Search results for: rainfall intensity-duration-frequency data
23581 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis
Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa
Abstract:
Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM
Procedia PDF Downloads 45423580 Assimilating Remote Sensing Data Into Crop Models: A Global Systematic Review
Authors: Luleka Dlamini, Olivier Crespo, Jos van Dam
Abstract:
Accurately estimating crop growth and yield is pivotal for timely sustainable agricultural management and ensuring food security. Crop models and remote sensing can complement each other and form a robust analysis tool to improve crop growth and yield estimations when combined. This study thus aims to systematically evaluate how research that exclusively focuses on assimilating RS data into crop models varies among countries, crops, data assimilation methods, and farming conditions. A strict search string was applied in the Scopus and Web of Science databases, and 497 potential publications were obtained. After screening for relevance with predefined inclusion/exclusion criteria, 123 publications were considered in the final review. Results indicate that over 81% of the studies were conducted in countries associated with high socio-economic and technological advancement, mainly China, the United States of America, France, Germany, and Italy. Many of these studies integrated MODIS or Landsat data into WOFOST to improve crop growth and yield estimation of staple crops at the field and regional scales. Most studies use recalibration or updating methods alongside various algorithms to assimilate remotely sensed leaf area index into crop models. However, these methods cannot account for the uncertainties in remote sensing observations and the crop model itself. l. Over 85% of the studies were based on commercial and irrigated farming systems. Despite a great global interest in data assimilation into crop models, limited research has been conducted in resource- and data-limited regions like Africa. We foresee a great potential for such application in those conditions. Hence facilitating and expanding the use of such an approach, from which developing farming communities could benefit.Keywords: crop models, remote sensing, data assimilation, crop yield estimation
Procedia PDF Downloads 13123579 Assimilating Remote Sensing Data into Crop Models: A Global Systematic Review
Authors: Luleka Dlamini, Olivier Crespo, Jos van Dam
Abstract:
Accurately estimating crop growth and yield is pivotal for timely sustainable agricultural management and ensuring food security. Crop models and remote sensing can complement each other and form a robust analysis tool to improve crop growth and yield estimations when combined. This study thus aims to systematically evaluate how research that exclusively focuses on assimilating RS data into crop models varies among countries, crops, data assimilation methods, and farming conditions. A strict search string was applied in the Scopus and Web of Science databases, and 497 potential publications were obtained. After screening for relevance with predefined inclusion/exclusion criteria, 123 publications were considered in the final review. Results indicate that over 81% of the studies were conducted in countries associated with high socio-economic and technological advancement, mainly China, the United States of America, France, Germany, and Italy. Many of these studies integrated MODIS or Landsat data into WOFOST to improve crop growth and yield estimation of staple crops at the field and regional scales. Most studies use recalibration or updating methods alongside various algorithms to assimilate remotely sensed leaf area index into crop models. However, these methods cannot account for the uncertainties in remote sensing observations and the crop model itself. l. Over 85% of the studies were based on commercial and irrigated farming systems. Despite a great global interest in data assimilation into crop models, limited research has been conducted in resource- and data-limited regions like Africa. We foresee a great potential for such application in those conditions. Hence facilitating and expanding the use of such an approach, from which developing farming communities could benefit.Keywords: crop models, remote sensing, data assimilation, crop yield estimation
Procedia PDF Downloads 8223578 Enhancing Robustness in Federated Learning through Decentralized Oracle Consensus and Adaptive Evaluation
Authors: Peiming Li
Abstract:
This paper presents an innovative blockchain-based approach to enhance the reliability and efficiency of federated learning systems. By integrating a decentralized oracle consensus mechanism into the federated learning framework, we address key challenges of data and model integrity. Our approach utilizes a network of redundant oracles, functioning as independent validators within an epoch-based training system in the federated learning model. In federated learning, data is decentralized, residing on various participants' devices. This scenario often leads to concerns about data integrity and model quality. Our solution employs blockchain technology to establish a transparent and tamper-proof environment, ensuring secure data sharing and aggregation. The decentralized oracles, a concept borrowed from blockchain systems, act as unbiased validators. They assess the contributions of each participant using a Hidden Markov Model (HMM), which is crucial for evaluating the consistency of participant inputs and safeguarding against model poisoning and malicious activities. Our methodology's distinct feature is its epoch-based training. An epoch here refers to a specific training phase where data is updated and assessed for quality and relevance. The redundant oracles work in concert to validate data updates during these epochs, enhancing the system's resilience to security threats and data corruption. The effectiveness of this system was tested using the Mnist dataset, a standard in machine learning for benchmarking. Results demonstrate that our blockchain-oriented federated learning approach significantly boosts system resilience, addressing the common challenges of federated environments. This paper aims to make these advanced concepts accessible, even to those with a limited background in blockchain or federated learning. We provide a foundational understanding of how blockchain technology can revolutionize data integrity in decentralized systems and explain the role of oracles in maintaining model accuracy and reliability.Keywords: federated learning system, block chain, decentralized oracles, hidden markov model
Procedia PDF Downloads 6323577 Multiple Query Optimization in Wireless Sensor Networks Using Data Correlation
Authors: Elaheh Vaezpour
Abstract:
Data sensing in wireless sensor networks is done by query deceleration the network by the users. In many applications of the wireless sensor networks, many users send queries to the network simultaneously. If the queries are processed separately, the network’s energy consumption will increase significantly. Therefore, it is very important to aggregate the queries before sending them to the network. In this paper, we propose a multiple query optimization framework based on sensors physical and temporal correlation. In the proposed method, queries are merged and sent to network by considering correlation among the sensors in order to reduce the communication cost between the sensors and the base station.Keywords: wireless sensor networks, multiple query optimization, data correlation, reducing energy consumption
Procedia PDF Downloads 33423576 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models
Authors: Yoonsuh Jung
Abstract:
As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an "optimal" value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.Keywords: cross validation, parameter averaging, parameter selection, regularization parameter search
Procedia PDF Downloads 41523575 Digital Image Steganography with Multilayer Security
Authors: Amar Partap Singh Pharwaha, Balkrishan Jindal
Abstract:
In this paper, a new method is developed for hiding image in a digital image with multilayer security. In the proposed method, the secret image is encrypted in the first instance using a flexible matrix based symmetric key to add first layer of security. Then another layer of security is added to the secret data by encrypting the ciphered data using Pythagorean Theorem method. The ciphered data bits (4 bits) produced after double encryption are then embedded within digital image in the spatial domain using Least Significant Bits (LSBs) substitution. To improve the image quality of the stego-image, an improved form of pixel adjustment process is proposed. To evaluate the effectiveness of the proposed method, image quality metrics including Peak Signal-to-Noise Ratio (PSNR), Mean Square Error (MSE), entropy, correlation, mean value and Universal Image Quality Index (UIQI) are measured. It has been found experimentally that the proposed method provides higher security as well as robustness. In fact, the results of this study are quite promising.Keywords: Pythagorean theorem, pixel adjustment, ciphered data, image hiding, least significant bit, flexible matrix
Procedia PDF Downloads 33723574 MapReduce Logistic Regression Algorithms with RHadoop
Authors: Byung Ho Jung, Dong Hoon Lim
Abstract:
Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.Keywords: big data, logistic regression, MapReduce, RHadoop
Procedia PDF Downloads 28523573 Iterative Panel RC Extraction for Capacitive Touchscreen
Authors: Chae Hoon Park, Jong Kang Park, Jong Tae Kim
Abstract:
Electrical characteristics of capacitive touchscreen need to be accurately analyzed to result in better performance for multi-channel capacitance sensing. In this paper, we extracted the panel resistances and capacitances of the touchscreen by comparing measurement data and model data. By employing a lumped RC model for driver-to-receiver paths in touchscreen, we estimated resistance and capacitance values according to the physical lengths of channel paths which are proportional to the RC model. As a result, we obtained the model having 95.54% accuracy of the measurement data.Keywords: electrical characteristics of capacitive touchscreen, iterative extraction, lumped RC model, physical lengths of channel paths
Procedia PDF Downloads 33423572 Combining Shallow and Deep Unsupervised Machine Learning Techniques to Detect Bad Actors in Complex Datasets
Authors: Jun Ming Moey, Zhiyaun Chen, David Nicholson
Abstract:
Bad actors are often hard to detect in data that imprints their behaviour patterns because they are comparatively rare events embedded in non-bad actor data. An unsupervised machine learning framework is applied here to detect bad actors in financial crime datasets that record millions of transactions undertaken by hundreds of actors (<0.01% bad). Specifically, the framework combines ‘shallow’ (PCA, Isolation Forest) and ‘deep’ (Autoencoder) methods to detect outlier patterns. Detection performance analysis for both the individual methods and their combination is reported.Keywords: detection, machine learning, deep learning, unsupervised, outlier analysis, data science, fraud, financial crime
Procedia PDF Downloads 9523571 Machine Learning Development Audit Framework: Assessment and Inspection of Risk and Quality of Data, Model and Development Process
Authors: Jan Stodt, Christoph Reich
Abstract:
The usage of machine learning models for prediction is growing rapidly and proof that the intended requirements are met is essential. Audits are a proven method to determine whether requirements or guidelines are met. However, machine learning models have intrinsic characteristics, such as the quality of training data, that make it difficult to demonstrate the required behavior and make audits more challenging. This paper describes an ML audit framework that evaluates and reviews the risks of machine learning applications, the quality of the training data, and the machine learning model. We evaluate and demonstrate the functionality of the proposed framework by auditing an steel plate fault prediction model.Keywords: audit, machine learning, assessment, metrics
Procedia PDF Downloads 27123570 Efficient Sampling of Probabilistic Program for Biological Systems
Authors: Keerthi S. Shetty, Annappa Basava
Abstract:
In recent years, modelling of biological systems represented by biochemical reactions has become increasingly important in Systems Biology. Biological systems represented by biochemical reactions are highly stochastic in nature. Probabilistic model is often used to describe such systems. One of the main challenges in Systems biology is to combine absolute experimental data into probabilistic model. This challenge arises because (1) some molecules may be present in relatively small quantities, (2) there is a switching between individual elements present in the system, and (3) the process is inherently stochastic on the level at which observations are made. In this paper, we describe a novel idea of combining absolute experimental data into probabilistic model using tool R2. Through a case study of the Transcription Process in Prokaryotes we explain how biological systems can be written as probabilistic program to combine experimental data into the model. The model developed is then analysed in terms of intrinsic noise and exact sampling of switching times between individual elements in the system. We have mainly concentrated on inferring number of genes in ON and OFF states from experimental data.Keywords: systems biology, probabilistic model, inference, biology, model
Procedia PDF Downloads 34923569 A Real-time Classification of Lying Bodies for Care Application of Elderly Patients
Authors: E. Vazquez-Santacruz, M. Gamboa-Zuniga
Abstract:
In this paper, we show a methodology for bodies classification in lying state using HOG descriptors and pressures sensors positioned in a matrix form (14 x 32 sensors) on the surface where bodies lie down. it will be done in real time. Our system is embedded in a care robot that can assist the elderly patient and medical staff around to get a better quality of life in and out of hospitals. Due to current technology a limited number of sensors is used, wich results in low-resolution data array, that will be used as image of 14 x 32 pixels. Our work considers the problem of human posture classification with few information (sensors), applying digital process to expand the original data of the sensors and so get more significant data for the classification, however, this is done with low-cost algorithms to ensure the real-time execution.Keywords: real-time classification, sensors, robots, health care, elderly patients, artificial intelligence
Procedia PDF Downloads 86623568 Disidentification of Historical City Centers: A Comparative Study of the Old and New Settlements of Mardin, Turkey
Authors: Fatma Kürüm Varolgüneş, Fatih Canan
Abstract:
Mardin is one of the unique cities in Turkey with its rich cultural and historical heritage. Mardin’s traditional dwellings have been affected both by natural data such as climate and topography and by cultural data like lifestyle and belief. However, in the new settlements, housing is formed with modern approaches and unsuitable forms clashing with Mardin’s culture and environment. While the city is expanding, traditional textures are ignored. Thus, traditional settlements are losing their identity and are vanishing because of the rapid change and transformation. The main aim of this paper is to determine the physical and social data needed to define the characteristic features of Mardin’s old and new settlements. In this context, based on social and cultural data, old and new settlement formations of Mardin have been investigated from various aspects. During this research, the following methods have been utilized: observations, interviews, public surveys, literature review, as well as site examination via maps, photographs and questionnaire methodology. In conclusion, this paper focuses on how changes in the physical forms of cities affect the typology and the identity of cities, as in the case of Mardin.Keywords: urban and local identity, historical city center, traditional settlements, Mardin
Procedia PDF Downloads 32823567 Pediatric Hearing Aid Use: A Study Based on Data Logging Information
Authors: Mina Salamatmanesh, Elizabeth Fitzpatrick, Tim Ramsay, Josee Lagacé, Lindsey Sikora, JoAnne Whittingham
Abstract:
Introduction: Hearing loss (HL) is one of the most common disorders that presents at birth and in early childhood. Universal newborn hearing screening (UNHS) has been adopted based on the assumption that with early identification of HL, children will have access to optimal amplification and intervention at younger ages, therefore, taking advantage of the brain’s maximal plasticity. One particular challenge for parents in the early years is achieving consistent hearing aid (HA) use which is critical to the child’s development and constitutes the first step in the rehabilitation process. This study examined the consistency of hearing aid use in young children based on data logging information documented during audiology sessions in the first three years after hearing aid fitting. Methodology: The first 100 children who were diagnosed with bilateral HL before 72 months of age since 2003 to 2015 in a pediatric audiology clinic and who had at least two hearing aid follow-up sessions with available data logging information were included in the study. Data from each audiology session (age of child at the session, average hours of use per day (for each ear) in the first three years after HA fitting) were collected. Clinical characteristics (degree of hearing loss, age of HA fitting) were also documented to further understanding of factors that impact HA use. Results: Preliminary analysis of the results of the first 20 children shows that all of them (100%) have at least one data logging session recorded in the clinical audiology system (Noah). Of the 20 children, 17(85%) have three data logging events recorded in the first three years after HA fitting. Based on the statistical analysis of the first 20 cases, the median hours of use in the first follow-up session after the hearing aid fitting in the right ear is 3.9 hours with an interquartile range (IQR) of 10.2h. For the left ear the median is 4.4 and the IQR is 9.7h. In the first session 47% of the children use their hearing aids ≤5 hours, 12% use them between 5 to 10 hours and 22% use them ≥10 hours a day. However, these children showed increased use by the third follow-up session with a median (IQR) of 9.1 hours for the right ear and 2.5, and of 8.2 hours for left ear (IQR) IQR is 5.6 By the third follow-up session, 14% of children used hearing aids ≤5 hours, while 38% of children used them ≥10 hours. Based on the primary results, factors like age and level of HL significantly impact the hours of use. Conclusion: The use of data logging information to assess the actual hours of HA provides an opportunity to examine the: a) challenges of families of young children with HAs, b) factors that impact use in very young children. Data logging when used collaboratively with parents, can be a powerful tool to identify problems and to encourage and assist families in maximizing their child’s hearing potential.Keywords: hearing loss, hearing aid, data logging, hours of use
Procedia PDF Downloads 23023566 Towards End-To-End Disease Prediction from Raw Metagenomic Data
Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker
Abstract:
Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine
Procedia PDF Downloads 12523565 The Role of Waqf Forestry for Sustainable Economic Development: A Panel Logit Analysis
Authors: Patria Yunita
Abstract:
Kuznets’ environmental curve analysis suggests sacrificing economic development to reduce environmental problems. However, we hope to achieve sustainable economic development. In this case, Islamic social finance, especially that of waqf in Indonesia, can be used as a solution to bridge the problem of environmental damage to the sustainability of economic development. The Panel Logit Regression method was used to analyze the probability of increasing economic growth and the role of waqf in the environmental impact of CO₂ emissions. This study uses panel data from 33 Indonesian provinces. The data used were the National Waqf Index, Forest Area, Waqf Land Area, Growth Rate of Regional Gross Domestic Product (YoY), and CO₂ Emissions for 2018-2022. Data were obtained from the Indonesian Waqf Board, Climate World Data, the Ministry of the Environment, and the Bank of Indonesia. The results prove that CO₂ emissions have a negative effect on regional economic growth and that waqf governance in the waqf index has a positive effect on regional economic growth in 33 provinces.Keywords: waqf, CO₂ emissions, panel logit analysis, sustainable economic development
Procedia PDF Downloads 4323564 Intelligent Human Pose Recognition Based on EMG Signal Analysis and Machine 3D Model
Authors: Si Chen, Quanhong Jiang
Abstract:
In the increasingly mature posture recognition technology, human movement information is widely used in sports rehabilitation, human-computer interaction, medical health, human posture assessment, and other fields today; this project uses the most original ideas; it is proposed to use the collection equipment for the collection of myoelectric data, reflect the muscle posture change on a degree of freedom through data processing, carry out data-muscle three-dimensional model joint adjustment, and realize basic pose recognition. Based on this, bionic aids or medical rehabilitation equipment can be further developed with the help of robotic arms and cutting-edge technology, which has a bright future and unlimited development space.Keywords: pose recognition, 3D animation, electromyography, machine learning, bionics
Procedia PDF Downloads 7923563 Optimizing Energy Efficiency: Leveraging Big Data Analytics and AWS Services for Buildings and Industries
Authors: Gaurav Kumar Sinha
Abstract:
In an era marked by increasing concerns about energy sustainability, this research endeavors to address the pressing challenge of energy consumption in buildings and industries. This study delves into the transformative potential of AWS services in optimizing energy efficiency. The research is founded on the recognition that effective management of energy consumption is imperative for both environmental conservation and economic viability. Buildings and industries account for a substantial portion of global energy use, making it crucial to develop advanced techniques for analysis and reduction. This study sets out to explore the integration of AWS services with big data analytics to provide innovative solutions for energy consumption analysis. Leveraging AWS's cloud computing capabilities, scalable infrastructure, and data analytics tools, the research aims to develop efficient methods for collecting, processing, and analyzing energy data from diverse sources. The core focus is on creating predictive models and real-time monitoring systems that enable proactive energy management. By harnessing AWS's machine learning and data analytics capabilities, the research seeks to identify patterns, anomalies, and optimization opportunities within energy consumption data. Furthermore, this study aims to propose actionable recommendations for reducing energy consumption in buildings and industries. By combining AWS services with metrics-driven insights, the research strives to facilitate the implementation of energy-efficient practices, ultimately leading to reduced carbon emissions and cost savings. The integration of AWS services not only enhances the analytical capabilities but also offers scalable solutions that can be customized for different building and industrial contexts. The research also recognizes the potential for AWS-powered solutions to promote sustainable practices and support environmental stewardship.Keywords: energy consumption analysis, big data analytics, AWS services, energy efficiency
Procedia PDF Downloads 6423562 Bandwidth Efficient Cluster Based Collision Avoidance Multicasting Protocol in VANETs
Authors: Navneet Kaur, Amarpreet Singh
Abstract:
In Vehicular Adhoc Networks, Data Dissemination is a challenging task. There are number of techniques, types and protocols available for disseminating the data but in order to preserve limited bandwidth and to disseminate maximum data over networks makes it more challenging. There are broadcasting, multicasting and geocasting based protocols. Multicasting based protocols are found to be best for conserving the bandwidth. One such protocol named BEAM exists that improves the performance of Vehicular Adhoc Networks by reducing the number of in-network message transactions and thereby efficiently utilizing the bandwidth during an emergency situation. But this protocol may result in multicar chain collision as there was no V2V communication. So, this paper proposes a new protocol named Enhanced Bandwidth Efficient Cluster Based Multicasting Protocol (EBECM) that will overcome the limitations of existing BEAM protocol. And Simulation results will show the improved performance of EBECM in terms of Routing overhead, throughput and PDR when compared with BEAM protocol.Keywords: BEAM, data dissemination, emergency situation, vehicular adhoc network
Procedia PDF Downloads 34823561 Machine Learning-Based Workflow for the Analysis of Project Portfolio
Authors: Jean Marie Tshimula, Atsushi Togashi
Abstract:
We develop a data-science approach for providing an interactive visualization and predictive models to find insights into the projects' historical data in order for stakeholders understand some unseen opportunities in the African market that might escape them behind the online project portfolio of the African Development Bank. This machine learning-based web application identifies the market trend of the fastest growing economies across the continent as well skyrocketing sectors which have a significant impact on the future of business in Africa. Owing to this, the approach is tailored to predict where the investment needs are the most required. Moreover, we create a corpus that includes the descriptions of over more than 1,200 projects that approximately cover 14 sectors designed for some of 53 African countries. Then, we sift out this large amount of semi-structured data for extracting tiny details susceptible to contain some directions to follow. In the light of the foregoing, we have applied the combination of Latent Dirichlet Allocation and Random Forests at the level of the analysis module of our methodology to highlight the most relevant topics that investors may focus on for investing in Africa.Keywords: machine learning, topic modeling, natural language processing, big data
Procedia PDF Downloads 16823560 On the Existence of Homotopic Mapping Between Knowledge Graphs and Graph Embeddings
Authors: Jude K. Safo
Abstract:
Knowledge Graphs KG) and their relation to Graph Embeddings (GE) represent a unique data structure in the landscape of machine learning (relative to image, text and acoustic data). Unlike the latter, GEs are the only data structure sufficient for representing hierarchically dense, semantic information needed for use-cases like supply chain data and protein folding where the search space exceeds the limits traditional search methods (e.g. page-rank, Dijkstra, etc.). While GEs are effective for compressing low rank tensor data, at scale, they begin to introduce a new problem of ’data retreival’ which we observe in Large Language Models. Notable attempts by transE, TransR and other prominent industry standards have shown a peak performance just north of 57% on WN18 and FB15K benchmarks, insufficient practical industry applications. They’re also limited, in scope, to next node/link predictions. Traditional linear methods like Tucker, CP, PARAFAC and CANDECOMP quickly hit memory limits on tensors exceeding 6.4 million nodes. This paper outlines a topological framework for linear mapping between concepts in KG space and GE space that preserve cardinality. Most importantly we introduce a traceable framework for composing dense linguistic strcutures. We demonstrate performance on WN18 benchmark this model hits. This model does not rely on Large Langauge Models (LLM) though the applications are certainy relevant here as well.Keywords: representation theory, large language models, graph embeddings, applied algebraic topology, applied knot theory, combinatorics
Procedia PDF Downloads 6823559 The Names of the Traditional Motif of Batik Solo
Authors: Annisa D. Febryandini
Abstract:
Batik is a unique cultural heritage that strongly linked with its community. As a product of current culture in Solo, Batik Solo not only has a specific design and color to represent the cultural identity, cultural values, and spirituality of the community, but also has some specific names given by its community which are not arbitrary. This qualitative research paper uses the primary data by interview method as well as the secondary data to support it. Based on the data, this paper concludes that the names consist of a word or words taken from a current name of things in Javanese language. They indicate the cultural meaning such as a specific event, a hope, and the social status of the people who use the motif. Different from the other research, this paper takes a look at the names of traditional motif of Batik Solo which analyzed linguistically to reveal the cultural meaning.Keywords: traditional motif, Batik, solo, anthropological linguistics
Procedia PDF Downloads 27723558 Review of Speech Recognition Research on Low-Resource Languages
Authors: XuKe Cao
Abstract:
This paper reviews the current state of research on low-resource languages in the field of speech recognition, focusing on the challenges faced by low-resource language speech recognition, including the scarcity of data resources, the lack of linguistic resources, and the diversity of dialects and accents. The article reviews recent progress in low-resource language speech recognition, including techniques such as data augmentation, end to-end models, transfer learning, and multi-task learning. Based on the challenges currently faced, the paper also provides an outlook on future research directions. Through these studies, it is expected that the performance of speech recognition for low resource languages can be improved, promoting the widespread application and adoption of related technologies.Keywords: low-resource languages, speech recognition, data augmentation techniques, NLP
Procedia PDF Downloads 1423557 SEM Image Classification Using CNN Architectures
Authors: Güzi̇n Ti̇rkeş, Özge Teki̇n, Kerem Kurtuluş, Y. Yekta Yurtseven, Murat Baran
Abstract:
A scanning electron microscope (SEM) is a type of electron microscope mainly used in nanoscience and nanotechnology areas. Automatic image recognition and classification are among the general areas of application concerning SEM. In line with these usages, the present paper proposes a deep learning algorithm that classifies SEM images into nine categories by means of an online application to simplify the process. The NFFA-EUROPE - 100% SEM data set, containing approximately 21,000 images, was used to train and test the algorithm at 80% and 20%, respectively. Validation was carried out using a separate data set obtained from the Middle East Technical University (METU) in Turkey. To increase the accuracy in the results, the Inception ResNet-V2 model was used in view of the Fine-Tuning approach. By using a confusion matrix, it was observed that the coated-surface category has a negative effect on the accuracy of the results since it contains other categories in the data set, thereby confusing the model when detecting category-specific patterns. For this reason, the coated-surface category was removed from the train data set, hence increasing accuracy by up to 96.5%.Keywords: convolutional neural networks, deep learning, image classification, scanning electron microscope
Procedia PDF Downloads 12523556 Nearest Neighbor Investigate Using R+ Tree
Authors: Rutuja Desai
Abstract:
Search engine is fundamentally a framework used to search the data which is pertinent to the client via WWW. Looking close-by spot identified with the keywords is an imperative concept in developing web advances. For such kind of searching, extent pursuit or closest neighbor is utilized. In range search the forecast is made whether the objects meet to query object. Nearest neighbor is the forecast of the focuses close to the query set by the client. Here, the nearest neighbor methodology is utilized where Data recovery R+ tree is utilized rather than IR2 tree. The disadvantages of IR2 tree is: The false hit number can surpass the limit and the mark in Information Retrieval R-tree must have Voice over IP bit for each one of a kind word in W set is recouped by Data recovery R+ tree. The inquiry is fundamentally subordinate upon the key words and the geometric directions.Keywords: information retrieval, nearest neighbor search, keyword search, R+ tree
Procedia PDF Downloads 29123555 Classical and Bayesian Inference of the Generalized Log-Logistic Distribution with Applications to Survival Data
Authors: Abdisalam Hassan Muse, Samuel Mwalili, Oscar Ngesa
Abstract:
A generalized log-logistic distribution with variable shapes of the hazard rate was introduced and studied, extending the log-logistic distribution by adding an extra parameter to the classical distribution, leading to greater flexibility in analysing and modeling various data types. The proposed distribution has a large number of well-known lifetime special sub-models such as; Weibull, log-logistic, exponential, and Burr XII distributions. Its basic mathematical and statistical properties were derived. The method of maximum likelihood was adopted for estimating the unknown parameters of the proposed distribution, and a Monte Carlo simulation study is carried out to assess the behavior of the estimators. The importance of this distribution is that its tendency to model both monotone (increasing and decreasing) and non-monotone (unimodal and bathtub shape) or reversed “bathtub” shape hazard rate functions which are quite common in survival and reliability data analysis. Furthermore, the flexibility and usefulness of the proposed distribution are illustrated in a real-life data set and compared to its sub-models; Weibull, log-logistic, and BurrXII distributions and other parametric survival distributions with 3-parmaeters; like the exponentiated Weibull distribution, the 3-parameter lognormal distribution, the 3- parameter gamma distribution, the 3-parameter Weibull distribution, and the 3-parameter log-logistic (also known as shifted log-logistic) distribution. The proposed distribution provided a better fit than all of the competitive distributions based on the goodness-of-fit tests, the log-likelihood, and information criterion values. Finally, Bayesian analysis and performance of Gibbs sampling for the data set are also carried out.Keywords: hazard rate function, log-logistic distribution, maximum likelihood estimation, generalized log-logistic distribution, survival data, Monte Carlo simulation
Procedia PDF Downloads 20223554 Fuzzy Inference-Assisted Saliency-Aware Convolution Neural Networks for Multi-View Summarization
Authors: Tanveer Hussain, Khan Muhammad, Amin Ullah, Mi Young Lee, Sung Wook Baik
Abstract:
The Big Data generated from distributed vision sensors installed on large scale in smart cities create hurdles in its efficient and beneficial exploration for browsing, retrieval, and indexing. This paper presents a three-folded framework for effective video summarization of such data and provide a compact and representative format of Big Video Data. In the first fold, the paper acquires input video data from the installed cameras and collect clues such as type and count of objects and clarity of the view from a chunk of pre-defined number of frames of each view. The decision of representative view selection for a particular interval is based on fuzzy inference system, acquiring a precise and human resembling decision, reinforced by the known clues as a part of the second fold. In the third fold, the paper forwards the selected view frames to the summary generation mechanism that is supported by a saliency-aware convolution neural network (CNN) model. The new trend of fuzzy rules for view selection followed by CNN architecture for saliency computation makes the multi-view video summarization (MVS) framework a suitable candidate for real-world practice in smart cities.Keywords: big video data analysis, fuzzy logic, multi-view video summarization, saliency detection
Procedia PDF Downloads 18823553 Relation between Pavement Roughness and Distress Parameters for Highways
Authors: Suryapeta Harini
Abstract:
Road surface roughness is one of the essential aspects of the road's functional condition, indicating riding comfort in both the transverse and longitudinal directions. The government of India has made maintaining good surface evenness a prerequisite for all highway projects. Pavement distress data was collected with a Network Survey Vehicle (NSV) on a National Highway. It determines the smoothness and frictional qualities of the pavement surface, which are related to driving safety and ease. Based on the data obtained in the field, a regression equation was created with the IRI value and the visual distresses. The suggested system can use wireless acceleration sensors and GPS to gather vehicle status and location data, as well as calculate the international roughness index (IRI). Potholes, raveling, rut depth, cracked area, and repair work are all affected by pavement roughness, according to the current study. The study was carried out in one location. Data collected through using Bump integrator was used for the validation. The bump integrator (BI) obtained using deflection from the network survey vehicle was correlated with the distress parameter to establish an equation.Keywords: roughness index, network survey vehicle, regression, correlation
Procedia PDF Downloads 17623552 Structural Health Monitoring using Fibre Bragg Grating Sensors in Slab and Beams
Authors: Pierre van Tonder, Dinesh Muthoo, Kim twiname
Abstract:
Many existing and newly built structures are constructed on the design basis of the engineer and the workmanship of the construction company. However, when considering larger structures where more people are exposed to the building, its structural integrity is of great importance considering the safety of its occupants (Raghu, 2013). But how can the structural integrity of a building be monitored efficiently and effectively. This is where the fourth industrial revolution step in, and with minimal human interaction, data can be collected, analysed, and stored, which could also give an indication of any inconsistencies found in the data collected, this is where the Fibre Bragg Grating (FBG) monitoring system is introduced. This paper illustrates how data can be collected and converted to develop stress – strain behaviour and to produce bending moment diagrams for the utilisation and prediction of the structure’s integrity. Embedded fibre optic sensors were used in this study– fibre Bragg grating sensors in particular. The procedure entailed making use of the shift in wavelength demodulation technique and an inscription process of the phase mask technique. The fibre optic sensors considered in this report were photosensitive and embedded in the slab and beams for data collection and analysis. Two sets of fibre cables have been inserted, one purposely to collect temperature recordings and the other to collect strain and temperature. The data was collected over a time period and analysed used to produce bending moment diagrams to make predictions of the structure’s integrity. The data indicated the fibre Bragg grating sensing system proved to be useful and can be used for structural health monitoring in any environment. From the experimental data for the slab and beams, the moments were found to be64.33 kN.m, 64.35 kN.m and 45.20 kN.m (from the experimental bending moment diagram), and as per the idealistic (Ultimate Limit State), the data of 133 kN.m and 226.2 kN.m were obtained. The difference in values gave room for an early warning system, in other words, a reserve capacity of approximately 50% to failure.Keywords: fibre bragg grating, structural health monitoring, fibre optic sensors, beams
Procedia PDF Downloads 139