Search results for: accuracy
3390 A U-Net Based Architecture for Fast and Accurate Diagram Extraction
Authors: Revoti Prasad Bora, Saurabh Yadav, Nikita Katyal
Abstract:
In the context of educational data mining, the use case of extracting information from images containing both text and diagrams is of high importance. Hence, document analysis requires the extraction of diagrams from such images and processes the text and diagrams separately. To the author’s best knowledge, none among plenty of approaches for extracting tables, figures, etc., suffice the need for real-time processing with high accuracy as needed in multiple applications. In the education domain, diagrams can be of varied characteristics viz. line-based i.e. geometric diagrams, chemical bonds, mathematical formulas, etc. There are two broad categories of approaches that try to solve similar problems viz. traditional computer vision based approaches and deep learning approaches. The traditional computer vision based approaches mainly leverage connected components and distance transform based processing and hence perform well in very limited scenarios. The existing deep learning approaches either leverage YOLO or faster-RCNN architectures. These approaches suffer from a performance-accuracy tradeoff. This paper proposes a U-Net based architecture that formulates the diagram extraction as a segmentation problem. The proposed method provides similar accuracy with a much faster extraction time as compared to the mentioned state-of-the-art approaches. Further, the segmentation mask in this approach allows the extraction of diagrams of irregular shapes.Keywords: computer vision, deep-learning, educational data mining, faster-RCNN, figure extraction, image segmentation, real-time document analysis, text extraction, U-Net, YOLO
Procedia PDF Downloads 1373389 Sparse Signal Restoration Algorithm Based on Piecewise Adaptive Backtracking Orthogonal Least Squares
Authors: Linyu Wang, Jiahui Ma, Jianhong Xiang, Hanyu Jiang
Abstract:
the traditional greedy compressed sensing algorithm needs to know the signal sparsity when recovering the signal, but the signal sparsity in the practical application can not be obtained as a priori information, and the recovery accuracy is low, which does not meet the needs of practical application. To solve this problem, this paper puts forward Piecewise adaptive backtracking orthogonal least squares algorithm. The algorithm is divided into two stages. In the first stage, the sparsity pre-estimation strategy is adopted, which can quickly approach the real sparsity and reduce time consumption. In the second stage iteration, the correction strategy and adaptive step size are used to accurately estimate the sparsity, and the backtracking idea is introduced to improve the accuracy of signal recovery. Through experimental simulation, the algorithm can accurately recover the estimated signal with fewer iterations when the sparsity is unknown.Keywords: compressed sensing, greedy algorithm, least square method, adaptive reconstruction
Procedia PDF Downloads 1483388 Trading off Accuracy for Speed in Powerdrill
Authors: Filip Buruiana, Alexander Hall, Reimar Hofmann, Thomas Hofmann, Silviu Ganceanu, Alexandru Tudorica
Abstract:
In-memory column-stores make interactive analysis feasible for many big data scenarios. PowerDrill is a system used internally at Google for exploration in logs data. Even though it is a highly parallelized column-store and uses in memory caching, interactive response times cannot be achieved for all datasets (note that it is common to analyze data with 50 billion records in PowerDrill). In this paper, we investigate two orthogonal approaches to optimize performance at the expense of an acceptable loss of accuracy. Both approaches can be implemented as outer wrappers around existing database engines and so they should be easily applicable to other systems. For the first optimization we show that memory is the limiting factor in executing queries at speed and therefore explore possibilities to improve memory efficiency. We adapt some of the theory behind data sketches to reduce the size of particularly expensive fields in our largest tables by a factor of 4.5 when compared to a standard compression algorithm. This saves 37% of the overall memory in PowerDrill and introduces a 0.4% relative error in the 90th percentile for results of queries with the expensive fields. We additionally evaluate the effects of using sampling on accuracy and propose a simple heuristic for annotating individual result-values as accurate (or not). Based on measurements of user behavior in our real production system, we show that these estimates are essential for interpreting intermediate results before final results are available. For a large set of queries this effectively brings down the 95th latency percentile from 30 to 4 seconds.Keywords: big data, in-memory column-store, high-performance SQL queries, approximate SQL queries
Procedia PDF Downloads 2593387 Uncertainty of the Brazilian Earth System Model for Solar Radiation
Authors: Elison Eduardo Jardim Bierhals, Claudineia Brazil, Deivid Pires, Rafael Haag, Elton Gimenez Rossini
Abstract:
This study evaluated the uncertainties involved in the solar radiation projections generated by the Brazilian Earth System Model (BESM) of the Weather and Climate Prediction Center (CPTEC) belonging to Coupled Model Intercomparison Phase 5 (CMIP5), with the aim of identifying efficiency in the projections for solar radiation of said model and in this way establish the viability of its use. Two different scenarios elaborated by Intergovernmental Panel on Climate Change (IPCC) were evaluated: RCP 4.5 (with more optimistic contour conditions) and 8.5 (with more pessimistic initial conditions). The method used to verify the accuracy of the present model was the Nash coefficient and the Statistical bias, as it better represents these atmospheric patterns. The BESM showed a tendency to overestimate the data of solar radiation projections in most regions of the state of Rio Grande do Sul and through the validation methods adopted by this study, BESM did not present a satisfactory accuracy.Keywords: climate changes, projections, solar radiation, uncertainty
Procedia PDF Downloads 2503386 Mobile Platform’s Attitude Determination Based on Smoothed GPS Code Data and Carrier-Phase Measurements
Authors: Mohamed Ramdani, Hassen Abdellaoui, Abdenour Boudrassen
Abstract:
Mobile platform’s attitude estimation approaches mainly based on combined positioning techniques and developed algorithms; which aim to reach a fast and accurate solution. In this work, we describe the design and the implementation of an attitude determination (AD) process, using only measurements from GPS sensors. The major issue is based on smoothed GPS code data using Hatch filter and raw carrier-phase measurements integrated into attitude algorithm based on vectors measurement using least squares (LSQ) estimation method. GPS dataset from a static experiment is used to investigate the effectiveness of the presented approach and consequently to check the accuracy of the attitude estimation algorithm. Attitude results from GPS multi-antenna over short baselines are introduced and analyzed. The 3D accuracy of estimated attitude parameters using smoothed measurements is over 0.27°.Keywords: attitude determination, GPS code data smoothing, hatch filter, carrier-phase measurements, least-squares attitude estimation
Procedia PDF Downloads 1553385 Utilizing Temporal and Frequency Features in Fault Detection of Electric Motor Bearings with Advanced Methods
Authors: Mohammad Arabi
Abstract:
The development of advanced technologies in the field of signal processing and vibration analysis has enabled more accurate analysis and fault detection in electrical systems. This research investigates the application of temporal and frequency features in detecting faults in electric motor bearings, aiming to enhance fault detection accuracy and prevent unexpected failures. The use of methods such as deep learning algorithms and neural networks in this process can yield better results. The main objective of this research is to evaluate the efficiency and accuracy of methods based on temporal and frequency features in identifying faults in electric motor bearings to prevent sudden breakdowns and operational issues. Additionally, the feasibility of using techniques such as machine learning and optimization algorithms to improve the fault detection process is also considered. This research employed an experimental method and random sampling. Vibration signals were collected from electric motors under normal and faulty conditions. After standardizing the data, temporal and frequency features were extracted. These features were then analyzed using statistical methods such as analysis of variance (ANOVA) and t-tests, as well as machine learning algorithms like artificial neural networks and support vector machines (SVM). The results showed that using temporal and frequency features significantly improves the accuracy of fault detection in electric motor bearings. ANOVA indicated significant differences between normal and faulty signals. Additionally, t-tests confirmed statistically significant differences between the features extracted from normal and faulty signals. Machine learning algorithms such as neural networks and SVM also significantly increased detection accuracy, demonstrating high effectiveness in timely and accurate fault detection. This study demonstrates that using temporal and frequency features combined with machine learning algorithms can serve as an effective tool for detecting faults in electric motor bearings. This approach not only enhances fault detection accuracy but also simplifies and streamlines the detection process. However, challenges such as data standardization and the cost of implementing advanced monitoring systems must also be considered. Utilizing temporal and frequency features in fault detection of electric motor bearings, along with advanced machine learning methods, offers an effective solution for preventing failures and ensuring the operational health of electric motors. Given the promising results of this research, it is recommended that this technology be more widely adopted in industrial maintenance processes.Keywords: electric motor, fault detection, frequency features, temporal features
Procedia PDF Downloads 473384 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach
Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf
Abstract:
This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis
Procedia PDF Downloads 713383 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
Authors: Samina Khalid, Shamila Nasreen
Abstract:
Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA
Procedia PDF Downloads 4963382 Using the Smith-Waterman Algorithm to Extract Features in the Classification of Obesity Status
Authors: Rosa Figueroa, Christopher Flores
Abstract:
Text categorization is the problem of assigning a new document to a set of predetermined categories, on the basis of a training set of free-text data that contains documents whose category membership is known. To train a classification model, it is necessary to extract characteristics in the form of tokens that facilitate the learning and classification process. In text categorization, the feature extraction process involves the use of word sequences also known as N-grams. In general, it is expected that documents belonging to the same category share similar features. The Smith-Waterman (SW) algorithm is a dynamic programming algorithm that performs a local sequence alignment in order to determine similar regions between two strings or protein sequences. This work explores the use of SW algorithm as an alternative to feature extraction in text categorization. The dataset used for this purpose, contains 2,610 annotated documents with the classes Obese/Non-Obese. This dataset was represented in a matrix form using the Bag of Word approach. The score selected to represent the occurrence of the tokens in each document was the term frequency-inverse document frequency (TF-IDF). In order to extract features for classification, four experiments were conducted: the first experiment used SW to extract features, the second one used unigrams (single word), the third one used bigrams (two word sequence) and the last experiment used a combination of unigrams and bigrams to extract features for classification. To test the effectiveness of the extracted feature set for the four experiments, a Support Vector Machine (SVM) classifier was tuned using 20% of the dataset. The remaining 80% of the dataset together with 5-Fold Cross Validation were used to evaluate and compare the performance of the four experiments of feature extraction. Results from the tuning process suggest that SW performs better than the N-gram based feature extraction. These results were confirmed by using the remaining 80% of the dataset, where SW performed the best (accuracy = 97.10%, weighted average F-measure = 97.07%). The second best was obtained by the combination of unigrams-bigrams (accuracy = 96.04, weighted average F-measure = 95.97) closely followed by the bigrams (accuracy = 94.56%, weighted average F-measure = 94.46%) and finally unigrams (accuracy = 92.96%, weighted average F-measure = 92.90%).Keywords: comorbidities, machine learning, obesity, Smith-Waterman algorithm
Procedia PDF Downloads 2973381 Small Target Recognition Based on Trajectory Information
Authors: Saad Alkentar, Abdulkareem Assalem
Abstract:
Recognizing small targets has always posed a significant challenge in image analysis. Over long distances, the image signal-to-noise ratio tends to be low, limiting the amount of useful information available to detection systems. Consequently, visual target recognition becomes an intricate task to tackle. In this study, we introduce a Track Before Detect (TBD) approach that leverages target trajectory information (coordinates) to effectively distinguish between noise and potential targets. By reframing the problem as a multivariate time series classification, we have achieved remarkable results. Specifically, our TBD method achieves an impressive 97% accuracy in separating target signals from noise within a mere half-second time span (consisting of 10 data points). Furthermore, when classifying the identified targets into our predefined categories—airplane, drone, and bird—we achieve an outstanding classification accuracy of 96% over a more extended period of 1.5 seconds (comprising 30 data points).Keywords: small targets, drones, trajectory information, TBD, multivariate time series
Procedia PDF Downloads 473380 The Effects of Big 6+6 Skill Training on Daily Living Skills for an Adolescent with Intellectual Disability
Authors: Luca Vascelli, Silvia Iacomini, Giada Gueli, Francesca Cavallini, Carlo Cavallini, Federica Berardo
Abstract:
The study was conducted to evaluate the effect of training on Big 6 + 6 motor skills to promote daily living skills. Precision teaching (PT) suggests that improved speed of the component behaviors can lead to better performance of composite skills. This study assessed the effects of the repeated timed practice of component motor skills on speed and accuracy of composite skills related to daily living skills. An 18 years old adolescent with intellectual disability participated. A pre post probe single-subject design was used. The results suggest that the participant was able to perform the component skills at his individual aims (endurance was assessed). The speed and accuracy of composite skills were increased; stability and retention were also measured for the composite skill after the training.Keywords: big 6+6, daily living skills, intellectual disability, precision teaching
Procedia PDF Downloads 1543379 Evaluation of Ensemble Classifiers for Intrusion Detection
Authors: M. Govindarajan
Abstract:
One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.Keywords: data mining, ensemble, radial basis function, support vector machine, accuracy
Procedia PDF Downloads 2483378 Using Open Source Data and GIS Techniques to Overcome Data Deficiency and Accuracy Issues in the Construction and Validation of Transportation Network: Case of Kinshasa City
Authors: Christian Kapuku, Seung-Young Kho
Abstract:
An accurate representation of the transportation system serving the region is one of the important aspects of transportation modeling. Such representation often requires developing an abstract model of the system elements, which also requires important amount of data, surveys and time. However, in some cases such as in developing countries, data deficiencies, time and budget constraints do not always allow such accurate representation, leaving opportunities to assumptions that may negatively affect the quality of the analysis. With the emergence of Internet open source data especially in the mapping technologies as well as the advances in Geography Information System, opportunities to tackle these issues have raised. Therefore, the objective of this paper is to demonstrate such application through a practical case of the development of the transportation network for the city of Kinshasa. The GIS geo-referencing was used to construct the digitized map of Transportation Analysis Zones using available scanned images. Centroids were then dynamically placed at the center of activities using an activities density map. Next, the road network with its characteristics was built using OpenStreet data and other official road inventory data by intersecting their layers and cleaning up unnecessary links such as residential streets. The accuracy of the final network was then checked, comparing it with satellite images from Google and Bing. For the validation, the final network was exported into Emme3 to check for potential network coding issues. Results show a high accuracy between the built network and satellite images, which can mostly be attributed to the use of open source data.Keywords: geographic information system (GIS), network construction, transportation database, open source data
Procedia PDF Downloads 1673377 Software-Defined Architecture and Front-End Optimization for DO-178B Compliant Distance Measuring Equipment
Authors: Farzan Farhangian, Behnam Shakibafar, Bobda Cedric, Rene Jr. Landry
Abstract:
Among the air navigation technologies, many of them are capable of increasing aviation sustainability as well as accuracy improvement in Alternative Positioning, Navigation, and Timing (APNT), especially avionics Distance Measuring Equipment (DME), Very high-frequency Omni-directional Range (VOR), etc. The integration of these air navigation solutions could make a robust and efficient accuracy in air mobility, air traffic management and autonomous operations. Designing a proper RF front-end, power amplifier and software-defined transponder could pave the way for reaching an optimized avionics navigation solution. In this article, the possibility of reaching an optimum front-end to be used with single low-cost Software-Defined Radio (SDR) has been investigated in order to reach a software-defined DME architecture. Our software-defined approach uses the firmware possibilities to design a real-time software architecture compatible with a Multi Input Multi Output (MIMO) BladeRF to estimate an accurate time delay between a Transmission (Tx) and the reception (Rx) channels using the synchronous scheduled communication. We could design a novel power amplifier for the transmission channel of the DME to pass the minimum transmission power. This article also investigates designing proper pair pulses based on the DO-178B avionics standard. Various guidelines have been tested, and the possibility of passing the certification process for each standard term has been analyzed. Finally, the performance of the DME was tested in the laboratory environment using an IFR6000, which showed that the proposed architecture reached an accuracy of less than 0.23 Nautical mile (Nmi) with 98% probability.Keywords: avionics, DME, software defined radio, navigation
Procedia PDF Downloads 793376 Forecasting Model to Predict Dengue Incidence in Malaysia
Authors: W. H. Wan Zakiyatussariroh, A. A. Nasuhar, W. Y. Wan Fairos, Z. A. Nazatul Shahreen
Abstract:
Forecasting dengue incidence in a population can provide useful information to facilitate the planning of the public health intervention. Many studies on dengue cases in Malaysia were conducted but are limited in modeling the outbreak and forecasting incidence. This article attempts to propose the most appropriate time series model to explain the behavior of dengue incidence in Malaysia for the purpose of forecasting future dengue outbreaks. Several seasonal auto-regressive integrated moving average (SARIMA) models were developed to model Malaysia’s number of dengue incidence on weekly data collected from January 2001 to December 2011. SARIMA (2,1,1)(1,1,1)52 model was found to be the most suitable model for Malaysia’s dengue incidence with the least value of Akaike information criteria (AIC) and Bayesian information criteria (BIC) for in-sample fitting. The models further evaluate out-sample forecast accuracy using four different accuracy measures. The results indicate that SARIMA (2,1,1)(1,1,1)52 performed well for both in-sample fitting and out-sample evaluation.Keywords: time series modeling, Box-Jenkins, SARIMA, forecasting
Procedia PDF Downloads 4843375 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification
Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike
Abstract:
Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.Keywords: data mining, decision tree, classification, imbalance dataset
Procedia PDF Downloads 1363374 A Neural Network Classifier for Identifying Duplicate Image Entries in Real-Estate Databases
Authors: Sergey Ermolin, Olga Ermolin
Abstract:
A Deep Convolution Neural Network with Triplet Loss is used to identify duplicate images in real-estate advertisements in the presence of image artifacts such as watermarking, cropping, hue/brightness adjustment, and others. The effects of batch normalization, spatial dropout, and various convergence methodologies on the resulting detection accuracy are discussed. For comparative Return-on-Investment study (per industry request), end-2-end performance is benchmarked on both Nvidia Titan GPUs and Intel’s Xeon CPUs. A new real-estate dataset from San Francisco Bay Area is used for this work. Sufficient duplicate detection accuracy is achieved to supplement other database-grounded methods of duplicate removal. The implemented method is used in a Proof-of-Concept project in the real-estate industry.Keywords: visual recognition, convolutional neural networks, triplet loss, spatial batch normalization with dropout, duplicate removal, advertisement technologies, performance benchmarking
Procedia PDF Downloads 3383373 The Relationship between Iranian EFL Learners' Multiple Intelligences and Their Performance on Grammar Tests
Authors: Rose Shayeghi, Pejman Hosseinioun
Abstract:
The Multiple Intelligences theory characterizes human intelligence as a multifaceted entity that exists in all human beings with varying degrees. The most important contribution of this theory to the field of English Language Teaching (ELT) is its role in identifying individual differences and designing more learner-centered programs. The present study aims at investigating the relationship between different elements of multiple intelligence and grammar scores. To this end, 63 female Iranian EFL learner selected from among intermediate students participated in the study. The instruments employed were a Nelson English language test, Michigan Grammar Test, and Teele Inventory for Multiple Intelligences (TIMI). The results of Pearson Product-Moment Correlation revealed a significant positive correlation between grammatical accuracy and linguistic as well as interpersonal intelligence. The results of Stepwise Multiple Regression indicated that linguistic intelligence contributed to the prediction of grammatical accuracy.Keywords: multiple intelligence, grammar, ELT, EFL, TIMI
Procedia PDF Downloads 4903372 Face Tracking and Recognition Using Deep Learning Approach
Authors: Degale Desta, Cheng Jian
Abstract:
The most important factor in identifying a person is their face. Even identical twins have their own distinct faces. As a result, identification and face recognition are needed to tell one person from another. A face recognition system is a verification tool used to establish a person's identity using biometrics. Nowadays, face recognition is a common technique used in a variety of applications, including home security systems, criminal identification, and phone unlock systems. This system is more secure because it only requires a facial image instead of other dependencies like a key or card. Face detection and face identification are the two phases that typically make up a human recognition system.The idea behind designing and creating a face recognition system using deep learning with Azure ML Python's OpenCV is explained in this paper. Face recognition is a task that can be accomplished using deep learning, and given the accuracy of this method, it appears to be a suitable approach. To show how accurate the suggested face recognition system is, experimental results are given in 98.46% accuracy using Fast-RCNN Performance of algorithms under different training conditions.Keywords: deep learning, face recognition, identification, fast-RCNN
Procedia PDF Downloads 1403371 Comparison of Different Machine Learning Algorithms for Solubility Prediction
Authors: Muhammet Baldan, Emel Timuçin
Abstract:
Molecular solubility prediction plays a crucial role in various fields, such as drug discovery, environmental science, and material science. In this study, we compare the performance of five machine learning algorithms—linear regression, support vector machines (SVM), random forests, gradient boosting machines (GBM), and neural networks—for predicting molecular solubility using the AqSolDB dataset. The dataset consists of 9981 data points with their corresponding solubility values. MACCS keys (166 bits), RDKit properties (20 properties), and structural properties(3) features are extracted for every smile representation in the dataset. A total of 189 features were used for training and testing for every molecule. Each algorithm is trained on a subset of the dataset and evaluated using metrics accuracy scores. Additionally, computational time for training and testing is recorded to assess the efficiency of each algorithm. Our results demonstrate that random forest model outperformed other algorithms in terms of predictive accuracy, achieving an 0.93 accuracy score. Gradient boosting machines and neural networks also exhibit strong performance, closely followed by support vector machines. Linear regression, while simpler in nature, demonstrates competitive performance but with slightly higher errors compared to ensemble methods. Overall, this study provides valuable insights into the performance of machine learning algorithms for molecular solubility prediction, highlighting the importance of algorithm selection in achieving accurate and efficient predictions in practical applications.Keywords: random forest, machine learning, comparison, feature extraction
Procedia PDF Downloads 403370 Implementation of an Image Processing System Using Artificial Intelligence for the Diagnosis of Malaria Disease
Authors: Mohammed Bnebaghdad, Feriel Betouche, Malika Semmani
Abstract:
Image processing become more sophisticated over time due to technological advances, especially artificial intelligence (AI) technology. Currently, AI image processing is used in many areas, including surveillance, industry, science, and medicine. AI in medical image processing can help doctors diagnose diseases faster, with minimal mistakes, and with less effort. Among these diseases is malaria, which remains a major public health challenge in many parts of the world. It affects millions of people every year, particularly in tropical and subtropical regions. Early detection of malaria is essential to prevent serious complications and reduce the burden of the disease. In this paper, we propose and implement a scheme based on AI image processing to enhance malaria disease diagnosis through automated analysis of blood smear images. The scheme is based on the convolutional neural network (CNN) method. So, we have developed a model that classifies infected and uninfected single red cells using images available on Kaggle, as well as real blood smear images obtained from the Central Laboratory of Medical Biology EHS Laadi Flici (formerly El Kettar) in Algeria. The real images were segmented into individual cells using the watershed algorithm in order to match the images from the Kaagle dataset. The model was trained and tested, achieving an accuracy of 99% and 97% accuracy for new real images. This validates that the model performs well with new real images, although with slightly lower accuracy. Additionally, the model has been embedded in a Raspberry Pi4, and a graphical user interface (GUI) was developed to visualize the malaria diagnostic results and facilitate user interaction.Keywords: medical image processing, malaria parasite, classification, CNN, artificial intelligence
Procedia PDF Downloads 193369 An ALM Matrix Completion Algorithm for Recovering Weather Monitoring Data
Authors: Yuqing Chen, Ying Xu, Renfa Li
Abstract:
The development of matrix completion theory provides new approaches for data gathering in Wireless Sensor Networks (WSN). The existing matrix completion algorithms for WSN mainly consider how to reduce the sampling number without considering the real-time performance when recovering the data matrix. In order to guarantee the recovery accuracy and reduce the recovery time consumed simultaneously, we propose a new ALM algorithm to recover the weather monitoring data. A lot of experiments have been carried out to investigate the performance of the proposed ALM algorithm by using different parameter settings, different sampling rates and sampling models. In addition, we compare the proposed ALM algorithm with some existing algorithms in the literature. Experimental results show that the ALM algorithm can obtain better overall recovery accuracy with less computing time, which demonstrate that the ALM algorithm is an effective and efficient approach for recovering the real world weather monitoring data in WSN.Keywords: wireless sensor network, matrix completion, singular value thresholding, augmented Lagrange multiplier
Procedia PDF Downloads 3843368 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder
Authors: Dua Hişam, Serhat İkizoğlu
Abstract:
Identifying the problem behind balance disorder is one of the most interesting topics in the medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three machine learning (ML) models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest Classifier (RF) was the most accurate model.Keywords: vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting
Procedia PDF Downloads 693367 Implementation of a Low-Cost Instrumentation for an Open Cycle Wind Tunnel to Evaluate Pressure Coefficient
Authors: Cristian P. Topa, Esteban A. Valencia, Victor H. Hidalgo, Marco A. Martinez
Abstract:
Wind tunnel experiments for aerodynamic profiles display numerous advantages, such as: clean steady laminar flow, controlled environmental conditions, streamlines visualization, and real data acquisition. However, the experiment instrumentation usually is expensive, and hence, each test implies a incremented in design cost. The aim of this work is to select and implement a low-cost static pressure data acquisition system for a NACA 2412 airfoil in an open cycle wind tunnel. This work compares wind tunnel experiment with Computational Fluid Dynamics (CFD) simulation and parametric analysis. The experiment was evaluated at Reynolds of 1.65 e5, with increasing angles from -5° to 15°. The comparison between the approaches show good enough accuracy, between the experiment and CFD, additional parametric analysis results differ widely from the other methods, which complies with the lack of accuracy of the lateral approach due its simplicity.Keywords: wind tunnel, low cost instrumentation, experimental testing, CFD simulation
Procedia PDF Downloads 1803366 Effects of Listening to Pleasant Thai Classical Music on Increasing Working Memory in Elderly: An Electroencephalogram Study
Authors: Anchana Julsiri, Seree Chadcham
Abstract:
The present study determined the effects of listening to pleasant Thai classical music on increasing working memory in elderly. Thai classical music without lyrics that made participants feel fun and aroused was used in the experiment for 3.19-5.40 minutes. The accuracy scores of Counting Span Task (CST), upper alpha ERD%, and theta ERS% were used to assess working memory of participants both before and after listening to pleasant Thai classical music. The results showed that the accuracy scores of CST and upper alpha ERD% in the frontal area of participants after listening to Thai classical music were significantly higher than before listening to Thai classical music (p < .05). Theta ERS% in the fronto-parietal network of participants after listening to Thai classical music was significantly lower than before listening to Thai classical music (p < .05).Keywords: brain wave, elderly, pleasant Thai classical music, working memory
Procedia PDF Downloads 4593365 The Sequential Estimation of the Seismoacoustic Source Energy in C-OTDR Monitoring Systems
Authors: Andrey V. Timofeev, Dmitry V. Egorov
Abstract:
The practical efficient approach is suggested for estimation of the seismoacoustic sources energy in C-OTDR monitoring systems. This approach represents the sequential plan for confidence estimation both the seismoacoustic sources energy, as well the absorption coefficient of the soil. The sequential plan delivers the non-asymptotic guaranteed accuracy of obtained estimates in the form of non-asymptotic confidence regions with prescribed sizes. These confidence regions are valid for a finite sample size when the distributions of the observations are unknown. Thus, suggested estimates are non-asymptotic and nonparametric, and also these estimates guarantee the prescribed estimation accuracy in the form of the prior prescribed size of confidence regions, and prescribed confidence coefficient value.Keywords: nonparametric estimation, sequential confidence estimation, multichannel monitoring systems, C-OTDR-system, non-lineary regression
Procedia PDF Downloads 3563364 [Keynote Speech]: Feature Selection and Predictive Modeling of Housing Data Using Random Forest
Authors: Bharatendra Rai
Abstract:
Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).Keywords: housing data, feature selection, random forest, Boruta algorithm, root mean square error
Procedia PDF Downloads 3233363 Research on Reservoir Lithology Prediction Based on Residual Neural Network and Squeeze-and- Excitation Neural Network
Authors: Li Kewen, Su Zhaoxin, Wang Xingmou, Zhu Jian Bing
Abstract:
Conventional reservoir prediction methods ar not sufficient to explore the implicit relation between seismic attributes, and thus data utilization is low. In order to improve the predictive classification accuracy of reservoir lithology, this paper proposes a deep learning lithology prediction method based on ResNet (Residual Neural Network) and SENet (Squeeze-and-Excitation Neural Network). The neural network model is built and trained by using seismic attribute data and lithology data of Shengli oilfield, and the nonlinear mapping relationship between seismic attribute and lithology marker is established. The experimental results show that this method can significantly improve the classification effect of reservoir lithology, and the classification accuracy is close to 70%. This study can effectively predict the lithology of undrilled area and provide support for exploration and development.Keywords: convolutional neural network, lithology, prediction of reservoir, seismic attributes
Procedia PDF Downloads 1773362 A Chinese Nested Named Entity Recognition Model Based on Lexical Features
Abstract:
In the field of named entity recognition, most of the research has been conducted around simple entities. However, for nested named entities, which still contain entities within entities, it has been difficult to identify them accurately due to their boundary ambiguity. In this paper, a hierarchical recognition model is constructed based on the grammatical structure and semantic features of Chinese text for boundary calculation based on lexical features. The analysis is carried out at different levels in terms of granularity, semantics, and lexicality, respectively, avoiding repetitive work to reduce computational effort and using the semantic features of words to calculate the boundaries of entities to improve the accuracy of the recognition work. The results of the experiments carried out on web-based microblogging data show that the model achieves an accuracy of 86.33% and an F1 value of 89.27% in recognizing nested named entities, making up for the shortcomings of some previous recognition models and improving the efficiency of recognition of nested named entities.Keywords: coarse-grained, nested named entity, Chinese natural language processing, word embedding, T-SNE dimensionality reduction algorithm
Procedia PDF Downloads 1283361 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments
Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic
Abstract:
Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.Keywords: time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder
Procedia PDF Downloads 289