Search results for: Support Vector Machines (SVM)
8037 Leveraging SHAP Values for Effective Feature Selection in Peptide Identification
Authors: Sharon Li, Zhonghang Xia
Abstract:
Post-database search is an essential phase in peptide identification using tandem mass spectrometry (MS/MS) to refine peptide-spectrum matches (PSMs) produced by database search engines. These engines frequently face difficulty differentiating between correct and incorrect peptide assignments. Despite advances in statistical and machine learning methods aimed at improving the accuracy of peptide identification, challenges remain in selecting critical features for these models. In this study, two machine learning models—a random forest tree and a support vector machine—were applied to three datasets to enhance PSMs. SHAP values were utilized to determine the significance of each feature within the models. The experimental results indicate that the random forest model consistently outperformed the SVM across all datasets. Further analysis of SHAP values revealed that the importance of features varies depending on the dataset, indicating that a feature's role in model predictions can differ significantly. This variability in feature selection can lead to substantial differences in model performance, with false discovery rate (FDR) differences exceeding 50% between different feature combinations. Through SHAP value analysis, the most effective feature combinations were identified, significantly enhancing model performance.Keywords: peptide identification, SHAP value, feature selection, random forest tree, support vector machine
Procedia PDF Downloads 238036 Exploring Students' Alternative Conception in Vector Components
Authors: Umporn Wutchana
Abstract:
An open ended problem and unstructured interview had been used to explore students’ conceptual and procedural understanding of vector components. The open ended problem had been designed based on research instrument used in previous physics education research. Without physical context, we asked students to find out magnitude and draw graphical form of vector components. The open ended problem was given to 211 first year students of faculty of science during the third (summer) semester in 2014 academic year. The students spent approximately 15 minutes of their second time of the General Physics I course to complete the open ended problem after they had failed. Consequently, their responses were classified based on the similarity of errors performed in the responses. Then, an unstructured interview was conducted. 7 students were randomly selected and asked to reason and explain their answers. The study results showed that 53% of 211 students provided correct numerical magnitude of vector components while 10.9% of them confused and punctuated the magnitude of vectors in x- with y-components. Others 20.4% provided just symbols and the last 15.6% gave no answer. When asking to draw graphical form of vector components, only 10% of 211 students made corrections. A majority of them produced errors and revealed alternative conceptions. 46.5% drew longer and/or shorter magnitude of vector components. 43.1% drew vectors in different forms or wrote down other symbols. Results from the unstructured interview indicated that some students just memorized the method to get numerical magnitude of x- and y-components. About graphical form of component vectors, some students though that the length of component vectors should be shorter than those of the given one. So then, it could be combined to be equal length of the given vectors while others though that component vectors should has the same length as the given vectors. It was likely to be that many students did not develop a strong foundation of understanding in vector components but just learn by memorizing its solution or the way to compute its magnitude and attribute little meaning to such concept.Keywords: graphical vectors, vectors, vector components, misconceptions, alternative conceptions
Procedia PDF Downloads 1898035 Roughness Discrimination Using Bioinspired Tactile Sensors
Authors: Zhengkun Yi
Abstract:
Surface texture discrimination using artificial tactile sensors has attracted increasing attentions in the past decade as it can endow technical and robot systems with a key missing ability. However, as a major component of texture, roughness has rarely been explored. This paper presents an approach for tactile surface roughness discrimination, which includes two parts: (1) design and fabrication of a bioinspired artificial fingertip, and (2) tactile signal processing for tactile surface roughness discrimination. The bioinspired fingertip is comprised of two polydimethylsiloxane (PDMS) layers, a polymethyl methacrylate (PMMA) bar, and two perpendicular polyvinylidene difluoride (PVDF) film sensors. This artificial fingertip mimics human fingertips in three aspects: (1) Elastic properties of epidermis and dermis in human skin are replicated by the two PDMS layers with different stiffness, (2) The PMMA bar serves the role analogous to that of a bone, and (3) PVDF film sensors emulate Meissner’s corpuscles in terms of both location and response to the vibratory stimuli. Various extracted features and classification algorithms including support vector machines (SVM) and k-nearest neighbors (kNN) are examined for tactile surface roughness discrimination. Eight standard rough surfaces with roughness values (Ra) of 50 μm, 25 μm, 12.5 μm, 6.3 μm 3.2 μm, 1.6 μm, 0.8 μm, and 0.4 μm are explored. The highest classification accuracy of (82.6 ± 10.8) % can be achieved using solely one PVDF film sensor with kNN (k = 9) classifier and the standard deviation feature.Keywords: bioinspired fingertip, classifier, feature extraction, roughness discrimination
Procedia PDF Downloads 3128034 Parkinson’s Disease Detection Analysis through Machine Learning Approaches
Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee
Abstract:
Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier
Procedia PDF Downloads 1298033 Application of Rapid Eye Imagery in Crop Type Classification Using Vegetation Indices
Authors: Sunita Singh, Rajani Srivastava
Abstract:
For natural resource management and in other applications about earth observation revolutionary remote sensing technology plays a significant role. One of such application in monitoring and classification of crop types at spatial and temporal scale, as it provides latest, most precise and cost-effective information. Present study emphasizes the use of three different vegetation indices of Rapid Eye imagery on crop type classification. It also analyzed the effect of each indices on classification accuracy. Rapid Eye imagery is highly demanded and preferred for agricultural and forestry sectors as it has red-edge and NIR bands. The three indices used in this study were: the Normalized Difference Vegetation Index (NDVI), the Green Normalized Difference Vegetation Index (GNDVI), and the Normalized Difference Red Edge Index (NDRE) and all of these incorporated the Red Edge band. The study area is Varanasi district of Uttar Pradesh, India and Radial Basis Function (RBF) kernel was used here for the Support Vector Machines (SVMs) classification. Classification was performed with these three vegetation indices. The contribution of each indices on image classification accuracy was also tested with single band classification. Highest classification accuracy of 85% was obtained using three vegetation indices. The study concluded that NDRE has the highest contribution on classification accuracy compared to the other vegetation indices and the Rapid Eye imagery can get satisfactory results of classification accuracy without original bands.Keywords: GNDVI, NDRE, NDVI, rapid eye, vegetation indices
Procedia PDF Downloads 3628032 Comparison of GIS-Based Soil Erosion Susceptibility Models Using Support Vector Machine, Binary Logistic Regression and Artificial Neural Network in the Southwest Amazon Region
Authors: Elaine Lima Da Fonseca, Eliomar Pereira Da Silva Filho
Abstract:
The modeling of areas susceptible to soil loss by hydro erosive processes consists of a simplified instrument of reality with the purpose of predicting future behaviors from the observation and interaction of a set of geoenvironmental factors. The models of potential areas for soil loss will be obtained through binary logistic regression, artificial neural networks, and support vector machines. The choice of the municipality of Colorado do Oeste in the south of the western Amazon is due to soil degradation due to anthropogenic activities, such as agriculture, road construction, overgrazing, deforestation, and environmental and socioeconomic configurations. Initially, a soil erosion inventory map constructed through various field investigations will be designed, including the use of remotely piloted aircraft, orbital imagery, and the PLANAFLORO/RO database. 100 sampling units with the presence of erosion will be selected based on the assumptions indicated in the literature, and, to complement the dichotomous analysis, 100 units with no erosion will be randomly designated. The next step will be the selection of the predictive parameters that exert, jointly, directly, or indirectly, some influence on the mechanism of occurrence of soil erosion events. The chosen predictors are altitude, declivity, aspect or orientation of the slope, curvature of the slope, composite topographic index, flow power index, lineament density, normalized difference vegetation index, drainage density, lithology, soil type, erosivity, and ground surface temperature. After evaluating the relative contribution of each predictor variable, the erosion susceptibility model will be applied to the municipality of Colorado do Oeste - Rondônia through the SPSS Statistic 26 software. Evaluation of the model will occur through the determination of the values of the R² of Cox & Snell and the R² of Nagelkerke, Hosmer and Lemeshow Test, Log Likelihood Value, and Wald Test, in addition to analysis of the Confounding Matrix, ROC Curve and Accumulated Gain according to the model specification. The validation of the synthesis map resulting from both models of the potential risk of soil erosion will occur by means of Kappa indices, accuracy, and sensitivity, as well as by field verification of the classes of susceptibility to erosion using drone photogrammetry. Thus, it is expected to obtain the mapping of the following classes of susceptibility to erosion very low, low, moderate, very high, and high, which may constitute a screening tool to identify areas where more detailed investigations need to be carried out, applying more efficient social resources.Keywords: modeling, susceptibility to erosion, artificial intelligence, Amazon
Procedia PDF Downloads 668031 Identification of Hepatocellular Carcinoma Using Supervised Learning Algorithms
Authors: Sagri Sharma
Abstract:
Analysis of diseases integrating multi-factors increases the complexity of the problem and therefore, development of frameworks for the analysis of diseases is an issue that is currently a topic of intense research. Due to the inter-dependence of the various parameters, the use of traditional methodologies has not been very effective. Consequently, newer methodologies are being sought to deal with the problem. Supervised Learning Algorithms are commonly used for performing the prediction on previously unseen data. These algorithms are commonly used for applications in fields ranging from image analysis to protein structure and function prediction and they get trained using a known dataset to come up with a predictor model that generates reasonable predictions for the response to new data. Gene expression profiles generated by DNA analysis experiments can be quite complex since these experiments can involve hypotheses involving entire genomes. The application of well-known machine learning algorithm - Support Vector Machine - to analyze the expression levels of thousands of genes simultaneously in a timely, automated and cost effective way is thus used. The objectives to undertake the presented work are development of a methodology to identify genes relevant to Hepatocellular Carcinoma (HCC) from gene expression dataset utilizing supervised learning algorithms and statistical evaluations along with development of a predictive framework that can perform classification tasks on new, unseen data.Keywords: artificial intelligence, biomarker, gene expression datasets, hepatocellular carcinoma, machine learning, supervised learning algorithms, support vector machine
Procedia PDF Downloads 4298030 LED Lighting Interviews and Assessment in Forest Machines
Authors: Rauno Pääkkönen, Fabriziomaria Gobba, Leena Korpinen
Abstract:
The objective of the study is to assess the implementation of LED lighting into forest machine work in the dark. In addition, the paper includes a wide variety of important and relevant safety and health parameters. In modern, computerized work in the cab of forest machines, artificial illumination is a demanding task when performing duties, such as the visual inspections of wood and computer calculations. We interviewed entrepreneurs and gathered the following as the most pertinent themes: (1) safety, (2) practical problems, and (3) work with LED lighting. The most important comments were in regards to the practical problems of LED lighting. We found indications of technical problems in implementing LED lighting, like snow and dirt on the surfaces of lamps that dim the emission of light. Moreover, service work in the dark forest is dangerous and increases the risks of on-site accidents. We also concluded that the amount of blue light to the eyes should be assessed, especially, when the drivers are working in a semi-dark cab.Keywords: forest machines, health, LED, safety
Procedia PDF Downloads 4308029 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition
Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar
Abstract:
In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers
Procedia PDF Downloads 458028 Semigroups of Linear Transformations with Fixed Subspaces: Green’s Relations and Ideals
Authors: Yanisa Chaiya, Jintana Sanwong
Abstract:
Let V be a vector space over a field and W a subspace of V. Let Fix(V,W) denote the set of all linear transformations on V with fix all elements in W. In this paper, we show that Fix(V,W) is a semigroup under the composition of maps and describe Green’s relations on this semigroup in terms of images, kernels and the dimensions of subspaces of the quotient space V/W where V/W = {v+W : v is an element in V} with v+W = {v+w : w is an element in W}. Let dim(U) denote the dimension of a vector space U and Vα = {vα : v is an element in V} where vα is an image of v under a linear transformation α. For any cardinal number a let a'= min{b : b > a}. We also show that the ideals of Fix(V,W) are precisely the sets. Fix(r) ={α ∊ Fix(V,W) : dim(Vα/W) < r} where 1 ≤ r ≤ a' and a = dim(V/W). Moreover, we prove that if V is a finite-dimensional vector space, then every ideal of Fix(V,W) is principle.Keywords: Green’s relations, ideals, linear transformation semi-groups, principle ideals
Procedia PDF Downloads 2928027 Improved Classification Procedure for Imbalanced and Overlapped Situations
Authors: Hankyu Lee, Seoung Bum Kim
Abstract:
The issue with imbalance and overlapping in the class distribution becomes important in various applications of data mining. The imbalanced dataset is a special case in classification problems in which the number of observations of one class (i.e., major class) heavily exceeds the number of observations of the other class (i.e., minor class). Overlapped dataset is the case where many observations are shared together between the two classes. Imbalanced and overlapped data can be frequently found in many real examples including fraud and abuse patients in healthcare, quality prediction in manufacturing, text classification, oil spill detection, remote sensing, and so on. The class imbalance and overlap problem is the challenging issue because this situation degrades the performance of most of the standard classification algorithms. In this study, we propose a classification procedure that can effectively handle imbalanced and overlapped datasets by splitting data space into three parts: nonoverlapping, light overlapping, and severe overlapping and applying the classification algorithm in each part. These three parts were determined based on the Hausdorff distance and the margin of the modified support vector machine. An experiments study was conducted to examine the properties of the proposed method and compared it with other classification algorithms. The results showed that the proposed method outperformed the competitors under various imbalanced and overlapped situations. Moreover, the applicability of the proposed method was demonstrated through the experiment with real data.Keywords: classification, imbalanced data with class overlap, split data space, support vector machine
Procedia PDF Downloads 3088026 CNC Milling-Drilling Machine Cutting Tool Holder
Authors: Hasan Al Dabbas
Abstract:
In this paper, it is addressed that the mechanical machinery captures a major share of innovation in drilling and milling chucks technology. Users demand higher speeds in milling because they are cutting more aluminum and are relying on higher speeds to eliminate secondary finishing operations. To meet that demand, milling-machine builders have enhanced their machine’s rigidity. Moreover, faster cutting has caught up with boring mills. Cooling these machine’s internal components is a challenge at high speeds. Another trend predicted that it is more use of controlled axes to let the machines do many more operations on 5 sides without having to move or re-fix the work. Advances of technology in mechanical engineering have helped to make high-speed machining equipment. To accompany these changes in milling and drilling machines chucks, the demand of easiest software is increased. An open architecture controller is being sought that would allow flexibility and information exchange.Keywords: drilling, milling, chucks, cutting edges, tools, machines
Procedia PDF Downloads 5728025 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity
Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Saifur Rahman Sabuj
Abstract:
This paper examines relationships between solar activity and earthquakes; it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to affect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth.Keywords: k-nearest neighbour, support vector regression, random forest regression, long short-term memory network, earthquakes, solar activity, sunspot number, solar wind, solar flares
Procedia PDF Downloads 738024 Retrospective Evaluation of Vector-borne Infections in Cats Living in Germany (2012-2019)
Authors: I. Schäfer, B. Kohn, M. Volkmann, E. Müller
Abstract:
Introduction: Blood-feeding arthropods transmit parasitic, bacterial, or viral pathogens to domestic animals and wildlife. Vector-borne infections are gaining significance due to the increase of travel, import of domestic animals from abroad, and the changing climate in Europe. Aims of the study: The main objective of this retrospective study was to assess the prevalence of vector-borne infections in cats in which a ‘Feline Travel Profile’ had been conducted. Material and Methods: This retrospective study included test results from cats for which a ‘Feline Travel Profile’ established by LABOKLIN had been requested by veterinarians between April 2012 and December 2019. This profile contains direct detection methods via polymerase chain reaction (PCR) for Hepatozoon spp. and Dirofilaria spp. as well as indirect detection methods via immunofluorescence antibody test (IFAT) for Ehrlichia spp. and Leishmania spp. This profile was expanded to include an IFAT for Rickettsia spp. from July 2015 onwards. The prevalence of the different vector-borne infectious agents was calculated. Results: A total of 602 cats were tested using the ‘Feline Travel Profile’. Positive test results were as follows: Rickettsia spp. IFAT 54/442 (12.2%), Ehrlichia spp. IFAT 68/602 (11.3%), Leishmania spp. IFAT 21/602 (3.5%), Hepatozoon spp. PCR 51/595 (8.6%), and Dirofilaria spp. PCR 1/595 cats (0.2%). Co-infections with more than one pathogen could be detected in 22/602 cats. Conclusions: 170/602 cats (28.2%) were tested positive for at least one vector-borne pathogen. Infections with multiple pathogens could be detected in 3.7% of the cats. The data emphasizes the importance of considering vector-borne infections as potential differential diagnoses in cats.Keywords: arthopod-transmitted infections, feline vector-borne infections, Germany, laboratory diagnostics
Procedia PDF Downloads 1668023 Prioritization in a Maintenance, Repair and Overhaul (MRO) System Based on Fuzzy Logic at Iran Khodro (IKCO)
Authors: Izadi Banafsheh, Sedaghat Reza
Abstract:
Maintenance, Repair, and Overhaul (MRO) of machinery are a key recent issue concerning the automotive industry. It has always been a debated question what order or priority should be adopted for the MRO of machinery. This study attempts to examine several criteria including process sensitivity, average time between machine failures, average duration of repair, availability of parts, availability of maintenance personnel and workload through a literature review and experts survey so as to determine the condition of the machine. According to the mentioned criteria, the machinery were ranked in four modes below: A) Need for inspection, B) Need for minor repair, C) Need for part replacement, and D) Need for major repair. The Fuzzy AHP was employed to determine the weighting of criteria. At the end, the obtained weights were ranked through the AHP for each criterion, three groups were specified: shaving machines, assembly and painting in four modes. The statistical population comprises the elite in the Iranian automotive industry at IKCO covering operation managers, CEOs and maintenance professionals who are highly specialized in MRO and perfectly knowledgeable in how the machinery function. The information required for this study were collected from both desk research and field review, which eventually led to construction of a questionnaire handed out to the sample respondents in order to collect information on the subject matter. The results of the AHP for weighting the criteria revealed that the availability of maintenance personnel was the top priority at coefficient of 0.206, while the process sensitivity took the last priority at coefficient of 0.066. Furthermore, the results of TOPSIS for prioritizing the IKCO machinery suggested that at the mode where there is need for inspection, the assembly machines took the top priority while paining machines took the third priority. As for the mode where there is need for minor repairs, the assembly machines took the top priority while the third priority belonged to the shaving machines. As for the mode where there is need for parts replacement, the assembly machines took the top priority while the third belonged to the paining machinery. Finally, as for the mode where there is need for major repair, the assembly machines took the top priority while the third belonged to the paining machinery.Keywords: maintenance, repair, overhaul, MRO, prioritization of machinery, fuzzy logic, AHP, TOPSIS
Procedia PDF Downloads 2868022 Feasibility Study of Wireless Communication for the Control and Monitoring of Rotating Electrical Machine
Authors: S. Ben Brahim, T. H. Vuong, J. David, R. Bouallegue, M. Pietrzak-David
Abstract:
Electrical machine monitoring is important to protect motor from unexpected problems. Today, using wireless communication for electrical machines is interesting for both real time monitoring and diagnostic purposes. In this paper, we propose a system based on wireless communication IEEE 802.11 to control electrical machine. IEEE 802.11 standard is recommended for this type of applications because it provides a faster connection, better range from the base station, and better security. Therefore, our contribution is to study a new technique to control and monitor the rotating electrical machines (motors, generators) using wireless communication. The reliability of radio channel inside rotating electrical machine is also discussed. Then, the communication protocol, software and hardware design used for the proposed system are presented in detail and the experimental results of our system are illustrated.Keywords: control, DFIM machine, electromagnetic field, EMC, IEEE 802.11, monitoring, rotating electrical machines, wireless communication
Procedia PDF Downloads 6958021 Recent Advances in Pulse Width Modulation Techniques and Multilevel Inverters
Authors: Satish Kumar Peddapelli
Abstract:
This paper presents advances in pulse width modulation techniques which refers to a method of carrying information on train of pulses and the information be encoded in the width of pulses. Pulse Width Modulation is used to control the inverter output voltage. This is done by exercising the control within the inverter itself by adjusting the ON and OFF periods of inverter. By fixing the DC input voltage we get AC output voltage. In variable speed AC motors the AC output voltage from a constant DC voltage is obtained by using inverter. Recent developments in power electronics and semiconductor technology have lead improvements in power electronic systems. Hence, different circuit configurations namely multilevel inverters have become popular and considerable interest by researcher are given on them. A fast Space-Vector Pulse Width Modulation (SVPWM) method for five-level inverter is also discussed. In this method, the space vector diagram of the five-level inverter is decomposed into six space vector diagrams of three-level inverters. In turn, each of these six space vector diagrams of three-level inverter is decomposed into six space vector diagrams of two-level inverters. After decomposition, all the remaining necessary procedures for the three-level SVPWM are done like conventional two-level inverter. The proposed method reduces the algorithm complexity and the execution time. It can be applied to the multilevel inverters above the five-level also. The experimental setup for three-level diode-clamped inverter is developed using TMS320LF2407 DSP controller and the experimental results are analysed.Keywords: five-level inverter, space vector pulse wide modulation, diode clamped inverter, electrical engineering
Procedia PDF Downloads 3888020 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes
Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani
Abstract:
The development of the method to annotate unknown gene functions is an important task in bioinformatics. One of the approaches for the annotation is The identification of the metabolic pathway that genes are involved in. Gene expression data have been utilized for the identification, since gene expression data reflect various intracellular phenomena. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.Keywords: metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning
Procedia PDF Downloads 4038019 Infodemic Detection on Social Media with a Multi-Dimensional Deep Learning Framework
Authors: Raymond Xu, Cindy Jingru Wang
Abstract:
Social media has become a globally connected and influencing platform. Social media data, such as tweets, can help predict the spread of pandemics and provide individuals and healthcare providers early warnings. Public psychological reactions and opinions can be efficiently monitored by AI models on the progression of dominant topics on Twitter. However, statistics show that as the coronavirus spreads, so does an infodemic of misinformation due to pandemic-related factors such as unemployment and lockdowns. Social media algorithms are often biased toward outrage by promoting content that people have an emotional reaction to and are likely to engage with. This can influence users’ attitudes and cause confusion. Therefore, social media is a double-edged sword. Combating fake news and biased content has become one of the essential tasks. This research analyzes the variety of methods used for fake news detection covering random forest, logistic regression, support vector machines, decision tree, naive Bayes, BoW, TF-IDF, LDA, CNN, RNN, LSTM, DeepFake, and hierarchical attention network. The performance of each method is analyzed. Based on these models’ achievements and limitations, a multi-dimensional AI framework is proposed to achieve higher accuracy in infodemic detection, especially pandemic-related news. The model is trained on contextual content, images, and news metadata.Keywords: artificial intelligence, fake news detection, infodemic detection, image recognition, sentiment analysis
Procedia PDF Downloads 2548018 Image Compression Based on Regression SVM and Biorthogonal Wavelets
Authors: Zikiou Nadia, Lahdir Mourad, Ameur Soltane
Abstract:
In this paper, we propose an effective method for image compression based on SVM Regression (SVR), with three different kernels, and biorthogonal 2D Discrete Wavelet Transform. SVM regression could learn dependency from training data and compressed using fewer training points (support vectors) to represent the original data and eliminate the redundancy. Biorthogonal wavelet has been used to transform the image and the coefficients acquired are then trained with different kernels SVM (Gaussian, Polynomial, and Linear). Run-length and Arithmetic coders are used to encode the support vectors and its corresponding weights, obtained from the SVM regression. The peak signal noise ratio (PSNR) and their compression ratios of several test images, compressed with our algorithm, with different kernels are presented. Compared with other kernels, Gaussian kernel achieves better image quality. Experimental results show that the compression performance of our method gains much improvement.Keywords: image compression, 2D discrete wavelet transform (DWT-2D), support vector regression (SVR), SVM Kernels, run-length, arithmetic coding
Procedia PDF Downloads 3828017 Develop a Conceptual Data Model of Geotechnical Risk Assessment in Underground Coal Mining Using a Cloud-Based Machine Learning Platform
Authors: Reza Mohammadzadeh
Abstract:
The major challenges in geotechnical engineering in underground spaces arise from uncertainties and different probabilities. The collection, collation, and collaboration of existing data to incorporate them in analysis and design for given prospect evaluation would be a reliable, practical problem solving method under uncertainty. Machine learning (ML) is a subfield of artificial intelligence in statistical science which applies different techniques (e.g., Regression, neural networks, support vector machines, decision trees, random forests, genetic programming, etc.) on data to automatically learn and improve from them without being explicitly programmed and make decisions and predictions. In this paper, a conceptual database schema of geotechnical risks in underground coal mining based on a cloud system architecture has been designed. A new approach of risk assessment using a three-dimensional risk matrix supported by the level of knowledge (LoK) has been proposed in this model. Subsequently, the model workflow methodology stages have been described. In order to train data and LoK models deployment, an ML platform has been implemented. IBM Watson Studio, as a leading data science tool and data-driven cloud integration ML platform, is employed in this study. As a Use case, a data set of geotechnical hazards and risk assessment in underground coal mining were prepared to demonstrate the performance of the model, and accordingly, the results have been outlined.Keywords: data model, geotechnical risks, machine learning, underground coal mining
Procedia PDF Downloads 2748016 Possibility of Creating Polygon Layers from Raster Layers Obtained by using Classic Image Processing Software: Case of Geological Map of Rwanda
Authors: Louis Nahimana
Abstract:
Most maps are in a raster or pdf format and it is not easy to get vector layers of published maps. Faced to the production of geological simplified map of the northern Lake Tanganyika countries without geological information in vector format, I tried a method of obtaining vector layers from raster layers created from geological maps of Rwanda and DR Congo in pdf and jpg format. The procedure was as follows: The original raster maps were georeferenced using ArcGIS10.2. Under Adobe Photoshop, map areas with the same color corresponding to a lithostratigraphic unit were selected all over the map and saved in a specific raster layer. Using the same image processing software Adobe Photoshop, each RGB raster layer was converted in grayscale type and improved before importation in ArcGIS10. After georeferencing, each lithostratigraphic raster layer was transformed into a multitude of polygons with the tool "Raster to Polygon (Conversion)". Thereafter, tool "Aggregate Polygons (Cartography)" allowed obtaining a single polygon layer. Repeating the same steps for each color corresponding to a homogeneous rock unit, it was possible to reconstruct the simplified geological constitution of Rwanda and the Democratic Republic of Congo in vector format. By using the tool «Append (Management)», vector layers obtained were combined with those from Burundi to achieve vector layers of the geology of the « Northern Lake Tanganyika countries ».Keywords: creating raster layer under image processing software, raster to polygon, aggregate polygons, adobe photoshop
Procedia PDF Downloads 4428015 Machine Learning in Agriculture: A Brief Review
Authors: Aishi Kundu, Elhan Raza
Abstract:
"Necessity is the mother of invention" - Rapid increase in the global human population has directed the agricultural domain toward machine learning. The basic need of human beings is considered to be food which can be satisfied through farming. Farming is one of the major revenue generators for the Indian economy. Agriculture is not only considered a source of employment but also fulfils humans’ basic needs. So, agriculture is considered to be the source of employment and a pillar of the economy in developing countries like India. This paper provides a brief review of the progress made in implementing Machine Learning in the agricultural sector. Accurate predictions are necessary at the right time to boost production and to aid the timely and systematic distribution of agricultural commodities to make their availability in the market faster and more effective. This paper includes a thorough analysis of various machine learning algorithms applied in different aspects of agriculture (crop management, soil management, water management, yield tracking, livestock management, etc.).Due to climate changes, crop production is affected. Machine learning can analyse the changing patterns and come up with a suitable approach to minimize loss and maximize yield. Machine Learning algorithms/ models (regression, support vector machines, bayesian models, artificial neural networks, decision trees, etc.) are used in smart agriculture to analyze and predict specific outcomes which can be vital in increasing the productivity of the Agricultural Food Industry. It is to demonstrate vividly agricultural works under machine learning to sensor data. Machine Learning is the ongoing technology benefitting farmers to improve gains in agriculture and minimize losses. This paper discusses how the irrigation and farming management systems evolve in real-time efficiently. Artificial Intelligence (AI) enabled programs to emerge with rich apprehension for the support of farmers with an immense examination of data.Keywords: machine Learning, artificial intelligence, crop management, precision farming, smart farming, pre-harvesting, harvesting, post-harvesting
Procedia PDF Downloads 1058014 Computational Linguistic Implications of Gender Bias: Machines Reflect Misogyny in Society
Authors: Irene Yi
Abstract:
Machine learning, natural language processing, and neural network models of language are becoming more and more prevalent in the fields of technology and linguistics today. Training data for machines are at best, large corpora of human literature and at worst, a reflection of the ugliness in society. Computational linguistics is a growing field dealing with such issues of data collection for technological development. Machines have been trained on millions of human books, only to find that in the course of human history, derogatory and sexist adjectives are used significantly more frequently when describing females in history and literature than when describing males. This is extremely problematic, both as training data, and as the outcome of natural language processing. As machines start to handle more responsibilities, it is crucial to ensure that they do not take with them historical sexist and misogynistic notions. This paper gathers data and algorithms from neural network models of language having to deal with syntax, semantics, sociolinguistics, and text classification. Computational analysis on such linguistic data is used to find patterns of misogyny. Results are significant in showing the existing intentional and unintentional misogynistic notions used to train machines, as well as in developing better technologies that take into account the semantics and syntax of text to be more mindful and reflect gender equality. Further, this paper deals with the idea of non-binary gender pronouns and how machines can process these pronouns correctly, given its semantic and syntactic context. This paper also delves into the implications of gendered grammar and its effect, cross-linguistically, on natural language processing. Languages such as French or Spanish not only have rigid gendered grammar rules, but also historically patriarchal societies. The progression of society comes hand in hand with not only its language, but how machines process those natural languages. These ideas are all extremely vital to the development of natural language models in technology, and they must be taken into account immediately.Keywords: computational analysis, gendered grammar, misogynistic language, neural networks
Procedia PDF Downloads 1198013 Investigation of the Effects of Sampling Frequency on the THD of 3-Phase Inverters Using Space Vector Modulation
Authors: Khattab Al Qaisi, Nicholas Bowring
Abstract:
This paper presents the simulation results of the effects of sampling frequency on the total harmonic distortion (THD) of three-phase inverters using the space vector pulse width modulation (SVPWM) and space vector control (SVC) algorithms. The relationship between the variables was studied using curve fitting techniques, and it has been shown that, for 50 Hz inverters, there is an exponential relation between the sampling frequency and THD up to around 8500 Hz, beyond which the performance of the model becomes irregular, and there is an negative exponential relation between the sampling frequency and the marginal improvement to the THD. It has also been found that the performance of SVPWM is better than that of SVC with the same sampling frequency in most frequency range, including the range where the performance of the former is irregular.Keywords: DSI, SVPWM, THD, DC-AC converter, sampling frequency, performance
Procedia PDF Downloads 4858012 Molecular Detection of Leishmania from the Phlebotomus Genus: Tendency towards Leishmaniasis Regression in Constantine, North-East of Algeria
Authors: K. Frahtia, I. Mihoubi, S. Picot
Abstract:
Leishmaniasis is a group of parasitic disease with a varied clinical expression caused by flagellate protozoa of the Leishmania genus. These diseases are transmitted to humans and animals by the sting of a vector insect, the female sandfly. Among the groups of dipteral disease vectors, Phlebotominae occupy a prime position and play a significant role in human pathology, such as leishmaniasis that affects nearly 350 million people worldwide. The vector control operation launched by health services throughout the country proves to be effective since despite the prevalence of the disease remains high especially in rural areas, leishmaniasis appears to be declining in Algeria. In this context, this study mainly concerns molecular detection of Leishmania from the vector. Furthermore, a molecular diagnosis has also been made on skin samples taken from patients in the region of Constantine, located in the North-East of Algeria. Concerning the vector, 5858 sandflies were captured, including 4360 males and 1498 females. Male specimens were identified based on their morphological. The morphological identification highlighted the presence of the Phlebotomus genus with a prevalence of 93% against 7% represented by the Sergentomyia genus. About the identified species, P. perniciosus is the most abundant with 59.4% of the male identified population followed by P. longicuspis with 24.7% of the workforce. P. perfiliewi is poorly represented by 6.7% of specimens followed by P. papatasi with 2.2% and 1.5% S. dreyfussi. Concerning skin samples, 45/79 (56.96%) collected samples were found positive by real-time PCR. This rate appears to be in sharp decline compared to previous years (alert peak of 30,227 cases in 2005). Concerning the detection of Leishmania from sandflies by RT-PCR, the results show that 3/60 PCR performed genus are positive with melting temperatures corresponding to that of the reference strain (84.1 +/- 0.4 ° C for L. infantum). This proves that the vectors were parasitized. On the other side, identification by RT-PCR species did not give any results. This could be explained by the presence of an insufficient amount of leishmanian DNA in the vector, and therefore support the hypothesis of the regression of leishmaniasis in Constantine.Keywords: Algeria, molecular diagnostic, phlebotomus, real time PCR
Procedia PDF Downloads 2728011 Sentiment Analysis of Chinese Microblog Comments: Comparison between Support Vector Machine and Long Short-Term Memory
Authors: Xu Jiaqiao
Abstract:
Text sentiment analysis is an important branch of natural language processing. This technology is widely used in public opinion analysis and web surfing recommendations. At present, the mainstream sentiment analysis methods include three parts: sentiment analysis based on a sentiment dictionary, based on traditional machine learning, and based on deep learning. This paper mainly analyzes and compares the advantages and disadvantages of the SVM method of traditional machine learning and the Long Short-term Memory (LSTM) method of deep learning in the field of Chinese sentiment analysis, using Chinese comments on Sina Microblog as the data set. Firstly, this paper classifies and adds labels to the original comment dataset obtained by the web crawler, and then uses Jieba word segmentation to classify the original dataset and remove stop words. After that, this paper extracts text feature vectors and builds document word vectors to facilitate the training of the model. Finally, SVM and LSTM models are trained respectively. After accuracy calculation, it can be obtained that the accuracy of the LSTM model is 85.80%, while the accuracy of SVM is 91.07%. But at the same time, LSTM operation only needs 2.57 seconds, SVM model needs 6.06 seconds. Therefore, this paper concludes that: compared with the SVM model, the LSTM model is worse in accuracy but faster in processing speed.Keywords: sentiment analysis, support vector machine, long short-term memory, Chinese microblog comments
Procedia PDF Downloads 948010 Classifier for Liver Ultrasound Images
Authors: Soumya Sajjan
Abstract:
Liver cancer is the most common cancer disease worldwide in men and women, and is one of the few cancers still on the rise. Liver disease is the 4th leading cause of death. According to new NHS (National Health Service) figures, deaths from liver diseases have reached record levels, rising by 25% in less than a decade; heavy drinking, obesity, and hepatitis are believed to be behind the rise. In this study, we focus on Development of Diagnostic Classifier for Ultrasound liver lesion. Ultrasound (US) Sonography is an easy-to-use and widely popular imaging modality because of its ability to visualize many human soft tissues/organs without any harmful effect. This paper will provide an overview of underlying concepts, along with algorithms for processing of liver ultrasound images Naturaly, Ultrasound liver lesion images are having more spackle noise. Developing classifier for ultrasound liver lesion image is a challenging task. We approach fully automatic machine learning system for developing this classifier. First, we segment the liver image by calculating the textural features from co-occurrence matrix and run length method. For classification, Support Vector Machine is used based on the risk bounds of statistical learning theory. The textural features for different features methods are given as input to the SVM individually. Performance analysis train and test datasets carried out separately using SVM Model. Whenever an ultrasonic liver lesion image is given to the SVM classifier system, the features are calculated, classified, as normal and diseased liver lesion. We hope the result will be helpful to the physician to identify the liver cancer in non-invasive method.Keywords: segmentation, Support Vector Machine, ultrasound liver lesion, co-occurance Matrix
Procedia PDF Downloads 4118009 Acoustic Emission Techniques in Monitoring Low-Speed Bearing Conditions
Authors: Faisal AlShammari, Abdulmajid Addali, Mosab Alrashed
Abstract:
It is widely acknowledged that bearing failures are the primary reason for breakdowns in rotating machinery. These failures are extremely costly, particularly in terms of lost production. Roller bearings are widely used in industrial machinery and need to be maintained in good condition to ensure the continuing efficiency, effectiveness, and profitability of the production process. The research presented here is an investigation of the use of acoustic emission (AE) to monitor bearing conditions at low speeds. Many machines, particularly large, expensive machines operate at speeds below 100 rpm, and such machines are important to the industry. However, the overwhelming proportion of studies have investigated the use of AE techniques for condition monitoring of higher-speed machines (typically several hundred rpm, or even higher). Few researchers have investigated the application of these techniques to low-speed machines ( < 100 rpm). This paper addressed this omission and has established which, of the available, AE techniques are suitable for the detection of incipient faults and measurement of fault growth in low-speed bearings. The first objective of this paper program was to assess the applicability of AE techniques to monitor low-speed bearings. It was found that the measured statistical parameters successfully monitored bearing conditions at low speeds (10-100 rpm). The second objective was to identify which commonly used statistical parameters derived from the AE signal (RMS, kurtosis, amplitude and counts) could identify the onset of a fault in the out race. It was found that these parameters effectually identify the presence of a small fault seeded into the outer races. Also, it is concluded that rotational speed has a strong influence on the measured AE parameters but that they are entirely independent of the load under such load and speed conditions.Keywords: acoustic emission, condition monitoring, NDT, statistical analysis
Procedia PDF Downloads 2488008 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction
Procedia PDF Downloads 340