Search results for: k nearest neighbor classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 682

Search results for: k nearest neighbor classifier

232 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 112
231 EEG-Based Classification of Psychiatric Disorders: Bipolar Mood Disorder vs. Schizophrenia

Authors: Han-Jeong Hwang, Jae-Hyun Jo, Fatemeh Alimardani

Abstract:

An accurate diagnosis of psychiatric diseases is a challenging issue, in particular when distinct symptoms for different diseases are overlapped, such as delusions appeared in bipolar mood disorder (BMD) and schizophrenia (SCH). In the present study, we propose a useful way to discriminate BMD and SCH using electroencephalography (EEG). A total of thirty BMD and SCH patients (15 vs. 15) took part in our experiment. EEG signals were measured with nineteen electrodes attached on the scalp using the international 10-20 system, while they were exposed to a visual stimulus flickering at 16 Hz for 95 s. The flickering visual stimulus induces a certain brain signal, known as steady-state visual evoked potential (SSVEP), which is differently observed in patients with BMD and SCH, respectively, in terms of SSVEP amplitude because they process the same visual information in own unique way. For classifying BDM and SCH patients, machine learning technique was employed in which leave-one-out-cross validation was performed. The SSVEPs induced at the fundamental (16 Hz) and second harmonic (32 Hz) stimulation frequencies were extracted using fast Fourier transformation (FFT), and they were used as features. The most discriminative feature was selected using the Fisher score, and support vector machine (SVM) was used as a classifier. From the analysis, we could obtain a classification accuracy of 83.33 %, showing the feasibility of discriminating patients with BMD and SCH using EEG. We expect that our approach can be utilized for psychiatrists to more accurately diagnose the psychiatric disorders, BMD and SCH.

Keywords: bipolar mood disorder, electroencephalography, schizophrenia, machine learning

Procedia PDF Downloads 424
230 Stock Market Prediction Using Convolutional Neural Network That Learns from a Graph

Authors: Mo-Se Lee, Cheol-Hwi Ahn, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN (Convolutional Neural Network), which is known as effective solution for recognizing and classifying images, has been popularly applied to classification and prediction problems in various fields. In this study, we try to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. In specific, we propose to apply CNN as the binary classifier that predicts stock market direction (up or down) by using a graph as its input. That is, our proposal is to build a machine learning algorithm that mimics a person who looks at the graph and predicts whether the trend will go up or down. Our proposed model consists of four steps. In the first step, it divides the dataset into 5 days, 10 days, 15 days, and 20 days. And then, it creates graphs for each interval in step 2. In the next step, CNN classifiers are trained using the graphs generated in the previous step. In step 4, it optimizes the hyper parameters of the trained model by using the validation dataset. To validate our model, we will apply it to the prediction of KOSPI200 for 1,986 days in eight years (from 2009 to 2016). The experimental dataset will include 14 technical indicators such as CCI, Momentum, ROC and daily closing price of KOSPI200 of Korean stock market.

Keywords: convolutional neural network, deep learning, Korean stock market, stock market prediction

Procedia PDF Downloads 425
229 Adaptive Swarm Balancing Algorithms for Rare-Event Prediction in Imbalanced Healthcare Data

Authors: Jinyan Li, Simon Fong, Raymond Wong, Mohammed Sabah, Fiaidhi Jinan

Abstract:

Clinical data analysis and forecasting have make great contributions to disease control, prevention and detection. However, such data usually suffer from highly unbalanced samples in class distributions. In this paper, we target at the binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat-inspired algorithm, and combine both of them with the synthetic minority over-sampling technique (SMOTE) for processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reveal that while the performance improvements obtained by the former methods are not scalable to larger data scales, the later one, which we call Adaptive Swarm Balancing Algorithms, leads to significant efficiency and effectiveness improvements on large datasets. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. Leading to more credible performances of the classifier, and shortening the running time compared with the brute-force method.

Keywords: Imbalanced dataset, meta-heuristic algorithm, SMOTE, big data

Procedia PDF Downloads 443
228 Thermal Performance of an Air Heating Storing System

Authors: Mohammed A. Elhaj, Jamal S. Yassin

Abstract:

Owing to the lack of synchronization between the solar energy availability and the heat demands in a specific application, the energy storing sub-system is necessary to maintain the continuity of thermal process. The present work is dealing with an active solar heating storing system in which an air solar collector is connected to storing unit where this energy is distributed and provided to the heated space in a controlled manner. The solar collector is a box type absorber where the air flows between a number of vanes attached between the collector absorber and the bottom plate. This design can improve the efficiency due to increasing the heat transfer area exposed to the flowing air, as well as the heat conduction through the metal vanes from the top absorbing surface. The storing unit is a packed bed type where the air is coming from the air collector and circulated through the bed in order to add/remove the energy through the charging / discharging processes, respectively. The major advantage of the packed bed storage is its high degree of thermal stratification. Numerical solution of the packed bed energy storage is considered through dividing the bed into a number of equal segments for the bed particles and solved the energy equation for each segment depending on the neighbor ones. The studied design and performance parameters in the developed simulation model including, particle size, void fraction, etc. The final results showed that the collector efficiency was fluctuated between 55%-61% in winter season (January) under the climatic conditions of Misurata in Libya. Maximum temperature of 52ºC is attained at the top of the bed while the lower one is 25ºC at the end of the charging process of hot air into the bed. This distribution can satisfy the required load for the most house heating in Libya.

Keywords: solar energy, thermal process, performance, collector, packed bed, numerical analysis, simulation

Procedia PDF Downloads 332
227 Blood Glucose Level Measurement from Breath Analysis

Authors: Tayyab Hassan, Talha Rehman, Qasim Abdul Aziz, Ahmad Salman

Abstract:

The constant monitoring of blood glucose level is necessary for maintaining health of patients and to alert medical specialists to take preemptive measures before the onset of any complication as a result of diabetes. The current clinical monitoring of blood glucose uses invasive methods repeatedly which are uncomfortable and may result in infections in diabetic patients. Several attempts have been made to develop non-invasive techniques for blood glucose measurement. In this regard, the existing methods are not reliable and are less accurate. Other approaches claiming high accuracy have not been tested on extended dataset, and thus, results are not statistically significant. It is a well-known fact that acetone concentration in breath has a direct relation with blood glucose level. In this paper, we have developed the first of its kind, reliable and high accuracy breath analyzer for non-invasive blood glucose measurement. The acetone concentration in breath was measured using MQ 138 sensor in the samples collected from local hospitals in Pakistan involving one hundred patients. The blood glucose levels of these patients are determined using conventional invasive clinical method. We propose a linear regression classifier that is trained to map breath acetone level to the collected blood glucose level achieving high accuracy.

Keywords: blood glucose level, breath acetone concentration, diabetes, linear regression

Procedia PDF Downloads 172
226 Ambisyllabic Conditioning in English: Evidence from the Accent of Nigerian Speakers of English

Authors: Nkereke Mfon Essien

Abstract:

In an ambisyllabic environment, one consonant sound simultaneously assumes both the coda and onset positions of a word due to its structural proclivity to affect two phonological processes or repair two ill-formed sequences in those syllable positions at the same time. This study sets out to examine the structural conditions that trigger this not-so-common phonological privilege for consonant sounds in the English language and Nigerian English and if such constraints could have any correspondence in the language studied. Data for the study were obtained from a native speaker of English who was the control and twenty (20) educated Nigerian speakers of English from the three ethnic/linguistic groups in Nigeria. Preliminary findings from the data show that ambisyllabicity in English is triggered mainly by stress, a condition which causes a consonant in a stressed syllable to become glottalised and simultaneously devoices the nearest voiced consonant in the next syllable. For example, in the word coupler,/'kʌplɜr/ is realized as ['kʌˀpl̥ɜr]. In some Nigerian English, preliminary findings show that ambisyllabicity is triggered by a sequence of intervocalic short, high central vowels and a coda nasal. Since the short vowel may not occur in an open syllable, the nasal serves to close the impermissible open syllable. However, since the Nigerian English foot structure does not permit a CVC.V syllable, the same coda nasal simultaneously repairs the impermissible syllable foot to (CV.CV) by applying the Maximal Onset Principle since this is a preliminary investigation, a conclusion would not suffice yet.

Keywords: ambisyllabicity, nasal, coda, stress, phonological process, syllable, foot

Procedia PDF Downloads 23
225 MhAGCN: Multi-Head Attention Graph Convolutional Network for Web Services Classification

Authors: Bing Li, Zhi Li, Yilong Yang

Abstract:

Web classification can promote the quality of service discovery and management in the service repository. It is widely used to locate developers desired services. Although traditional classification methods based on supervised learning models can achieve classification tasks, developers need to manually mark web services, and the quality of these tags may not be enough to establish an accurate classifier for service classification. With the doubling of the number of web services, the manual tagging method has become unrealistic. In recent years, the attention mechanism has made remarkable progress in the field of deep learning, and its huge potential has been fully demonstrated in various fields. This paper designs a multi-head attention graph convolutional network (MHAGCN) service classification method, which can assign different weights to the neighborhood nodes without complicated matrix operations or relying on understanding the entire graph structure. The framework combines the advantages of the attention mechanism and graph convolutional neural network. It can classify web services through automatic feature extraction. The comprehensive experimental results on a real dataset not only show the superior performance of the proposed model over the existing models but also demonstrate its potentially good interpretability for graph analysis.

Keywords: attention mechanism, graph convolutional network, interpretability, service classification, service discovery

Procedia PDF Downloads 137
224 Using Predictive Analytics to Identify First-Year Engineering Students at Risk of Failing

Authors: Beng Yew Low, Cher Liang Cha, Cheng Yong Teoh

Abstract:

Due to a lack of continual assessment or grade related data, identifying first-year engineering students in a polytechnic education at risk of failing is challenging. Our experience over the years tells us that there is no strong correlation between having good entry grades in Mathematics and the Sciences and excelling in hardcore engineering subjects. Hence, identifying students at risk of failure cannot be on the basis of entry grades in Mathematics and the Sciences alone. These factors compound the difficulty of early identification and intervention. This paper describes the development of a predictive analytics model in the early detection of students at risk of failing and evaluates its effectiveness. Data from continual assessments conducted in term one, supplemented by data of student psychological profiles such as interests and study habits, were used. Three classification techniques, namely Logistic Regression, K Nearest Neighbour, and Random Forest, were used in our predictive model. Based on our findings, Random Forest was determined to be the strongest predictor with an Area Under the Curve (AUC) value of 0.994. Correspondingly, the Accuracy, Precision, Recall, and F-Score were also highest among these three classifiers. Using this Random Forest Classification technique, students at risk of failure could be identified at the end of term one. They could then be assigned to a Learning Support Programme at the beginning of term two. This paper gathers the results of our findings. It also proposes further improvements that can be made to the model.

Keywords: continual assessment, predictive analytics, random forest, student psychological profile

Procedia PDF Downloads 136
223 Assessment of Genetic Diversity and Population Structure of Goldstripe Sardinella, Sardinella gibbosa in the Transboundary Area of Kenya and Tanzania Using mtDNA and msDNA Markers

Authors: Sammy Kibor, Filip Huyghe, Marc Kochzius, James Kairo

Abstract:

Goldstripe Sardinella, Sardinella gibbosa, (Bleeker, 1849) is a commercially and ecologically important small pelagic fish common in the Western Indian Ocean region. The present study aimed to assess genetic diversity and population structure of the species in the Kenya-Tanzania transboundary area using mtDNA and msDNA markers. Some 630 bp sequence in the mitochondrial DNA (mtDNA) Cytochrome C Oxidase I (COI) and five polymorphic microsatellite DNA loci were analyzed. Fin clips of 309 individuals from eight locations within the transboundary area were collected between July and December 2018. The S. gibbosa individuals from the different locations were distinguishable from one another based on the mtDNA variation, as demonstrated with a neighbor-joining tree and minimum spanning network analysis. None of the identified 22 haplotypes were shared between Kenya and Tanzania. Gene diversity per locus was relatively high (0.271-0.751), highest Fis was 0.391. The structure analysis, discriminant analysis of Principal component (DAPC) and the pair-wise (FST = 0.136 P < 0.001) values after Bonferroni correction using five microsatellite loci provided clear inference on genetic differentiation and thus evidence of population structure of S. gibbosa along the Kenya-Tanzania coast. This study shows a high level of genetic diversity and the presence of population structure (Φst =0.078 P < 0.001) resulting to the existence of four populations giving a clear indication of minimum gene flow among the population. This information has application in the designing of marine protected areas, an important tool for marine conservation.

Keywords: marine connectivity, microsatellites, population genetics, transboundary

Procedia PDF Downloads 124
222 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 414
221 A Neural Network Classifier for Estimation of the Degree of Infestation by Late Blight on Tomato Leaves

Authors: Gizelle K. Vianna, Gabriel V. Cunha, Gustavo S. Oliveira

Abstract:

Foliage diseases in plants can cause a reduction in both quality and quantity of agricultural production. Intelligent detection of plant diseases is an essential research topic as it may help monitoring large fields of crops by automatically detecting the symptoms of foliage diseases. This work investigates ways to recognize the late blight disease from the analysis of tomato digital images, collected directly from the field. A pair of multilayer perceptron neural network analyzes the digital images, using data from both RGB and HSL color models, and classifies each image pixel. One neural network is responsible for the identification of healthy regions of the tomato leaf, while the other identifies the injured regions. The outputs of both networks are combined to generate the final classification of each pixel from the image and the pixel classes are used to repaint the original tomato images by using a color representation that highlights the injuries on the plant. The new images will have only green, red or black pixels, if they came from healthy or injured portions of the leaf, or from the background of the image, respectively. The system presented an accuracy of 97% in detection and estimation of the level of damage on the tomato leaves caused by late blight.

Keywords: artificial neural networks, digital image processing, pattern recognition, phytosanitary

Procedia PDF Downloads 330
220 Community Participation for Sustainable Development Tourism in Bang Noi Floating Market, Bangkonti District, Samutsongkhram Province

Authors: Bua Srikos, Phusit Phukamchanoad

Abstract:

The purpose is to study the model and characteristic of participation of the suitable community to lead to develop permanent water marketing in Bang Noi Floating Market, Bangkonti District, Samutsongkhram Province. A total of 342 survey questionnaires were administered to potential respondents. The researchers interviewed the leader of the community. Appreciation Influence Control (AIC) was used to talk with 20 villagers on arena. The findings revealed that overall, most people had the middle level of the participation in developing the durable Bang Noi Floating Market, Bangkonti, Samutsongkhram Province and in aspects of gaining benefits from developing it with atmosphere and a beautiful view for tourism. For example, the landscape is beautiful with public utilities. The participation in preserving and developing Bang Noi Floating Market remains in the former way of life. The basic factor of person affects to the participation of people such as age, level of education, career, and income per month. Most participants are the original hosts that have houses and shops located in the marketing and neighbor. These people involve with the benefits and have the power to make a water marketing strategy, the major role to set the information database. It also found that the leader and the villagers play the important role in setting a five-physical database. Data include level of information such as position of village, territory of village, road, river, and premises. Information of culture consists of a two-level of information, interesting point, and Itinerary. The information occurs from presenting and practicing by the leader and villagers in the community.All of phases are presented for listening and investigating database together in both the leader and villagers in the process of participation.

Keywords: participation, community, sustainable development, encouragement, tourism

Procedia PDF Downloads 348
219 Fused Structure and Texture (FST) Features for Improved Pedestrian Detection

Authors: Hussin K. Ragb, Vijayan K. Asari

Abstract:

In this paper, we present a pedestrian detection descriptor called Fused Structure and Texture (FST) features based on the combination of the local phase information with the texture features. Since the phase of the signal conveys more structural information than the magnitude, the phase congruency concept is used to capture the structural features. On the other hand, the Center-Symmetric Local Binary Pattern (CSLBP) approach is used to capture the texture information of the image. The dimension less quantity of the phase congruency and the robustness of the CSLBP operator on the flat images, as well as the blur and illumination changes, lead the proposed descriptor to be more robust and less sensitive to the light variations. The proposed descriptor can be formed by extracting the phase congruency and the CSLBP values of each pixel of the image with respect to its neighborhood. The histogram of the oriented phase and the histogram of the CSLBP values for the local regions in the image are computed and concatenated to construct the FST descriptor. Several experiments were conducted on INRIA and the low resolution DaimlerChrysler datasets to evaluate the detection performance of the pedestrian detection system that is based on the FST descriptor. A linear Support Vector Machine (SVM) is used to train the pedestrian classifier. These experiments showed that the proposed FST descriptor has better detection performance over a set of state of the art feature extraction methodologies.

Keywords: pedestrian detection, phase congruency, local phase, LBP features, CSLBP features, FST descriptor

Procedia PDF Downloads 490
218 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI

Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer

Abstract:

In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.

Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting

Procedia PDF Downloads 521
217 The Impact of Coffee Consumption to Body Mass Index and Body Composition

Authors: A.L. Tamm, N. Šott, J. Jürimäe, E. Lätt, A. Orav, Ü. Parm

Abstract:

Coffee is one of the most frequently consumed beverages in the world but still its effects on human organism are not completely understood. Coffee has also been used as a method for weight loss, but its effectiveness has not been proved. There is also not similar comprehension in classifying overweight in choosing between body mass index (BMI) and fat percentage (fat%). The aim of the study was to determine associations between coffee consumption and body composition. Secondly, to detect which measure (BMI or fat%) is more accurate to use describing overweight. Altogether 103 persons enrolled the study and divided into three groups: coffee non-consumers (n=39), average coffee drinkers, who consumed 1 to 4 cups (1 cup = ca 200ml) of coffee per day (n=40) and excessive coffee consumers, who drank at least five cups of coffee per day (n=24). Body mass (medical electronic scale, A&D Instruments, Abingdon, UK) and height (Martin metal anthropometer to the nearest 0.1 cm) were measured and BMI calculated (kg/m2). Participants´ body composition was detected with dual energy X-ray absorptiometry (DXA, Hologic) and general data (history of chronic diseases included) and information about coffee consumption, and physical activity level was collected with questionnaires. Results of the study showed that excessive coffee consumption was associated with increased fat-free mass. It could be foremost due to greater physical activity level in school time or greater (not significant) male proportion in excessive coffee consumers group. For estimating the overweight the fat% in comparison to BMI recommended, as it gives more accurate results evaluating chronical disease risks. In conclusion coffee consumption probably does not affect body composition and for estimating the body composition fat% seems to be more accurate compared with BMI.

Keywords: body composition, body fat percentage, body mass index, coffee consumption

Procedia PDF Downloads 420
216 Anomaly Detection in a Data Center with a Reconstruction Method Using a Multi-Autoencoders Model

Authors: Victor Breux, Jérôme Boutet, Alain Goret, Viviane Cattin

Abstract:

Early detection of anomalies in data centers is important to reduce downtimes and the costs of periodic maintenance. However, there is little research on this topic and even fewer on the fusion of sensor data for the detection of abnormal events. The goal of this paper is to propose a method for anomaly detection in data centers by combining sensor data (temperature, humidity, power) and deep learning models. The model described in the paper uses one autoencoder per sensor to reconstruct the inputs. The auto-encoders contain Long-Short Term Memory (LSTM) layers and are trained using the normal samples of the relevant sensors selected by correlation analysis. The difference signal between the input and its reconstruction is then used to classify the samples using feature extraction and a random forest classifier. The data measured by the sensors of a data center between January 2019 and May 2020 are used to train the model, while the data between June 2020 and May 2021 are used to assess it. Performances of the model are assessed a posteriori through F1-score by comparing detected anomalies with the data center’s history. The proposed model outperforms the state-of-the-art reconstruction method, which uses only one autoencoder taking multivariate sequences and detects an anomaly with a threshold on the reconstruction error, with an F1-score of 83.60% compared to 24.16%.

Keywords: anomaly detection, autoencoder, data centers, deep learning

Procedia PDF Downloads 194
215 Intelligent Transport System: Classification of Traffic Signs Using Deep Neural Networks in Real Time

Authors: Anukriti Kumar, Tanmay Singh, Dinesh Kumar Vishwakarma

Abstract:

Traffic control has been one of the most common and irritating problems since the time automobiles have hit the roads. Problems like traffic congestion have led to a significant time burden around the world and one significant solution to these problems can be the proper implementation of the Intelligent Transport System (ITS). It involves the integration of various tools like smart sensors, artificial intelligence, position technologies and mobile data services to manage traffic flow, reduce congestion and enhance driver's ability to avoid accidents during adverse weather. Road and traffic signs’ recognition is an emerging field of research in ITS. Classification problem of traffic signs needs to be solved as it is a major step in our journey towards building semi-autonomous/autonomous driving systems. The purpose of this work focuses on implementing an approach to solve the problem of traffic sign classification by developing a Convolutional Neural Network (CNN) classifier using the GTSRB (German Traffic Sign Recognition Benchmark) dataset. Rather than using hand-crafted features, our model addresses the concern of exploding huge parameters and data method augmentations. Our model achieved an accuracy of around 97.6% which is comparable to various state-of-the-art architectures.

Keywords: multiclass classification, convolution neural network, OpenCV

Procedia PDF Downloads 177
214 Diagnosis and Analysis of Automated Liver and Tumor Segmentation on CT

Authors: R. R. Ramsheeja, R. Sreeraj

Abstract:

For view the internal structures of the human body such as liver, brain, kidney etc have a wide range of different modalities for medical images are provided nowadays. Computer Tomography is one of the most significant medical image modalities. In this paper use CT liver images for study the use of automatic computer aided techniques to calculate the volume of the liver tumor. Segmentation method is used for the detection of tumor from the CT scan is proposed. Gaussian filter is used for denoising the liver image and Adaptive Thresholding algorithm is used for segmentation. Multiple Region Of Interest(ROI) based method that may help to characteristic the feature different. It provides a significant impact on classification performance. Due to the characteristic of liver tumor lesion, inherent difficulties appear selective. For a better performance, a novel proposed system is introduced. Multiple ROI based feature selection and classification are performed. In order to obtain of relevant features for Support Vector Machine(SVM) classifier is important for better generalization performance. The proposed system helps to improve the better classification performance, reason in which we can see a significant reduction of features is used. The diagnosis of liver cancer from the computer tomography images is very difficult in nature. Early detection of liver tumor is very helpful to save the human life.

Keywords: computed tomography (CT), multiple region of interest(ROI), feature values, segmentation, SVM classification

Procedia PDF Downloads 509
213 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 44
212 A Kernel-Based Method for MicroRNA Precursor Identification

Authors: Bin Liu

Abstract:

MicroRNAs (miRNAs) are small non-coding RNA molecules, functioning in transcriptional and post-transcriptional regulation of gene expression. The discrimination of the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops) is necessary for the understanding of miRNAs’ role in the control of cell life and death. Since both their small size and sequence specificity, it cannot be based on sequence information alone but requires structure information about the miRNA precursor to get satisfactory performance. Kmers are convenient and widely used features for modeling the properties of miRNAs and other biological sequences. However, Kmers suffer from the inherent limitation that if the parameter K is increased to incorporate long range effects, some certain Kmer will appear rarely or even not appear, as a consequence, most Kmers absent and a few present once. Thus, the statistical learning approaches using Kmers as features become susceptible to noisy data once K becomes large. In this study, we proposed a Gapped k-mer approach to overcome the disadvantages of Kmers, and applied this method to the field of miRNA prediction. Combined with the structure status composition, a classifier called imiRNA-GSSC was proposed. We show that compared to the original imiRNA-kmer and alternative approaches. Trained on human miRNA precursors, this predictor can achieve an accuracy of 82.34 for predicting 4022 pre-miRNA precursors from eleven species.

Keywords: gapped k-mer, imiRNA-GSSC, microRNA precursor, support vector machine

Procedia PDF Downloads 163
211 Enhancement Method of Network Traffic Anomaly Detection Model Based on Adversarial Training With Category Tags

Authors: Zhang Shuqi, Liu Dan

Abstract:

For the problems in intelligent network anomaly traffic detection models, such as low detection accuracy caused by the lack of training samples, poor effect with small sample attack detection, a classification model enhancement method, F-ACGAN(Flow Auxiliary Classifier Generative Adversarial Network) which introduces generative adversarial network and adversarial training, is proposed to solve these problems. Generating adversarial data with category labels could enhance the training effect and improve classification accuracy and model robustness. FACGAN consists of three steps: feature preprocess, which includes data type conversion, dimensionality reduction and normalization, etc.; A generative adversarial network model with feature learning ability is designed, and the sample generation effect of the model is improved through adversarial iterations between generator and discriminator. The adversarial disturbance factor of the gradient direction of the classification model is added to improve the diversity and antagonism of generated data and to promote the model to learn from adversarial classification features. The experiment of constructing a classification model with the UNSW-NB15 dataset shows that with the enhancement of FACGAN on the basic model, the classification accuracy has improved by 8.09%, and the score of F1 has improved by 6.94%.

Keywords: data imbalance, GAN, ACGAN, anomaly detection, adversarial training, data augmentation

Procedia PDF Downloads 106
210 Contribution to the Study of Reproduction of Water Birds (Case of Marsh Bouessdra, North East Algeria)

Authors: Wahiba Boudraa, Khalil Draidi, Badis Bakhouch, Farah Chettibi, Meriem Aberkane, Zihad Bouslama, Moussa Houhamdi

Abstract:

The Gulf of Annaba, located at the extreme north eastern Algerian; our site of study is a marsh administratively it is part of the wilaya of Annaba, municipality of El-Bouni; extends on a surface from 55 hectare, the maximum depth is of less 2m. A scheme of work was adopted for an evaluation and characterization of the reproduction of the water nicheurs birds in the marsh of Boussedra. Some important parameters described by the scientific literature; According to standardized methods, variables were the object of a regular follow-up during the period of reproduction. These parameters were taken into account: the installation date of the nests, the vegetable support; blossoming of eggs, causes of the failure of the blossomings (predation or abandonment), characteristics of the nests (composition, internal diameter, external diameter, depth and heightening), measurements of the distances nest-nest nearest, Depth of water, the measurement of eggs, size of laying, size of laying. The follow-up in the marsh was carried out between March 2013 until the month of July 2014 at a rate of two outputs per weeks, one located and noted the nests to control them each week. The study on the reproduction of the water birds enables us to note that this site plays a very important part in the wintering and the reproduction of certain species important. This study opens broad prospects for study of several phenomena related to the ecology of the water birds, and the conservation of the wetlands.

Keywords: Algeria, Boussedra, nests, reproduction, water birds

Procedia PDF Downloads 257
209 Tensor Deep Stacking Neural Networks and Bilinear Mapping Based Speech Emotion Classification Using Facial Electromyography

Authors: P. S. Jagadeesh Kumar, Yang Yung, Wenli Hu

Abstract:

Speech emotion classification is a dominant research field in finding a sturdy and profligate classifier appropriate for different real-life applications. This effort accentuates on classifying different emotions from speech signal quarried from the features related to pitch, formants, energy contours, jitter, shimmer, spectral, perceptual and temporal features. Tensor deep stacking neural networks were supported to examine the factors that influence the classification success rate. Facial electromyography signals were composed of several forms of focuses in a controlled atmosphere by means of audio-visual stimuli. Proficient facial electromyography signals were pre-processed using moving average filter, and a set of arithmetical features were excavated. Extracted features were mapped into consistent emotions using bilinear mapping. With facial electromyography signals, a database comprising diverse emotions will be exposed with a suitable fine-tuning of features and training data. A success rate of 92% can be attained deprived of increasing the system connivance and the computation time for sorting diverse emotional states.

Keywords: speech emotion classification, tensor deep stacking neural networks, facial electromyography, bilinear mapping, audio-visual stimuli

Procedia PDF Downloads 256
208 Diversity Indices as a Tool for Evaluating Quality of Water Ways

Authors: Khadra Ahmed, Khaled Kheireldin

Abstract:

In this paper, we present a pedestrian detection descriptor called Fused Structure and Texture (FST) features based on the combination of the local phase information with the texture features. Since the phase of the signal conveys more structural information than the magnitude, the phase congruency concept is used to capture the structural features. On the other hand, the Center-Symmetric Local Binary Pattern (CSLBP) approach is used to capture the texture information of the image. The dimension less quantity of the phase congruency and the robustness of the CSLBP operator on the flat images, as well as the blur and illumination changes, lead the proposed descriptor to be more robust and less sensitive to the light variations. The proposed descriptor can be formed by extracting the phase congruency and the CSLBP values of each pixel of the image with respect to its neighborhood. The histogram of the oriented phase and the histogram of the CSLBP values for the local regions in the image are computed and concatenated to construct the FST descriptor. Several experiments were conducted on INRIA and the low resolution DaimlerChrysler datasets to evaluate the detection performance of the pedestrian detection system that is based on the FST descriptor. A linear Support Vector Machine (SVM) is used to train the pedestrian classifier. These experiments showed that the proposed FST descriptor has better detection performance over a set of state of the art feature extraction methodologies.

Keywords: planktons, diversity indices, water quality index, water ways

Procedia PDF Downloads 519
207 Performance Comparison of Situation-Aware Models for Activating Robot Vacuum Cleaner in a Smart Home

Authors: Seongcheol Kwon, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

We assume an IoT-based smart-home environment where the on-off status of each of the electrical appliances including the room lights can be recognized in a real time by monitoring and analyzing the smart meter data. At any moment in such an environment, we can recognize what the household or the user is doing by referring to the status data of the appliances. In this paper, we focus on a smart-home service that is to activate a robot vacuum cleaner at right time by recognizing the user situation, which requires a situation-aware model that can distinguish the situations that allow vacuum cleaning (Yes) from those that do not (No). We learn as our candidate models a few classifiers such as naïve Bayes, decision tree, and logistic regression that can map the appliance-status data into Yes and No situations. Our training and test data are obtained from simulations of user behaviors, in which a sequence of user situations such as cooking, eating, dish washing, and so on is generated with the status of the relevant appliances changed in accordance with the situation changes. During the simulation, both the situation transition and the resulting appliance status are determined stochastically. To compare the performances of the aforementioned classifiers we obtain their learning curves for different types of users through simulations. The result of our empirical study reveals that naïve Bayes achieves a slightly better classification accuracy than the other compared classifiers.

Keywords: situation-awareness, smart home, IoT, machine learning, classifier

Procedia PDF Downloads 422
206 A Machine Learning Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

There has been a need in recent years to predict student academic achievement prior to graduation. This is to assist them in improving their grades, especially for those who have struggled in the past. The purpose of this research is to use supervised learning techniques to create a model that predicts student academic progress. Many scholars have developed models that predict student academic achievement based on characteristics including smoking, demography, culture, social media, parent educational background, parent finances, and family background, to mention a few. This element, as well as the model used, could have misclassified the kids in terms of their academic achievement. As a prerequisite to predicting if the student will perform well in the future on related courses, this model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester. With a 96.7 percent accuracy, the model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost. This model is offered as a desktop application with user-friendly interfaces for forecasting student academic progress for both teachers and students. As a result, both students and professors are encouraged to use this technique to predict outcomes better.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 110
205 Major Depressive Disorder: Diagnosis based on Electroencephalogram Analysis

Authors: Wajid Mumtaz, Aamir Saeed Malik, Syed Saad Azhar Ali, Mohd Azhar Mohd Yasin

Abstract:

In this paper, a technique based on electroencephalogram (EEG) analysis is presented, aiming for diagnosing major depressive disorder (MDD) among a potential population of MDD patients and healthy controls. EEG is recognized as a clinical modality during applications such as seizure diagnosis, index for anesthesia, detection of brain death or stroke. However, its usability for psychiatric illnesses such as MDD is less studied. Therefore, in this study, for the sake of diagnosis, 2 groups of study participants were recruited, 1) MDD patients, 2) healthy people as controls. EEG data acquired from both groups were analyzed involving inter-hemispheric asymmetry and composite permutation entropy index (CPEI). To automate the process, derived quantities from EEG were utilized as inputs to classifier such as logistic regression (LR) and support vector machine (SVM). The learning of these classification models was tested with a test dataset. Their learning efficiency is provided as accuracy of classifying MDD patients from controls, their sensitivities and specificities were reported, accordingly (LR =81.7 % and SVM =81.5 %). Based on the results, it is concluded that the derived measures are indicators for diagnosing MDD from a potential population of normal controls. In addition, the results motivate further exploring other measures for the same purpose.

Keywords: major depressive disorder, diagnosis based on EEG, EEG derived features, CPEI, inter-hemispheric asymmetry

Procedia PDF Downloads 546
204 Iris Feature Extraction and Recognition Based on Two-Dimensional Gabor Wavelength Transform

Authors: Bamidele Samson Alobalorun, Ifedotun Roseline Idowu

Abstract:

Biometrics technologies apply the human body parts for their unique and reliable identification based on physiological traits. The iris recognition system is a biometric–based method for identification. The human iris has some discriminating characteristics which provide efficiency to the method. In order to achieve this efficiency, there is a need for feature extraction of the distinct features from the human iris in order to generate accurate authentication of persons. In this study, an approach for an iris recognition system using 2D Gabor for feature extraction is applied to iris templates. The 2D Gabor filter formulated the patterns that were used for training and equally sent to the hamming distance matching technique for recognition. A comparison of results is presented using two iris image subjects of different matching indices of 1,2,3,4,5 filter based on the CASIA iris image database. By comparing the two subject results, the actual computational time of the developed models, which is measured in terms of training and average testing time in processing the hamming distance classifier, is found with best recognition accuracy of 96.11% after capturing the iris localization or segmentation using the Daughman’s Integro-differential, the normalization is confined to the Daugman’s rubber sheet model.

Keywords: Daugman rubber sheet, feature extraction, Hamming distance, iris recognition system, 2D Gabor wavelet transform

Procedia PDF Downloads 66
203 A Comparative Legal Enquiry on the Concept of Invention

Authors: Giovanna Carugno

Abstract:

The concept of invention is rarely scrutinized by legal scholars since it is a slippery one, full of nuances and difficult to be defined. When does an idea become relevant for the patent law? When is it simply possible to talk of what an invention is? It is the first question to be answered to obtain a patent, but it is sometimes neglected by treaties or reduced to very simple and automatically re-cited definitions. Maybe, also because it is more a transnational and cultural concept than a mere institution of law. Tautology is used to avoid the challenge (in the United States patent regulation, the inventor is the one who contributed to have a patentable invention); in other case, a clear definition is surprisingly not even provided (see, e.g., the European Patent Convention). In Europe, the issue is still more complicated because there are several different solutions elaborate inorganically be national systems of courts varying one to the other only with the aim of solving different IP cases. Also a neighbor domain, like copyright law, is not assisting us in the research, since an author in this field is entitles to be the 'inventor' or the 'author' and to protect as far as he produces something new. Novelty is not enough in patent law. A simple distinction between mere improvement that can be achieved by a man skilled in the art (a sort of reasonable man, in other sectors) or a change that is not obvious rising to the dignity of protection seems not going too far. It is not still defining this concept; it is rigid and not fruitful. So, setting aside for the moment the issue related to the definition of the invention/inventor, our proposal is to scrutinize the possible self-sufficiency of a system in which the inventor or the improver should be awarded of royalties or similar compensation according to the economic improvement he was able to bring. The law, in this case, is in the penumbras of misleading concepts, divided between facts that are obscure and technical, and not involving necessarily legal issues. The aim of this paper is to find out a single definition (or, at least, the minimum elements common in the different legal systems) of what is (legally) an invention and what can be the hints to practically identify an authentic invention. In conclusion, it will propose an alternative system in which the invention is not considered anymore and the only thing that matters are the revenues generated by technological improvement, caused by the worker's activity.

Keywords: comparative law, intellectual property, invention, patents

Procedia PDF Downloads 184