Search results for: wolof word classification
2603 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation
Procedia PDF Downloads 2342602 Auto Classification of Multiple ECG Arrhythmic Detection via Machine Learning Techniques: A Review
Authors: Ng Liang Shen, Hau Yuan Wen
Abstract:
Arrhythmia analysis of ECG signal plays a major role in diagnosing most of the cardiac diseases. Therefore, a single arrhythmia detection of an electrocardiographic (ECG) record can determine multiple pattern of various algorithms and match accordingly each ECG beats based on Machine Learning supervised learning. These researchers used different features and classification methods to classify different arrhythmia types. A major problem in these studies is the fact that the symptoms of the disease do not show all the time in the ECG record. Hence, a successful diagnosis might require the manual investigation of several hours of ECG records. The point of this paper presents investigations cardiovascular ailment in Electrocardiogram (ECG) Signals for Cardiac Arrhythmia utilizing examination of ECG irregular wave frames via heart beat as correspond arrhythmia which with Machine Learning Pattern Recognition.Keywords: electrocardiogram, ECG, classification, machine learning, pattern recognition, detection, QRS
Procedia PDF Downloads 3742601 Land Use/Land Cover Mapping Using Landsat 8 and Sentinel-2 in a Mediterranean Landscape
Authors: Moschos Vogiatzis, K. Perakis
Abstract:
Spatial-explicit and up-to-date land use/land cover information is fundamental for spatial planning, land management, sustainable development, and sound decision-making. In the last decade, many satellite-derived land cover products at different spatial, spectral, and temporal resolutions have been developed, such as the European Copernicus Land Cover product. However, more efficient and detailed information for land use/land cover is required at the regional or local scale. A typical Mediterranean basin with a complex landscape comprised of various forest types, crops, artificial surfaces, and wetlands was selected to test and develop our approach. In this study, we investigate the improvement of Copernicus Land Cover product (CLC2018) using Landsat 8 and Sentinel-2 pixel-based classification based on all available existing geospatial data (Forest Maps, LPIS, Natura2000 habitats, cadastral parcels, etc.). We examined and compared the performance of the Random Forest classifier for land use/land cover mapping. In total, 10 land use/land cover categories were recognized in Landsat 8 and 11 in Sentinel-2A. A comparison of the overall classification accuracies for 2018 shows that Landsat 8 classification accuracy was slightly higher than Sentinel-2A (82,99% vs. 80,30%). We concluded that the main land use/land cover types of CLC2018, even within a heterogeneous area, can be successfully mapped and updated according to CLC nomenclature. Future research should be oriented toward integrating spatiotemporal information from seasonal bands and spectral indexes in the classification process.Keywords: classification, land use/land cover, mapping, random forest
Procedia PDF Downloads 1232600 Bilingual Gaming Kit to Teach English Language through Collaborative Learning
Authors: Sarayu Agarwal
Abstract:
This paper aims to teach English (secondary language) by bridging the understanding between the Regional language (primary language) and the English Language (secondary language). Here primary language is the one a person has learned from birth or within the critical period, while secondary language would be any other language one learns or speaks. The paper also focuses on evolving old teaching methods to a contemporary participatory model of learning and teaching. Pilot studies were conducted to gauge an understanding of student’s knowledge of the English language. Teachers and students were interviewed and their academic curriculum was assessed as a part of the initial study. Extensive literature study and design thinking principles were used to devise a solution to the problem. The objective is met using a holistic learning kit/card game to teach children word recognition, word pronunciation, word spelling and writing words. Implication of the paper is a noticeable improvement in the understanding and grasping of English language. With increasing usage and applicability of English as a second language (ESL) world over, the paper becomes relevant due to its easy replicability to any other primary or secondary language. Future scope of this paper would be transforming the idea of participatory learning into self-regulated learning methods. With the upcoming govt. learning centres in rural areas and provision of smart devices such as tablets, the development of the card games into digital applications seems very feasible.Keywords: English as a second language, vocabulary-building card games, learning through gamification, rural education
Procedia PDF Downloads 2442599 Terrain Classification for Ground Robots Based on Acoustic Features
Authors: Bernd Kiefer, Abraham Gebru Tesfay, Dietrich Klakow
Abstract:
The motivation of our work is to detect different terrain types traversed by a robot based on acoustic data from the robot-terrain interaction. Different acoustic features and classifiers were investigated, such as Mel-frequency cepstral coefficient and Gamma-tone frequency cepstral coefficient for the feature extraction, and Gaussian mixture model and Feed forward neural network for the classification. We analyze the system’s performance by comparing our proposed techniques with some other features surveyed from distinct related works. We achieve precision and recall values between 87% and 100% per class, and an average accuracy at 95.2%. We also study the effect of varying audio chunk size in the application phase of the models and find only a mild impact on performance.Keywords: acoustic features, autonomous robots, feature extraction, terrain classification
Procedia PDF Downloads 3662598 Comparing the Contribution of General Vocabulary Knowledge and Academic Vocabulary Knowledge to Learners' Academic Achievement
Authors: Reem Alsager, James Milton
Abstract:
Coxhead’s (2000) Academic Word List (AWL) believed to be essential for students pursuing higher education and helps differentiate English for Academic Purposes (EAP) from General English as a course of study, and it is thought to be important for comprehending English academic texts. It has been described that AWL is an infrequent, discrete set of vocabulary items unreachable from general language. On the other hand, it has been known for a period of time that general vocabulary knowledge is a good predictor of academic achievement. This study, however, is an attempt to measure and compare the contribution of academic knowledge and general vocabulary knowledge to learners’ GPA and examine what knowledge is a better predictor of academic achievement and investigate whether AWL as a specialised list of infrequent words relates to the frequency effect. The participants were comprised of 44 international postgraduate students in Swansea University, all from the School of Management, following the taught MSc (Master of Science). The study employed the Academic Vocabulary Size Test (AVST) and the XK_Lex vocabulary size test. The findings indicate that AWL is a list based on word frequency rather than a discrete and unique word list and that the AWL performs the same function as general vocabulary, with tests of each found to measure largely the same quality of knowledge. The findings also suggest that the contribution that AWL knowledge provides for academic success is not sufficient and that general vocabulary knowledge is better in predicting academic achievement. Furthermore, the contribution that academic knowledge added above the contribution of general vocabulary knowledge when combined is really small and noteworthy. This study’s results are in line with the argument and suggest that it is the development of general vocabulary size is an essential quality for academic success and acquiring the words of the AWL will form part of this process. The AWL by itself does not provide sufficient coverage, and is probably not specialised enough, for knowledge of this list to influence this general process. It can be concluded that AWL as an academic word list epitomizes only a fraction of words that are actually needed for academic success in English and that knowledge of academic vocabulary combined with general vocabulary knowledge above the most frequent 3000 words is what matters most to ultimate academic success.Keywords: academic achievement, academic vocabulary, general vocabulary, vocabulary size
Procedia PDF Downloads 2182597 The Implementation of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications
Authors: Mohamed R. Mhereeg
Abstract:
The paper discusses the implementation of the MultiAgent classification System (MACS) and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies, which are the .NET widows service based agents, the Windows Communication Foundation (WCF) services, the Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). Microsoft's .NET windows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW. The Monitoring Agents (MAs) were configured to execute automatically to monitor excel spreadsheets development activities by content. Data gathered by the Monitoring Agents from various resources over a period of time was collected and filtered by a Database Updater Agent (DUA) residing in the .NET client application of the system. This agent then transfers and stores the data in Oracle server database via Oracle stored procedures for further processing that leads to the classification of the end user developers.Keywords: MACS, implementation, multi-agent, SOA, autonomous, WCF
Procedia PDF Downloads 2722596 Method To Create Signed Word - Application In Teaching And Learning Vietnamese Sign Language
Authors: Nguyen Thi Kim Thoa
Abstract:
Vietnam currently has about two million five hundred deaf/hard of hearing people. Although the issue of Vietnamese Sign Language (VSL) education has received attention from the State, there are still many issues that need to be resolved, such as policies, teacher training in both knowledge and teaching methods, education programs, and textbook compilation. Furthermore, the issue of research on VSL has not yet attracted the attention of linguists. Using the quantitative description method, the article will analyze, synthesize, and compare to find methods to create signed words in VSL, such as based on external shape characteristics, operational characteristics, operating methods, and basic meanings, from which we can see the special nature of signed words, the division of word types and the morphological meaning of creating new words through sign methods. From the results of this research, the aspect of ‘visual culture’ will be clarified in Vietnamese Deaf Culture. Through that, we also develop a number of vocabulary teaching methods (such as teaching vocabulary through a group of methods of forming signed words, teaching vocabulary using mind maps, and teaching vocabulary through culture...), with the aim of further improving the effectiveness of teaching and learning VSL in Vietnam. The research results also provide deaf people in Vietnam with a scientific and effective method of learning vocabulary, helping them quickly integrate into the community. The article will be a useful reference for linguists who want to research VSL.Keywords: Vietnamese sign language (VSL), signed word, teaching, method
Procedia PDF Downloads 342595 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques
Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart
Abstract:
Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.Keywords: machine learning, text classification, NLP techniques, semantic representation
Procedia PDF Downloads 982594 Dynamics of Hybrid Language in Urban and Rural Uttar Pradesh India
Authors: Divya Pande
Abstract:
The dynamics of culture expresses itself in language. Even after India got independence in 1947 English subtly crept in the language of the masses with a silent and powerful flow towards the vernacular. The culture contact resulted in learning and emergence of a new language across the Hindi speaking belt of Northern and Central India. The hybrid words thus formed displaced the original word and got contextualized and absorbed in the language of the common masses. The research paper explores the interesting new vocabulary used extensively in the urban and rural districts of the state of Uttar- Pradesh which is the most populous state of India. The paper adopts a two way classification- formal and contextual for the analysis of the hybrid vocabulary of the linguistic items where one element is necessarily from the English language and the other from the Hindi. The new vocabulary represents languages of the wider world cutting across the geographical and the cultural barriers. The paper also broadly points out to the Hinglish commonly used in the state.Keywords: assimilation, culture contact, Hinglish, hybrid words
Procedia PDF Downloads 3992593 A Deep Learning Approach to Subsection Identification in Electronic Health Records
Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan
Abstract:
Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification
Procedia PDF Downloads 2152592 Comparative Analysis of Spectral Estimation Methods for Brain-Computer Interfaces
Authors: Rafik Djemili, Hocine Bourouba, M. C. Amara Korba
Abstract:
In this paper, we present a method in order to classify EEG signals for Brain-Computer Interfaces (BCI). EEG signals are first processed by means of spectral estimation methods to derive reliable features before classification step. Spectral estimation methods used are standard periodogram and the periodogram calculated by the Welch method; both methods are compared with Logarithm of Band Power (logBP) features. In the method proposed, we apply Linear Discriminant Analysis (LDA) followed by Support Vector Machine (SVM). Classification accuracy reached could be as high as 85%, which proves the effectiveness of classification of EEG signals based BCI using spectral methods.Keywords: brain-computer interface, motor imagery, electroencephalogram, linear discriminant analysis, support vector machine
Procedia PDF Downloads 4982591 The Image of Cultural Tourism in the Tourists’ Point of View
Authors: Wanida Suwunniponth
Abstract:
The purposes of this research were to investigate the perceived of a cultural image and loyalty of tourists toward the attraction at Banglumphu neighborhood in Bangkok and to study the relationship of the cultural image of Banglumphu community and loyalty to visit this area of the tourists. This study employed both quantitative approach and qualitative approach. In a quantitative research, a questionnaire was used to collect data from 300 systematic sampled tourists who visited Banglumphu area and the correlation analysis were used to analyze data. The results revealed that the overall tourists’ point of view toward Banglumphu cultural image was at a good level which lifestyle had the best image, followed by value and belief, physical dimension, community identity, tradition, and local wisdom. In addition, the overall aspect of tourists’ loyalty including satisfaction, word of mouths, and revisiting were at good levels which word of mouths received the highest value, followed by revisiting, and satisfaction, respectively. In addition, the relationship between cultural image in aspect on lifestyle, tradition, local wisdom, belief, community identity and loyalty to visit Banglumphu in each aspect on satisfaction, word of mouths, and revisiting were moderately correlated at the significant level of 0.05, except physical dimension was not correlated with each aspect of tourists’ loyalty.Keywords: cultural tourism, image, loyalty, revisit
Procedia PDF Downloads 2492590 Optimizing Perennial Plants Image Classification by Fine-Tuning Deep Neural Networks
Authors: Khairani Binti Supyan, Fatimah Khalid, Mas Rina Mustaffa, Azreen Bin Azman, Amirul Azuani Romle
Abstract:
Perennial plant classification plays a significant role in various agricultural and environmental applications, assisting in plant identification, disease detection, and biodiversity monitoring. Nevertheless, attaining high accuracy in perennial plant image classification remains challenging due to the complex variations in plant appearance, the diverse range of environmental conditions under which images are captured, and the inherent variability in image quality stemming from various factors such as lighting conditions, camera settings, and focus. This paper proposes an adaptation approach to optimize perennial plant image classification by fine-tuning the pre-trained DNNs model. This paper explores the efficacy of fine-tuning prevalent architectures, namely VGG16, ResNet50, and InceptionV3, leveraging transfer learning to tailor the models to the specific characteristics of perennial plant datasets. A subset of the MYLPHerbs dataset consisted of 6 perennial plant species of 13481 images under various environmental conditions that were used in the experiments. Different strategies for fine-tuning, including adjusting learning rates, training set sizes, data augmentation, and architectural modifications, were investigated. The experimental outcomes underscore the effectiveness of fine-tuning deep neural networks for perennial plant image classification, with ResNet50 showcasing the highest accuracy of 99.78%. Despite ResNet50's superior performance, both VGG16 and InceptionV3 achieved commendable accuracy of 99.67% and 99.37%, respectively. The overall outcomes reaffirm the robustness of the fine-tuning approach across different deep neural network architectures, offering insights into strategies for optimizing model performance in the domain of perennial plant image classification.Keywords: perennial plants, image classification, deep neural networks, fine-tuning, transfer learning, VGG16, ResNet50, InceptionV3
Procedia PDF Downloads 632589 Obstacle Classification Method Based on 2D LIDAR Database
Authors: Moohyun Lee, Soojung Hur, Yongwan Park
Abstract:
In this paper is proposed a method uses only LIDAR system to classification an obstacle and determine its type by establishing database for classifying obstacles based on LIDAR. The existing LIDAR system, in determining the recognition of obstruction in an autonomous vehicle, has an advantage in terms of accuracy and shorter recognition time. However, it was difficult to determine the type of obstacle and therefore accurate path planning based on the type of obstacle was not possible. In order to overcome this problem, a method of classifying obstacle type based on existing LIDAR and using the width of obstacle materials was proposed. However, width measurement was not sufficient to improve accuracy. In this research, the width data was used to do the first classification; database for LIDAR intensity data by four major obstacle materials on the road were created; comparison is made to the LIDAR intensity data of actual obstacle materials; and determine the obstacle type by finding the one with highest similarity values. An experiment using an actual autonomous vehicle under real environment shows that data declined in quality in comparison to 3D LIDAR and it was possible to classify obstacle materials using 2D LIDAR.Keywords: obstacle, classification, database, LIDAR, segmentation, intensity
Procedia PDF Downloads 3472588 Contribution of Word Decoding and Reading Fluency on Reading Comprehension in Young Typical Readers of Kannada Language
Authors: Vangmayee V. Subban, Suzan Deelan. Pinto, Somashekara Haralakatta Shivananjappa, Shwetha Prabhu, Jayashree S. Bhat
Abstract:
Introduction and Need: During early years of schooling, the instruction in the schools mainly focus on children’s word decoding abilities. However, the skilled readers should master all the components of reading such as word decoding, reading fluency and comprehension. Nevertheless, the relationship between each component during the process of learning to read is less clear. The studies conducted in alphabetical languages have mixed opinion on relative contribution of word decoding and reading fluency on reading comprehension. However, the scenarios in alphasyllabary languages are unexplored. Aim and Objectives: The aim of the study was to explore the role of word decoding, reading fluency on reading comprehension abilities in children learning to read Kannada between the age ranges of 5.6 to 8.6 years. Method: In this cross sectional study, a total of 60 typically developing children, 20 each from Grade I, Grade II, Grade III maintaining equal gender ratio between the age range of 5.6 to 6.6 years, 6.7 to 7.6 years and 7.7 to 8.6 years respectively were selected from Kannada medium schools. The reading fluency and reading comprehension abilities of the children were assessed using Grade level passages selected from the Kannada text book of children core curriculum. All the passages consist of five questions to assess reading comprehension. The pseudoword decoding skills were assessed using 40 pseudowords with varying syllable length and their Akshara composition. Pseudowords are formed by interchanging the syllables within the meaningful word while maintaining the phonotactic constraints of Kannada language. The assessment material was subjected to content validation and reliability measures before collecting the data on the study samples. The data were collected individually, and reading fluency was assessed for words correctly read per minute. Pseudoword decoding was scored for the accuracy of reading. Results: The descriptive statistics indicated that the mean pseudoword reading, reading comprehension, words accurately read per minute increased with the Grades. The performance of Grade III children found to be higher, Grade I lower and Grade II remained intermediate of Grade III and Grade I. The trend indicated that reading skills gradually improve with the Grades. Pearson’s correlation co-efficient showed moderate and highly significant (p=0.00) positive co-relation between the variables, indicating the interdependency of all the three components required for reading. The hierarchical regression analysis revealed 37% variance in reading comprehension was explained by pseudoword decoding and was highly significant. Subsequent entry of reading fluency measure, there was no significant change in R-square and was only change 3%. Therefore, pseudoword-decoding evolved as a single most significant predictor of reading comprehension during early Grades of reading acquisition. Conclusion: The present study concludes that the pseudoword decoding skills contribute significantly to reading comprehension than reading fluency during initial years of schooling in children learning to read Kannada language.Keywords: alphasyllabary, pseudo-word decoding, reading comprehension, reading fluency
Procedia PDF Downloads 2602587 Metamorphic Computer Virus Classification Using Hidden Markov Model
Authors: Babak Bashari Rad
Abstract:
A metamorphic computer virus uses different code transformation techniques to mutate its body in duplicated instances. Characteristics and function of new instances are mostly similar to their parents, but they cannot be easily detected by the majority of antivirus in market, as they depend on string signature-based detection techniques. The purpose of this research is to propose a Hidden Markov Model for classification of metamorphic viruses in executable files. In the proposed solution, portable executable files are inspected to extract the instructions opcodes needed for the examination of code. A Hidden Markov Model trained on portable executable files is employed to classify the metamorphic viruses of the same family. The proposed model is able to generate and recognize common statistical features of mutated code. The model has been evaluated by examining the model on a test data set. The performance of the model has been practically tested and evaluated based on False Positive Rate, Detection Rate and Overall Accuracy. The result showed an acceptable performance with high average of 99.7% Detection Rate.Keywords: malware classification, computer virus classification, metamorphic virus, metamorphic malware, Hidden Markov Model
Procedia PDF Downloads 3142586 Road Vehicle Recognition Using Magnetic Sensing Feature Extraction and Classification
Authors: Xiao Chen, Xiaoying Kong, Min Xu
Abstract:
This paper presents a road vehicle detection approach for the intelligent transportation system. This approach mainly uses low-cost magnetic sensor and associated data collection system to collect magnetic signals. This system can measure the magnetic field changing, and it also can detect and count vehicles. We extend Mel Frequency Cepstral Coefficients to analyze vehicle magnetic signals. Vehicle type features are extracted using representation of cepstrum, frame energy, and gap cepstrum of magnetic signals. We design a 2-dimensional map algorithm using Vector Quantization to classify vehicle magnetic features to four typical types of vehicles in Australian suburbs: sedan, VAN, truck, and bus. Experiments results show that our approach achieves a high level of accuracy for vehicle detection and classification.Keywords: vehicle classification, signal processing, road traffic model, magnetic sensing
Procedia PDF Downloads 3182585 Comparative Study of Accuracy of Land Cover/Land Use Mapping Using Medium Resolution Satellite Imagery: A Case Study
Authors: M. C. Paliwal, A. K. Jain, S. K. Katiyar
Abstract:
Classification of satellite imagery is very important for the assessment of its accuracy. In order to determine the accuracy of the classified image, usually the assumed-true data are derived from ground truth data using Global Positioning System. The data collected from satellite imagery and ground truth data is then compared to find out the accuracy of data and error matrices are prepared. Overall and individual accuracies are calculated using different methods. The study illustrates advanced classification and accuracy assessment of land use/land cover mapping using satellite imagery. IRS-1C-LISS IV data were used for classification of satellite imagery. The satellite image was classified using the software in fourteen classes namely water bodies, agricultural fields, forest land, urban settlement, barren land and unclassified area etc. Classification of satellite imagery and calculation of accuracy was done by using ERDAS-Imagine software to find out the best method. This study is based on the data collected for Bhopal city boundaries of Madhya Pradesh State of India.Keywords: resolution, accuracy assessment, land use mapping, satellite imagery, ground truth data, error matrices
Procedia PDF Downloads 5052584 MSIpred: A Python 2 Package for the Classification of Tumor Microsatellite Instability from Tumor Mutation Annotation Data Using a Support Vector Machine
Authors: Chen Wang, Chun Liang
Abstract:
Microsatellite instability (MSI) is characterized by high degree of polymorphism in microsatellite (MS) length due to a deficiency in mismatch repair (MMR) system. MSI is associated with several tumor types and its status can be considered as an important indicator for tumor prognostic. Conventional clinical diagnosis of MSI examines PCR products of a panel of MS markers using electrophoresis (MSI-PCR) which is laborious, time consuming, and less reliable. MSIpred, a python 2 package for automatic classification of MSI was released by this study. It computes important somatic mutation features from files in mutation annotation format (MAF) generated from paired tumor-normal exome sequencing data, subsequently using these to predict tumor MSI status with a support vector machine (SVM) classifier trained by MAF files of 1074 tumors belonging to four types. Evaluation of MSIpred on an independent 358-tumor test set achieved overall accuracy of over 98% and area under receiver operating characteristic (ROC) curve of 0.967. These results indicated that MSIpred is a robust pan-cancer MSI classification tool and can serve as a complementary diagnostic to MSI-PCR in MSI diagnosis.Keywords: microsatellite instability, pan-cancer classification, somatic mutation, support vector machine
Procedia PDF Downloads 1682583 The Effect of Feature Selection on Pattern Classification
Authors: Chih-Fong Tsai, Ya-Han Hu
Abstract:
The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.Keywords: data mining, feature selection, pattern classification, dimensionality reduction
Procedia PDF Downloads 6672582 Words Spotting in the Images Handwritten Historical Documents
Authors: Issam Ben Jami
Abstract:
Information retrieval in digital libraries is very important because most famous historical documents occupy a significant value. The word spotting in historical documents is a very difficult notion, because automatic recognition of such documents is naturally cursive, it represents a wide variability in the level scale and translation words in the same documents. We first present a system for the automatic recognition, based on the extraction of interest points words from the image model. The extraction phase of the key points is chosen from the representation of the image as a synthetic description of the shape recognition in a multidimensional space. As a result, we use advanced methods that can find and describe interesting points invariant to scale, rotation and lighting which are linked to local configurations of pixels. We test this approach on documents of the 15th century. Our experiments give important results.Keywords: feature matching, historical documents, pattern recognition, word spotting
Procedia PDF Downloads 2732581 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.Keywords: classification algorithms, data mining, knowledge discovery, tourism
Procedia PDF Downloads 2942580 Accuracy Improvement of Traffic Participant Classification Using Millimeter-Wave Radar by Leveraging Simulator Based on Domain Adaptation
Authors: Tokihiko Akita, Seiichi Mita
Abstract:
A millimeter-wave radar is the most robust against adverse environments, making it an essential environment recognition sensor for automated driving. However, the reflection signal is sparse and unstable, so it is difficult to obtain the high recognition accuracy. Deep learning provides high accuracy even for them in recognition, but requires large scale datasets with ground truth. Specially, it takes a lot of cost to annotate for a millimeter-wave radar. For the solution, utilizing a simulator that can generate an annotated huge dataset is effective. Simulation of the radar is more difficult to match with real world data than camera image, and recognition by deep learning with higher-order features using the simulator causes further deviation. We have challenged to improve the accuracy of traffic participant classification by fusing simulator and real-world data with domain adaptation technique. Experimental results with the domain adaptation network created by us show that classification accuracy can be improved even with a few real-world data.Keywords: millimeter-wave radar, object classification, deep learning, simulation, domain adaptation
Procedia PDF Downloads 912579 Attribute Index and Classification Method of Earthquake Damage Photographs of Engineering Structure
Authors: Ming Lu, Xiaojun Li, Bodi Lu, Juehui Xing
Abstract:
Earthquake damage phenomenon of each large earthquake gives comprehensive and profound real test to the dynamic performance and failure mechanism of different engineering structures. Cognitive engineering structure characteristics through seismic damage phenomenon are often far superior to expensive shaking table experiments. After the earthquake, people will record a variety of different types of engineering damage photos. However, a large number of earthquake damage photographs lack sufficient information and reduce their using value. To improve the research value and the use efficiency of engineering seismic damage photographs, this paper objects to explore and show seismic damage background information, which includes the earthquake magnitude, earthquake intensity, and the damaged structure characteristics. From the research requirement in earthquake engineering field, the authors use the 2008 China Wenchuan M8.0 earthquake photographs, and provide four kinds of attribute indexes and classification, which are seismic information, structure types, earthquake damage parts and disaster causation factors. The final object is to set up an engineering structural seismic damage database based on these four attribute indicators and classification, and eventually build a website providing seismic damage photographs.Keywords: attribute index, classification method, earthquake damage picture, engineering structure
Procedia PDF Downloads 7632578 Classification of Cosmological Wormhole Solutions in the Framework of General Relativity
Authors: Usamah Al-Ali
Abstract:
We explore the effect of expanding space on the exoticity of the matter supporting a traversable Lorentzian wormhole of zero radial tide whose line element is given by ds2 = dt^2 − a^2(t)[ dr^2/(1 − kr2 −b(r)/r)+ r2dΩ^2 in the context of General Relativity. This task is achieved by deriving the Einstein field equations for anisotropic matter field corresponding to the considered cosmological wormhole metric and performing a classification of their solutions on the basis of a variable equations of state (EoS) of the form p = ω(r)ρ. Explicit forms of the shape function b(r) and the scale factor a(t) arising in the classification are utilized to construct the corresponding energy-momentum tensor where the energy conditions for each case is investigated. While the violation of energy conditions is inevitable in case of static wormholes, the classification we performed leads to interesting solutions in which this violation is either reduced or eliminated.Keywords: general relativity, Einstein field equations, energy conditions, cosmological wormhole
Procedia PDF Downloads 622577 Fat-Tail Test of Regulatory DNA Sequences
Authors: Jian-Jun Shu
Abstract:
The statistical properties of CRMs are explored by estimating similar-word set occurrence distribution. It is observed that CRMs tend to have a fat-tail distribution for similar-word set occurrence. Thus, the fat-tail test with two fatness coefficients is proposed to distinguish CRMs from non-CRMs, especially from exons. For the first fatness coefficient, the separation accuracy between CRMs and exons is increased as compared with the existing content-based CRM prediction method – fluffy-tail test. For the second fatness coefficient, the computing time is reduced as compared with fluffy-tail test, making it very suitable for long sequences and large data-base analysis in the post-genome time. Moreover, these indexes may be used to predict the CRMs which have not yet been observed experimentally. This can serve as a valuable filtering process for experiment.Keywords: statistical approach, transcription factor binding sites, cis-regulatory modules, DNA sequences
Procedia PDF Downloads 2892576 On the Interactive Search with Web Documents
Authors: Mario Kubek, Herwig Unger
Abstract:
Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documentsKeywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking
Procedia PDF Downloads 3912575 Application of Argumentation for Improving the Classification Accuracy in Inductive Concept Formation
Authors: Vadim Vagin, Marina Fomina, Oleg Morosin
Abstract:
This paper contains the description of argumentation approach for the problem of inductive concept formation. It is proposed to use argumentation, based on defeasible reasoning with justification degrees, to improve the quality of classification models, obtained by generalization algorithms. The experiment’s results on both clear and noisy data are also presented.Keywords: argumentation, justification degrees, inductive concept formation, noise, generalization
Procedia PDF Downloads 4402574 Information Retrieval for Kafficho Language
Authors: Mareye Zeleke Mekonen
Abstract:
The Kafficho language has distinct issues in information retrieval because of its restricted resources and dearth of standardized methods. In this endeavor, with the cooperation and support of linguists and native speakers, we investigate the creation of information retrieval systems specifically designed for the Kafficho language. The Kafficho information retrieval system allows Kafficho speakers to access information easily in an efficient and effective way. Our objective is to conduct an information retrieval experiment using 220 Kafficho text files, including fifteen sample questions. Tokenization, normalization, stop word removal, stemming, and other data pre-processing chores, together with additional tasks like term weighting, were prerequisites for the vector space model to represent each page and a particular query. The three well-known measurement metrics we used for our word were Precision, Recall, and and F-measure, with values of 87%, 28%, and 35%, respectively. This demonstrates how well the Kaffiho information retrieval system performed well while utilizing the vector space paradigm.Keywords: Kafficho, information retrieval, stemming, vector space
Procedia PDF Downloads 55