Search results for: RNA-related genomic features
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3964

Search results for: RNA-related genomic features

3934 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 294
3933 An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods

Authors: Issa Qabaja, Fadi Thabtah

Abstract:

Email phishing classification is one of the vital problems in the online security research domain that have attracted several scholars due to its impact on the users payments performed daily online. One aspect to reach a good performance by the detection algorithms in the email phishing problem is to identify the minimal set of features that significantly have an impact on raising the phishing detection rate. This paper investigate three known feature selection methods named Information Gain (IG), Chi-square and Correlation Features Set (CFS) on the email phishing problem to separate high influential features from low influential ones in phishing detection. We measure the degree of influentially by applying four data mining algorithms on a large set of features. We compare the accuracy of these algorithms on the complete features set before feature selection has been applied and after feature selection has been applied. After conducting experiments, the results show 12 common significant features have been chosen among the considered features by the feature selection methods. Further, the average detection accuracy derived by the data mining algorithms on the reduced 12-features set was very slight affected when compared with the one derived from the 47-features set.

Keywords: data mining, email classification, phishing, online security

Procedia PDF Downloads 407
3932 Genodata: The Human Genome Variation Using BigData

Authors: Surabhi Maiti, Prajakta Tamhankar, Prachi Uttam Mehta

Abstract:

Since the accomplishment of the Human Genome Project, there has been an unparalled escalation in the sequencing of genomic data. This project has been the first major vault in the field of medical research, especially in genomics. This project won accolades by using a concept called Bigdata which was earlier, extensively used to gain value for business. Bigdata makes use of data sets which are generally in the form of files of size terabytes, petabytes, or exabytes and these data sets were traditionally used and managed using excel sheets and RDBMS. The voluminous data made the process tedious and time consuming and hence a stronger framework called Hadoop was introduced in the field of genetic sciences to make data processing faster and efficient. This paper focuses on using SPARK which is gaining momentum with the advancement of BigData technologies. Cloud Storage is an effective medium for storage of large data sets which is generated from the genetic research and the resultant sets produced from SPARK analysis.

Keywords: human genome project, Bigdata, genomic data, SPARK, cloud storage, Hadoop

Procedia PDF Downloads 228
3931 Exploring Syntactic and Semantic Features for Text-Based Authorship Attribution

Authors: Haiyan Wu, Ying Liu, Shaoyun Shi

Abstract:

Authorship attribution is to extract features to identify authors of anonymous documents. Many previous works on authorship attribution focus on statistical style features (e.g., sentence/word length), content features (e.g., frequent words, n-grams). Modeling these features by regression or some transparent machine learning methods gives a portrait of the authors' writing style. But these methods do not capture the syntactic (e.g., dependency relationship) or semantic (e.g., topics) information. In recent years, some researchers model syntactic trees or latent semantic information by neural networks. However, few works take them together. Besides, predictions by neural networks are difficult to explain, which is vital in authorship attribution tasks. In this paper, we not only utilize the statistical style and content features but also take advantage of both syntactic and semantic features. Different from an end-to-end neural model, feature selection and prediction are two steps in our method. An attentive n-gram network is utilized to select useful features, and logistic regression is applied to give prediction and understandable representation of writing style. Experiments show that our extracted features can improve the state-of-the-art methods on three benchmark datasets.

Keywords: authorship attribution, attention mechanism, syntactic feature, feature extraction

Procedia PDF Downloads 106
3930 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 118
3929 Using New Machine Algorithms to Classify Iranian Musical Instruments According to Temporal, Spectral and Coefficient Features

Authors: Ronak Khosravi, Mahmood Abbasi Layegh, Siamak Haghipour, Avin Esmaili

Abstract:

In this paper, a study on classification of musical woodwind instruments using a small set of features selected from a broad range of extracted ones by the sequential forward selection method was carried out. Firstly, we extract 42 features for each record in the music database of 402 sound files belonging to five different groups of Flutes (end blown and internal duct), Single –reed, Double –reed (exposed and capped), Triple reed and Quadruple reed. Then, the sequential forward selection method is adopted to choose the best feature set in order to achieve very high classification accuracy. Two different classification techniques of support vector machines and relevance vector machines have been tested out and an accuracy of up to 96% can be achieved by using 21 time, frequency and coefficient features and relevance vector machine with the Gaussian kernel function.

Keywords: coefficient features, relevance vector machines, spectral features, support vector machines, temporal features

Procedia PDF Downloads 290
3928 Breeding Cotton for Annual Growth Habit: Remobilizing End-of-season Perennial Reserves for Increased Yield

Authors: Salman Naveed, Nitant Gandhi, Grant Billings, Zachary Jones, B. Todd Campbell, Michael Jones, Sachin Rustgi

Abstract:

Cotton (Gossypium spp.) is the primary source of natural fiber in the U.S. and a major crop in the Southeastern U.S. Despite constant efforts to increase the cotton fiber yield, the yield gain has stagnated. Therefore, we undertook a novel approach to improve the cotton fiber yield by altering its growth habit from perennial to annual. In this effort, we identified genotypes with high-expression alleles of five floral induction and meristem identity genes (FT, SOC1, FUL, LFY, and AP1) from an upland cotton mini-core collection and crossed them in various combinations to develop cotton lines with annual growth habit, optimal flowering time and enhanced productivity. To facilitate the characterization of genotypes with the desired combinations of stacked alleles, we identified markers associated with the gene expression traits via genome-wide association analysis using a 63K SNP Array (Hulse-Kemp et al. 2015 G3 5:1187). Over 14,500 SNPs showed polymorphism and were used for association analysis. A total of 396 markers showed association with expression traits. Out of these 396 markers, 159 mapped to genes, 50 to untranslated regions, and 187 to random genomic regions. Biased genomic distribution of associated markers was observed where more trait-associated markers mapped to the cotton D sub-genome. Many quantitative trait loci coincided at specific genomic regions. This observation has implications as these traits could be bred together. The analysis also allowed the identification of candidate regulators of the expression patterns of these floral induction and meristem identity genes whose functions will be validated via virus-induced gene silencing.

Keywords: cotton, GWAS, QTL, expression traits

Procedia PDF Downloads 125
3927 Exploring Chess Game AI Features Application

Authors: Bashayer Almalki, Mayar Bajrai, Dana Mirah, Kholood Alghamdi, Hala Sanyour

Abstract:

This research aims to investigate the features of an AI chess app that are most preferred by users. A questionnaire was used as the methodology to gather responses from a varied group of participants. The questionnaire consisted of several questions related to the features of the AI chess app. The responses were analyzed using descriptive statistics and factor analysis. The findings indicate that the most preferred features of an AI chess app are the ability to play against the computer, the option to adjust the difficulty level, and the availability of tutorials and puzzles. The results of this research could be useful for developers of AI chess apps to enhance the user experience and satisfaction.

Keywords: chess, game, application, computics

Procedia PDF Downloads 45
3926 Research on Perceptual Features of Couchsurfers on New Hospitality Tourism Platform Couchsurfing

Authors: Yuanxiang Miao

Abstract:

This paper aims to examine the perceptual features of couchsurfers on a new hospitality tourism platform, the free homestay website couchsurfing. As a local host, the author has accepted 61 couchsurfers in Kyoto, Japan, and attempted to figure out couchsurfers' characteristics on perception by hosting them. Moreover, the methodology of this research is mainly based on in-depth interviews, by talking with couchsurfers, observing their behaviors, doing questionnaires, etc. Five dominant perceptual features of couchsurfers were identified: (1) Trusting; (2) Meeting; (3) Sharing; (4) Reciprocity; (5) Worries. The value of this research lies in figuring out a deeper understanding of the perceptual features of couchsurfers, and the author indeed hosted and stayed with 61 couchsurfers from 30 countries and areas over one year. Lastly, the author offers practical suggestions for future research.

Keywords: couchsurfing, depth interview, hospitality tourism, perceptual features

Procedia PDF Downloads 122
3925 The Latent Model of Linguistic Features in Korean College Students’ L2 Argumentative Writings: Syntactic Complexity, Lexical Complexity, and Fluency

Authors: Jiyoung Bae, Gyoomi Kim

Abstract:

This study explores a range of linguistic features used in Korean college students’ argumentative writings for the purpose of developing a model that identifies variables which predict writing proficiencies. This study investigated the latent variable structure of L2 linguistic features, including syntactic complexity, the lexical complexity, and fluency. One hundred forty-six university students in Korea participated in this study. The results of the study’s confirmatory factor analysis (CFA) showed that indicators of linguistic features from this study-provided a foundation for re-categorizing indicators found in extant research on L2 Korean writers depending on each latent variable of linguistic features. The CFA models indicated one measurement model of L2 syntactic complexity and L2 learners’ writing proficiency; these two latent factors were correlated with each other. Based on the overall findings of the study, integrated linguistic features of L2 writings suggested some pedagogical implications in L2 writing instructions.

Keywords: linguistic features, syntactic complexity, lexical complexity, fluency

Procedia PDF Downloads 144
3924 Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset

Authors: Essam Al Daoud

Abstract:

Gradient boosting methods have been proven to be a very important strategy. Many successful machine learning solutions were developed using the XGBoost and its derivatives. The aim of this study is to investigate and compare the efficiency of three gradient methods. Home credit dataset is used in this work which contains 219 features and 356251 records. However, new features are generated and several techniques are used to rank and select the best features. The implementation indicates that the LightGBM is faster and more accurate than CatBoost and XGBoost using variant number of features and records.

Keywords: gradient boosting, XGBoost, LightGBM, CatBoost, home credit

Procedia PDF Downloads 134
3923 Predicting Open Chromatin Regions in Cell-Free DNA Whole Genome Sequencing Data by Correlation Clustering  

Authors: Fahimeh Palizban, Farshad Noravesh, Amir Hossein Saeidian, Mahya Mehrmohamadi

Abstract:

In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the blood and contribute to a pool of circulating fragments called cell-free DNA. Accordingly, identifying the tissue origin of these DNA fragments from the plasma can result in more accurate and fast disease diagnosis and precise treatment protocols. Open chromatin regions are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. There have been several studies in the area of cancer liquid biopsy that integrate distinct genomic and epigenomic features for early cancer detection along with tissue of origin detection. However, multimodal analysis requires several types of experiments to cover the genomic and epigenomic aspects of a single sample, which will lead to a huge amount of cost and time. To overcome these limitations, the idea of predicting OCRs from WGS is of particular importance. In this regard, we proposed a computational approach to target the prediction of open chromatin regions as an important epigenetic feature from cell-free DNA whole genome sequence data. To fulfill this objective, local sequencing depth will be fed to our proposed algorithm and the prediction of the most probable open chromatin regions from whole genome sequencing data can be carried out. Our method integrates the signal processing method with sequencing depth data and includes count normalization, Discrete Fourie Transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of the clustering (open chromatin region+, open chromatin region-) with previously validated open chromatin regions related to human blood samples of the ATAC-DB database. The percentage of overlap between predicted open chromatin regions and the experimentally validated regions obtained by ATAC-seq in ATAC-DB is greater than 67%, which indicates meaningful prediction. As it is evident, OCRs are mostly located in the transcription start sites (TSS) of the genes. In this regard, we compared the concordance between the predicted OCRs and the human genes TSS regions obtained from refTSS and it showed proper accordance around 52.04% and ~78% with all and the housekeeping genes, respectively. Accurately detecting open chromatin regions from plasma cell-free DNA-seq data is a very challenging computational problem due to the existence of several confounding factors, such as technical and biological variations. Although this approach is in its infancy, there has already been an attempt to apply it, which leads to a tool named OCRDetector with some restrictions like the need for highly depth cfDNA WGS data, prior information about OCRs distribution, and considering multiple features. However, we implemented a graph signal clustering based on a single depth feature in an unsupervised learning manner that resulted in faster performance and decent accuracy. Overall, we tried to investigate the epigenomic pattern of a cell-free DNA sample from a new computational perspective that can be used along with other tools to investigate genetic and epigenetic aspects of a single whole genome sequencing data for efficient liquid biopsy-related analysis.

Keywords: open chromatin regions, cancer, cell-free DNA, epigenomics, graph signal processing, correlation clustering

Procedia PDF Downloads 114
3922 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML

Procedia PDF Downloads 106
3921 Hybrid Anomaly Detection Using Decision Tree and Support Vector Machine

Authors: Elham Serkani, Hossein Gharaee Garakani, Naser Mohammadzadeh, Elaheh Vaezpour

Abstract:

Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.

Keywords: decision tree, feature selection, intrusion detection system, support vector machine

Procedia PDF Downloads 236
3920 Applying EzRAD Method for SNPs Discovery in Population Genetics of Freshwater and Marine Fish in the South of Vietnam

Authors: Quyen Vu Dang Ha, Oanh Truong Thi, Thuoc Tran Linh, Kent Carpenter, Thinh Doan Vu, Binh Dang Thuy

Abstract:

Enzyme restriction site associated DNA (EzRAD) has recently emerged as a promising genomic approach for exploring fish genetic diversity on a genome-wide scale. This is a simplified method for genomic genotyping in non-model organisms and applied for SNPs discovery in the population genetics of freshwater and marine fish in the South of Vietnam. The observations of regional-scale differentiation of commercial freshwater fish (smallscale croakers Boesemania microlepis) and marine fish (emperor Lethrinus lentjan) are clarified. Samples were collected along Hau River and coastal area in the south and center Vietnam. 52 DNA samples from Tra Vinh, An Giang Province for Boesemania microlepis and 34 DNA samples of Lethrinus lentjan from Phu Quoc, Nha Trang, Da Nang Province were used to prepare EzRAD libraries from genomic DNA digested with MboI and Sau3AI. A pooled sample of regional EzRAD libraries was sequenced using the HiSeq 2500 Illumina platform. For Boesemania microlepis, the small scale population different from upstream to downstream of Hau river were detected, An Giang population exhibited less genetic diversity (SNPs per individual from 14 to 926), in comparison to Tra Vinh population (from 11 to 2172). For Lethrinus lentjan, the result showed the minor difference between populations in the Northern and the Southern Mekong River. The numbers of contigs and SNPs vary from 1315 to 2455 and from 7122 to 8594, respectively (P ≤ 0.01). The current preliminary study reveals regional scale population disconnection probably reflecting environmental changing. Additional sampling and EzRad libraries need to be implemented for resource management in the Mekong Delta.

Keywords: Boesemania microlepis, EzRAD, Lethrinus lentjan, SNPs

Procedia PDF Downloads 474
3919 Task Distraction vs. Visual Enhancement: Which Is More Effective?

Authors: Huangmei Liu, Si Liu, Jia’nan Liu

Abstract:

The present experiment investigated and compared the effectiveness of two kinds of methods of attention control: Task distraction and visual enhancement. In the study, the effectiveness of task distractions to explicit features and of visual enhancement to implicit features of the same group of Chinese characters were compared based on their effect on the participants’ reaction time, subjective confidence rating, and verbal report. We found support that the visual enhancement on implicit features did overcome the contrary effect of training distraction and led to awareness of those implicit features, at least to some extent.

Keywords: task distraction, visual enhancement, attention, awareness, learning

Procedia PDF Downloads 409
3918 Security Features for Remote Healthcare System: A Feasibility Study

Authors: Tamil Chelvi Vadivelu, Nurazean Maarop, Rasimah Che Yusoff, Farhana Aini Saludin

Abstract:

Implementing a remote healthcare system needs to consider many security features. Therefore, before any deployment of the remote healthcare system, a feasibility study from the security perspective is crucial. Remote healthcare system using WBAN technology has been used in other countries for medical purposes but in Malaysia, such projects are still not yet implemented. This study was conducted qualitatively. The interview results involving five healthcare practitioners are further elaborated. The study has addressed four important security features in order to incorporate remote healthcare system using WBAN in Malaysian government hospitals.

Keywords: remote healthcare, IT security, security features, wireless sensor application

Procedia PDF Downloads 275
3917 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 476
3916 Systems Versioning: A Features-Based Meta-Modeling Approach

Authors: Ola A. Younis, Said Ghoul

Abstract:

Systems running these days are huge, complex and exist in many versions. Controlling these versions and tracking their changes became a very hard process as some versions are created using meaningless names or specifications. Many versions of a system are created with no clear difference between them. This leads to mismatching between a user’s request and the version he gets. In this paper, we present a system versions meta-modeling approach that produces versions based on system’s features. This model reduced the number of steps needed to configure a release and gave each version its unique specifications. This approach is applicable for systems that use features in its specification.

Keywords: features, meta-modeling, semantic modeling, SPL, VCS, versioning

Procedia PDF Downloads 416
3915 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1

Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar

Abstract:

Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.

Keywords: comparative genomics, DNA sequencing, phage, phylogenomics

Procedia PDF Downloads 156
3914 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 49
3913 Training a Neural Network Using Input Dropout with Aggressive Reweighting (IDAR) on Datasets with Many Useless Features

Authors: Stylianos Kampakis

Abstract:

This paper presents a new algorithm for neural networks called “Input Dropout with Aggressive Re-weighting” (IDAR) aimed specifically at datasets with many useless features. IDAR combines two techniques (dropout of input neurons and aggressive re weighting) in order to eliminate the influence of noisy features. The technique can be seen as a generalization of dropout. The algorithm is tested on two different benchmark data sets: a noisy version of the iris dataset and the MADELON data set. Its performance is compared against three other popular techniques for dealing with useless features: L2 regularization, LASSO and random forests. The results demonstrate that IDAR can be an effective technique for handling data sets with many useless features.

Keywords: neural networks, feature selection, regularization, aggressive reweighting

Procedia PDF Downloads 429
3912 An Automatic Feature Extraction Technique for 2D Punch Shapes

Authors: Awais Ahmad Khan, Emad Abouel Nasr, H. M. A. Hussein, Abdulrahman Al-Ahmari

Abstract:

Sheet-metal parts have been widely applied in electronics, communication and mechanical industries in recent decades; but the advancement in sheet-metal part design and manufacturing is still behind in comparison with the increasing importance of sheet-metal parts in modern industry. This paper presents a methodology for automatic extraction of some common 2D internal sheet metal features. The features used in this study are taken from Unipunch ™ catalogue. The extraction process starts with the data extraction from STEP file using an object oriented approach and with the application of suitable algorithms and rules, all features contained in the catalogue are automatically extracted. Since the extracted features include geometry and engineering information, they will be effective for downstream application such as feature rebuilding and process planning.

Keywords: feature extraction, internal features, punch shapes, sheet metal

Procedia PDF Downloads 589
3911 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis

Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

Abstract:

The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Keywords: auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter

Procedia PDF Downloads 398
3910 Enterprise Information Portal Features: Results of Content Analysis Literature Review

Authors: Michal Krčál

Abstract:

Since their introduction in 1990’s, Enterprise Information Portals (EIPs) were investigated from different perspectives (e.g. project management, technology acceptance, IS success). However, no systematic literature review was produced to systematize both the research efforts and the technology itself. This paper reports first results of an extent systematic literature review study focused on research of EIPs and its categorization, specifically it reports a conceptual model of EIP features. The previous attempt to categorize EIP features was published in 2002. For the purpose of the literature review, content of 89 articles was analyzed in order to identify and categorize features of EIPs. The methodology of the literature review was as follows. Firstly, search queries in major indexing databases (Web of Science and SCOPUS) were used. The results of queries were analyzed according to their usability for the goal of the study. Then, full-texts were coded in Atlas.ti according to previously established coding scheme. The codes were categorized and the conceptual model of EIP features was created.

Keywords: enterprise information portal, content analysis, features, systematic literature review

Procedia PDF Downloads 272
3909 Content-Based Image Retrieval Using HSV Color Space Features

Authors: Hamed Qazanfari, Hamid Hassanpour, Kazem Qazanfari

Abstract:

In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.

Keywords: content-based image retrieval, color difference histogram, efficient features selection, entropy, correlation

Procedia PDF Downloads 225
3908 Investigating the Stylistic Features of Advertising: Ad Design and Creation

Authors: Asma Ben Abdallah

Abstract:

Language has a powerful influence over people and their actions. The language of advertising has a very great impact on the consumer. It makes use of different features from the linguistic continuum. The present paper attempts to apply the theories of stylistics to the analysis of advertising texts. In order to decipher the stylistic features of the advertising discourse, 30 advertising text samples designed by MA Business students have been selected. These samples have been analyzed at the level of design and content. The study brings insights into the use of stylistic devices in advertising, and it reveals that both linguistic and non-linguistic features of advertisements are frequently employed to develop a well-thought-out design and content. The practical significance of the study is to highlight the specificities of the advertising genre so that people interested in the language of advertising (Business students and ESP teachers) will have a better understanding of the nature of the language used and the techniques of writing and designing ads. Similarly, those working in the advertising sphere (ad designers) will appreciate the specificities of the advertising discourse.

Keywords: the language of advertising, advertising discourse, ad design, stylistic features

Procedia PDF Downloads 209
3907 Morphological Features Fusion for Identifying INBREAST-Database Masses Using Neural Networks and Support Vector Machines

Authors: Nadia el Atlas, Mohammed el Aroussi, Mohammed Wahbi

Abstract:

In this paper a novel technique of mass characterization based on robust features-fusion is presented. The proposed method consists of mainly four stages: (a) the first phase involves segmenting the masses using edge information’s. (b) The second phase is to calculate and fuse the most relevant morphological features. (c) The last phase is the classification step which allows us to classify the images into benign and malignant masses. In this step we have implemented Support Vectors Machines (SVM) and Artificial Neural Networks (ANN), which were evaluated with the following performance criteria: confusion matrix, accuracy, sensitivity, specificity, receiver operating characteristic ROC, and error histogram. The effectiveness of this new approach was evaluated by a recently developed database: INBREAST database. The fusion of the most appropriate morphological features provided very good results. The SVM gives accuracy to within 64.3%. Whereas the ANN classifier gives better results with an accuracy of 97.5%.

Keywords: breast cancer, mammography, CAD system, features, fusion

Procedia PDF Downloads 570
3906 Using Priority Order of Basic Features for Circumscribed Masses Detection in Mammograms

Authors: Minh Dong Le, Viet Dung Nguyen, Do Huu Viet, Nguyen Huu Tu

Abstract:

In this paper, we present a new method for circumscribed masses detection in mammograms. Our method is evaluated on 23 mammographic images of circumscribed masses and 20 normal mammograms from public Mini-MIAS database. The method is quite sanguine with sensitivity (SE) of 95% with only about 1 false positive per image (FPpI). To achieve above results we carry out a progression following: Firstly, the input images are preprocessed with the aim to enhance key information of circumscribed masses; Next, we calculate and evaluate statistically basic features of abnormal regions on training database; Then, mammograms on testing database are divided into equal blocks which calculated corresponding features. Finally, using priority order of basic features to classify blocks as an abnormal or normal regions.

Keywords: mammograms, circumscribed masses, evaluated statistically, priority order of basic features

Procedia PDF Downloads 308
3905 1D Convolutional Networks to Compute Mel-Spectrogram, Chromagram, and Cochleogram for Audio Networks

Authors: Elias Nemer, Greg Vines

Abstract:

Time-frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Cochleogram have been proven more effective and convenient than training on-time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different features. In this paper, we provide a PyTorch framework for creating various spectral features as well as time-frequency transformation and time-domain filter-banks using the built-in trainable conv1d() layer. This allows computing these features on the fly as part of a larger network and enabling easier experimentation with various combinations and parameters. Our work extends the work in the literature developed for that end: First, by adding more of these features and also by allowing the possibility of either starting from initialized kernels or training them from random values. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes or simply use as is and add more layers for various applications.

Keywords: neural networks Mel-Spectrogram, chromagram, cochleogram, discrete Fourrier transform, PyTorch conv1d()

Procedia PDF Downloads 200