Search results for: DNA sequences classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2637

Search results for: DNA sequences classification

2487 Well Log Sequences Stratigraphy and Potential Reservoirs of Wells KF-1and KF-2; Kribi Oil Field, Douala-Kribi-Campo Basin, Cameroon

Authors: Nkwanyang L. Takem, Christopher M. Agyingi

Abstract:

Background and aim: An integrated interpretation of wireline logs and lithology of two selected wells (KF-1 and KF-2) of Kribi oil field within the southeastern offshore Douala/Kribi Campo Basin was carried out for sequence stratigraphic analysis of sediments penetrated by the wells. Methods: The stratigraphic units within the wells were subdivided into depositional sequences using characteristic well log patterns that were deposited between Tertiary Miocene to lower Cretaceous. Results: Nine (9) and eight (8) depositional sequences were identified respectively for KF-1 and KF-2. The sequences comprise LST (progradational packages), TSTs (retrogradational packages) and HSTs (aggradational packages), which reflect depositional systems deposited during different phases of base-level changes. The (LST) consists of Basin Floor Fans (BFF), Slope Fans and Channel Sands deposited when sea level was low and accommodation space lower than rate of sediment influx. TST consists of retrogradational marine shales deposited during high relative sea levels and when accommodation space was higher than rate of sediment influx. HST consisted of shoreface sands displaying mostly aggradational to progradational stacking patterns. Conclusion: The rapid facies changes between successive systems tracts provide potential stratigraphic traps. Reservoir stratification and continuity vary greatly between systems tracts and this enhanced development of stratigraphic traps in the area. Basin floor fans comprise sandstone of good reservoir quality, thus huge accumulation of HC can be trapped in this reservoirs.

Keywords: Douala-Kribi-Campo Basin, reservoirs, sequence strastigraphyy, system tracks

Procedia PDF Downloads 533
2486 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 30
2485 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
2484 Feature Extraction and Classification Based on the Bayes Test for Minimum Error

Authors: Nasar Aldian Ambark Shashoa

Abstract:

Classification with a dimension reduction based on Bayesian approach is proposed in this paper . The first step is to generate a sample (parameter) of fault-free mode class and faulty mode class. The second, in order to obtain good classification performance, a selection of important features is done with the discrete karhunen-loeve expansion. Next, the Bayes test for minimum error is used to classify the classes. Finally, the results for simulated data demonstrate the capabilities of the proposed procedure.

Keywords: analytical redundancy, fault detection, feature extraction, Bayesian approach

Procedia PDF Downloads 501
2483 Network Traffic Classification Scheme for Internet Network Based on Application Categorization for Ipv6

Authors: Yaser Miaji, Mohammed Aloryani

Abstract:

The rise of recent applications in everyday implementation like videoconferencing, online recreation and voice speech communication leads to pressing the need for novel mechanism and policy to serve this steep improvement within the application itself and users‟ wants. This diversity in web traffics needs some classification and prioritization of the traffics since some traffics merit abundant attention with less delay and loss, than others. This research is intended to reinforce the mechanism by analysing the performance in application according to the proposed mechanism implemented. The mechanism used is quite direct and analytical. The mechanism is implemented by modifying the queue limit in the algorithm.

Keywords: traffic classification, IPv6, internet, application categorization

Procedia PDF Downloads 534
2482 An Unbiased Profiling of Immune Repertoire via Sequencing and Analyzing T-Cell Receptor Genes

Authors: Yi-Lin Chen, Sheng-Jou Hung, Tsunglin Liu

Abstract:

Adaptive immune system recognizes a wide range of antigens via expressing a large number of structurally distinct T cell and B cell receptor genes. The distinct receptor genes arise from complex rearrangements called V(D)J recombination, and constitute the immune repertoire. A common method of profiling immune repertoire is via amplifying recombined receptor genes using multiple primers and high-throughput sequencing. This multiplex-PCR approach is efficient; however, the resulting repertoire can be distorted because of primer bias. To eliminate primer bias, 5’ RACE is an alternative amplification approach. However, the application of RACE approach is limited by its low efficiency (i.e., the majority of data are non-regular receptor sequences, e.g., containing intronic segments) and lack of the convenient tool for analysis. We propose a computational tool that can correctly identify non-regular receptor sequences in RACE data via aligning receptor sequences against the whole gene instead of only the exon regions as done in all other tools. Using our tool, the remaining regular data allow for an accurate profiling of immune repertoire. In addition, a RACE approach is improved to yield a higher fraction of regular T-cell receptor sequences. Finally, we quantify the degree of primer bias of a multiplex-PCR approach via comparing it to the RACE approach. The results reveal significant differences in frequency of VJ combination by the two approaches. Together, we provide a new experimental and computation pipeline for an unbiased profiling of immune repertoire. As immune repertoire profiling has many applications, e.g., tracing bacterial and viral infection, detection of T cell lymphoma and minimal residual disease, monitoring cancer immunotherapy, etc., our work should benefit scientists who are interested in the applications.

Keywords: immune repertoire, T-cell receptor, 5' RACE, high-throughput sequencing, sequence alignment

Procedia PDF Downloads 161
2481 Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations of previous approaches, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with attention mechanism. In a previous work on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: transformers, generative ai, gene expression design, classification

Procedia PDF Downloads 29
2480 A Protein-Wave Alignment Tool for Frequency Related Homologies Identification in Polypeptide Sequences

Authors: Victor Prevost, Solene Landerneau, Michel Duhamel, Joel Sternheimer, Olivier Gallet, Pedro Ferrandiz, Marwa Mokni

Abstract:

The search for homologous proteins is one of the ongoing challenges in biology and bioinformatics. Traditionally, a pair of proteins is thought to be homologous when they originate from the same ancestral protein. In such a case, their sequences share similarities, and advanced scientific research effort is spent to investigate this question. On this basis, we propose the Protein-Wave Alignment Tool (”P-WAT”) developed within the framework of the France Relance 2030 plan. Our work takes into consideration the mass-related wave aspect of protein biosynthesis, by associating specific frequencies to each amino acid according to its mass. Amino acids are then regrouped within their mass category. This way, our algorithm produces specific alignments in addition to those obtained with a common amino acid coding system. For this purpose, we develop the ”P-WAT” original algorithm, able to address large protein databases, with different attributes such as species, protein names, etc. that allow us to align user’s requests with a set of specific protein sequences. The primary intent of this algorithm is to achieve efficient alignments, in this specific conceptual frame, by minimizing execution costs and information loss. Our algorithm identifies sequence similarities by searching for matches of sub-sequences of different sizes, referred to as primers. Our algorithm relies on Boolean operations upon a dot plot matrix to identify primer amino acids common to both proteins which are likely to be part of a significant alignment of peptides. From those primers, dynamic programming-like traceback operations generate alignments and alignment scores based on an adjusted PAM250 matrix.

Keywords: protein, alignment, homologous, Genodic

Procedia PDF Downloads 82
2479 Gene Prediction in DNA Sequences Using an Ensemble Algorithm Based on Goertzel Algorithm and Anti-Notch Filter

Authors: Hamidreza Saberkari, Mousa Shamsi, Hossein Ahmadi, Saeed Vaali, , MohammadHossein Sedaaghi

Abstract:

In the recent years, using signal processing tools for accurate identification of the protein coding regions has become a challenge in bioinformatics. Most of the genomic signal processing methods is based on the period-3 characteristics of the nucleoids in DNA strands and consequently, spectral analysis is applied to the numerical sequences of DNA to find the location of periodical components. In this paper, a novel ensemble algorithm for gene selection in DNA sequences has been presented which is based on the combination of Goertzel algorithm and anti-notch filter (ANF). The proposed algorithm has many advantages when compared to other conventional methods. Firstly, it leads to identify the coding protein regions more accurate due to using the Goertzel algorithm which is tuned at the desired frequency. Secondly, faster detection time is achieved. The proposed algorithm is applied on several genes, including genes available in databases BG570 and HMR195 and their results are compared to other methods based on the nucleotide level evaluation criteria. Implementation results show the excellent performance of the proposed algorithm in identifying protein coding regions, specifically in identification of small-scale gene areas.

Keywords: protein coding regions, period-3, anti-notch filter, Goertzel algorithm

Procedia PDF Downloads 365
2478 Comparison of the Classification of Cystic Renal Lesions Using the Bosniak Classification System with Contrast Enhanced Ultrasound and Magnetic Resonance Imaging to Computed Tomography: A Prospective Study

Authors: Dechen Tshering Vogel, Johannes T. Heverhagen, Bernard Kiss, Spyridon Arampatzis

Abstract:

In addition to computed tomography (CT), contrast enhanced ultrasound (CEUS), and magnetic resonance imaging (MRI) are being increasingly used for imaging of renal lesions. The aim of this prospective study was to compare the classification of complex cystic renal lesions using the Bosniak classification with CEUS and MRI to CT. Forty-eight patients with 65 cystic renal lesions were included in this study. All participants signed written informed consent. The agreement between the Bosniak classifications of complex renal lesions ( ≥ BII-F) on CEUS and MRI were compared to that of CT and were tested using Cohen’s Kappa. Sensitivity, specificity, positive and negative predictive values (PPV/NPV) and the accuracy of CEUS and MRI compared to CT in the detection of complex renal lesions were calculated. Twenty-nine (45%) out of 65 cystic renal lesions were classified as complex using CT. The agreement between CEUS and CT in the classification of complex cysts was fair (agreement 50.8%, Kappa 0.31), and was excellent between MRI and CT (agreement 93.9%, Kappa 0.88). Compared to CT, MRI had a sensitivity of 96.6%, specificity of 91.7%, a PPV of 54.7%, and an NPV of 54.7% with an accuracy of 63.1%. The corresponding values for CEUS were sensitivity 100.0%, specificity 33.3%, PPV 90.3%, and NPV 97.1% with an accuracy 93.8%. The classification of complex renal cysts based on MRI and CT scans correlated well, and MRI can be used instead of CT for this purpose. CEUS can exclude complex lesions, but due to higher sensitivity, cystic lesions tend to be upgraded. However, it is useful for initial imaging, for follow up of lesions and in those patients with contraindications to CT and MRI.

Keywords: Bosniak classification, computed tomography, contrast enhanced ultrasound, cystic renal lesions, magnetic resonance imaging

Procedia PDF Downloads 113
2477 Enhancement Method of Network Traffic Anomaly Detection Model Based on Adversarial Training With Category Tags

Authors: Zhang Shuqi, Liu Dan

Abstract:

For the problems in intelligent network anomaly traffic detection models, such as low detection accuracy caused by the lack of training samples, poor effect with small sample attack detection, a classification model enhancement method, F-ACGAN(Flow Auxiliary Classifier Generative Adversarial Network) which introduces generative adversarial network and adversarial training, is proposed to solve these problems. Generating adversarial data with category labels could enhance the training effect and improve classification accuracy and model robustness. FACGAN consists of three steps: feature preprocess, which includes data type conversion, dimensionality reduction and normalization, etc.; A generative adversarial network model with feature learning ability is designed, and the sample generation effect of the model is improved through adversarial iterations between generator and discriminator. The adversarial disturbance factor of the gradient direction of the classification model is added to improve the diversity and antagonism of generated data and to promote the model to learn from adversarial classification features. The experiment of constructing a classification model with the UNSW-NB15 dataset shows that with the enhancement of FACGAN on the basic model, the classification accuracy has improved by 8.09%, and the score of F1 has improved by 6.94%.

Keywords: data imbalance, GAN, ACGAN, anomaly detection, adversarial training, data augmentation

Procedia PDF Downloads 75
2476 International Classification of Primary Care as a Reference for Coding the Demand for Care in Primary Health Care

Authors: Souhir Chelly, Chahida Harizi, Aicha Hechaichi, Sihem Aissaoui, Leila Ben Ayed, Maha Bergaoui, Mohamed Kouni Chahed

Abstract:

Introduction: The International Classification of Primary Care (ICPC) is part of the morbidity classification system. It had 17 chapters, and each is coded by an alphanumeric code: the letter corresponds to the chapter, the number to a paragraph in the chapter. The objective of this study is to show the utility of this classification in the coding of the reasons for demand for care in Primary health care (PHC), its advantages and limits. Methods: This is a cross-sectional descriptive study conducted in 4 PHC in Ariana district. Data on the demand for care during 2 days in the same week were collected. The coding of the information was done according to the CISP. The data was entered and analyzed by the EPI Info 7 software. Results: A total of 523 demands for care were investigated. The patients who came for the consultation are predominantly female (62.72%). Most of the consultants are young with an average age of 35 ± 26 years. In the ICPC, there are 7 rubrics: 'infections' is the most common reason with 49.9%, 'other diagnoses' with 40.2%, 'symptoms and complaints' with 5.5%, 'trauma' with 2.1%, 'procedures' with 2.1% and 'neoplasm' with 0.3%. The main advantage of the ICPC is the fact of being a standardized tool. It is very suitable for classification of the reasons for demand for care in PHC according to their specificity, capacity to be used in a computerized medical file of the PHC. Its current limitations are related to the difficulty of classification of some reasons for demand for care. Conclusion: The ICPC has been developed to provide healthcare with a coding reference that takes into account their specificity. The CIM is in its 10th revision; it would gain from revision to revision to be more efficient to be generalized and used by the teams of PHC.

Keywords: international classification of primary care, medical file, primary health care, Tunisia

Procedia PDF Downloads 236
2475 Fractal Behaviour of Earthquake Sequences in Himalaya

Authors: Kamal, Adil Ahmad

Abstract:

Earthquakes are among the most versatile natural and dynamic processes, and hence a fractal model is considered to be the best representative of the same. We present a novel method to process and analyse information hidden in earthquake sequences using Fractal Dimensions and Iterative Function Systems (IFS). Spatial and temporal variations in the fractal dimensions of seismicity observed around the Indian peninsula in last 30 years are studied. This was used as a possible precursor before large earthquakes in the region. IFS images for observed seismicity in the Himalayan belt were also obtained. We scan the whole data set and coarse grain of a selected window to reduce it to four bins. A critical analysis of four-cornered chaos-game clearly shows that the spatial variation in earthquake occurrences in Himalayan range is not random. Two subzones of Himalaya have a tendency to follow each other in time.

Keywords: earthquakes, fractals, Himalaya, iterated function systems

Procedia PDF Downloads 269
2474 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 423
2473 DNA Barcoding for Identification of Dengue Vectors from Assam and Arunachal Pradesh: North-Eastern States in India

Authors: Monika Soni, Shovonlal Bhowmick, Chandra Bhattacharya, Jitendra Sharma, Prafulla Dutta, Jagadish Mahanta

Abstract:

Aedes aegypti and Aedes albopictus are considered as two major vectors to transmit dengue virus. In North-east India, two states viz. Assam and Arunachal Pradesh are known to be high endemic zone for dengue and Chikungunya viral infection. The taxonomical classification of medically important vectors are important for mapping of actual evolutionary trends and epidemiological studies. However, misidentification of mosquito species in field-collected mosquito specimens could have a negative impact which may affect vector-borne disease control policy. DNA barcoding is a prominent method to record available species, differentiate from new addition and change of population structure. In this study, a combined approach of a morphological and molecular technique of DNA barcoding was adopted to explore sequence variation in mitochondrial cytochrome c oxidase subunit I (COI) gene within dengue vectors. The study has revealed the map distribution of the dengue vector from two states i.e. Assam and Arunachal Pradesh, India. Approximate five hundred mosquito specimens were collected from different parts of two states, and their morphological features were compared with the taxonomic keys. The analysis of detailed taxonomic study revealed identification of two species Aedes aegypti and Aedes albopictus. The species aegypti comprised of 66.6% of the specimen and represented as dominant dengue vector species. The sequences obtained through standard DNA barcoding protocol were compared with public databases, viz. GenBank and BOLD. The sequences of all Aedes albopictus have shown 100% similarity whereas sequence of Aedes aegypti has shown 99.77 - 100% similarity of COI gene with that of different geographically located same species based on BOLD database search. From dengue prevalent different geographical regions fifty-nine sequences were retrieved from NCBI and BOLD databases of the same and related taxa to determine the evolutionary distance model based on the phylogenetic analysis. Neighbor-Joining (NJ) and Maximum Likelihood (ML) phylogenetic tree was constructed in MEGA6.06 software with 1000 bootstrap replicates using Kimura-2-Parameter model. Data were analyzed for sequence divergence and found that intraspecific divergence ranged from 0.0 to 2.0% and interspecific divergence ranged from 11.0 to 12.0%. The transitional and transversional substitutions were tested individually. The sequences were deposited in NCBI: GenBank database. This observation claimed the first DNA barcoding analysis of Aedes mosquitoes from North-eastern states in India and also confirmed the range expansion of two important mosquito species. Overall, this study insight into the molecular ecology of the dengue vectors from North-eastern India which will enhance the understanding to improve the existing entomological surveillance and vector incrimination program.

Keywords: COI, dengue vectors, DNA barcoding, molecular identification, North-east India, phylogenetics

Procedia PDF Downloads 267
2472 Evaluation and Fault Classification for Healthcare Robot during Sit-To-Stand Performance through Center of Pressure

Authors: Tianyi Wang, Hieyong Jeong, An Guo, Yuko Ohno

Abstract:

Healthcare robot for assisting sit-to-stand (STS) performance had aroused numerous research interests. To author’s best knowledge, knowledge about how evaluating healthcare robot is still unknown. Robot should be labeled as fault if users feel demanding during STS when they are assisted by robot. In this research, we aim to propose a method to evaluate sit-to-stand assist robot through center of pressure (CoP), then classify different STS performance. Experiments were executed five times with ten healthy subjects under four conditions: two self-performed STSs with chair heights of 62 cm and 43 cm, and two robot-assisted STSs with chair heights of 43 cm and robot end-effect speed of 2 s and 5 s. CoP was measured using a Wii Balance Board (WBB). Bayesian classification was utilized to classify STS performance. The results showed that faults occurred when decreased the chair height and slowed robot assist speed. Proposed method for fault classification showed high probability of classifying fault classes form others. It was concluded that faults for STS assist robot could be detected by inspecting center of pressure and be classified through proposed classification algorithm.

Keywords: center of pressure, fault classification, healthcare robot, sit-to-stand movement

Procedia PDF Downloads 165
2471 From Primer Generation to Chromosome Identification: A Primer Generation Genotyping Method for Bacterial Identification and Typing

Authors: Wisam H. Benamer, Ehab A. Elfallah, Mohamed A. Elshaari, Farag A. Elshaari

Abstract:

A challenge for laboratories is to provide bacterial identification and antibiotic sensitivity results within a short time. Hence, advancement in the required technology is desirable to improve timing, accuracy and quality. Even with the current advances in methods used for both phenotypic and genotypic identification of bacteria the need is there to develop method(s) that enhance the outcome of bacteriology laboratories in accuracy and time. The hypothesis introduced here is based on the assumption that the chromosome of any bacteria contains unique sequences that can be used for its identification and typing. The outcome of a pilot study designed to test this hypothesis is reported in this manuscript. Methods: The complete chromosome sequences of several bacterial species were downloaded to use as search targets for unique sequences. Visual basic and SQL server (2014) were used to generate a complete set of 18-base long primers, a process started with reverse translation of randomly chosen 6 amino acids to limit the number of the generated primers. In addition, the software used to scan the downloaded chromosomes using the generated primers for similarities was designed, and the resulting hits were classified according to the number of similar chromosomal sequences, i.e., unique or otherwise. Results: All primers that had identical/similar sequences in the selected genome sequence(s) were classified according to the number of hits in the chromosomes search. Those that were identical to a single site on a single bacterial chromosome were referred to as unique. On the other hand, most generated primers sequences were identical to multiple sites on a single or multiple chromosomes. Following scanning, the generated primers were classified based on ability to differentiate between medically important bacterial and the initial results looks promising. Conclusion: A simple strategy that started by generating primers was introduced; the primers were used to screen bacterial genomes for match. Primer(s) that were uniquely identical to specific DNA sequence on a specific bacterial chromosome were selected. The identified unique sequence can be used in different molecular diagnostic techniques, possibly to identify bacteria. In addition, a single primer that can identify multiple sites in a single chromosome can be exploited for region or genome identification. Although genomes sequences draft of isolates of organism DNA enable high throughput primer design using alignment strategy, and this enhances diagnostic performance in comparison to traditional molecular assays. In this method the generated primers can be used to identify an organism before the draft sequence is completed. In addition, the generated primers can be used to build a bank for easy access of the primers that can be used to identify bacteria.

Keywords: bacteria chromosome, bacterial identification, sequence, primer generation

Procedia PDF Downloads 165
2470 Isolation and Classification of Red Blood Cells in Anemic Microscopic Images

Authors: Jameela Ali Alkrimi, Abdul Rahim Ahmad, Azizah Suliman, Loay E. George

Abstract:

Red blood cells (RBCs) are among the most commonly and intensively studied type of blood cells in cell biology. The lack of RBCs is a condition characterized by lower than normal hemoglobin level; this condition is referred to as 'anemia'. In this study, a software was developed to isolate RBCs by using a machine learning approach to classify anemic RBCs in microscopic images. Several features of RBCs were extracted using image processing algorithms, including principal component analysis (PCA). With the proposed method, RBCs were isolated in 34 second from an image containing 18 to 27 cells. We also proposed that PCA could be performed to increase the speed and efficiency of classification. Our classifier algorithm yielded accuracy rates of 100%, 99.99%, and 96.50% for K-nearest neighbor (K-NN) algorithm, support vector machine (SVM), and neural network ANN, respectively. Classification was evaluated in highly sensitivity, specificity, and kappa statistical parameters. In conclusion, the classification results were obtained for a short time period with more efficient when PCA was used.

Keywords: red blood cells, pre-processing image algorithms, classification algorithms, principal component analysis PCA, confusion matrix, kappa statistical parameters, ROC

Procedia PDF Downloads 379
2469 On the Utility of Bidirectional Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of the flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on the spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts, as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with an attention mechanism. In previous works on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work, with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on the presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: machine learning, classification and regression, gene circuit design, bidirectional transformers

Procedia PDF Downloads 33
2468 An Attempt at the Multi-Criterion Classification of Small Towns

Authors: Jerzy Banski

Abstract:

The basic aim of this study is to discuss and assess different classifications and research approaches to small towns that take their social and economic functions into account, as well as relations with surrounding areas. The subject literature typically includes three types of approaches to the classification of small towns: 1) the structural, 2) the location-related, and 3) the mixed. The structural approach allows for the grouping of towns from the point of view of the social, cultural and economic functions they discharge. The location-related approach draws on the idea of there being a continuum between the center and the periphery. A mixed classification making simultaneous use of the different approaches to research brings the most information to bear in regard to categories of the urban locality. Bearing in mind the approaches to classification, it is possible to propose a synthetic method for classifying small towns that takes account of economic structure, location and the relationship between the towns and their surroundings. In the case of economic structure, the small centers may be divided into two basic groups – those featuring a multi-branch structure and those that are specialized economically. A second element of the classification reflects the locations of urban centers. Two basic types can be identified – the small town within the range of impact of a large agglomeration, or else the town outside such areas, which is to say located peripherally. The third component of the classification arises out of small towns’ relations with their surroundings. In consequence, it is possible to indicate 8 types of small-town: from local centers enjoying good accessibility and a multi-branch economic structure to peripheral supra-local centers characterised by a specialized economic structure.

Keywords: small towns, classification, functional structure, localization

Procedia PDF Downloads 159
2467 A Novel Epitope Prediction for Vaccine Designing against Ebola Viral Envelope Proteins

Authors: Manju Kanu, Subrata Sinha, Surabhi Johari

Abstract:

Viral proteins of Ebola viruses belong to one of the best studied viruses; however no effective prevention against EBOV has been developed. Epitope-based vaccines provide a new strategy for prophylactic and therapeutic application of pathogen-specific immunity. A critical requirement of this strategy is the identification and selection of T-cell epitopes that act as vaccine targets. This study describes current methodologies for the selection process, with Ebola virus as a model system. Hence great challenge in the field of ebola virus research is to design universal vaccine. A combination of publicly available bioinformatics algorithms and computational tools are used to screen and select antigen sequences as potential T-cell epitopes of supertypes Human Leukocyte Antigen (HLA) alleles. MUSCLE and MOTIF tools were used to find out most conserved peptide sequences of viral proteins. Immunoinformatics tools were used for prediction of immunogenic peptides of viral proteins in zaire strains of Ebola virus. Putative epitopes for viral proteins (VP) were predicted from conserved peptide sequences of VP. Three tools NetCTL 1.2, BIMAS and Syfpeithi were used to predict the Class I putative epitopes while three tools, ProPred, IEDB-SMM-align and NetMHCII 2.2 were used to predict the Class II putative epitopes. B cell epitopes were predicted by BCPREDS 1.0. Immunogenic peptides were identified and selected manually by putative epitopes predicted from online tools individually for both MHC classes. Finally sequences of predicted peptides for both MHC classes were looked for common region which was selected as common immunogenic peptide. The immunogenic peptides were found for viral proteins of Ebola virus: epitopes FLESGAVKY, SSLAKHGEY. These predicted peptides could be promising candidates to be used as target for vaccine design.

Keywords: epitope, b cell, immunogenicity, ebola

Procedia PDF Downloads 284
2466 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 205
2465 Genetic Approach to Target Putative PKS Genes Involved in Ochratoxin a Biosynthesis within Aspergillus Section Nigri, As a Main Cause of Human Nephropathy

Authors: Sabah Ben Fredj Melki, Yves Brygoo, Ahmed Mliki

Abstract:

A 700 pb PCR-derived DNA fragment was isolated from Aspergillus carbonarius, Aspergillus niger, and Aspergillus tubingensis using degenerated primers (LC1-LC2c) and two newly designed primer pairs (KSLB-LC6) for Aspergillus niger and (AFl1F-LC2) for Aspergillus tubingensis developed for the acyl transferase (AT) and the KS domains of fungal PKSs. DNA from the most of black Aspergillus species currently recognized was tested. Herein, we report on the identification and characterisation of a part of the novel putative OTA-polyketide synthase gene in A. carbonarius “ACPks”, A. niger “ANPks” and A. tubingenis “ATPks”. The sequences were aligned and analyzed using phylogenetic methods. Primers used in this study showed general applicability and other Aspergillus species belonging to section Nigri were successfully amplified especially in A. niger and A. tubingenis. The predicted amino acid sequences “ACPks” displayed 66 to 81% similarities to different polyketide synthase genes while “ANPks” similarities varied from 68 to 71% and “ATPks” were from 81 to 97%. The AT and the KS domains appeared to be specific for a particular type of fungal PKSs and were related to PKSs involved in different mycotoxin biosynthesis pathways, including ochratoxin A. The sequences presented in this work have a high utility for the discovery of novel fungal PKS gene clusters.

Keywords: Pks genes, OTA Biosynthesis, Aspergillus Nigri, sequence analysis

Procedia PDF Downloads 41
2464 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: client classification, loan suitability, risk rating, CART analysis

Procedia PDF Downloads 317
2463 Performance Prediction Methodology of Slow Aging Assets

Authors: M. Ben Slimene, M.-S. Ouali

Abstract:

Asset management of urban infrastructures faces a multitude of challenges that need to be overcome to obtain a reliable measurement of performances. Predicting the performance of slowly aging systems is one of those challenges, which helps the asset manager to investigate specific failure modes and to undertake the appropriate maintenance and rehabilitation interventions to avoid catastrophic failures as well as to optimize the maintenance costs. This article presents a methodology for modeling the deterioration of slowly degrading assets based on an operating history. It consists of extracting degradation profiles by grouping together assets that exhibit similar degradation sequences using an unsupervised classification technique derived from artificial intelligence. The obtained clusters are used to build the performance prediction models. This methodology is applied to a sample of a stormwater drainage culvert dataset.

Keywords: artificial Intelligence, clustering, culvert, regression model, slow degradation

Procedia PDF Downloads 76
2462 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: evolutionary computation, feature selection, classification, clustering

Procedia PDF Downloads 339
2461 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 473
2460 Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

Authors: Jorge A. Ruiz-Vanoye, Ocotlán Díaz-Parra, Alejandro Fuentes-Penna, Daniel Vélez-Díaz, Edith Olaco García

Abstract:

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Keywords: Intelligent Transportation Systems, data-mining techniques, evolutionary algorithms, discriminant analysis, machine learning

Procedia PDF Downloads 435
2459 Air Classification of Dust from Steel Converter Secondary De-dusting for Zinc Enrichment

Authors: C. Lanzerstorfer

Abstract:

The off-gas from the basic oxygen furnace (BOF), where pig iron is converted into steel, is treated in the primary ventilation system. This system is in full operation only during oxygen-blowing when the BOF converter vessel is in a vertical position. When pig iron and scrap are charged into the BOF and when slag or steel are tapped, the vessel is tilted. The generated emissions during charging and tapping cannot be captured by the primary off-gas system. To capture these emissions, a secondary ventilation system is usually installed. The emissions are captured by a canopy hood installed just above the converter mouth in tilted position. The aim of this study was to investigate the dependence of Zn and other components on the particle size of BOF secondary ventilation dust. Because of the high temperature of the BOF process it can be expected that Zn will be enriched in the fine dust fractions. If Zn is enriched in the fine fractions, classification could be applied to split the dust into two size fractions with a different content of Zn. For this air classification experiments with dust from the secondary ventilation system of a BOF were performed. The results show that Zn and Pb are highly enriched in the finest dust fraction. For Cd, Cu and Sb the enrichment is less. In contrast, the non-volatile metals Al, Fe, Mn and Ti were depleted in the fine fractions. Thus, air classification could be considered for the treatment of dust from secondary BOF off-gas cleaning.

Keywords: air classification, converter dust, recycling, zinc

Procedia PDF Downloads 405
2458 3D Reconstruction of Human Body Based on Gender Classification

Authors: Jiahe Liu, Hongyang Yu, Feng Qian, Miao Luo

Abstract:

SMPL-X was a powerful parametric human body model that included male, neutral, and female models, with significant gender differences between these three models. During the process of 3D human body reconstruction, the correct selection of standard templates was crucial for obtaining accurate results. To address this issue, we developed an efficient gender classification algorithm to automatically select the appropriate template for 3D human body reconstruction. The key to this gender classification algorithm was the precise analysis of human body features. By using the SMPL-X model, the algorithm could detect and identify gender features of the human body, thereby determining which standard template should be used. The accuracy of this algorithm made the 3D reconstruction process more accurate and reliable, as it could adjust model parameters based on individual gender differences. SMPL-X and the related gender classification algorithm have brought important advancements to the field of 3D human body reconstruction. By accurately selecting standard templates, they have improved the accuracy of reconstruction and have broad potential in various application fields. These technologies continue to drive the development of the 3D reconstruction field, providing us with more realistic and accurate human body models.

Keywords: gender classification, joint detection, SMPL-X, 3D reconstruction

Procedia PDF Downloads 38