Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1142

Search results for: microarray dataset

1142 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 530

1141 A Review of Effective Gene Selection Methods for Cancer Classification Using Microarray Gene Expression Profile

Authors: Hala Alshamlan, Ghada Badr, Yousef Alohali

Abstract:

Cancer is one of the dreadful diseases, which causes considerable death rate in humans. DNA microarray-based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. In recent years, a DNA microarray technique has gained more attraction in both scientific and in industrial fields. It is important to determine the informative genes that cause cancer to improve early cancer diagnosis and to give effective chemotherapy treatment. In order to gain deep insight into the cancer classification problem, it is necessary to take a closer look at the proposed gene selection methods. We believe that they should be an integral preprocessing step for cancer classification. Furthermore, finding an accurate gene selection method is a very significant issue in a cancer classification area because it reduces the dimensionality of microarray dataset and selects informative genes. In this paper, we classify and review the state-of-art gene selection methods. We proceed by evaluating the performance of each gene selection approach based on their classification accuracy and number of informative genes. In our evaluation, we will use four benchmark microarray datasets for the cancer diagnosis (leukemia, colon, lung, and prostate). In addition, we compare the performance of gene selection method to investigate the effective gene selection method that has the ability to identify a small set of marker genes, and ensure high cancer classification accuracy. To the best of our knowledge, this is the first attempt to compare gene selection approaches for cancer classification using microarray gene expression profile.

Keywords: gene selection, feature selection, cancer classification, microarray, gene expression profile

Procedia PDF Downloads 417

1140 Microarray Data Visualization and Preprocessing Using R and Bioconductor

Authors: Ruchi Yadav, Shivani Pandey, Prachi Srivastava

Abstract:

Microarrays provide a rich source of data on the molecular working of cells. Each microarray reports on the abundance of tens of thousands of mRNAs. Virtually every human disease is being studied using microarrays with the hope of finding the molecular mechanisms of disease. Bioinformatics analysis plays an important part of processing the information embedded in large-scale expression profiling studies and for laying the foundation for biological interpretation. A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. One of the most popular platforms for microarray analysis is Bioconductor, an open source and open development software project based on the R programming language. This paper describes specific procedures for conducting quality assessment, visualization and preprocessing of Affymetrix Gene Chip and also details the different bioconductor packages used to analyze affymetrix microarray data and describe the analysis and outcome of each plots.

Keywords: microarray analysis, R language, affymetrix visualization, bioconductor

Procedia PDF Downloads 446

1139 A Gene Selection Algorithm for Microarray Cancer Classification Using an Improved Particle Swarm Optimization

Authors: Arfan Ali Nagra, Tariq Shahzad, Meshal Alharbi, Khalid Masood Khan, Muhammad Mugees Asif, Taher M. Ghazal, Khmaies Ouahada

Abstract:

Gene selection is an essential step for the classification of microarray cancer data. Gene expression cancer data (DNA microarray) facilitates computing the robust and concurrent expression of various genes. Particle swarm optimization (PSO) requires simple operators and less number of parameters for tuning the model in gene selection. The selection of a prognostic gene with small redundancy is a great challenge for the researcher as there are a few complications in PSO based selection method. In this research, a new variant of PSO (Self-inertia weight adaptive PSO) has been proposed. In the proposed algorithm, SIW-APSO-ELM is explored to achieve gene selection prediction accuracies. This new algorithm balances the exploration capabilities of the improved inertia weight adaptive particle swarm optimization and the exploitation. The self-inertia weight adaptive particle swarm optimization (SIW-APSO) is used to search the solution. The SIW-APSO is updated with an evolutionary process in such a way that each particle iteratively improves its velocities and positions. The extreme learning machine (ELM) has been designed for the selection procedure. The proposed method has been to identify a number of genes in the cancer dataset. The classification algorithm contains ELM, K- centroid nearest neighbor (KCNN), and support vector machine (SVM) to attain high forecast accuracy as compared to the start-of-the-art methods on microarray cancer datasets that show the effectiveness of the proposed method.

Keywords: microarray cancer, improved PSO, ELM, SVM, evolutionary algorithms

Procedia PDF Downloads 50

1138 Evaluation of DNA Microarray System in the Identification of Microorganisms Isolated from Blood

Authors: Merih Şimşek, Recep Keşli, Özgül Çetinkaya, Cengiz Demir, Adem Aslan

Abstract:

Bacteremia is a clinical entity with high morbidity and mortality rates when immediate diagnose, or treatment cannot be achieved. Microorganisms which can cause sepsis or bacteremia are easily isolated from blood cultures. Fifty-five positive blood cultures were included in this study. Microorganisms in 55 blood cultures were isolated by conventional microbiological methods; afterwards, microorganisms were defined in terms of the phenotypic aspects by the Vitek-2 system. The same microorganisms in all blood culture samples were defined in terms of genotypic aspects again by Multiplex-PCR DNA Low-Density Microarray System. At the end of the identification process, the DNA microarray system’s success in identification was evaluated based on the Vitek-2 system. The Vitek-2 system and DNA Microarray system were able to identify the same microorganisms in 53 samples; on the other hand, different microorganisms were identified in the 2 blood cultures by DNA Microarray system. The microorganisms identified by Vitek-2 system were found to be identical to 96.4 % of microorganisms identified by DNA Microarrays system. In addition to bacteria identified by Vitek-2, the presence of a second bacterium has been detected in 5 blood cultures by the DNA Microarray system. It was identified 18 of 55 positive blood culture as E.coli strains with both Vitek 2 and DNA microarray systems. The same identification numbers were found 6 and 8 for Acinetobacter baumanii, 10 and 10 for K.pneumoniae, 5 and 5 for S.aureus, 7 and 11 for Enterococcus spp, 5 and 5 for P.aeruginosa, 2 and 2 for C.albicans respectively. According to these results, DNA Microarray system requires both a technical device and experienced staff support; besides, it requires more expensive kits than Vitek-2. However, this method should be used in conjunction with conventional microbiological methods. Thus, large microbiology laboratories will produce faster, more sensitive and more successful results in the identification of cultured microorganisms.

Keywords: microarray, Vitek-2, blood culture, bacteremia

Procedia PDF Downloads 309

1137 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 340

1136 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout

Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.

Keywords: metabolic network, gene knockout, flux balance analysis, microarray data, integration

Procedia PDF Downloads 549

1135 The Identification of Combined Genomic Expressions as a Diagnostic Factor for Oral Squamous Cell Carcinoma

Authors: Ki-Yeo Kim

Abstract:

Trends in genetics are transforming in order to identify differential coexpressions of correlated gene expression rather than the significant individual gene. Moreover, it is known that a combined biomarker pattern improves the discrimination of a specific cancer. The identification of the combined biomarker is also necessary for the early detection of invasive oral squamous cell carcinoma (OSCC). To identify the combined biomarker that could improve the discrimination of OSCC, we explored an appropriate number of genes in a combined gene set in order to attain the highest level of accuracy. After detecting a significant gene set, including the pre-defined number of genes, a combined expression was identified using the weights of genes in a gene set. We used the Principal Component Analysis (PCA) for the weight calculation. In this process, we used three public microarray datasets. One dataset was used for identifying the combined biomarker, and the other two datasets were used for validation. The discrimination accuracy was measured by the out-of-bag (OOB) error. There was no relation between the significance and the discrimination accuracy in each individual gene. The identified gene set included both significant and insignificant genes. One of the most significant gene sets in the classification of normal and OSCC included MMP1, SOCS3 and ACOX1. Furthermore, in the case of oral dysplasia and OSCC discrimination, two combined biomarkers were identified. The combined genomic expression achieved better performance in the discrimination of different conditions than in a single significant gene. Therefore, it could be expected that accurate diagnosis for cancer could be possible with a combined biomarker.

Keywords: oral squamous cell carcinoma, combined biomarker, microarray dataset, correlated genes

Procedia PDF Downloads 387

1134 A Significant Clinical Role for the Capitalbio™ DNA Microarray in the Diagnosis of Multidrug-Resistant Tuberculosis in Patients with Tuberculous Spondylitis Simultaneous with Pulmonary Tuberculosis in High Prevalence Settings in China

Authors: Wenjie Wu, Peng Cheng, Zehua Zhang, Fei Luo, Feng Wu, Min Zhong, Jianzhong Xu

Abstract:

Background: There has been limited research into the therapeutic efficacy of rapid diagnosis of spinal tuberculosis complicated with pulmonary tuberculosis. We attempted to discover whether the utilization of a DNA microarray assay to detect multidrug-resistant spinal tuberculosis complicated with pulmonary tuberculosis can improve clinical outcomes. Methods: A prospective study was conducted from February 2006 to September 2015. One hundred and forty-three consecutive culture–confirmed, clinically and imaging diagnosed MDR-TB patients with spinal tuberculosis complicated by pulmonary tuberculosis were enrolled into the study. The initial time to treatment for MDR-TB, the method of infection control, radiological indicators of spinal tubercular infectious foci, culture conversion, and adverse drug reactions were compared with the standard culture methods. Results: Of the total of 143 MDR-TB patients, 68 (47.6%) were diagnosed by conventional culture methods and 75 (52.4%) following the implementation of detection using the DNA microarray. Patients in the microarray group began rational use of the second-line drugs schedule more speedily than sufferers in the culture group (17.3 vs. 74.1 days). Among patients were admitted to a general tuberculosis ward, those from the microarray group spent less time in the ward than those from the culture group (7.8 vs. 49.2 days). In those patients with six months follow-up (n=134), patients in the microarray group had a higher rate of sputum negativity conversion at six months (89% vs. 73%). In the microarray group, the rate of drug adverse reactions was significantly lower (22.2% vs. 67.7%). At the same time, they had a more obvious reduction of the area with spinal tuberculous lesions in radiological examinations (77% vs. 108%). Conclusions: The application of the CapitalBio™ DNA Microarray assay caused noteworthy clinical advances including an earlier time to begin MDR-TB treatment, increased sputum culture conversion, improved infection control measures and better radiographical results

Keywords: tuberculosis, multidrug-resistant, tuberculous spondylitis, DNA microarray, clinical outcomes

Procedia PDF Downloads 255

1133 Fluoride-Induced Stress and Its Association with Bone Developmental Pathway in Osteosarcoma Cells

Authors: Deepa Gandhi, Pravin K. Naoghare, Amit Bafana, Krishnamurthi Kannan, Saravanadevi Sivanesana

Abstract:

Oxidative stress is known to depreciate normal functioning of osteoblast cells. Present study reports oxidative/inflammatory signatures in fluoride exposed human osteosarcoma (HOS) cells and its possible association with the genes involved in bone developmental pathway. Microarray analysis was performed to understand the possible molecular mechanisms of stress-mediated bone lose in HOS cells. Cells were chronically exposed with sub-lethal concentration of fluoride. Global gene expression is profiling revealed 34 up regulated and 2598 down-regulated genes, which were associated with several biological processes including bone development, osteoblast differentiation, stress response, inflammatory response, apoptosis, regulation of cell proliferation. Microarray data were further validated through qRT-PCR and western blot analyses using key representative genes. Based on these findings, it can be proposed that chronic exposure of fluoride may impair bone development via oxidative and inflammatory stress. The present finding also provides important biological clues, which will be helpful for the development of therapeutic targets against diseases related bone.

Keywords: bone, HOS cells, microarray, stress

Procedia PDF Downloads 347

1132 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization

Procedia PDF Downloads 158

1131 In silico Analysis of Differentially Expressed Genes in High-Grade Squamous Intraepithelial Lesion and Squamous Cell Carcinomas Stages of Cervical Cancer

Authors: Rahul Agarwal, Ashutosh Singh

Abstract:

Cervical cancer is one of the women related cancers which starts from the pre-cancerous cells and a fraction of women with pre-cancers of the cervix will develop cervical cancer. Cervical pre-cancers if treated in pre-invasive stage can prevent almost all true cervical squamous cell carcinoma. The present study investigates the genes and pathways that are involved in the progression of cervical cancer and are responsible in transition from pre-invasive stage to other advanced invasive stages. The study used GDS3292 microarray data to identify the stage specific genes in cervical cancer and further to generate the network of the significant genes. The microarray data GDS3292 consists of the expression profiling of 10 normal cervices, 7 HSILs and 21 SCCs samples. The study identifies 70 upregulated and 37 downregulated genes in HSIL stage while 95 upregulated and 60 downregulated genes in SCC stages. Biological process including cell communication, signal transduction are highly enriched in both HSIL and SCC stages of cervical cancer. Further, the ppi interaction of genes involved in HSIL and SCC stages helps in identifying the interacting partners. This work may lead to the identification of potential diagnostic biomarker which can be utilized for early stage detection.

Keywords: cervical cancer, HSIL, microarray, SCC

Procedia PDF Downloads 192

1130 An Analysis on Clustering Based Gene Selection and Classification for Gene Expression Data

Authors: K. Sathishkumar, V. Thiagarasu

Abstract:

Due to recent advances in DNA microarray technology, it is now feasible to obtain gene expression profiles of tissue samples at relatively low costs. Many scientists around the world use the advantage of this gene profiling to characterize complex biological circumstances and diseases. Microarray techniques that are used in genome-wide gene expression and genome mutation analysis help scientists and physicians in understanding of the pathophysiological mechanisms, in diagnoses and prognoses, and choosing treatment plans. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. This work presents an analysis of several clustering algorithms proposed to deals with the gene expression data effectively. The existing clustering algorithms like Support Vector Machine (SVM), K-means algorithm and evolutionary algorithm etc. are analyzed thoroughly to identify the advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. In order to improve the classification performance of the best approach in terms of Accuracy, Convergence Behavior and processing time, a hybrid clustering based optimization approach has been proposed.

Keywords: microarray technology, gene expression data, clustering, gene Selection

Procedia PDF Downloads 285

1129 Predicting Dose Level and Length of Time for Radiation Exposure Using Gene Expression

Authors: Chao Sima, Shanaz Ghandhi, Sally A. Amundson, Michael L. Bittner, David J. Brenner

Abstract:

In a large-scale radiologic emergency, potentially affected population need to be triaged efficiently using various biomarkers where personal dosimeters are not likely worn by the individuals. It has long been established that radiation injury can be estimated effectively using panels of genetic biomarkers. Furthermore, the rate of radiation, in addition to dose of radiation, plays a major role in determining biological responses. Therefore, a better and more accurate triage involves estimating both the dose level of the exposure and the length of time of that exposure. To that end, a large in vivo study was carried out on mice with internal emitter caesium-137 (¹³⁷Cs). Four different injection doses of ¹³⁷Cs were used: 157.5 μCi, 191 μCi, 214.5μCi, and 259 μCi. Cohorts of 6~7 mice from the control arm and each of the dose levels were sacrificed, and blood was collected 2, 3, 5, 7 and 14 days after injection for microarray RNA gene expression analysis. Using a generalized linear model with penalized maximum likelihood, a panel of 244 genes was established and both the doses of injection and the number of days after injection were accurately predicted for all 155 subjects using this panel. This has proven that microarray gene expression can be used effectively in radiation biodosimetry in predicting both the dose levels and the length of exposure time, which provides a more holistic view on radiation exposure and helps improving radiation damage assessment and treatment.

Keywords: caesium-137, gene expression microarray, multivariate responses prediction, radiation biodosimetry

Procedia PDF Downloads 169

1128 Distorted Document Images Dataset for Text Detection and Recognition

Authors: Ilia Zharikov, Philipp Nikitin, Ilia Vasiliev, Vladimir Dokholyan

Abstract:

With the increasing popularity of document analysis and recognition systems, text detection (TD) and optical character recognition (OCR) in document images become challenging tasks. However, according to our best knowledge, no publicly available datasets for these particular problems exist. In this paper, we introduce a Distorted Document Images dataset (DDI-100) and provide a detailed analysis of the DDI-100 in its current state. To create the dataset we collected 7000 unique document pages, and extend it by applying different types of distortions and geometric transformations. In total, DDI-100 contains more than 100,000 document images together with binary text masks, text and character locations in terms of bounding boxes. We also present an analysis of several state-of-the-art TD and OCR approaches on the presented dataset. Lastly, we demonstrate the usefulness of DDI-100 to improve accuracy and stability of the considered TD and OCR models.

Keywords: document analysis, open dataset, optical character recognition, text detection

Procedia PDF Downloads 132

1127 Biomolecules Based Microarray for Screening Human Endothelial Cells Behavior

Authors: Adel Dalilottojari, Bahman Delalat, Frances J. Harding, Michaelia P. Cockshell, Claudine S. Bonder, Nicolas H. Voelcker

Abstract:

Endothelial Progenitor Cell (EPC) based therapies continue to be of interest to treat ischemic events based on their proven role to promote blood vessel formation and thus tissue re-vascularisation. Current strategies for the production of clinical-grade EPCs requires the in vitro isolation of EPCs from peripheral blood followed by cell expansion to provide sufficient quantities EPCs for cell therapy. This study aims to examine the use of different biomolecules to significantly improve the current strategy of EPC capture and expansion on collagen type I (Col I). In this study, four different biomolecules were immobilised on a surface and then investigated for their capacity to support EPC capture and proliferation. First, a cell microarray platform was fabricated by coating a glass surface with epoxy functional allyl glycidyl ether plasma polymer (AGEpp) to mediate biomolecule binding. The four candidate biomolecules tested were Col I, collagen type II (Col II), collagen type IV (Col IV) and vascular endothelial growth factor A (VEGF-A), which were arrayed on the epoxy-functionalised surface using a non-contact printer. The surrounding area between the printed biomolecules was passivated with polyethylene glycol-bisamine (A-PEG) to prevent non-specific cell attachment. EPCs were seeded onto the microarray platform and cell numbers quantified after 1 h (to determine capture) and 72 h (to determine proliferation). All of the extracellular matrix (ECM) biomolecules printed demonstrated an ability to capture EPCs within 1 h of cell seeding with Col II exhibiting the highest level of attachment when compared to the other biomolecules. Interestingly, Col IV exhibited the highest increase in EPC expansion after 72 h when compared to Col I, Col II and VEGF-A. These results provide information for significant improvement in the capture and expansion of human EPC for further application.

Keywords: biomolecules, cell microarray platform, cell therapy, endothelial progenitor cells, high throughput screening

Procedia PDF Downloads 260

1126 SAMRA: Dataset in Al-Soudani Arabic Maghrebi Script for Recognition of Arabic Ancient Words Handwritten

Authors: Sidi Ahmed Maouloud, Cheikh Ba

Abstract:

Much of West Africa’s cultural heritage is written in the Al-Soudani Arabic script, which was widely used in West Africa before the time of European colonization. This Al-Soudani Arabic script is an African version of the Maghrebi script, in particular, the Al-Mebssout script. However, the local African qualities were incorporated into the Al-Soudani script in a way that gave it a unique African diversity and character. Despite the existence of several Arabic datasets in Oriental script, allowing for the analysis, layout, and recognition of texts written in these calligraphies, many Arabic scripts and written traditions remain understudied. In this paper, we present a dataset of words from Al-Soudani calligraphy scripts. This dataset consists of 100 images selected from three different manuscripts written in Al-Soudani Arabic script by different copyists. The primary source for this database was the libraries of Boston University and Cambridge University. This dataset highlights the unique characteristics of the Al-Soudani Arabic script as well as the new challenges it presents in terms of automatic word recognition of Arabic manuscripts. An HTR system based on a hybrid ANN (CRNN-CTC) is also proposed to test this dataset. SAMRA is a dataset of annotated Arabic manuscript words in the Al-Soudani script that can help researchers automatically recognize and analyze manuscript words written in this script.

Keywords: dataset, CRNN-CTC, handwritten words recognition, Al-Soudani Arabic script, HTR, manuscripts

Procedia PDF Downloads 71

1125 An Integrated Visualization Tool for Heat Map and Gene Ontology Graph

Authors: Somyung Oh, Jeonghyeon Ha, Kyungwon Lee, Sejong Oh

Abstract:

Microarray is a general scheme to find differentially expressed genes for target concept. The output is expressed by heat map, and biologists analyze related terms of gene ontology to find some characteristics of differentially expressed genes. In this paper, we propose integrated visualization tool for heat map and gene ontology graph. Previous two methods are used by static manner and separated way. Proposed visualization tool integrates them and users can interactively manage it. Users may easily find and confirm related terms of gene ontology for given differentially expressed genes. Proposed tool also visualize connections between genes on heat map and gene ontology graph. We expect biologists to find new meaningful topics by proposed tool.

Keywords: heat map, gene ontology, microarray, differentially expressed gene

Procedia PDF Downloads 278

1124 Fuzzy-Machine Learning Models for the Prediction of Fire Outbreak: A Comparative Analysis

Authors: Uduak Umoh, Imo Eyoh, Emmauel Nyoho

Abstract:

This paper compares fuzzy-machine learning algorithms such as Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) for the predicting cases of fire outbreak. The paper uses the fire outbreak dataset with three features (Temperature, Smoke, and Flame). The data is pre-processed using Interval Type-2 Fuzzy Logic (IT2FL) algorithm. Min-Max Normalization and Principal Component Analysis (PCA) are used to predict feature labels in the dataset, normalize the dataset, and select relevant features respectively. The output of the pre-processing is a dataset with two principal components (PC1 and PC2). The pre-processed dataset is then used in the training of the aforementioned machine learning models. K-fold (with K=10) cross-validation method is used to evaluate the performance of the models using the matrices – ROC (Receiver Operating Curve), Specificity, and Sensitivity. The model is also tested with 20% of the dataset. The validation result shows KNN is the better model for fire outbreak detection with an ROC value of 0.99878, followed by SVM with an ROC value of 0.99753.

Keywords: Machine Learning Algorithms , Interval Type-2 Fuzzy Logic, Fire Outbreak, Support Vector Machine, K-Nearest Neighbour, Principal Component Analysis

Procedia PDF Downloads 133

1123 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 89

1122 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 378

1121 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 526

1120 Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

Authors: Yalong Jiang, Zheru Chi

Abstract:

In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.

Keywords: CNN, convolutional neural network, capsule network, capacity optimization, character recognition, data augmentation, semantic segmentation

Procedia PDF Downloads 117

1119 Energy Complementary in Colombia: Imputation of Dataset

Authors: Felipe Villegas-Velasquez, Harold Pantoja-Villota, Sergio Holguin-Cardona, Alejandro Osorio-Botero, Brayan Candamil-Arango

Abstract:

Colombian electricity comes mainly from hydric resources, affected by environmental variations such as the El Niño phenomenon. That is why incorporating other types of resources is necessary to provide electricity constantly. This research seeks to fill the wind speed and global solar irradiance dataset for two years with the highest amount of information. A further result is the characterization of the data by region that led to infer which errors occurred and offered the incomplete dataset.

Keywords: energy, wind speed, global solar irradiance, Colombia, imputation

Procedia PDF Downloads 109

1118 DOG1 Expression Is in Common Human Tumors: A Tissue Microarray Study on More than 15,000 Tissue Samples

Authors: Kristina Jansen, Maximilian Lennartz, Patrick Lebok, Guido Sauter, Ronald Simon, David Dum, Stefan Steurer

Abstract:

DOG1 (Discovered on GIST1) is a voltage-gated calcium-activated chloride and bicarbonate channel that is highly expressed in interstitial cells of Cajal and in gastrointestinal stromal tumors (GIST) derived from Cajal cells. To systematically determine in what tumor entities and normal tissue types DOG1 may be further expressed, a tissue microarray (TMA) containing 15,965 samples from 121 different tumor types and subtypes as well as 608 samples of 76 different normal tissue types were analyzed by immunohistochemistry. DOG1 immunostaining was found in 67 tumor types, including GIST (95.7%), esophageal squamous cell carcinoma (31.9%), pancreatic ductal adenocarcinoma (33.6%), adenocarcinoma of the Papilla Vateri (20%), squamous cell carcinoma of the vulva (15.8%) and the oral cavity (15.3%), mucinous ovarian cancer (15.3%), esophageal adenocarcinoma (12.5%), endometrioid endometrial cancer (12.1%), neuroendocrine carcinoma of the colon (11.1%) and diffuse gastric adenocarcinoma (11%). Low level-DOG1 immunostaining was seen in 17 additional tumor entities. DOG1 expression was unrelated to histopathological parameters of tumor aggressiveness and/or patient prognosis in cancers of the breast (n=1,002), urinary bladder (975), ovary (469), endometrium (173), stomach (233), and thyroid gland (512). High DOG1 expression was linked to estrogen receptor expression in breast cancer (p<0.0001) and the absence of HPV infection in squamous cell carcinomas (p=0.0008). In conclusion, our data identify several tumor entities that can show DOG1 expression levels at similar levels as in GIST. Although DOG1 is tightly linked to a diagnosis of GIST in spindle cell tumors, the differential diagnosis is much broader in DOG1 positive epithelioid neoplasms.

Keywords: biomarker, DOG1, immunohistochemistry, tissue microarray

Procedia PDF Downloads 176

1117 Dys-Regulation of Immune and Inflammatory Response in in vitro Fertilization Implantation Failure Patients under Ovarian Stimulation

Authors: Amruta D. S. Pathare, Indira Hinduja, Kusum Zaveri

Abstract:

Implantation failure (IF) even after the good-quality embryo transfer (ET) in the physiologically normal endometrium is the main obstacle in in vitro fertilization (IVF). Various microarray studies have been performed worldwide to elucidate the genes requisite for endometrial receptivity. These studies have included the population based on different phases of menstrual cycle during natural cycle and stimulated cycle in normal fertile women. Additionally, the literature is also available in recurrent implantation failure patients versus oocyte donors in natural cycle. However, for the first time, we aim to study the genomics of endometrial receptivity in IF patients under controlled ovarian stimulation (COS) during which ET is generally practised in IVF. Endometrial gene expression profiling in IF patients (n=10) and oocyte donors (n=8) were compared during window of implantation under COS by whole genome microarray (using Illumina platform). Enrichment analysis of microarray data was performed to determine dys-regulated biological functions and pathways using Database for Annotation, Visualization and Integrated Discovery, v6.8 (DAVID). The enrichment mapping was performed with the help of Cytoscape software. Microarray results were validated by real-time PCR. Localization of genes related to immune response (Progestagen-Associated Endometrial Protein (PAEP), Leukaemia Inhibitory Factor (LIF), Interleukin-6 Signal Transducer (IL6ST) was detected by immunohistochemistry. The study revealed 418 genes downregulated and 519 genes upregulated in IF patients compared to healthy fertile controls. The gene ontology, pathway analysis and enrichment mapping revealed significant downregulation in activation and regulation of immune and inflammation response in IF patients under COS. The lower expression of Progestagen Associated Endometrial Protein (PAEP), Leukemia Inhibitory Factor (LIF) and Interleukin 6 Signal Transducer (IL6ST) in cases compared to controls by real time and immunohistochemistry suggests the functional importance of these genes. The study was proved useful to uncover the probable reason of implantation failure being imbalance of immune and inflammatory regulation in our group of subjects. Based on the present study findings, a panel of significant dysregulated genes related to immune and inflammatory pathways needs to be further substantiated in larger cohort in natural as well as stimulated cycle. Upon which these genes could be screened in IF patients during window of implantation (WOI) before going for embryo transfer or any other immunological treatment. This would help to estimate the regulation of specific immune response during WOI in a patient. The appropriate treatment of either activation of immune response or suppression of immune response can be then attempted in IF patients to enhance the receptivity of endometrium.

Keywords: endometrial receptivity, immune and inflammatory response, gene expression microarray, window of implantation

Procedia PDF Downloads 110

1116 Clinicopathological and Immunohistochemical Study of Ovarian Sex Cord-Stromal Tumors and Their Histological Mimics

Authors: Ghada Esheba, Ebtisam Aljerayan, Afnan Al-Ghamdi, Atheer Alsharif, Hanan alzahrani

Abstract:

Background: Primary ovarian neoplasms comprise a heterogeneous group of tumors of three main subtypes: surface epithelial, germ cell, and sex cord-stromal. The wide morphological variation within and between these groups can result in diagnostic difficulties. Gonadal sex cord-stromal tumors (SCST) represent one of the most heterogeneous categories of human neoplasms, because they may contain various combinations of different gonadal sex cord and stromal element. Aim: The aim of this work is to highlight the clinicopathological characteristics of SCST and to assess the value of alpha-inhibin and calretinin in the distinction between SCST and their mimics. Material and methods: This study was carried out on 100 cases using full tissue sections; 70 cases were SCST and 30 cases were histological mimics of SCST. The cases were studied using immunohistochemically using alpha-inhibin. In addition, an ovarian tissue microarray containing 170 benign and malignant ovarian neoplasms was also studied immunohistochemically for calretinin expression. The ovarian microarray included 14 SCST, 59 ovarian serous borderline tumors, 17 mucinous borderline tumors, 10 mucinous adenocarcinomas, 32 endometrioid adenocarcinomas, 34 clear cell carcinomas, and 4 germ cell tumors. Results: 99% of SCST examined using full tissue sections exhibited positive cytoplasmic staining for inhibin. On the contrary, only 7% of the histological mimics (P value < 0.0001). 86% of SCST in the tissue microarray were positive for calretinin with nuclear and/or cytoplasmic staining compared to only 7% of the other tumor types (P value < 0.0001). Conclusions: SCST have characteristic clinicopathological and immunohistochemical features and their recognition is crucial for proper diagnosis and treatment. Alpha-inhibin and calretinin are of great help in the diagnosis of sex cord-stromal tumors.

Keywords: calretinin, granulosa cell tumor, inhibin, sex cord-stromal tumors

Procedia PDF Downloads 176

1115 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models

Authors: Yoonsuh Jung

Abstract:

As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an "optimal" value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.

Keywords: cross validation, parameter averaging, parameter selection, regularization parameter search

Procedia PDF Downloads 381

1114 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique

Authors: Yeliz Karaca, Rana Karabudak

Abstract:

Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.

Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques

Procedia PDF Downloads 132

1113 Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language

Authors: Marie Alaghband, Niloofar Yousefi, Ivan Garibay

Abstract:

Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image’s facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.

Keywords: annotated facial expression dataset, gesture recognition, sequenced facial expression dataset, sign language recognition

Procedia PDF Downloads 126