Search results for: cancer dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 622

Search results for: cancer dataset

472 Fusion of ETM+ Multispectral and Panchromatic Texture for Remote Sensing Classification

Authors: Mahesh Pal

Abstract:

This paper proposes to use ETM+ multispectral data and panchromatic band as well as texture features derived from the panchromatic band for land cover classification. Four texture features including one 'internal texture' and three GLCM based textures namely correlation, entropy, and inverse different moment were used in combination with ETM+ multispectral data. Two data sets involving combination of multispectral, panchromatic band and its texture were used and results were compared with those obtained by using multispectral data alone. A decision tree classifier with and without boosting were used to classify different datasets. Results from this study suggest that the dataset consisting of panchromatic band, four of its texture features and multispectral data was able to increase the classification accuracy by about 2%. In comparison, a boosted decision tree was able to increase the classification accuracy by about 3% with the same dataset.

Keywords: Internal texture; GLCM; decision tree; boosting; classification accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690
471 Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening

Authors: Ksheeraj Sai Vepuri, Nada Attar

Abstract:

We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.

Keywords: Facial expression recognition, image pre-processing, deep learning, CNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 492
470 Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

Authors: Frank Emmert-Streib, Matthias Dehmer, Jing Liu, Max Muhlhauser

Abstract:

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714
469 Assessing and Visualizing the Stability of Feature Selectors: A Case Study with Spectral Data

Authors: R.Guzman-Martinez, Oscar Garcia-Olalla, R.Alaiz-Rodriguez

Abstract:

Feature selection plays an important role in applications with high dimensional data. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the dataset is small and the aim is to gain insight into the underlying process by analyzing the most relevant features. In this work, we propose a graphical approach that enables to analyze the similarity between feature ranking techniques as well as their individual stability. Moreover, it works with whatever stability metric (Canberra distance, Spearman's rank correlation coefficient, Kuncheva's stability index,...). We illustrate this visualization technique evaluating the stability of several feature selection techniques on a spectral binary dataset. Experimental results with a neural-based classifier show that stability and ranking quality may not be linked together and both issues have to be studied jointly in order to offer answers to the domain experts.

Keywords: Feature Selection Stability, Spectral data, Data visualization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480
468 Thermalytix: An Advanced Artificial Intelligence Based Solution for Non-Contact Breast Screening

Authors: S. Sudhakar, Geetha Manjunath, Siva Teja Kakileti, Himanshu Madhu

Abstract:

Diagnosis of breast cancer at early stages has seen better clinical and survival outcomes. Survival rates in developing countries like India are very low due to accessibility and affordability issues of screening tests such as Mammography. In addition, Mammography is not much effective in younger women with dense breasts. This leaves a gap in current screening methods. Thermalytix is a new technique for detecting breast abnormality in a non-contact, non-invasive way. It is an AI-enabled computer-aided diagnosis solution that automates interpretation of high resolution thermal images and identifies potential malignant lesions. The solution is low cost, easy to use, portable and is effective in all age groups.  This paper presents the results of a retrospective comparative analysis of Thermalytix over Mammography and Clinical Breast Examination for breast cancer screening. Thermalytix was found to have better sensitivity than both the tests, with good specificity as well. In addition, Thermalytix identified all malignant patients without palpable lumps.

Keywords: Breast Cancer Screening, Radiology, Thermalytix, Artificial Intelligence, Thermography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2708
467 Predictive Clustering Hybrid Regression(pCHR) Approach and Its Application to Sucrose-Based Biohydrogen Production

Authors: Nikhil, Ari Visa, Chin-Chao Chen, Chiu-Yue Lin, Jaakko A. Puhakka, Olli Yli-Harja

Abstract:

A predictive clustering hybrid regression (pCHR) approach was developed and evaluated using dataset from H2- producing sucrose-based bioreactor operated for 15 months. The aim was to model and predict the H2-production rate using information available about envirome and metabolome of the bioprocess. Selforganizing maps (SOM) and Sammon map were used to visualize the dataset and to identify main metabolic patterns and clusters in bioprocess data. Three metabolic clusters: acetate coupled with other metabolites, butyrate only, and transition phases were detected. The developed pCHR model combines principles of k-means clustering, kNN classification and regression techniques. The model performed well in modeling and predicting the H2-production rate with mean square error values of 0.0014 and 0.0032, respectively.

Keywords: Biohydrogen, bioprocess modeling, clusteringhybrid regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729
466 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1478
465 Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia

Authors: Nevine M. Labib, Michael N. Malek

Abstract:

Data Mining aims at discovering knowledge out of data and presenting it in a form that is easily comprehensible to humans. One of the useful applications in Egypt is the Cancer management, especially the management of Acute Lymphoblastic Leukemia or ALL, which is the most common type of cancer in children. This paper discusses the process of designing a prototype that can help in the management of childhood ALL, which has a great significance in the health care field. Besides, it has a social impact on decreasing the rate of infection in children in Egypt. It also provides valubale information about the distribution and segmentation of ALL in Egypt, which may be linked to the possible risk factors. Undirected Knowledge Discovery is used since, in the case of this research project, there is no target field as the data provided is mainly subjective. This is done in order to quantify the subjective variables. Therefore, the computer will be asked to identify significant patterns in the provided medical data about ALL. This may be achieved through collecting the data necessary for the system, determimng the data mining technique to be used for the system, and choosing the most suitable implementation tool for the domain. The research makes use of a data mining tool, Clementine, so as to apply Decision Trees technique. We feed it with data extracted from real-life cases taken from specialized Cancer Institutes. Relevant medical cases details such as patient medical history and diagnosis are analyzed, classified, and clustered in order to improve the disease management.

Keywords: Data Mining, Decision Trees, Knowledge Discovery, Leukemia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2165
464 Image Ranking to Assist Object Labeling for Training Detection Models

Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman

Abstract:

Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack rarely seen examples. This is due to the process commonly used during human annotation where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.

Keywords: Computer vision, deep learning, object detection, semiconductor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 747
463 Semi-Supervised Outlier Detection Using a Generative and Adversary Framework

Authors: Jindong Gu, Matthias Schubert, Volker Tresp

Abstract:

In many outlier detection tasks, only training data belonging to one class, i.e., the positive class, is available. The task is then to predict a new data point as belonging either to the positive class or to the negative class, in which case the data point is considered an outlier. For this task, we propose a novel corrupted Generative Adversarial Network (CorGAN). In the adversarial process of training CorGAN, the Generator generates outlier samples for the negative class, and the Discriminator is trained to distinguish the positive training data from the generated negative data. The proposed framework is evaluated using an image dataset and a real-world network intrusion dataset. Our outlier-detection method achieves state-of-the-art performance on both tasks.

Keywords: Outlier detection, generative adversary networks, semi-supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1014
462 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1364
461 Evaluation of the Heating Capability and in vitro Hemolysis of Nanosized MgxMn1-xFe2O4 (x = 0.3 and 0.4) Ferrites Prepared by Sol-gel Method

Authors: Laura Elena De León Prado, Dora Alicia Cortés Hernández, Javier Sánchez

Abstract:

Among the different cancer treatments that are currently used, hyperthermia has a promising potential due to the multiple benefits that are obtained by this technique. In general terms, hyperthermia is a method that takes advantage of the sensitivity of cancer cells to heat, in order to damage or destroy them. Within the different ways of supplying heat to cancer cells and achieve their destruction or damage, the use of magnetic nanoparticles has attracted attention due to the capability of these particles to generate heat under the influence of an external magnetic field. In addition, these nanoparticles have a high surface area and sizes similar or even lower than biological entities, which allow their approaching and interaction with a specific region of interest. The most used magnetic nanoparticles for hyperthermia treatment are those based on iron oxides, mainly magnetite and maghemite, due to their biocompatibility, good magnetic properties and chemical stability. However, in order to fulfill more efficiently the requirements that demand the treatment of magnetic hyperthermia, there have been investigations using ferrites that incorporate different metallic ions, such as Mg, Mn, Co, Ca, Ni, Cu, Li, Gd, etc., in their structure. This paper reports the synthesis of nanosized MgxMn1-xFe2O4 (x = 0.3 and 0.4) ferrites by sol-gel method and their evaluation in terms of heating capability and in vitro hemolysis to determine the potential use of these nanoparticles as thermoseeds for the treatment of cancer by magnetic hyperthermia. It was possible to obtain ferrites with nanometric sizes, a single crystalline phase with an inverse spinel structure and a behavior near to that of superparamagnetic materials. Additionally, at concentrations of 10 mg of magnetic material per mL of water, it was possible to reach a temperature of approximately 45°C, which is within the range of temperatures used for the treatment of hyperthermia. The results of the in vitro hemolysis assay showed that, at the concentrations tested, these nanoparticles are non-hemolytic, as their percentage of hemolysis is close to zero. Therefore, these materials can be used as thermoseeds for the treatment of cancer by magnetic hyperthermia.

Keywords: Ferrites, heating capability, hemolysis, nanoparticles, sol-gel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 862
460 A Neural Network Approach in Predicting the Blood Glucose Level for Diabetic Patients

Authors: Zarita Zainuddin, Ong Pauline, C. Ardil

Abstract:

Diabetes Mellitus is a chronic metabolic disorder, where the improper management of the blood glucose level in the diabetic patients will lead to the risk of heart attack, kidney disease and renal failure. This paper attempts to enhance the diagnostic accuracy of the advancing blood glucose levels of the diabetic patients, by combining principal component analysis and wavelet neural network. The proposed system makes separate blood glucose prediction in the morning, afternoon, evening and night intervals, using dataset from one patient covering a period of 77 days. Comparisons of the diagnostic accuracy with other neural network models, which use the same dataset are made. The comparison results showed overall improved accuracy, which indicates the effectiveness of this proposed system.

Keywords: Diabetes Mellitus, principal component analysis, time-series, wavelet neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2943
459 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5386
458 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance

Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar

Abstract:

Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.

Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2232
457 Diagnosis of Ovarian Cancer with Proteomic Patterns in Serum using Independent Component Analysis and Neural Networks

Authors: Simone C. F. Neves, Lúcio F. A. Campos, Ewaldo Santana, Ginalber L. O. Serra, Allan K. Barros

Abstract:

We propose a method for discrimination and classification of ovarian with benign, malignant and normal tissue using independent component analysis and neural networks. The method was tested for a proteomic patters set from A database, and radial basis functions neural networks. The best performance was obtained with probabilistic neural networks, resulting I 99% success rate, with 98% of specificity e 100% of sensitivity.

Keywords: Cancer ovarian, Proteomic patterns in serum, independent component analysis and neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1789
456 Cascaded Neural Network for Internal Temperature Forecasting in Induction Motor

Authors: Hidir S. Nogay

Abstract:

In this study, two systems were created to predict interior temperature in induction motor. One of them consisted of a simple ANN model which has two layers, ten input parameters and one output parameter. The other one consisted of eight ANN models connected each other as cascaded. Cascaded ANN system has 17 inputs. Main reason of cascaded system being used in this study is to accomplish more accurate estimation by increasing inputs in the ANN system. Cascaded ANN system is compared with simple conventional ANN model to prove mentioned advantages. Dataset was obtained from experimental applications. Small part of the dataset was used to obtain more understandable graphs. Number of data is 329. 30% of the data was used for testing and validation. Test data and validation data were determined for each ANN model separately and reliability of each model was tested. As a result of this study, it has been understood that the cascaded ANN system produced more accurate estimates than conventional ANN model.

Keywords: Cascaded neural network, internal temperature, three-phase induction motor, inverter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 817
455 Colorectal Cancer Screening by a CEACAM-6 Immunosensor

Authors: C. T. S. Ching, P. W. C hen, T. P. Sun, H. L. Shieh

Abstract:

The biomarker for colorectal cancer (CRC) is CEACAM-6 antigen (C6AG). Therefore, this study aims to develop a novel, simple and low-cost CEACAM-6 antigen immumosensor (C6AG-IMS), based on electrical impedance measurement, for precise determination of C6AG. A low-cost screen-printed graphite electrode was constructed and used as the sensor, with CEACAM-6 antibody (C6AB) immobilized on it. The procedures of sensor fabrication and antibody immobilization are simple and low-cost. Measurement of the electrical impedance at a definite frequency ranges (0.43 – 1.26 MHz) showed that the C6AG-IMS has an excellent linear (r2>0.9) response range (8.125 – 65 pg/mL), covering the normal physiological and pathological ranges of blood C6AG levels. Also, the C6AG-IMS has excellent reliability and validity, with the intraclass correlation coefficient being 0.97. In conclusion, a novel, simple, low-cost and reliable C6AG-IMS was designed and developed, being able to accurately determine blood C6AG levels in the range of pathological and normal physiological regions. The C6AG-IMS can provide a point-of-care and immediate screening results to the user at home.

Keywords: Colorectal Cancer, Immunosensor, Electrical Impedance, CEACAM-6, Measurement, Point-of-Care

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591
454 Synthesis of PVA/γ-Fe2O3 Used in Cancer Treatment by Hyperthermia

Authors: Sajjad Seifi Mofarah, S. K. Sadrnezhaad, Shokooh Moghadam, Javad Tavakoli

Abstract:

In recent years a new method of combination treatment for cancer has been developed and studied that has led to significant advancements in the field of cancer therapy. Hyperthermia is a traditional therapy that, along with a creation of a medically approved level of heat with the help of an alternating magnetic AC current, results in the destruction of cancer cells by heat. This paper gives details regarding the production of the spherical nanocomposite PVA/γ-Fe2O3 in order to be used for medical purposes such as tumor treatment by hyperthermia. To reach a suitable and evenly distributed temperature, the nanocomposite with core-shell morphology and spherical form within a 100 to 200 nanometer size was created using phase separation emulsion, in which the magnetic nano-particles γ- Fe2O3 with an average particle size of 20 nano-meters and with different percentages of 0.2, 0.4, 0.5 and 0.6 were covered by polyvinyl alcohol. The main concern in hyperthermia and heat treatment is achieving desirable specific absorption rate (SAR) and one of the most critical factors in SAR is particle size. In this project all attempts has been done to reach minimal size and consequently maximum SAR. The morphological analysis of the spherical structure of the nanocomposite PVA/γ-Fe2O3 was achieved by SEM analyses and the study of the chemical bonds created was made possible by FTIR analysis. To investigate the manner of magnetic nanocomposite particle size distribution a DLS experiment was conducted. Moreover, to determine the magnetic behavior of the γ- Fe2O3 particle and the nanocomposite PVA/γ-Fe2O3 in different concentrations a VSM test was conducted. To sum up, creating magnetic nanocomposites with a spherical morphology that would be employed for drug loading opens doors to new approaches in developing nanocomposites that provide efficient heat and a controlled release of drug simultaneously inside the magnetic field, which are among their positive characteristics that could significantly improve the recovery process in patients.

Keywords: Nanocomposite, hyperthermia, cancer therapy, drug release.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4700
453 Data Mining Techniques in Computer-Aided Diagnosis: Non-Invasive Cancer Detection

Authors: Florin Gorunescu

Abstract:

Diagnosis can be achieved by building a model of a certain organ under surveillance and comparing it with the real time physiological measurements taken from the patient. This paper deals with the presentation of the benefits of using Data Mining techniques in the computer-aided diagnosis (CAD), focusing on the cancer detection, in order to help doctors to make optimal decisions quickly and accurately. In the field of the noninvasive diagnosis techniques, the endoscopic ultrasound elastography (EUSE) is a recent elasticity imaging technique, allowing characterizing the difference between malignant and benign tumors. Digitalizing and summarizing the main EUSE sample movies features in a vector form concern with the use of the exploratory data analysis (EDA). Neural networks are then trained on the corresponding EUSE sample movies vector input in such a way that these intelligent systems are able to offer a very precise and objective diagnosis, discriminating between benign and malignant tumors. A concrete application of these Data Mining techniques illustrates the suitability and the reliability of this methodology in CAD.

Keywords: Endoscopic ultrasound elastography, exploratorydata analysis, neural networks, non-invasive cancer detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1815
452 Using Machine Learning Techniques for Autism Spectrum Disorder Analysis and Detection in Children

Authors: Norah Alshahrani, Abdulaziz Almaleh

Abstract:

Autism Spectrum Disorder (ASD) is a condition related to issues with brain development that affects how a person recognises and communicates with others which results in difficulties with interaction and communication socially and it is constantly growing. Early recognition of ASD allows children to lead safe and healthy lives and helps doctors with accurate diagnoses and management of conditions. Therefore, it is crucial to develop a method that will achieve good results and with high accuracy for the measurement of ASD in children. In this paper, ASD datasets of toddlers and children have been analyzed. We employed the following machine learning techniques to attempt to explore ASD: Random Forest (RF), Decision Tree (DT), Na¨ıve Bayes (NB) and Support Vector Machine (SVM). Then feature selection was used to provide fewer attributes from ASD datasets while preserving model performance. As a result, we found that the best result has been provided by SVM, achieving 0.98% in the toddler dataset and 0.99% in the children dataset.

Keywords: Autism Spectrum Disorder, ASD, Machine Learning, ML, Feature Selection, Support Vector Machine, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 505
451 Genetic Programming Approach for Multi-Category Pattern Classification Appliedto Network Intrusions Detection

Authors: K.M. Faraoun, A. Boukelif

Abstract:

This paper describes a new approach of classification using genetic programming. The proposed technique consists of genetically coevolving a population of non-linear transformations on the input data to be classified, and map them to a new space with a reduced dimension, in order to get a maximum inter-classes discrimination. The classification of new samples is then performed on the transformed data, and so become much easier. Contrary to the existing GP-classification techniques, the proposed one use a dynamic repartition of the transformed data in separated intervals, the efficacy of a given intervals repartition is handled by the fitness criterion, with a maximum classes discrimination. Experiments were first performed using the Fisher-s Iris dataset, and then, the KDD-99 Cup dataset was used to study the intrusion detection and classification problem. Obtained results demonstrate that the proposed genetic approach outperform the existing GP-classification methods [1],[2] and [3], and give a very accepted results compared to other existing techniques proposed in [4],[5],[6],[7] and [8].

Keywords: Genetic programming, patterns classification, intrusion detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1665
450 Class Outliers Mining: Distance-Based Approach

Authors: Nabil M. Hewahi, Motaz K. Saad

Abstract:

In large datasets, identifying exceptional or rare cases with respect to a group of similar cases is considered very significant problem. The traditional problem (Outlier Mining) is to find exception or rare cases in a dataset irrespective of the class label of these cases, they are considered rare events with respect to the whole dataset. In this research, we pose the problem that is Class Outliers Mining and a method to find out those outliers. The general definition of this problem is “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels". We introduce a novel definition of Outlier that is Class Outlier, and propose the Class Outlier Factor (COF) which measures the degree of being a Class Outlier for a data object. Our work includes a proposal of a new algorithm towards mining of the Class Outliers, presenting experimental results applied on various domains of real world datasets and finally a comparison study with other related methods is performed.

Keywords: Class Outliers, Distance-Based Approach, Outliers Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3341
449 Continual Learning Using Data Generation for Hyperspectral Remote Sensing Scene Classification

Authors: Samiah Alammari, Nassim Ammour

Abstract:

When providing a massive number of tasks successively to a deep learning process, a good performance of the model requires preserving the previous tasks data to retrain the model for each upcoming classification. Otherwise, the model performs poorly due to the catastrophic forgetting phenomenon. To overcome this shortcoming, we developed a successful continual learning deep model for remote sensing hyperspectral image regions classification. The proposed neural network architecture encapsulates two trainable subnetworks. The first module adapts its weights by minimizing the discrimination error between the land-cover classes during the new task learning, and the second module tries to learn how to replicate the data of the previous tasks by discovering the latent data structure of the new task dataset. We conduct experiments on hyperspectral image (HSI) dataset on Indian Pines. The results confirm the capability of the proposed method.

Keywords: Continual learning, data reconstruction, remote sensing, hyperspectral image segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 151
448 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing domain presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: Classification, climbing, data imbalance, data scarcity, machine learning, time sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489
447 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat

Abstract:

Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.

Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1778
446 Fractal Dimension of Breast Cancer Cell Migration in a Wound Healing Assay

Authors: R. Sullivan, T. Holden, G. Tremberger, Jr, E. Cheung, C. Branch, J. Burrero, G. Surpris, S. Quintana, A. Rameau, N. Gadura, H. Yao, R. Subramaniam, P. Schneider, S. A. Rotenberg, P. Marchese, A. Flamhlolz, D. Lieberman, T. Cheung

Abstract:

Migration in breast cancer cell wound healing assay had been studied using image fractal dimension analysis. The migration of MDA-MB-231 cells (highly motile) in a wound healing assay was captured using time-lapse phase contrast video microscopy and compared to MDA-MB-468 cell migration (moderately motile). The Higuchi fractal method was used to compute the fractal dimension of the image intensity fluctuation along a single pixel width region parallel to the wound. The near-wound region fractal dimension was found to decrease three times faster in the MDA-MB- 231 cells initially as compared to the less cancerous MDA-MB-468 cells. The inner region fractal dimension was found to be fairly constant for both cell types in time and suggests a wound influence range of about 15 cell layer. The box-counting fractal dimension method was also used to study region of interest (ROI). The MDAMB- 468 ROI area fractal dimension was found to decrease continuously up to 7 hours. The MDA-MB-231 ROI area fractal dimension was found to increase and is consistent with the behavior of a HGF-treated MDA-MB-231 wound healing assay posted in the public domain. A fractal dimension based capacity index has been formulated to quantify the invasiveness of the MDA-MB-231 cells in the perpendicular-to-wound direction. Our results suggest that image intensity fluctuation fractal dimension analysis can be used as a tool to quantify cell migration in terms of cancer severity and treatment responses.

Keywords: Higuchi fractal dimension, box-counting fractal dimension, cancer cell migration, wound healing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2485
445 Towards Real-Time Classification of Finger Movement Direction Using Encephalography Independent Components

Authors: Mohamed Mounir Tellache, Hiroyuki Kambara, Yasuharu Koike, Makoto Miyakoshi, Natsue Yoshimura

Abstract:

This study explores the practicality of using electroencephalographic (EEG) independent components to predict eight-direction finger movements in pseudo-real-time. Six healthy participants with individual-head MRI images performed finger movements in eight directions with two different arm configurations. The analysis was performed in two stages. The first stage consisted of using independent component analysis (ICA) to separate the signals representing brain activity from non-brain activity signals and to obtain the unmixing matrix. The resulting independent components (ICs) were checked, and those reflecting brain-activity were selected. Finally, the time series of the selected ICs were used to predict eight finger-movement directions using Sparse Logistic Regression (SLR). The second stage consisted of using the previously obtained unmixing matrix, the selected ICs, and the model obtained by applying SLR to classify a different EEG dataset. This method was applied to two different settings, namely the single-participant level and the group-level. For the single-participant level, the EEG dataset used in the first stage and the EEG dataset used in the second stage originated from the same participant. For the group-level, the EEG datasets used in the first stage were constructed by temporally concatenating each combination without repetition of the EEG datasets of five participants out of six, whereas the EEG dataset used in the second stage originated from the remaining participants. The average test classification results across datasets (mean ± S.D.) were 38.62 ± 8.36% for the single-participant, which was significantly higher than the chance level (12.50 ± 0.01%), and 27.26 ± 4.39% for the group-level which was also significantly higher than the chance level (12.49% ± 0.01%). The classification accuracy within [–45°, 45°] of the true direction is 70.03 ± 8.14% for single-participant and 62.63 ± 6.07% for group-level which may be promising for some real-life applications. Clustering and contribution analyses further revealed the brain regions involved in finger movement and the temporal aspect of their contribution to the classification. These results showed the possibility of using the ICA-based method in combination with other methods to build a real-time system to control prostheses.

Keywords: Brain-computer interface, BCI, electroencephalography, EEG, finger motion decoding, independent component analysis, pseudo-real-time motion decoding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 543
444 Advanced Image Analysis Tools Development for the Early Stage Bronchial Cancer Detection

Authors: P. Bountris, E. Farantatos, N. Apostolou

Abstract:

Autofluorescence (AF) bronchoscopy is an established method to detect dysplasia and carcinoma in situ (CIS). For this reason the “Sotiria" Hospital uses the Karl Storz D-light system. However, in early tumor stages the visualization is not that obvious. With the help of a PC, we analyzed the color images we captured by developing certain tools in Matlab®. We used statistical methods based on texture analysis, signal processing methods based on Gabor models and conversion algorithms between devicedependent color spaces. Our belief is that we reduced the error made by the naked eye. The tools we implemented improve the quality of patients' life.

Keywords: Bronchoscopy, digital image processing, lung cancer, texture analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1389
443 In silico Repopulation Model of Various Tumour Cells during Treatment Breaks in Head and Neck Cancer Radiotherapy

Authors: Loredana G. Marcu, David Marcu, Sanda M. Filip

Abstract:

Advanced head and neck cancers are aggressive tumours, which require aggressive treatment. Treatment efficiency is often hindered by cancer cell repopulation during radiotherapy, which is due to various mechanisms triggered by the loss of tumour cells and involves both stem and differentiated cells. The aim of the current paper is to present in silico simulations of radiotherapy schedules on a virtual head and neck tumour grown with biologically realistic kinetic parameters. Using the linear quadratic formalism of cell survival after radiotherapy, altered fractionation schedules employing various treatment breaks for normal tissue recovery are simulated and repopulation mechanism implemented in order to evaluate the impact of various cancer cell contribution on tumour behaviour during irradiation. The model has shown that the timing of treatment breaks is an important factor influencing tumour control in rapidly proliferating tissues such as squamous cell carcinomas of the head and neck. Furthermore, not only stem cells but also differentiated cells, via the mechanism of abortive division, can contribute to malignant cell repopulation during treatment.

Keywords: Radiation, tumour repopulation, squamous cell carcinoma, stem cell.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1917