Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 101

Search results for: pre-processing

101 LaPEA: Language for Preprocessing of Edge Applications in Smart Factory

Authors: Masaki Sakai, Tsuyoshi Nakajima, Kazuya Takahashi

Abstract:

In order to improve the productivity of a factory, it is often the case to create an inference model by collecting and analyzing operational data off-line and then to develop an edge application (EAP) that evaluates the quality of the products or diagnoses machine faults in real-time. To accelerate this development cycle, an edge application framework for the smart factory is proposed, which enables to create and modify EAPs based on prepared inference models. In the framework, the preprocessing component is the key part to make it work. This paper proposes a language for preprocessing of edge applications, called LaPEA, which can flexibly process several sensor data from machines into explanatory variables for an inference model, and proves that it meets the requirements for the preprocessing.

Keywords: edge application framework, edgecross, preprocessing language, smart factory

Procedia PDF Downloads 60
100 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 191
99 The Implementation of the Javanese Lettered-Manuscript Image Preprocessing Stage Model on the Batak Lettered-Manuscript Image

Authors: Anastasia Rita Widiarti, Agus Harjoko, Marsono, Sri Hartati

Abstract:

This paper presents the results of a study to test whether the Javanese character manuscript image preprocessing model that have been more widely applied, can also be applied to segment of the Batak characters manuscripts. The treatment process begins by converting the input image into a binary image. After the binary image is cleaned of noise, then the segmentation lines using projection profile is conducted. If unclear histogram projection is found, then the smoothing process before production indexes line segments is conducted. For each line image which has been produced, then the segmentation scripts in the line is applied, with regard of the connectivity between pixels which making up the letters that there is no characters are truncated. From the results of manuscript preprocessing system prototype testing, it is obtained the information about the system truth percentage value on pieces of Pustaka Batak Podani Ma AjiMamisinon manuscript ranged from 65% to 87.68% with a confidence level of 95%. The value indicates the truth percentage shown the initial processing model in Javanese characters manuscript image can be applied also to the image of the Batak characters manuscript.

Keywords: connected component, preprocessing, manuscript image, projection profiles

Procedia PDF Downloads 311
98 Heuristic to Generate Random X-Monotone Polygons

Authors: Kamaljit Pati, Manas Kumar Mohanty, Sanjib Sadhu

Abstract:

A heuristic has been designed to generate a random simple monotone polygon from a given set of ‘n’ points lying on a 2-Dimensional plane. Our heuristic generates a random monotone polygon in O(n) time after O(nℓogn) preprocessing time which is improved over the previous work where a random monotone polygon is produced in the same O(n) time but the preprocessing time is O(k) for n < k < n2. However, our heuristic does not generate all possible random polygons with uniform probability. The space complexity of our proposed heuristic is O(n).

Keywords: sorting, monotone polygon, visibility, chain

Procedia PDF Downloads 353
97 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 324
96 Preprocessing and Fusion of Multiple Representation of Finger Vein patterns using Conventional and Machine Learning techniques

Authors: Tomas Trainys, Algimantas Venckauskas

Abstract:

Application of biometric features to the cryptography for human identification and authentication is widely studied and promising area of the development of high-reliability cryptosystems. Biometric cryptosystems typically are designed for patterns recognition, which allows biometric data acquisition from an individual, extracts feature sets, compares the feature set against the set stored in the vault and gives a result of the comparison. Preprocessing and fusion of biometric data are the most important phases in generating a feature vector for key generation or authentication. Fusion of biometric features is critical for achieving a higher level of security and prevents from possible spoofing attacks. The paper focuses on the tasks of initial processing and fusion of multiple representations of finger vein modality patterns. These tasks are solved by applying conventional image preprocessing methods and machine learning techniques, Convolutional Neural Network (SVM) method for image segmentation and feature extraction. An article presents a method for generating sets of biometric features from a finger vein network using several instances of the same modality. Extracted features sets were fused at the feature level. The proposed method was tested and compared with the performance and accuracy results of other authors.

Keywords: bio-cryptography, biometrics, cryptographic key generation, data fusion, information security, SVM, pattern recognition, finger vein method.

Procedia PDF Downloads 72
95 Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening

Authors: Ksheeraj Sai Vepuri, Nada Attar

Abstract:

We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.

Keywords: facial expression recognittion, image preprocessing, deep learning, CNN

Procedia PDF Downloads 47
94 Mean Shift-Based Preprocessing Methodology for Improved 3D Buildings Reconstruction

Authors: Nikolaos Vassilas, Theocharis Tsenoglou, Djamchid Ghazanfarpour

Abstract:

In this work we explore the capability of the mean shift algorithm as a powerful preprocessing tool for improving the quality of spatial data, acquired from airborne scanners, from densely built urban areas. On one hand, high resolution image data corrupted by noise caused by lossy compression techniques are appropriately smoothed while at the same time preserving the optical edges and, on the other, low resolution LiDAR data in the form of normalized Digital Surface Map (nDSM) is upsampled through the joint mean shift algorithm. Experiments on both the edge-preserving smoothing and upsampling capabilities using synthetic RGB-z data show that the mean shift algorithm is superior to bilateral filtering as well as to other classical smoothing and upsampling algorithms. Application of the proposed methodology for 3D reconstruction of buildings of a pilot region of Athens, Greece results in a significant visual improvement of the 3D building block model.

Keywords: 3D buildings reconstruction, data fusion, data upsampling, mean shift

Procedia PDF Downloads 252
93 Estimation of Chronic Kidney Disease Using Artificial Neural Network

Authors: Ilker Ali Ozkan

Abstract:

In this study, an artificial neural network model has been developed to estimate chronic kidney failure which is a common disease. The patients’ age, their blood and biochemical values, and 24 input data which consists of various chronic diseases are used for the estimation process. The input data have been subjected to preprocessing because they contain both missing values and nominal values. 147 patient data which was obtained from the preprocessing have been divided into as 70% training and 30% testing data. As a result of the study, artificial neural network model with 25 neurons in the hidden layer has been found as the model with the lowest error value. Chronic kidney failure disease has been able to be estimated accurately at the rate of 99.3% using this artificial neural network model. The developed artificial neural network has been found successful for the estimation of chronic kidney failure disease using clinical data.

Keywords: estimation, artificial neural network, chronic kidney failure disease, disease diagnosis

Procedia PDF Downloads 256
92 Clothes Identification Using Inception ResNet V2 and MobileNet V2

Authors: Subodh Chandra Shakya, Badal Shrestha, Suni Thapa, Ashutosh Chauhan, Saugat Adhikari

Abstract:

To tackle our problem of clothes identification, we used different architectures of Convolutional Neural Networks. Among different architectures, the outcome from Inception ResNet V2 and MobileNet V2 seemed promising. On comparison of the metrices, we observed that the Inception ResNet V2 slightly outperforms MobileNet V2 for this purpose. So this paper of ours proposes the cloth identifier using Inception ResNet V2 and also contains the comparison between the outcome of ResNet V2 and MobileNet V2. The document here contains the results and findings of the research that we performed on the DeepFashion Dataset. To improve the dataset, we used different image preprocessing techniques like image shearing, image rotation, and denoising. The whole experiment was conducted with the intention of testing the efficiency of convolutional neural networks on cloth identification so that we could develop a reliable system that is good enough in identifying the clothes worn by the users. The whole system can be integrated with some kind of recommendation system.

Keywords: inception ResNet, convolutional neural net, deep learning, confusion matrix, data augmentation, data preprocessing

Procedia PDF Downloads 82
91 Sentiment Analysis: An Enhancement of Ontological-Based Features Extraction Techniques and Word Equations

Authors: Mohd Ridzwan Yaakub, Muhammad Iqbal Abu Latiffi

Abstract:

Online business has become popular recently due to the massive amount of information and medium available on the Internet. This has resulted in the huge number of reviews where the consumers share their opinion, criticisms, and satisfaction on the products they have purchased on the websites or the social media such as Facebook and Twitter. However, to analyze customer’s behavior has become very important for organizations to find new market trends and insights. The reviews from the websites or the social media are in structured and unstructured data that need a sentiment analysis approach in analyzing customer’s review. In this article, techniques used in will be defined. Definition of the ontology and description of its possible usage in sentiment analysis will be defined. It will lead to empirical research that related to mobile phones used in research and the ontology used in the experiment. The researcher also will explore the role of preprocessing data and feature selection methodology. As the result, ontology-based approach in sentiment analysis can help in achieving high accuracy for the classification task.

Keywords: feature selection, ontology, opinion, preprocessing data, sentiment analysis

Procedia PDF Downloads 114
90 Microarray Data Visualization and Preprocessing Using R and Bioconductor

Authors: Ruchi Yadav, Shivani Pandey, Prachi Srivastava

Abstract:

Microarrays provide a rich source of data on the molecular working of cells. Each microarray reports on the abundance of tens of thousands of mRNAs. Virtually every human disease is being studied using microarrays with the hope of finding the molecular mechanisms of disease. Bioinformatics analysis plays an important part of processing the information embedded in large-scale expression profiling studies and for laying the foundation for biological interpretation. A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. One of the most popular platforms for microarray analysis is Bioconductor, an open source and open development software project based on the R programming language. This paper describes specific procedures for conducting quality assessment, visualization and preprocessing of Affymetrix Gene Chip and also details the different bioconductor packages used to analyze affymetrix microarray data and describe the analysis and outcome of each plots.

Keywords: microarray analysis, R language, affymetrix visualization, bioconductor

Procedia PDF Downloads 405
89 A New Approach of Preprocessing with SVM Optimization Based on PSO for Bearing Fault Diagnosis

Authors: Tawfik Thelaidjia, Salah Chenikher

Abstract:

Bearing fault diagnosis has attracted significant attention over the past few decades. It consists of two major parts: vibration signal feature extraction and condition classification for the extracted features. In this paper, feature extraction from faulty bearing vibration signals is performed by a combination of the signal’s Kurtosis and features obtained through the preprocessing of the vibration signal samples using Db2 discrete wavelet transform at the fifth level of decomposition. In this way, a 7-dimensional vector of the vibration signal feature is obtained. After feature extraction from vibration signal, the support vector machine (SVM) was applied to automate the fault diagnosis procedure. To improve the classification accuracy for bearing fault prediction, particle swarm optimization (PSO) is employed to simultaneously optimize the SVM kernel function parameter and the penalty parameter. The results have shown feasibility and effectiveness of the proposed approach

Keywords: condition monitoring, discrete wavelet transform, fault diagnosis, kurtosis, machine learning, particle swarm optimization, roller bearing, rotating machines, support vector machine, vibration measurement

Procedia PDF Downloads 340
88 High-Resolution ECG Automated Analysis and Diagnosis

Authors: Ayad Dalloo, Sulaf Dalloo

Abstract:

Electrocardiogram (ECG) recording is prone to complications, on analysis by physicians, due to noise and artifacts, thus creating ambiguity leading to possible error of diagnosis. Such drawbacks may be overcome with the advent of high resolution Methods, such as Discrete Wavelet Analysis and Digital Signal Processing (DSP) techniques. This ECG signal analysis is implemented in three stages: ECG preprocessing, features extraction and classification with the aim of realizing high resolution ECG diagnosis and improved detection of abnormal conditions in the heart. The preprocessing stage involves removing spurious artifacts (noise), due to such factors as muscle contraction, motion, respiration, etc. ECG features are extracted by applying DSP and suggested sloping method techniques. These measured features represent peak amplitude values and intervals of P, Q, R, S, R’, and T waves on ECG, and other features such as ST elevation, QRS width, heart rate, electrical axis, QR and QT intervals. The classification is preformed using these extracted features and the criteria for cardiovascular diseases. The ECG diagnostic system is successfully applied to 12-lead ECG recordings for 12 cases. The system is provided with information to enable it diagnoses 15 different diseases. Physician’s and computer’s diagnoses are compared with 90% agreement, with respect to physician diagnosis, and the time taken for diagnosis is 2 seconds. All of these operations are programmed in Matlab environment.

Keywords: ECG diagnostic system, QRS detection, ECG baseline removal, cardiovascular diseases

Procedia PDF Downloads 229
87 Brain Tumor Segmentation Based on Minimum Spanning Tree

Authors: Simeon Mayala, Ida Herdlevær, Jonas Bull Haugsøen, Shamundeeswari Anandan, Sonia Gavasso, Morten Brun

Abstract:

In this paper, we propose a minimum spanning tree-based method for segmenting brain tumors. The proposed method performs interactive segmentation based on the minimum spanning tree without tuning parameters. The steps involve preprocessing, making a graph, constructing a minimum spanning tree, and a newly implemented way of interactively segmenting the region of interest. In the preprocessing step, a Gaussian filter is applied to 2D images to remove the noise. Then, the pixel neighbor graph is weighted by intensity differences and the corresponding minimum spanning tree is constructed. The image is loaded in an interactive window for segmenting the tumor. The region of interest and the background are selected by clicking to split the minimum spanning tree into two trees. One of these trees represents the region of interest and the other represents the background. Finally, the segmentation given by the two trees is visualized. The proposed method was tested by segmenting two different 2D brain T1-weighted magnetic resonance image data sets. The comparison between our results and the standard gold segmentation confirmed the validity of the minimum spanning tree approach. The proposed method is simple to implement and the results indicate that it is accurate and efficient.

Keywords: brain tumor, brain tumor segmentation, minimum spanning tree, segmentation, image processing

Procedia PDF Downloads 30
86 Principle Component Analysis on Colon Cancer Detection

Authors: N. K. Caecar Pratiwi, Yunendah Nur Fuadah, Rita Magdalena, R. D. Atmaja, Sofia Saidah, Ocky Tiaramukti

Abstract:

Colon cancer or colorectal cancer is a type of cancer that attacks the last part of the human digestive system. Lymphoma and carcinoma are types of cancer that attack human’s colon. Colon cancer causes deaths about half a million people every year. In Indonesia, colon cancer is the third largest cancer case for women and second in men. Unhealthy lifestyles such as minimum consumption of fiber, rarely exercising and lack of awareness for early detection are factors that cause high cases of colon cancer. The aim of this project is to produce a system that can detect and classify images into type of colon cancer lymphoma, carcinoma, or normal. The designed system used 198 data colon cancer tissue pathology, consist of 66 images for Lymphoma cancer, 66 images for carcinoma cancer and 66 for normal / healthy colon condition. This system will classify colon cancer starting from image preprocessing, feature extraction using Principal Component Analysis (PCA) and classification using K-Nearest Neighbor (K-NN) method. Several stages in preprocessing are resize, convert RGB image to grayscale, edge detection and last, histogram equalization. Tests will be done by trying some K-NN input parameter setting. The result of this project is an image processing system that can detect and classify the type of colon cancer with high accuracy and low computation time.

Keywords: carcinoma, colorectal cancer, k-nearest neighbor, lymphoma, principle component analysis

Procedia PDF Downloads 88
85 Assessment of Pre-Processing Influence on Near-Infrared Spectra for Predicting the Mechanical Properties of Wood

Authors: Aasheesh Raturi, Vimal Kothiyal, P. D. Semalty

Abstract:

We studied mechanical properties of Eucalyptus tereticornis using FT-NIR spectroscopy. Firstly, spectra were pre-processed to eliminate useless information. Then, prediction model was constructed by partial least squares regression. To study the influence of pre-processing on prediction of mechanical properties for NIR analysis of wood samples, we applied various pretreatment methods like straight line subtraction, constant offset elimination, vector-normalization, min-max normalization, multiple scattering. Correction, first derivative, second derivatives and their combination with other treatment such as First derivative + straight line subtraction, First derivative+ vector normalization and First derivative+ multiplicative scattering correction. The data processing methods in combination of preprocessing with different NIR regions, RMSECV, RMSEP and optimum factors/rank were obtained by optimization process of model development. More than 350 combinations were obtained during optimization process. More than one pre-processing method gave good calibration/cross-validation and prediction/test models, but only the best calibration/cross-validation and prediction/test models are reported here. The results show that one can safely use NIR region between 4000 to 7500 cm-1 with straight line subtraction, constant offset elimination, first derivative and second derivative preprocessing method which were found to be most appropriate for models development.

Keywords: FT-NIR, mechanical properties, pre-processing, PLS

Procedia PDF Downloads 263
84 Selecting the Best Sub-Region Indexing the Images in the Case of Weak Segmentation Based on Local Color Histograms

Authors: Mawloud Mosbah, Bachir Boucheham

Abstract:

Color Histogram is considered as the oldest method used by CBIR systems for indexing images. In turn, the global histograms do not include the spatial information; this is why the other techniques coming later have attempted to encounter this limitation by involving the segmentation task as a preprocessing step. The weak segmentation is employed by the local histograms while other methods as CCV (Color Coherent Vector) are based on strong segmentation. The indexation based on local histograms consists of splitting the image into N overlapping blocks or sub-regions, and then the histogram of each block is computed. The dissimilarity between two images is reduced, as consequence, to compute the distance between the N local histograms of the both images resulting then in N*N values; generally, the lowest value is taken into account to rank images, that means that the lowest value is that which helps to designate which sub-region utilized to index images of the collection being asked. In this paper, we make under light the local histogram indexation method in the hope to compare the results obtained against those given by the global histogram. We address also another noteworthy issue when Relying on local histograms namely which value, among N*N values, to trust on when comparing images, in other words, which sub-region among the N*N sub-regions on which we base to index images. Based on the results achieved here, it seems that relying on the local histograms, which needs to pose an extra overhead on the system by involving another preprocessing step naming segmentation, does not necessary mean that it produces better results. In addition to that, we have proposed here some ideas to select the local histogram on which we rely on to encode the image rather than relying on the local histogram having lowest distance with the query histograms.

Keywords: CBIR, color global histogram, color local histogram, weak segmentation, Euclidean distance

Procedia PDF Downloads 290
83 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 330
82 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 13
81 Prediction of Alzheimer's Disease Based on Blood Biomarkers and Machine Learning Algorithms

Authors: Man-Yun Liu, Emily Chia-Yu Su

Abstract:

Alzheimer's disease (AD) is the public health crisis of the 21st century. AD is a degenerative brain disease and the most common cause of dementia, a costly disease on the healthcare system. Unfortunately, the cause of AD is poorly understood, furthermore; the treatments of AD so far can only alleviate symptoms rather cure or stop the progress of the disease. Currently, there are several ways to diagnose AD; medical imaging can be used to distinguish between AD, other dementias, and early onset AD, and cerebrospinal fluid (CSF). Compared with other diagnostic tools, blood (plasma) test has advantages as an approach to population-based disease screening because it is simpler, less invasive also cost effective. In our study, we used blood biomarkers dataset of The Alzheimer’s disease Neuroimaging Initiative (ADNI) which was funded by National Institutes of Health (NIH) to do data analysis and develop a prediction model. We used independent analysis of datasets to identify plasma protein biomarkers predicting early onset AD. Firstly, to compare the basic demographic statistics between the cohorts, we used SAS Enterprise Guide to do data preprocessing and statistical analysis. Secondly, we used logistic regression, neural network, decision tree to validate biomarkers by SAS Enterprise Miner. This study generated data from ADNI, contained 146 blood biomarkers from 566 participants. Participants include cognitive normal (healthy), mild cognitive impairment (MCI), and patient suffered Alzheimer’s disease (AD). Participants’ samples were separated into two groups, healthy and MCI, healthy and AD, respectively. We used the two groups to compare important biomarkers of AD and MCI. In preprocessing, we used a t-test to filter 41/47 features between the two groups (healthy and AD, healthy and MCI) before using machine learning algorithms. Then we have built model with 4 machine learning methods, the best AUC of two groups separately are 0.991/0.709. We want to stress the importance that the simple, less invasive, common blood (plasma) test may also early diagnose AD. As our opinion, the result will provide evidence that blood-based biomarkers might be an alternative diagnostics tool before further examination with CSF and medical imaging. A comprehensive study on the differences in blood-based biomarkers between AD patients and healthy subjects is warranted. Early detection of AD progression will allow physicians the opportunity for early intervention and treatment.

Keywords: Alzheimer's disease, blood-based biomarkers, diagnostics, early detection, machine learning

Procedia PDF Downloads 244
80 Brain Age Prediction Based on Brain Magnetic Resonance Imaging by 3D Convolutional Neural Network

Authors: Leila Keshavarz Afshar, Hedieh Sajedi

Abstract:

Estimation of biological brain age from MR images is a topic that has been much addressed in recent years due to the importance it attaches to early diagnosis of diseases such as Alzheimer's. In this paper, we use a 3D Convolutional Neural Network (CNN) to provide a method for estimating the biological age of the brain. The 3D-CNN model is trained by MRI data that has been normalized. In addition, to reduce computation while saving overall performance, some effectual slices are selected for age estimation. By this method, the biological age of individuals using selected normalized data was estimated with Mean Absolute Error (MAE) of 4.82 years.

Keywords: brain age estimation, biological age, 3D-CNN, deep learning, T1-weighted image, SPM, preprocessing, MRI, canny, gray matter

Procedia PDF Downloads 56
79 Edge Detection in Low Contrast Images

Authors: Koushlendra Kumar Singh, Manish Kumar Bajpai, Rajesh K. Pandey

Abstract:

The edges of low contrast images are not clearly distinguishable to the human eye. It is difficult to find the edges and boundaries in it. The present work encompasses a new approach for low contrast images. The Chebyshev polynomial based fractional order filter has been used for filtering operation on an image. The preprocessing has been performed by this filter on the input image. Laplacian of Gaussian method has been applied on preprocessed image for edge detection. The algorithm has been tested on two test images.

Keywords: low contrast image, fractional order differentiator, Laplacian of Gaussian (LoG) method, chebyshev polynomial

Procedia PDF Downloads 452
78 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 400
77 Automatic Segmentation of Lung Pleura Based On Curvature Analysis

Authors: Sasidhar B., Bhaskar Rao N., Ramesh Babu D. R., Ravi Shankar M.

Abstract:

Segmentation of lung pleura is a preprocessing step in Computer-Aided Diagnosis (CAD) which helps in reducing false positives in detection of lung cancer. The existing methods fail in extraction of lung regions with the nodules at the pleura of the lungs. In this paper, a new method is proposed which segments lung regions with nodules at the pleura of the lungs based on curvature analysis and morphological operators. The proposed algorithm is tested on 06 patient’s dataset which consists of 60 images of Lung Image Database Consortium (LIDC) and the results are found to be satisfactory with 98.3% average overlap measure (AΩ).

Keywords: curvature analysis, image segmentation, morphological operators, thresholding

Procedia PDF Downloads 496
76 Sweepline Algorithm for Voronoi Diagram of Polygonal Sites

Authors: Dmitry A. Koptelov, Leonid M. Mestetskiy

Abstract:

Voronoi Diagram (VD) of finite set of disjoint simple polygons, called sites, is a partition of plane into loci (for each site at the locus) – regions, consisting of points that are closer to a given site than to all other. Set of polygons is a universal model for many applications in engineering, geoinformatics, design, computer vision, and graphics. VD of polygons construction usually done with a reduction to task of constructing VD of segments, for which there are effective O(n log n) algorithms for n segments. Preprocessing – constructing segments from polygons’ sides, and postprocessing – polygon’s loci construction by merging the loci of the sides of each polygon are also included in reduction. This approach doesn’t take into account two specific properties of the resulting segment sites. Firstly, all this segments are connected in pairs in the vertices of the polygons. Secondly, on the one side of each segment lies the interior of the polygon. The polygon is obviously included in its locus. Using this properties in the algorithm for VD construction is a resource to reduce computations. The article proposes an algorithm for the direct construction of VD of polygonal sites. Algorithm is based on sweepline paradigm, allowing to effectively take into account these properties. The solution is performed based on reduction. Preprocessing is the constructing of set of sites from vertices and edges of polygons. Each site has an orientation such that the interior of the polygon lies to the left of it. Proposed algorithm constructs VD for set of oriented sites with sweepline paradigm. Postprocessing is a selecting of edges of this VD formed by the centers of empty circles touching different polygons. Improving the efficiency of the proposed sweepline algorithm in comparison with the general Fortune algorithm is achieved due to the following fundamental solutions: 1. Algorithm constructs only such VD edges, which are on the outside of polygons. Concept of oriented sites allowed to avoid construction of VD edges located inside the polygons. 2. The list of events in sweepline algorithm has a special property: the majority of events are connected with “medium” polygon vertices, where one incident polygon side lies behind the sweepline and the other in front of it. The proposed algorithm processes such events in constant time and not in logarithmic time, as in the general Fortune algorithm. The proposed algorithm is fully implemented and tested on a large number of examples. The high reliability and efficiency of the algorithm is also confirmed by computational experiments with complex sets of several thousand polygons. It should be noted that, despite the considerable time that has passed since the publication of Fortune's algorithm in 1986, a full-scale implementation of this algorithm for an arbitrary set of segment sites has not been made. The proposed algorithm fills this gap for an important special case - a set of sites formed by polygons.

Keywords: voronoi diagram, sweepline, polygon sites, fortunes' algorithm, segment sites

Procedia PDF Downloads 67
75 Forecasting Residential Water Consumption in Hamilton, New Zealand

Authors: Farnaz Farhangi

Abstract:

Many people in New Zealand believe that the access to water is inexhaustible, and it comes from a history of virtually unrestricted access to it. For the region like Hamilton which is one of New Zealand’s fastest growing cities, it is crucial for policy makers to know about the future water consumption and implementation of rules and regulation such as universal water metering. Hamilton residents use water freely and they do not have any idea about how much water they use. Hence, one of proposed objectives of this research is focusing on forecasting water consumption using different methods. Residential water consumption time series exhibits seasonal and trend variations. Seasonality is the pattern caused by repeating events such as weather conditions in summer and winter, public holidays, etc. The problem with this seasonal fluctuation is that, it dominates other time series components and makes difficulties in determining other variations (such as educational campaign’s effect, regulation, etc.) in time series. Apart from seasonality, a stochastic trend is also combined with seasonality and makes different effects on results of forecasting. According to the forecasting literature, preprocessing (de-trending and de-seasonalization) is essential to have more performed forecasting results, while some other researchers mention that seasonally non-adjusted data should be used. Hence, I answer the question that is pre-processing essential? A wide range of forecasting methods exists with different pros and cons. In this research, I apply double seasonal ARIMA and Artificial Neural Network (ANN), considering diverse elements such as seasonality and calendar effects (public and school holidays) and combine their results to find the best predicted values. My hypothesis is the examination the results of combined method (hybrid model) and individual methods and comparing the accuracy and robustness. In order to use ARIMA, the data should be stationary. Also, ANN has successful forecasting applications in terms of forecasting seasonal and trend time series. Using a hybrid model is a way to improve the accuracy of the methods. Due to the fact that water demand is dominated by different seasonality, in order to find their sensitivity to weather conditions or calendar effects or other seasonal patterns, I combine different methods. The advantage of this combination is reduction of errors by averaging of each individual model. It is also useful when we are not sure about the accuracy of each forecasting model and it can ease the problem of model selection. Using daily residential water consumption data from January 2000 to July 2015 in Hamilton, I indicate how prediction by different methods varies. ANN has more accurate forecasting results than other method and preprocessing is essential when we use seasonal time series. Using hybrid model reduces forecasting average errors and increases the performance.

Keywords: artificial neural network (ANN), double seasonal ARIMA, forecasting, hybrid model

Procedia PDF Downloads 245
74 From Two-Way to Multi-Way: A Comparative Study for Map-Reduce Join Algorithms

Authors: Marwa Hussien Mohamed, Mohamed Helmy Khafagy

Abstract:

Map-Reduce is a programming model which is widely used to extract valuable information from enormous volumes of data. Map-reduce designed to support heterogeneous datasets. Apache Hadoop map-reduce used extensively to uncover hidden pattern like data mining, SQL, etc. The most important operation for data analysis is joining operation. But, map-reduce framework does not directly support join algorithm. This paper explains and compares two-way and multi-way map-reduce join algorithms for map reduce also we implement MR join Algorithms and show the performance of each phase in MR join algorithms. Our experimental results show that map side join and map merge join in two-way join algorithms has the longest time according to preprocessing step sorting data and reduce side cascade join has the longest time at Multi-Way join algorithms.

Keywords: Hadoop, MapReduce, multi-way join, two-way join, Ubuntu

Procedia PDF Downloads 349
73 Density-based Denoising of Point Cloud

Authors: Faisal Zaman, Ya Ping Wong, Boon Yian Ng

Abstract:

Point cloud source data for surface reconstruction is usually contaminated with noise and outliers. To overcome this, we present a novel approach using modified kernel density estimation (KDE) technique with bilateral filtering to remove noisy points and outliers. First we present a method for estimating optimal bandwidth of multivariate KDE using particle swarm optimization technique which ensures the robust performance of density estimation. Then we use mean-shift algorithm to find the local maxima of the density estimation which gives the centroid of the clusters. Then we compute the distance of a certain point from the centroid. Points belong to outliers then removed by automatic thresholding scheme which yields an accurate and economical point surface. The experimental results show that our approach comparably robust and efficient.

Keywords: point preprocessing, outlier removal, surface reconstruction, kernel density estimation

Procedia PDF Downloads 227
72 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: biometrics, genetic data, identity verification, k nearest neighbor

Procedia PDF Downloads 172