Search results for: preprocessing techniques
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6485

Search results for: preprocessing techniques

6485 Text Data Preprocessing Library: Bilingual Approach

Authors: Kabil Boukhari

Abstract:

In the context of information retrieval, the selection of the most relevant words is a very important step. In fact, the text cleaning allows keeping only the most representative words for a better use. In this paper, we propose a library for the purpose text preprocessing within an implemented application to facilitate this task. This study has two purposes. The first, is to present the related work of the various steps involved in text preprocessing, presenting the segmentation, stemming and lemmatization algorithms that could be efficient in the rest of study. The second, is to implement a developed tool for text preprocessing in French and English. This library accepts unstructured text as input and provides the preprocessed text as output, based on a set of rules and on a base of stop words for both languages. The proposed library has been made on different corpora and gave an interesting result.

Keywords: text preprocessing, segmentation, knowledge extraction, normalization, text generation, information retrieval

Procedia PDF Downloads 59
6484 LaPEA: Language for Preprocessing of Edge Applications in Smart Factory

Authors: Masaki Sakai, Tsuyoshi Nakajima, Kazuya Takahashi

Abstract:

In order to improve the productivity of a factory, it is often the case to create an inference model by collecting and analyzing operational data off-line and then to develop an edge application (EAP) that evaluates the quality of the products or diagnoses machine faults in real-time. To accelerate this development cycle, an edge application framework for the smart factory is proposed, which enables to create and modify EAPs based on prepared inference models. In the framework, the preprocessing component is the key part to make it work. This paper proposes a language for preprocessing of edge applications, called LaPEA, which can flexibly process several sensor data from machines into explanatory variables for an inference model, and proves that it meets the requirements for the preprocessing.

Keywords: edge application framework, edgecross, preprocessing language, smart factory

Procedia PDF Downloads 118
6483 Preprocessing and Fusion of Multiple Representation of Finger Vein patterns using Conventional and Machine Learning techniques

Authors: Tomas Trainys, Algimantas Venckauskas

Abstract:

Application of biometric features to the cryptography for human identification and authentication is widely studied and promising area of the development of high-reliability cryptosystems. Biometric cryptosystems typically are designed for patterns recognition, which allows biometric data acquisition from an individual, extracts feature sets, compares the feature set against the set stored in the vault and gives a result of the comparison. Preprocessing and fusion of biometric data are the most important phases in generating a feature vector for key generation or authentication. Fusion of biometric features is critical for achieving a higher level of security and prevents from possible spoofing attacks. The paper focuses on the tasks of initial processing and fusion of multiple representations of finger vein modality patterns. These tasks are solved by applying conventional image preprocessing methods and machine learning techniques, Convolutional Neural Network (SVM) method for image segmentation and feature extraction. An article presents a method for generating sets of biometric features from a finger vein network using several instances of the same modality. Extracted features sets were fused at the feature level. The proposed method was tested and compared with the performance and accuracy results of other authors.

Keywords: bio-cryptography, biometrics, cryptographic key generation, data fusion, information security, SVM, pattern recognition, finger vein method.

Procedia PDF Downloads 118
6482 Sentiment Analysis: An Enhancement of Ontological-Based Features Extraction Techniques and Word Equations

Authors: Mohd Ridzwan Yaakub, Muhammad Iqbal Abu Latiffi

Abstract:

Online business has become popular recently due to the massive amount of information and medium available on the Internet. This has resulted in the huge number of reviews where the consumers share their opinion, criticisms, and satisfaction on the products they have purchased on the websites or the social media such as Facebook and Twitter. However, to analyze customer’s behavior has become very important for organizations to find new market trends and insights. The reviews from the websites or the social media are in structured and unstructured data that need a sentiment analysis approach in analyzing customer’s review. In this article, techniques used in will be defined. Definition of the ontology and description of its possible usage in sentiment analysis will be defined. It will lead to empirical research that related to mobile phones used in research and the ontology used in the experiment. The researcher also will explore the role of preprocessing data and feature selection methodology. As the result, ontology-based approach in sentiment analysis can help in achieving high accuracy for the classification task.

Keywords: feature selection, ontology, opinion, preprocessing data, sentiment analysis

Procedia PDF Downloads 174
6481 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms

Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang

Abstract:

Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.

Keywords: bioassay, machine learning, preprocessing, virtual screen

Procedia PDF Downloads 247
6480 The Implementation of the Javanese Lettered-Manuscript Image Preprocessing Stage Model on the Batak Lettered-Manuscript Image

Authors: Anastasia Rita Widiarti, Agus Harjoko, Marsono, Sri Hartati

Abstract:

This paper presents the results of a study to test whether the Javanese character manuscript image preprocessing model that have been more widely applied, can also be applied to segment of the Batak characters manuscripts. The treatment process begins by converting the input image into a binary image. After the binary image is cleaned of noise, then the segmentation lines using projection profile is conducted. If unclear histogram projection is found, then the smoothing process before production indexes line segments is conducted. For each line image which has been produced, then the segmentation scripts in the line is applied, with regard of the connectivity between pixels which making up the letters that there is no characters are truncated. From the results of manuscript preprocessing system prototype testing, it is obtained the information about the system truth percentage value on pieces of Pustaka Batak Podani Ma AjiMamisinon manuscript ranged from 65% to 87.68% with a confidence level of 95%. The value indicates the truth percentage shown the initial processing model in Javanese characters manuscript image can be applied also to the image of the Batak characters manuscript.

Keywords: connected component, preprocessing, manuscript image, projection profiles

Procedia PDF Downloads 372
6479 High-Resolution ECG Automated Analysis and Diagnosis

Authors: Ayad Dalloo, Sulaf Dalloo

Abstract:

Electrocardiogram (ECG) recording is prone to complications, on analysis by physicians, due to noise and artifacts, thus creating ambiguity leading to possible error of diagnosis. Such drawbacks may be overcome with the advent of high resolution Methods, such as Discrete Wavelet Analysis and Digital Signal Processing (DSP) techniques. This ECG signal analysis is implemented in three stages: ECG preprocessing, features extraction and classification with the aim of realizing high resolution ECG diagnosis and improved detection of abnormal conditions in the heart. The preprocessing stage involves removing spurious artifacts (noise), due to such factors as muscle contraction, motion, respiration, etc. ECG features are extracted by applying DSP and suggested sloping method techniques. These measured features represent peak amplitude values and intervals of P, Q, R, S, R’, and T waves on ECG, and other features such as ST elevation, QRS width, heart rate, electrical axis, QR and QT intervals. The classification is preformed using these extracted features and the criteria for cardiovascular diseases. The ECG diagnostic system is successfully applied to 12-lead ECG recordings for 12 cases. The system is provided with information to enable it diagnoses 15 different diseases. Physician’s and computer’s diagnoses are compared with 90% agreement, with respect to physician diagnosis, and the time taken for diagnosis is 2 seconds. All of these operations are programmed in Matlab environment.

Keywords: ECG diagnostic system, QRS detection, ECG baseline removal, cardiovascular diseases

Procedia PDF Downloads 271
6478 Mean Shift-Based Preprocessing Methodology for Improved 3D Buildings Reconstruction

Authors: Nikolaos Vassilas, Theocharis Tsenoglou, Djamchid Ghazanfarpour

Abstract:

In this work we explore the capability of the mean shift algorithm as a powerful preprocessing tool for improving the quality of spatial data, acquired from airborne scanners, from densely built urban areas. On one hand, high resolution image data corrupted by noise caused by lossy compression techniques are appropriately smoothed while at the same time preserving the optical edges and, on the other, low resolution LiDAR data in the form of normalized Digital Surface Map (nDSM) is upsampled through the joint mean shift algorithm. Experiments on both the edge-preserving smoothing and upsampling capabilities using synthetic RGB-z data show that the mean shift algorithm is superior to bilateral filtering as well as to other classical smoothing and upsampling algorithms. Application of the proposed methodology for 3D reconstruction of buildings of a pilot region of Athens, Greece results in a significant visual improvement of the 3D building block model.

Keywords: 3D buildings reconstruction, data fusion, data upsampling, mean shift

Procedia PDF Downloads 289
6477 Clothes Identification Using Inception ResNet V2 and MobileNet V2

Authors: Subodh Chandra Shakya, Badal Shrestha, Suni Thapa, Ashutosh Chauhan, Saugat Adhikari

Abstract:

To tackle our problem of clothes identification, we used different architectures of Convolutional Neural Networks. Among different architectures, the outcome from Inception ResNet V2 and MobileNet V2 seemed promising. On comparison of the metrices, we observed that the Inception ResNet V2 slightly outperforms MobileNet V2 for this purpose. So this paper of ours proposes the cloth identifier using Inception ResNet V2 and also contains the comparison between the outcome of ResNet V2 and MobileNet V2. The document here contains the results and findings of the research that we performed on the DeepFashion Dataset. To improve the dataset, we used different image preprocessing techniques like image shearing, image rotation, and denoising. The whole experiment was conducted with the intention of testing the efficiency of convolutional neural networks on cloth identification so that we could develop a reliable system that is good enough in identifying the clothes worn by the users. The whole system can be integrated with some kind of recommendation system.

Keywords: inception ResNet, convolutional neural net, deep learning, confusion matrix, data augmentation, data preprocessing

Procedia PDF Downloads 147
6476 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning

Authors: Samina Khalid, Shamila Nasreen

Abstract:

Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA

Procedia PDF Downloads 453
6475 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 74
6474 Heuristic to Generate Random X-Monotone Polygons

Authors: Kamaljit Pati, Manas Kumar Mohanty, Sanjib Sadhu

Abstract:

A heuristic has been designed to generate a random simple monotone polygon from a given set of ‘n’ points lying on a 2-Dimensional plane. Our heuristic generates a random monotone polygon in O(n) time after O(nℓogn) preprocessing time which is improved over the previous work where a random monotone polygon is produced in the same O(n) time but the preprocessing time is O(k) for n < k < n2. However, our heuristic does not generate all possible random polygons with uniform probability. The space complexity of our proposed heuristic is O(n).

Keywords: sorting, monotone polygon, visibility, chain

Procedia PDF Downloads 401
6473 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 112
6472 Deep Learning for Qualitative and Quantitative Grain Quality Analysis Using Hyperspectral Imaging

Authors: Ole-Christian Galbo Engstrøm, Erik Schou Dreier, Birthe Møller Jespersen, Kim Steenstrup Pedersen

Abstract:

Grain quality analysis is a multi-parameterized problem that includes a variety of qualitative and quantitative parameters such as grain type classification, damage type classification, and nutrient regression. Currently, these parameters require human inspection, a multitude of instruments employing a variety of sensor technologies, and predictive model types or destructive and slow chemical analysis. This paper investigates the feasibility of applying near-infrared hyperspectral imaging (NIR-HSI) to grain quality analysis. For this study two datasets of NIR hyperspectral images in the wavelength range of 900 nm - 1700 nm have been used. Both datasets contain images of sparsely and densely packed grain kernels. The first dataset contains ~87,000 image crops of bulk wheat samples from 63 harvests where protein value has been determined by the FOSS Infratec NOVA which is the golden industry standard for protein content estimation in bulk samples of cereal grain. The second dataset consists of ~28,000 image crops of bulk grain kernels from seven different wheat varieties and a single rye variety. In the first dataset, protein regression analysis is the problem to solve while variety classification analysis is the problem to solve in the second dataset. Deep convolutional neural networks (CNNs) have the potential to utilize spatio-spectral correlations within a hyperspectral image to simultaneously estimate the qualitative and quantitative parameters. CNNs can autonomously derive meaningful representations of the input data reducing the need for advanced preprocessing techniques required for classical chemometric model types such as artificial neural networks (ANNs) and partial least-squares regression (PLS-R). A comparison between different CNN architectures utilizing 2D and 3D convolution is conducted. These results are compared to the performance of ANNs and PLS-R. Additionally, a variety of preprocessing techniques from image analysis and chemometrics are tested. These include centering, scaling, standard normal variate (SNV), Savitzky-Golay (SG) filtering, and detrending. The results indicate that the combination of NIR-HSI and CNNs has the potential to be the foundation for an automatic system unifying qualitative and quantitative grain quality analysis within a single sensor technology and predictive model type.

Keywords: deep learning, grain analysis, hyperspectral imaging, preprocessing techniques

Procedia PDF Downloads 67
6471 Arabic Text Representation and Classification Methods: Current State of the Art

Authors: Rami Ayadi, Mohsen Maraoui, Mounir Zrigui

Abstract:

In this paper, we have presented a brief current state of the art for Arabic text representation and classification methods. We decomposed Arabic Task Classification into four categories. First we describe some algorithms applied to classification on Arabic text. Secondly, we cite all major works when comparing classification algorithms applied on Arabic text, after this, we mention some authors who proposing new classification methods and finally we investigate the impact of preprocessing on Arabic TC.

Keywords: text classification, Arabic, impact of preprocessing, classification algorithms

Procedia PDF Downloads 435
6470 A Comparison of Convolutional Neural Network Architectures for the Classification of Alzheimer’s Disease Patients Using MRI Scans

Authors: Tomas Premoli, Sareh Rowlands

Abstract:

In this study, we investigate the impact of various convolutional neural network (CNN) architectures on the accuracy of diagnosing Alzheimer’s disease (AD) using patient MRI scans. Alzheimer’s disease is a debilitating neurodegenerative disorder that affects millions worldwide. Early, accurate, and non-invasive diagnostic methods are required for providing optimal care and symptom management. Deep learning techniques, particularly CNNs, have shown great promise in enhancing this diagnostic process. We aim to contribute to the ongoing research in this field by comparing the effectiveness of different CNN architectures and providing insights for future studies. Our methodology involved preprocessing MRI data, implementing multiple CNN architectures, and evaluating the performance of each model. We employed intensity normalization, linear registration, and skull stripping for our preprocessing. The selected architectures included VGG, ResNet, and DenseNet models, all implemented using the Keras library. We employed transfer learning and trained models from scratch to compare their effectiveness. Our findings demonstrated significant differences in performance among the tested architectures, with DenseNet201 achieving the highest accuracy of 86.4%. Transfer learning proved to be helpful in improving model performance. We also identified potential areas for future research, such as experimenting with other architectures, optimizing hyperparameters, and employing fine-tuning strategies. By providing a comprehensive analysis of the selected CNN architectures, we offer a solid foundation for future research in Alzheimer’s disease diagnosis using deep learning techniques. Our study highlights the potential of CNNs as a valuable diagnostic tool and emphasizes the importance of ongoing research to develop more accurate and effective models.

Keywords: Alzheimer’s disease, convolutional neural networks, deep learning, medical imaging, MRI

Procedia PDF Downloads 42
6469 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 389
6468 Classification of Cochannel Signals Using Cyclostationary Signal Processing and Deep Learning

Authors: Bryan Crompton, Daniel Giger, Tanay Mehta, Apurva Mody

Abstract:

The task of classifying radio frequency (RF) signals has seen recent success in employing deep neural network models. In this work, we present a combined signal processing and machine learning approach to signal classification for cochannel anomalous signals. The power spectral density and cyclostationary signal processing features of a captured signal are computed and fed into a neural net to produce a classification decision. Our combined signal preprocessing and machine learning approach allows for simpler neural networks with fast training times and small computational resource requirements for inference with longer preprocessing time.

Keywords: signal processing, machine learning, cyclostationary signal processing, signal classification

Procedia PDF Downloads 68
6467 Video Heart Rate Measurement for the Detection of Trauma-Related Stress States

Authors: Jarek Krajewski, David Daxberger, Luzi Beyer

Abstract:

Finding objective and non-intrusive measurements of emotional and psychopathological states (e.g., post-traumatic stress disorder, PTSD) is an important challenge. Thus, the proposed approach here uses Photoplethysmographic imaging (PPGI) applying facial RGB Cam videos to estimate heart rate levels. A pipeline for the signal processing of the raw image has been proposed containing different preprocessing approaches, e.g., Independent Component Analysis, Non-negative Matrix factorization, and various other artefact correction approaches. Under resting and constant light conditions, we reached a sensitivity of 84% for pulse peak detection. The results indicate that PPGI can be a suitable solution for providing heart rate data derived from these indirectly post-traumatic stress states.

Keywords: heart rate, PTSD, PPGI, stress, preprocessing

Procedia PDF Downloads 100
6466 Automated End-to-End Pipeline Processing Solution for Autonomous Driving

Authors: Ashish Kumar, Munesh Raghuraj Varma, Nisarg Joshi, Gujjula Vishwa Teja, Srikanth Sambi, Arpit Awasthi

Abstract:

Autonomous driving vehicles are revolutionizing the transportation system of the 21st century. This has been possible due to intensive research put into making a robust, reliable, and intelligent program that can perceive and understand its environment and make decisions based on the understanding. It is a very data-intensive task with data coming from multiple sensors and the amount of data directly reflects on the performance of the system. Researchers have to design the preprocessing pipeline for different datasets with different sensor orientations and alignments before the dataset can be fed to the model. This paper proposes a solution that provides a method to unify all the data from different sources into a uniform format using the intrinsic and extrinsic parameters of the sensor used to capture the data allowing the same pipeline to use data from multiple sources at a time. This also means easy adoption of new datasets or In-house generated datasets. The solution also automates the complete deep learning pipeline from preprocessing to post-processing for various tasks allowing researchers to design multiple custom end-to-end pipelines. Thus, the solution takes care of the input and output data handling, saving the time and effort spent on it and allowing more time for model improvement.

Keywords: augmentation, autonomous driving, camera, custom end-to-end pipeline, data unification, lidar, post-processing, preprocessing

Procedia PDF Downloads 70
6465 Determination of Physical Properties of Crude Oil Distillates by Near-Infrared Spectroscopy and Multivariate Calibration

Authors: Ayten Ekin Meşe, Selahattin Şentürk, Melike Duvanoğlu

Abstract:

Petroleum refineries are a highly complex process industry with continuous production and high operating costs. Physical separation of crude oil starts with the crude oil distillation unit, continues with various conversion and purification units, and passes through many stages until obtaining the final product. To meet the desired product specification, process parameters are strictly followed. To be able to ensure the quality of distillates, routine analyses are performed in quality control laboratories based on appropriate international standards such as American Society for Testing and Materials (ASTM) standard methods and European Standard (EN) methods. The cut point of distillates in the crude distillation unit is very crucial for the efficiency of the upcoming processes. In order to maximize the process efficiency, the determination of the quality of distillates should be as fast as possible, reliable, and cost-effective. In this sense, an alternative study was carried out on the crude oil distillation unit that serves the entire refinery process. In this work, studies were conducted with three different crude oil distillates which are Light Straight Run Naphtha (LSRN), Heavy Straight Run Naphtha (HSRN), and Kerosene. These products are named after separation by the number of carbons it contains. LSRN consists of five to six carbon-containing hydrocarbons, HSRN consist of six to ten, and kerosene consists of sixteen to twenty-two carbon-containing hydrocarbons. Physical properties of three different crude distillation unit products (LSRN, HSRN, and Kerosene) were determined using Near-Infrared Spectroscopy with multivariate calibration. The absorbance spectra of the petroleum samples were obtained in the range from 10000 cm⁻¹ to 4000 cm⁻¹, employing a quartz transmittance flow through cell with a 2 mm light path and a resolution of 2 cm⁻¹. A total of 400 samples were collected for each petroleum sample for almost four years. Several different crude oil grades were processed during sample collection times. Extended Multiplicative Signal Correction (EMSC) and Savitzky-Golay (SG) preprocessing techniques were applied to FT-NIR spectra of samples to eliminate baseline shifts and suppress unwanted variation. Two different multivariate calibration approaches (Partial Least Squares Regression, PLS and Genetic Inverse Least Squares, GILS) and an ensemble model were applied to preprocessed FT-NIR spectra. Predictive performance of each multivariate calibration technique and preprocessing techniques were compared, and the best models were chosen according to the reproducibility of ASTM reference methods. This work demonstrates the developed models can be used for routine analysis instead of conventional analytical methods with over 90% accuracy.

Keywords: crude distillation unit, multivariate calibration, near infrared spectroscopy, data preprocessing, refinery

Procedia PDF Downloads 86
6464 Performance Evaluation of Various Segmentation Techniques on MRI of Brain Tissue

Authors: U.V. Suryawanshi, S.S. Chowhan, U.V Kulkarni

Abstract:

Accuracy of segmentation methods is of great importance in brain image analysis. Tissue classification in Magnetic Resonance brain images (MRI) is an important issue in the analysis of several brain dementias. This paper portraits performance of segmentation techniques that are used on Brain MRI. A large variety of algorithms for segmentation of Brain MRI has been developed. The objective of this paper is to perform a segmentation process on MR images of the human brain, using Fuzzy c-means (FCM), Kernel based Fuzzy c-means clustering (KFCM), Spatial Fuzzy c-means (SFCM) and Improved Fuzzy c-means (IFCM). The review covers imaging modalities, MRI and methods for noise reduction and segmentation approaches. All methods are applied on MRI brain images which are degraded by salt-pepper noise demonstrate that the IFCM algorithm performs more robust to noise than the standard FCM algorithm. We conclude with a discussion on the trend of future research in brain segmentation and changing norms in IFCM for better results.

Keywords: image segmentation, preprocessing, MRI, FCM, KFCM, SFCM, IFCM

Procedia PDF Downloads 296
6463 Selecting the Best Sub-Region Indexing the Images in the Case of Weak Segmentation Based on Local Color Histograms

Authors: Mawloud Mosbah, Bachir Boucheham

Abstract:

Color Histogram is considered as the oldest method used by CBIR systems for indexing images. In turn, the global histograms do not include the spatial information; this is why the other techniques coming later have attempted to encounter this limitation by involving the segmentation task as a preprocessing step. The weak segmentation is employed by the local histograms while other methods as CCV (Color Coherent Vector) are based on strong segmentation. The indexation based on local histograms consists of splitting the image into N overlapping blocks or sub-regions, and then the histogram of each block is computed. The dissimilarity between two images is reduced, as consequence, to compute the distance between the N local histograms of the both images resulting then in N*N values; generally, the lowest value is taken into account to rank images, that means that the lowest value is that which helps to designate which sub-region utilized to index images of the collection being asked. In this paper, we make under light the local histogram indexation method in the hope to compare the results obtained against those given by the global histogram. We address also another noteworthy issue when Relying on local histograms namely which value, among N*N values, to trust on when comparing images, in other words, which sub-region among the N*N sub-regions on which we base to index images. Based on the results achieved here, it seems that relying on the local histograms, which needs to pose an extra overhead on the system by involving another preprocessing step naming segmentation, does not necessary mean that it produces better results. In addition to that, we have proposed here some ideas to select the local histogram on which we rely on to encode the image rather than relying on the local histogram having lowest distance with the query histograms.

Keywords: CBIR, color global histogram, color local histogram, weak segmentation, Euclidean distance

Procedia PDF Downloads 335
6462 Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening

Authors: Ksheeraj Sai Vepuri, Nada Attar

Abstract:

We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.

Keywords: facial expression recognittion, image preprocessing, deep learning, CNN

Procedia PDF Downloads 103
6461 STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via Embedded Anomaly Detection

Authors: Tianhao Zhang, Cen Chen, Dawei Cheng, Yuqi Liang, Yuanyuan Liang

Abstract:

Dealing with anomalies is a crucial preprocessing step for multivariate time series prediction. However, existing methods that separate anomaly preprocessing and model training into two stages have certain limitations. Specifically, these methods fail to leverage auxiliary information necessary to distinguish latent anomalies related to spatiotemporal factors during the preprocessing stage. Instead, they solely rely on data distribution for detection which may lead to incorrect processing of many samples that are beneficial for training. To address this, we propose STTS-EAD, an end-to-end method that seamlessly integrates anomaly detection into the training process of multivariate time series forecasting and aims to improve Spatio-Temporal learning based Time Series prediction via Embedded Anomaly Detection. Our proposed STTS-EAD leverages spatio-temporal information for forecasting and anomaly detection, with the two parts alternately executed and optimized for each other. To the best of our knowledge, STTS-EAD is the first to integrate anomaly detection and forecasting tasks in the training phase for improving the accuracy of multivariate time series forecasting. Extensive experiments on a public stock dataset and two real-world sales datasets from a renowned coffee chain enterprise show that our proposed method can effectively process detected anomalies in the training stage to improve forecasting performance in the inference stage and significantly outperform baselines.

Keywords: multivariate time series, anomaly detection, time series forecasting, spatiotemporal feature learning

Procedia PDF Downloads 9
6460 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 473
6459 Estimation of Chronic Kidney Disease Using Artificial Neural Network

Authors: Ilker Ali Ozkan

Abstract:

In this study, an artificial neural network model has been developed to estimate chronic kidney failure which is a common disease. The patients’ age, their blood and biochemical values, and 24 input data which consists of various chronic diseases are used for the estimation process. The input data have been subjected to preprocessing because they contain both missing values and nominal values. 147 patient data which was obtained from the preprocessing have been divided into as 70% training and 30% testing data. As a result of the study, artificial neural network model with 25 neurons in the hidden layer has been found as the model with the lowest error value. Chronic kidney failure disease has been able to be estimated accurately at the rate of 99.3% using this artificial neural network model. The developed artificial neural network has been found successful for the estimation of chronic kidney failure disease using clinical data.

Keywords: estimation, artificial neural network, chronic kidney failure disease, disease diagnosis

Procedia PDF Downloads 412
6458 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 49
6457 Hyperspectral Mapping Methods for Differentiating Mangrove Species along Karachi Coast

Authors: Sher Muhammad, Mirza Muhammad Waqar

Abstract:

It is necessary to monitor and identify mangroves types and spatial extent near coastal areas because it plays an important role in coastal ecosystem and environmental protection. This research aims at identifying and mapping mangroves types along Karachi coast ranging from 24.79 to 24.85 degree in latitude and 66.91 to 66.97 degree in longitude using hyperspectral remote sensing data and techniques. Image acquired during February, 2012 through Hyperion sensor have been used for this research. Image preprocessing includes geometric and radiometric correction followed by Minimum Noise Fraction (MNF) and Pixel Purity Index (PPI). The output of MNF and PPI has been analyzed by visualizing it in n-dimensions for end-member extraction. Well-distributed clusters on the n-dimensional scatter plot have been selected with the region of interest (ROI) tool as end members. These end members have been used as an input for classification techniques applied to identify and map mangroves species including Spectral Angle Mapper (SAM), Spectral Feature Fitting (SFF), and Spectral Information Diversion (SID). Only two types of mangroves namely Avicennia Marina (white mangroves) and Avicennia Germinans (black mangroves) have been observed throughout the study area.

Keywords: mangrove, hyperspectral, hyperion, SAM, SFF, SID

Procedia PDF Downloads 337
6456 Application of Data Mining Techniques for Tourism Knowledge Discovery

Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee

Abstract:

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

Keywords: classification algorithms, data mining, knowledge discovery, tourism

Procedia PDF Downloads 266