Search results for: time series classification
20923 Automatic Moment-Based Texture Segmentation
Authors: Tudor Barbu
Abstract:
An automatic moment-based texture segmentation approach is proposed in this paper. First, we describe the related work in this computer vision domain. Our texture feature extraction, the first part of the texture recognition process, produces a set of moment-based feature vectors. For each image pixel, a texture feature vector is computed as a sequence of area moments. Second, an automatic pixel classification approach is proposed. The feature vectors are clustered using some unsupervised classification algorithm, the optimal number of clusters being determined using a measure based on validation indexes. From the resulted pixel classes one determines easily the desired texture regions of the image.Keywords: image segmentation, moment-based, texture analysis, automatic classification, validation indexes
Procedia PDF Downloads 41420922 Deep Learning Application for Object Image Recognition and Robot Automatic Grasping
Authors: Shiuh-Jer Huang, Chen-Zon Yan, C. K. Huang, Chun-Chien Ting
Abstract:
Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.Keywords: deep learning, image processing, convolution neural network, YOLOv2, 7A6 series manipulator
Procedia PDF Downloads 24820921 Using Gene Expression Programming in Learning Process of Rough Neural Networks
Authors: Sanaa Rashed Abdallah, Yasser F. Hassan
Abstract:
The paper will introduce an approach where a rough sets, gene expression programming and rough neural networks are used cooperatively for learning and classification support. The Objective of gene expression programming rough neural networks (GEP-RNN) approach is to obtain new classified data with minimum error in training and testing process. Starting point of gene expression programming rough neural networks (GEP-RNN) approach is an information system and the output from this approach is a structure of rough neural networks which is including the weights and thresholds with minimum classification error.Keywords: rough sets, gene expression programming, rough neural networks, classification
Procedia PDF Downloads 38220920 IT-Aided Business Process Enabling Real-Time Analysis of Candidates for Clinical Trials
Authors: Matthieu-P. Schapranow
Abstract:
Recruitment of participants for clinical trials requires the screening of a big number of potential candidates, i.e. the testing for trial-specific inclusion and exclusion criteria, which is a time-consuming and complex task. Today, a significant amount of time is spent on identification of adequate trial participants as their selection may affect the overall study results. We introduce a unique patient eligibility metric, which allows systematic ranking and classification of candidates based on trial-specific filter criteria. Our web application enables real-time analysis of patient data and assessment of candidates using freely definable inclusion and exclusion criteria. As a result, the overall time required for identifying eligible candidates is tremendously reduced whilst additional degrees of freedom for evaluating the relevance of individual candidates are introduced by our contribution.Keywords: in-memory technology, clinical trials, screening, eligibility metric, data analysis, clustering
Procedia PDF Downloads 49220919 U.S. Trade and Trade Balance with China: Testing for Marshall-Lerner Condition and the J-Curve Hypothesis
Authors: Anisul Islam
Abstract:
The U.S. has a very strong trade relationship with China but with a large and persistent trade deficit. Some has argued that the undervalued Chinese Yuan is to be blamed for the persistent trade deficit. The empirical results are mixed at best. This paper empirically estimates the U.S. export function along with the U.S. import function with its trade with China with the purpose of testing for the existence of the Marshall-Lerner (ML) condition as well for the possible existence of the J-curve hypothesis. Annual export and import data will be utilized for as long as the time series data exists. The export and import functions will be estimated using advanced econometric techniques, along with appropriate diagnostic tests performed to examine the validity and reliability of the estimated results. The annual time-series data covers from 1975 to 2022 with a sample size of 48 years, the longest period ever utilized before in any previous study. The data is collected from several sources, such as the World Bank’s World Development Indicators, IMF Financial Statistics, IMF Direction of Trade Statistics, and several other sources. The paper is expected to shed important light on the ongoing debate regarding the persistent U.S. trade deficit with China and the policies that may be useful to reduce such deficits over time. As such, the paper will be of great interest for the academics, researchers, think tanks, global organizations, and policy makers in both China and the U.S.Keywords: exports, imports, marshall-lerner condition, j-curve hypothesis, united states, china
Procedia PDF Downloads 6220918 Land Cover Classification System for the Estimation of Carbon Storage in Terrestrial Ecosystems
Authors: Lei Zhang
Abstract:
The carbon cycle greatly influences global change, and the land cover changes contribute to the status and rate of the carbon budget in ecosystems. This paper proposes a land cover classification system for mapping land cover, the national ecological environment assessment, and estimating carbon storage in ecosystems. The classification system consists of basic land cover classes at levels Ⅰ and Ⅱ and auxiliary features at level III. The basic 38 classes characterizing land cover features are derived from 19 criteria referring to composition, structure, pattern, phenology, etc. The basic classes reflect the status of carbon storage in ecosystems. The auxiliary classes at level III complement the attributes of higher levels by 9 criteria. The 5 environmental criteria of temperature, moisture, landform, aspect and slope mainly reflect the potential and intensity of carbon storage in ecosystems. The disturbance of vegetation succession caused by land use type influences the vegetation carbon budget. The other 3 vegetation cover criteria, growth period, and species characteristics further refine the vegetation types. The hierarchical structure of the land cover map (the classes of levels Ⅰ and Ⅱ) is independent of the products of level III, which is helpful for land cover product management and applications. The classification system has been adopted in the Chinese national land cover database for the carbon budget in ecosystems at a 30 m scale.Keywords: classification system, land cover, ecosystem, carbon storage, object based
Procedia PDF Downloads 6920917 An Investigation of Rainfall Changes in KanganCity During Years 1964 to 2003
Authors: Borzou Faramarzi, Farideh Azimi, Azam Gohardoust, Abbas Ghasemi Ghasemvand, Maryam Mirzaei, Mandana Amani
Abstract:
In this study, attempts were made to examine and analyze the trend for rainfall changes in Kangan City, Booshehr Province, during the time span 1964 to 2003, using seven rainfall threshold indices based on 50 climate extremes indices approved by WMO–CCL/CLIVAR. These indices include days with heavy precipitations, days with rainfalls, frequency of rainfall threshold values, intensity of rainfall threshold values, percentage of rainfall threshold values, successive days of rainfall, and successive days with no precipitation. Results are indicative of the fact that Kangan City climatic conditions have become more dried than before. Indices days with heavy precipitations and days with rainfalls do not show a certain trend in Kangan City. Frequency, intensity, and percentage of rainfall threshold values in the station under investigation do not indicate a certain trend. In analysis of time series of rainfall extreme indices, generally, it was revealed that Kangan City is influenced by general factors of global warming. Calculation of values for the next 10 years based on ARIMA models demonstrates a continuation of warming trends in Kangan City. On the whole, rainfall conditions in Kangan City have experienced more dry periods compared to the past, the trend which is also observable for next 10 years.Keywords: climatic indices, climate change, extreme temperature and precipitation, time series
Procedia PDF Downloads 27120916 From Type-I to Type-II Fuzzy System Modeling for Diagnosis of Hepatitis
Authors: Shahabeddin Sotudian, M. H. Fazel Zarandi, I. B. Turksen
Abstract:
Hepatitis is one of the most common and dangerous diseases that affects humankind, and exposes millions of people to serious health risks every year. Diagnosis of Hepatitis has always been a challenge for physicians. This paper presents an effective method for diagnosis of hepatitis based on interval Type-II fuzzy. This proposed system includes three steps: pre-processing (feature selection), Type-I and Type-II fuzzy classification, and system evaluation. KNN-FD feature selection is used as the preprocessing step in order to exclude irrelevant features and to improve classification performance and efficiency in generating the classification model. In the fuzzy classification step, an “indirect approach” is used for fuzzy system modeling by implementing the exponential compactness and separation index for determining the number of rules in the fuzzy clustering approach. Therefore, we first proposed a Type-I fuzzy system that had an accuracy of approximately 90.9%. In the proposed system, the process of diagnosis faces vagueness and uncertainty in the final decision. Thus, the imprecise knowledge was managed by using interval Type-II fuzzy logic. The results that were obtained show that interval Type-II fuzzy has the ability to diagnose hepatitis with an average accuracy of 93.94%. The classification accuracy obtained is the highest one reached thus far. The aforementioned rate of accuracy demonstrates that the Type-II fuzzy system has a better performance in comparison to Type-I and indicates a higher capability of Type-II fuzzy system for modeling uncertainty.Keywords: hepatitis disease, medical diagnosis, type-I fuzzy logic, type-II fuzzy logic, feature selection
Procedia PDF Downloads 30520915 Global Solar Irradiance: Data Imputation to Analyze Complementarity Studies of Energy in Colombia
Authors: Jeisson A. Estrella, Laura C. Herrera, Cristian A. Arenas
Abstract:
The Colombian electricity sector has been transforming through the insertion of new energy sources to generate electricity, one of them being solar energy, which is being promoted by companies interested in photovoltaic technology. The study of this technology is important for electricity generation in general and for the planning of the sector from the perspective of energy complementarity. Precisely in this last approach is where the project is located; we are interested in answering the concerns about the reliability of the electrical system when climatic phenomena such as El Niño occur or in defining whether it is viable to replace or expand thermoelectric plants. Reliability of the electrical system when climatic phenomena such as El Niño occur, or to define whether it is viable to replace or expand thermoelectric plants with renewable electricity generation systems. In this regard, some difficulties related to the basic information on renewable energy sources from measured data must first be solved, as these come from automatic weather stations. Basic information on renewable energy sources from measured data, since these come from automatic weather stations administered by the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM) and, in the range of study (2005-2019), have significant amounts of missing data. For this reason, the overall objective of the project is to complete the global solar irradiance datasets to obtain time series to develop energy complementarity analyses in a subsequent project. Global solar irradiance data sets to obtain time series that will allow the elaboration of energy complementarity analyses in the following project. The filling of the databases will be done through numerical and statistical methods, which are basic techniques for undergraduate students in technical areas who are starting out as researchers technical areas who are starting out as researchers.Keywords: time series, global solar irradiance, imputed data, energy complementarity
Procedia PDF Downloads 7020914 DeClEx-Processing Pipeline for Tumor Classification
Authors: Gaurav Shinde, Sai Charan Gongiguntla, Prajwal Shirur, Ahmed Hambaba
Abstract:
Health issues are significantly increasing, putting a substantial strain on healthcare services. This has accelerated the integration of machine learning in healthcare, particularly following the COVID-19 pandemic. The utilization of machine learning in healthcare has grown significantly. We introduce DeClEx, a pipeline that ensures that data mirrors real-world settings by incorporating Gaussian noise and blur and employing autoencoders to learn intermediate feature representations. Subsequently, our convolutional neural network, paired with spatial attention, provides comparable accuracy to state-of-the-art pre-trained models while achieving a threefold improvement in training speed. Furthermore, we provide interpretable results using explainable AI techniques. We integrate denoising and deblurring, classification, and explainability in a single pipeline called DeClEx.Keywords: machine learning, healthcare, classification, explainability
Procedia PDF Downloads 5420913 Classification Rule Discovery by Using Parallel Ant Colony Optimization
Authors: Waseem Shahzad, Ayesha Tahir Khan, Hamid Hussain Awan
Abstract:
Ant-Miner algorithm that lies under ACO algorithms is used to extract knowledge from data in the form of rules. A variant of Ant-Miner algorithm named as cAnt-MinerPB is used to generate list of rules using pittsburgh approach in order to maintain the rule interaction among the rules that are generated. In this paper, we propose a parallel Ant MinerPB in which Ant colony optimization algorithm runs parallel. In this technique, a data set is divided vertically (i-e attributes) into different subsets. These subsets are created based on the correlation among attributes using Mutual Information (MI). It generates rules in a parallel manner and then merged to form a final list of rules. The results have shown that the proposed technique achieved higher accuracy when compared with original cAnt-MinerPB and also the execution time has also reduced.Keywords: ant colony optimization, parallel Ant-MinerPB, vertical partitioning, classification rule discovery
Procedia PDF Downloads 29320912 Random Subspace Ensemble of CMAC Classifiers
Authors: Somaiyeh Dehghan, Mohammad Reza Kheirkhahan Haghighi
Abstract:
The rapid growth of domains that have data with a large number of features, while the number of samples is limited has caused difficulty in constructing strong classifiers. To reduce the dimensionality of the feature space becomes an essential step in classification task. Random subspace method (or attribute bagging) is an ensemble classifier that consists of several classifiers that each base learner in ensemble has subset of features. In the present paper, we introduce Random Subspace Ensemble of CMAC neural network (RSE-CMAC), each of which has training with subset of features. Then we use this model for classification task. For evaluation performance of our model, we compare it with bagging algorithm on 36 UCI datasets. The results reveal that the new model has better performance.Keywords: classification, random subspace, ensemble, CMAC neural network
Procedia PDF Downloads 32820911 Comparison of Rainfall Trends in the Western Ghats and Coastal Region of Karnataka, India
Authors: Vinay C. Doranalu, Amba Shetty
Abstract:
In recent days due to climate change, there is a large variation in spatial distribution of daily rainfall within a small region. Rainfall is one of the main end climatic variables which affect spatio-temporal patterns of water availability. The real task postured by the change in climate is identification, estimation and understanding the uncertainty of rainfall. This study intended to analyze the spatial variations and temporal trends of daily precipitation using high resolution (0.25º x 0.25º) gridded data of Indian Meteorological Department (IMD). For the study, 38 grid points were selected in the study area and analyzed for daily precipitation time series (113 years) over the period 1901-2013. Grid points were divided into two zones based on the elevation and situated location of grid points: Low Land (exposed to sea and low elevated area/ coastal region) and High Land (Interior from sea and high elevated area/western Ghats). Time series were applied to examine the spatial analysis and temporal trends in each grid points by non-parametric Mann-Kendall test and Theil-Sen estimator to perceive the nature of trend and magnitude of slope in trend of rainfall. Pettit-Mann-Whitney test is applied to detect the most probable change point in trends of the time period. Results have revealed remarkable monotonic trend in each grid for daily precipitation of the time series. In general, by the regional cluster analysis found that increasing precipitation trend in shoreline region and decreasing trend in Western Ghats from recent years. Spatial distribution of rainfall can be partly explained by heterogeneity in temporal trends of rainfall by change point analysis. The Mann-Kendall test shows significant variation as weaker rainfall towards the rainfall distribution over eastern parts of the Western Ghats region of Karnataka.Keywords: change point analysis, coastal region India, gridded rainfall data, non-parametric
Procedia PDF Downloads 29120910 Application of Remote Sensing and GIS in Assessing Land Cover Changes within Granite Quarries around Brits Area, South Africa
Authors: Refilwe Moeletsi
Abstract:
Dimension stone quarrying around Brits and Belfast areas started in the early 1930s and has been growing rapidly since then. Environmental impacts associated with these quarries have not been documented, and hence this study aims at detecting any change in the environment that might have been caused by these activities. Landsat images that were used to assess land use/land cover changes in Brits quarries from 1998 - 2015. A supervised classification using maximum likelihood classifier was applied to classify each image into different land use/land cover types. Classification accuracy was assessed using Google Earth™ as a source of reference data. Post-classification change detection method was used to determine changes. The results revealed significant increase in granite quarries and corresponding decrease in vegetation cover within the study region.Keywords: remote sensing, GIS, change detection, granite quarries
Procedia PDF Downloads 31220909 A Double Ended AC Series Arc Fault Location Algorithm Based on Currents Estimation and a Fault Map Trace Generation
Authors: Edwin Calderon-Mendoza, Patrick Schweitzer, Serge Weber
Abstract:
Series arc faults appear frequently and unpredictably in low voltage distribution systems. Many methods have been developed to detect this type of faults and commercial protection systems such AFCI (arc fault circuit interrupter) have been used successfully in electrical networks to prevent damage and catastrophic incidents like fires. However, these devices do not allow series arc faults to be located on the line in operating mode. This paper presents a location algorithm for series arc fault in a low-voltage indoor power line in an AC 230 V-50Hz home network. The method is validated through simulations using the MATLAB software. The fault location method uses electrical parameters (resistance, inductance, capacitance, and conductance) of a 49 m indoor power line. The mathematical model of a series arc fault is based on the analysis of the V-I characteristics of the arc and consists basically of two antiparallel diodes and DC voltage sources. In a first step, the arc fault model is inserted at some different positions across the line which is modeled using lumped parameters. At both ends of the line, currents and voltages are recorded for each arc fault generation at different distances. In the second step, a fault map trace is created by using signature coefficients obtained from Kirchhoff equations which allow a virtual decoupling of the line’s mutual capacitance. Each signature coefficient obtained from the subtraction of estimated currents is calculated taking into account the Discrete Fast Fourier Transform of currents and voltages and also the fault distance value. These parameters are then substituted into Kirchhoff equations. In a third step, the same procedure described previously to calculate signature coefficients is employed but this time by considering hypothetical fault distances where the fault can appear. In this step the fault distance is unknown. The iterative calculus from Kirchhoff equations considering stepped variations of the fault distance entails the obtaining of a curve with a linear trend. Finally, the fault distance location is estimated at the intersection of two curves obtained in steps 2 and 3. The series arc fault model is validated by comparing current registered from simulation with real recorded currents. The model of the complete circuit is obtained for a 49m line with a resistive load. Also, 11 different arc fault positions are considered for the map trace generation. By carrying out the complete simulation, the performance of the method and the perspectives of the work will be presented.Keywords: indoor power line, fault location, fault map trace, series arc fault
Procedia PDF Downloads 13720908 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches
Authors: Aya Salama
Abstract:
Digital Twin is an emerging research topic that attracted researchers in the last decade. It is used in many fields, such as smart manufacturing and smart healthcare because it saves time and money. It is usually related to other technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, Human digital twin (HDT), in specific, is still a novel idea that still needs to prove its feasibility. HDT expands the idea of Digital Twin to human beings, which are living beings and different from the inanimate physical entities. The goal of this research was to create a Human digital twin that is responsible for real-time human replies automation by simulating human behavior. For this reason, clustering, supervised classification, topic extraction, and sentiment analysis were studied in this paper. The feasibility of the HDT for personal replies generation on social messaging applications was proved in this work. The overall accuracy of the proposed approach in this paper was 63% which is a very promising result that can open the way for researchers to expand the idea of HDT. This was achieved by using Random Forest for clustering the question data base and matching new questions. K-nearest neighbor was also applied for sentiment analysis.Keywords: human digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification, clustering
Procedia PDF Downloads 8520907 Automatic Method for Classification of Informative and Noninformative Images in Colonoscopy Video
Authors: Nidhal K. Azawi, John M. Gauch
Abstract:
Colorectal cancer is one of the leading causes of cancer death in the US and the world, which is why millions of colonoscopy examinations are performed annually. Unfortunately, noise, specular highlights, and motion artifacts corrupt many images in a typical colonoscopy exam. The goal of our research is to produce automated techniques to detect and correct or remove these noninformative images from colonoscopy videos, so physicians can focus their attention on informative images. In this research, we first automatically extract features from images. Then we use machine learning and deep neural network to classify colonoscopy images as either informative or noninformative. Our results show that we achieve image classification accuracy between 92-98%. We also show how the removal of noninformative images together with image alignment can aid in the creation of image panoramas and other visualizations of colonoscopy images.Keywords: colonoscopy classification, feature extraction, image alignment, machine learning
Procedia PDF Downloads 25020906 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study
Authors: Faisal Aburub, Wael Hadi
Abstract:
Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.Keywords: classification, data mining, evaluation measures, groundwater
Procedia PDF Downloads 27820905 Time Series Analysis of Air Pollution in Suceava County ( Nord- East of Romania)
Authors: Lazurca Liliana Gina
Abstract:
Different time series analysis of yearly air pollution at Suceava County, Nord-East of Romania, has been performed in this study. The trends in the atmospheric concentrations of the main gaseous and particulate pollutants in urban, industrial and rural environments across Suceava County were estimated for the period of 2008-2014. The non-parametric Mann-Kendall test was used to determine the trends in the annual average concentrations of air pollutants (NO2, NO, NOx, SO2, CO, PM10, O3, C6H6). The slope was estimated using the non-parametric Sen’s method. Trend significance was assumed at the 5% significance level (p < 0.05) in the current study. During the 7 year period, trends in atmospheric concentrations may not have been monotonic, in some instances concentrations of species increased and subsequently decreased. The trend in Suceava County is to keep a low concentration of pollutants in ambient air respecting the limit values.All the results that we obtained show that Romania has taken a lot of regulatory measures to decrease the concentrations of air pollutants in the last decade, in Suceava County the air quality monitoring highlight for the most part of the analyzed pollutants decreasing trends. For the analyzed period we observed considerable improvements in background air in Suceava County.Keywords: pollutant, trend, air quality monitoring, Mann-Kendall
Procedia PDF Downloads 36320904 Day/Night Detector for Vehicle Tracking in Traffic Monitoring Systems
Authors: M. Taha, Hala H. Zayed, T. Nazmy, M. Khalifa
Abstract:
Recently, traffic monitoring has attracted the attention of computer vision researchers. Many algorithms have been developed to detect and track moving vehicles. In fact, vehicle tracking in daytime and in nighttime cannot be approached with the same techniques, due to the extreme different illumination conditions. Consequently, traffic-monitoring systems are in need of having a component to differentiate between daytime and nighttime scenes. In this paper, a HSV-based day/night detector is proposed for traffic monitoring scenes. The detector employs the hue-histogram and the value-histogram on the top half of the image frame. Experimental results show that the extraction of the brightness features along with the color features within the top region of the image is effective for classifying traffic scenes. In addition, the detector achieves high precision and recall rates along with it is feasible for real time applications.Keywords: day/night detector, daytime/nighttime classification, image classification, vehicle tracking, traffic monitoring
Procedia PDF Downloads 55420903 Analyzing Tools and Techniques for Classification In Educational Data Mining: A Survey
Authors: D. I. George Amalarethinam, A. Emima
Abstract:
Educational Data Mining (EDM) is one of the newest topics to emerge in recent years, and it is concerned with developing methods for analyzing various types of data gathered from the educational circle. EDM methods and techniques with machine learning algorithms are used to extract meaningful and usable information from huge databases. For scientists and researchers, realistic applications of Machine Learning in the EDM sectors offer new frontiers and present new problems. One of the most important research areas in EDM is predicting student success. The prediction algorithms and techniques must be developed to forecast students' performance, which aids the tutor, institution to boost the level of student’s performance. This paper examines various classification techniques in prediction methods and data mining tools used in EDM.Keywords: classification technique, data mining, EDM methods, prediction methods
Procedia PDF Downloads 11420902 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results
Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter
Abstract:
Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.Keywords: classification, CRISP-DM, machine learning, predictive quality, regression
Procedia PDF Downloads 14320901 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides
Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney
Abstract:
Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis
Procedia PDF Downloads 32520900 Analysis of Extreme Rainfall Trends in Central Italy
Authors: Renato Morbidelli, Carla Saltalippi, Alessia Flammini, Marco Cifrodelli, Corrado Corradini
Abstract:
The trend of magnitude and frequency of extreme rainfalls seems to be different depending on the investigated area of the world. In this work, the impact of climate change on extreme rainfalls in Umbria, an inland region of central Italy, is examined using data recorded during the period 1921-2015 by 10 representative rain gauge stations. The study area is characterized by a complex orography, with altitude ranging from 200 to more than 2000 m asl. The climate is very different from zone to zone, with mean annual rainfall ranging from 650 to 1450 mm and mean annual air temperature from 3.3 to 14.2°C. Over the past 15 years, this region has been affected by four significant droughts as well as by six dangerous flood events, all with very large impact in economic terms. A least-squares linear trend analysis of annual maximums over 60 time series selected considering 6 different durations (1 h, 3 h, 6 h, 12 h, 24 h, 48 h) showed about 50% of positive and 50% of negative cases. For the same time series the non-parametrical Mann-Kendall test with a significance level 0.05 evidenced only 3% of cases characterized by a negative trend and no positive case. Further investigations have also demonstrated that the variance and covariance of each time series can be considered almost stationary. Therefore, the analysis on the magnitude of extreme rainfalls supplies the indication that an evident trend in the change of values in the Umbria region does not exist. However, also the frequency of rainfall events, with particularly high rainfall depths values, occurred during a fixed period has also to be considered. For all selected stations the 2-day rainfall events that exceed 50 mm were counted for each year, starting from the first monitored year to the end of 2015. Also, this analysis did not show predominant trends. Specifically, for all selected rain gauge stations the annual number of 2-day rainfall events that exceed the threshold value (50 mm) was slowly decreasing in time, while the annual cumulated rainfall depths corresponding to the same events evidenced trends that were not statistically significant. Overall, by using a wide available dataset and adopting simple methods, the influence of climate change on the heavy rainfalls in the Umbria region is not detected.Keywords: climate changes, rainfall extremes, rainfall magnitude and frequency, central Italy
Procedia PDF Downloads 23520899 A Nonlinear Feature Selection Method for Hyperspectral Image Classification
Authors: Pei-Jyun Hsieh, Cheng-Hsuan Li, Bor-Chen Kuo
Abstract:
For hyperspectral image classification, feature reduction is an important pre-processing for avoiding the Hughes phenomena due to the difficulty for collecting training samples. Hence, lots of researches developed feature selection methods such as F-score, HSIC (Hilbert-Schmidt Independence Criterion), and etc., to improve hyperspectral image classification. However, most of them only consider the class separability in the original space, i.e., a linear class separability. In this study, we proposed a nonlinear class separability measure based on kernel trick for selecting an appropriate feature subset. The proposed nonlinear class separability was formed by a generalized RBF kernel with different bandwidths with respect to different features. Moreover, it considered the within-class separability and the between-class separability. A genetic algorithm was applied to tune these bandwidths such that the smallest with-class separability and the largest between-class separability simultaneously. This indicates the corresponding feature space is more suitable for classification. In addition, the corresponding nonlinear classification boundary can separate classes very well. These optimal bandwidths also show the importance of bands for hyperspectral image classification. The reciprocals of these bandwidths can be viewed as weights of bands. The smaller bandwidth, the larger weight of the band, and the more importance for classification. Hence, the descending order of the reciprocals of the bands gives an order for selecting the appropriate feature subsets. In the experiments, three hyperspectral image data sets, the Indian Pine Site data set, the PAVIA data set, and the Salinas A data set, were used to demonstrate the selected feature subsets by the proposed nonlinear feature selection method are more appropriate for hyperspectral image classification. Only ten percent of samples were randomly selected to form the training dataset. All non-background samples were used to form the testing dataset. The support vector machine was applied to classify these testing samples based on selected feature subsets. According to the experiments on the Indian Pine Site data set with 220 bands, the highest accuracies by applying the proposed method, F-score, and HSIC are 0.8795, 0.8795, and 0.87404, respectively. However, the proposed method selects 158 features. F-score and HSIC select 168 features and 217 features, respectively. Moreover, the classification accuracies increase dramatically only using first few features. The classification accuracies with respect to feature subsets of 10 features, 20 features, 50 features, and 110 features are 0.69587, 0.7348, 0.79217, and 0.84164, respectively. Furthermore, only using half selected features (110 features) of the proposed method, the corresponding classification accuracy (0.84168) is approximate to the highest classification accuracy, 0.8795. For other two hyperspectral image data sets, the PAVIA data set and Salinas A data set, we can obtain the similar results. These results illustrate our proposed method can efficiently find feature subsets to improve hyperspectral image classification. One can apply the proposed method to determine the suitable feature subset first according to specific purposes. Then researchers can only use the corresponding sensors to obtain the hyperspectral image and classify the samples. This can not only improve the classification performance but also reduce the cost for obtaining hyperspectral images.Keywords: hyperspectral image classification, nonlinear feature selection, kernel trick, support vector machine
Procedia PDF Downloads 26220898 Classification of Health Information Needs of Hypertensive Patients in the Online Health Community Based on Content Analysis
Authors: Aijing Luo, Zirui Xin, Yifeng Yuan
Abstract:
Background: With the rapid development of the online health community, more and more patients or families are seeking health information on the Internet. Objective: This study aimed to discuss how to fully reveal the health information needs expressed by hypertensive patients in their questions in the online environment. Methods: This study randomly selected 1,000 text records from the question data of hypertensive patients from 2008 to 2018 collected from the website www.haodf.com and constructed a classification system through literature research and content analysis. This paper identified the background characteristics and questioning the intention of each hypertensive patient based on the patient’s question and used co-occurrence network analysis to explore the features of the health information needs of hypertensive patients. Results: The classification system for health information needs of patients with hypertension is composed of 9 parts: 355 kinds of drugs, 395 kinds of symptoms and signs, 545 kinds of tests and examinations , 526 kinds of demographic data, 80 kinds of diseases, 37 kinds of risk factors, 43 kinds of emotions, 6 kinds of lifestyles, 49 kinds of questions. The characteristics of the explored online health information needs of the hypertensive patients include: i)more than 49% of patients describe the features such as drugs, symptoms and signs, tests and examinations, demographic data, diseases, etc. ii) these groups are most concerned about treatment (77.8%), followed by diagnosis (32.3%); iii) 65.8% of hypertensive patients will ask doctors online several questions at the same time. 28.3% of the patients are very concerned about how to adjust the medication, and they will ask other treatment-related questions at the same time, including drug side effects, whether to take drugs, how to treat a disease, etc.; secondly, 17.6% of the patients will consult the doctors online about the causes of the clinical findings, including the relationship between the clinical findings and a disease, the treatment of a disease, medication, and examinations. Conclusion: In the online environment, the health information needs expressed by Chinese hypertensive patients to doctors are personalized; that is, patients with different background features express their questioning intentions to doctors. The classification system constructed in this study can guide health information service providers in the construction of online health resources, to help solve the problem of information asymmetry in communication between doctors and patients.Keywords: online health community, health information needs, hypertensive patients, doctor-patient communication
Procedia PDF Downloads 11820897 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation
Procedia PDF Downloads 23420896 Characterization of Agroforestry Systems in Burkina Faso Using an Earth Observation Data Cube
Authors: Dan Kanmegne
Abstract:
Africa will become the most populated continent by the end of the century, with around 4 billion inhabitants. Food security and climate changes will become continental issues since agricultural practices depend on climate but also contribute to global emissions and land degradation. Agroforestry has been identified as a cost-efficient and reliable strategy to address these two issues. It is defined as the integrated management of trees and crops/animals in the same land unit. Agroforestry provides benefits in terms of goods (fruits, medicine, wood, etc.) and services (windbreaks, fertility, etc.), and is acknowledged to have a great potential for carbon sequestration; therefore it can be integrated into reduction mechanisms of carbon emissions. Particularly in sub-Saharan Africa, the constraint stands in the lack of information about both areas under agroforestry and the characterization (composition, structure, and management) of each agroforestry system at the country level. This study describes and quantifies “what is where?”, earliest to the quantification of carbon stock in different systems. Remote sensing (RS) is the most efficient approach to map such a dynamic technology as agroforestry since it gives relatively adequate and consistent information over a large area at nearly no cost. RS data fulfill the good practice guidelines of the Intergovernmental Panel On Climate Change (IPCC) that is to be used in carbon estimation. Satellite data are getting more and more accessible, and the archives are growing exponentially. To retrieve useful information to support decision-making out of this large amount of data, satellite data needs to be organized so to ensure fast processing, quick accessibility, and ease of use. A new solution is a data cube, which can be understood as a multi-dimensional stack (space, time, data type) of spatially aligned pixels and used for efficient access and analysis. A data cube for Burkina Faso has been set up from the cooperation project between the international service provider WASCAL and Germany, which provides an accessible exploitation architecture of multi-temporal satellite data. The aim of this study is to map and characterize agroforestry systems using the Burkina Faso earth observation data cube. The approach in its initial stage is based on an unsupervised image classification of a normalized difference vegetation index (NDVI) time series from 2010 to 2018, to stratify the country based on the vegetation. Fifteen strata were identified, and four samples per location were randomly assigned to define the sampling units. For safety reasons, the northern part will not be part of the fieldwork. A total of 52 locations will be visited by the end of the dry season in February-March 2020. The field campaigns will consist of identifying and describing different agroforestry systems and qualitative interviews. A multi-temporal supervised image classification will be done with a random forest algorithm, and the field data will be used for both training the algorithm and accuracy assessment. The expected outputs are (i) map(s) of agroforestry dynamics, (ii) characteristics of different systems (main species, management, area, etc.); (iii) assessment report of Burkina Faso data cube.Keywords: agroforestry systems, Burkina Faso, earth observation data cube, multi-temporal image classification
Procedia PDF Downloads 14420895 Impacts of Climate Elements on the Annual Periodic Behavior of the Shallow Groundwater Level: Case Study from Central-Eastern Europe
Authors: Tamas Garamhegyi, Jozsef Kovacs, Rita Pongracz, Peter Tanos, Balazs Trasy, Norbert Magyar, Istvan G. Hatvani
Abstract:
Like most environmental processes, shallow groundwater fluctuation under natural circumstances also behaves periodically. With the statistical tools at hand, it can easily be determined if a period exists in the data or not. Thus, the question may be raised: Does the estimated average period time characterize the whole time period, or not? This is especially important in the case of such complex phenomena as shallow groundwater fluctuation, driven by numerous factors. Because of the continuous changes in the oscillating components of shallow groundwater time series, the most appropriate method should be used to investigate its periodicity, this is wavelet spectrum analysis. The aims of the research were to investigate the periodic behavior of the shallow groundwater time series of an agriculturally important and drought sensitive region in Central-Eastern Europe and its relationship to the European pressure action centers. During the research ~216 shallow groundwater observation wells located in the eastern part of the Great Hungarian Plain with a temporal coverage of 50 years were scanned for periodicity. By taking the full-time interval as 100%, the presence of any period could be determined in percentages. With the complex hydrogeological/meteorological model developed in this study, non-periodic time intervals were found in the shallow groundwater levels. On the local scale, this phenomenon linked to drought conditions, and on a regional scale linked to the maxima of the regional air pressures in the Gulf of Genoa. The study documented an important link between shallow groundwater levels and climate variables/indices facilitating the necessary adaptation strategies on national and/or regional scales, which have to take into account the predictions of drought-related climatic conditions.Keywords: climate change, drought, groundwater periodicity, wavelet spectrum and coherence analyses
Procedia PDF Downloads 38320894 Land Use/Land Cover Mapping Using Landsat 8 and Sentinel-2 in a Mediterranean Landscape
Authors: Moschos Vogiatzis, K. Perakis
Abstract:
Spatial-explicit and up-to-date land use/land cover information is fundamental for spatial planning, land management, sustainable development, and sound decision-making. In the last decade, many satellite-derived land cover products at different spatial, spectral, and temporal resolutions have been developed, such as the European Copernicus Land Cover product. However, more efficient and detailed information for land use/land cover is required at the regional or local scale. A typical Mediterranean basin with a complex landscape comprised of various forest types, crops, artificial surfaces, and wetlands was selected to test and develop our approach. In this study, we investigate the improvement of Copernicus Land Cover product (CLC2018) using Landsat 8 and Sentinel-2 pixel-based classification based on all available existing geospatial data (Forest Maps, LPIS, Natura2000 habitats, cadastral parcels, etc.). We examined and compared the performance of the Random Forest classifier for land use/land cover mapping. In total, 10 land use/land cover categories were recognized in Landsat 8 and 11 in Sentinel-2A. A comparison of the overall classification accuracies for 2018 shows that Landsat 8 classification accuracy was slightly higher than Sentinel-2A (82,99% vs. 80,30%). We concluded that the main land use/land cover types of CLC2018, even within a heterogeneous area, can be successfully mapped and updated according to CLC nomenclature. Future research should be oriented toward integrating spatiotemporal information from seasonal bands and spectral indexes in the classification process.Keywords: classification, land use/land cover, mapping, random forest
Procedia PDF Downloads 123