Search results for: classification and regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5048

Search results for: classification and regression

4598 Early Gastric Cancer Prediction from Diet and Epidemiological Data Using Machine Learning in Mizoram Population

Authors: Brindha Senthil Kumar, Payel Chakraborty, Senthil Kumar Nachimuthu, Arindam Maitra, Prem Nath

Abstract:

Gastric cancer is predominantly caused by demographic and diet factors as compared to other cancer types. The aim of the study is to predict Early Gastric Cancer (ECG) from diet and lifestyle factors using supervised machine learning algorithms. For this study, 160 healthy individual and 80 cases were selected who had been followed for 3 years (2016-2019), at Civil Hospital, Aizawl, Mizoram. A dataset containing 11 features that are core risk factors for the gastric cancer were extracted. Supervised machine algorithms: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Multilayer perceptron, and Random Forest were used to analyze the dataset using Python Jupyter Notebook Version 3. The obtained classified results had been evaluated using metrics parameters: minimum_false_positives, brier_score, accuracy, precision, recall, F1_score, and Receiver Operating Characteristics (ROC) curve. Data analysis results showed Naive Bayes - 88, 0.11; Random Forest - 83, 0.16; SVM - 77, 0.22; Logistic Regression - 75, 0.25 and Multilayer perceptron - 72, 0.27 with respect to accuracy and brier_score in percent. Naive Bayes algorithm out performs with very low false positive rates as well as brier_score and good accuracy. Naive Bayes algorithm classification results in predicting ECG showed very satisfactory results using only diet cum lifestyle factors which will be very helpful for the physicians to educate the patients and public, thereby mortality of gastric cancer can be reduced/avoided with this knowledge mining work.

Keywords: Early Gastric cancer, Machine Learning, Diet, Lifestyle Characteristics

Procedia PDF Downloads 135
4597 Using India’s Traditional Knowledge Digital Library on Traditional Tibetan Medicine

Authors: Chimey Lhamo, Ngawang Tsering

Abstract:

Traditional Tibetan medicine, known as Sowa Rigpa (Science of healing), originated more than 2500 years ago with an insightful background, and it has been growing significant attention in many Asian countries like China, India, Bhutan, and Nepal. Particularly, the Indian government has targeted Traditional Tibetan medicine as its major Indian medical system, including Ayurveda. Although Traditional Tibetan medicine has been growing interest and has a long history, it is not easily recognized worldwide because it exists only in the Tibetan language and it is neither accessible nor understood by patent examiners at the international patent office, data about Traditional Tibetan medicine is not yet broadly exist in the Internet. There has also been the exploitation of traditional Tibetan medicine increasing. The Traditional Knowledge Digital Library is a database aiming to prevent the patenting and misappropriation of India’s traditional medicine knowledge by using India’s Traditional knowledge Digital Library on Sowa Rigpa in order to prevent its exploitation at international patent with the help of information technology tools and an innovative classification systems-traditional knowledge resource classification (TKRC). As of date, more than 3000 Sowa Rigpa formulations have been transcribed into a Traditional Knowledge Digital Library database. In this paper, we are presenting India's Traditional Knowledge Digital Library for Traditional Tibetan medicine, and this database system helps to preserve and prevent the exploitation of Sowa Rigpa. Gradually it will be approved and accepted globally.

Keywords: traditional Tibetan medicine, India's traditional knowledge digital library, traditional knowledge resources classification, international patent classification

Procedia PDF Downloads 109
4596 A Review: Detection and Classification Defects on Banana and Apples by Computer Vision

Authors: Zahow Muoftah

Abstract:

Traditional manual visual grading of fruits has been one of the agricultural industry’s major challenges due to its laborious nature as well as inconsistency in the inspection and classification process. The main requirements for computer vision and visual processing are some effective techniques for identifying defects and estimating defect areas. Automated defect detection using computer vision and machine learning has emerged as a promising area of research with a high and direct impact on the visual inspection domain. Grading, sorting, and disease detection are important factors in determining the quality of fruits after harvest. Many studies have used computer vision to evaluate the quality level of fruits during post-harvest. Many studies have used computer vision to evaluate the quality level of fruits during post-harvest. Many studies have been conducted to identify diseases and pests that affect the fruits of agricultural crops. However, most previous studies concentrated solely on the diagnosis of a lesion or disease. This study focused on a comprehensive study to identify pests and diseases of apple and banana fruits using detection and classification defects on Banana and Apples by Computer Vision. As a result, the current article includes research from these domains as well. Finally, various pattern recognition techniques for detecting apple and banana defects are discussed.

Keywords: computer vision, banana, apple, detection, classification

Procedia PDF Downloads 80
4595 Maturity Classification of Oil Palm Fresh Fruit Bunches Using Thermal Imaging Technique

Authors: Shahrzad Zolfagharnassab, Abdul Rashid Mohamed Shariff, Reza Ehsani, Hawa Ze Jaffar, Ishak Aris

Abstract:

Ripeness estimation of oil palm fresh fruit is important processes that affect the profitableness and salability of oil palm fruits. The adulthood or ripeness of the oil palm fruits influences the quality of oil palm. Conventional procedure includes physical grading of Fresh Fruit Bunches (FFB) maturity by calculating the number of loose fruits per bunch. This physical classification of oil palm FFB is costly, time consuming and the results may have human error. Hence, many researchers try to develop the methods for ascertaining the maturity of oil palm fruits and thereby, deviously the oil content of distinct palm fruits without the need for exhausting oil extraction and analysis. This research investigates the potential of infrared images (Thermal Images) as a predictor to classify the oil palm FFB ripeness. A total of 270 oil palm fresh fruit bunches from most common cultivar of oil palm bunches Nigresens according to three maturity categories: under ripe, ripe and over ripe were collected. Each sample was scanned by the thermal imaging cameras FLIR E60 and FLIR T440. The average temperature of each bunches were calculated by using image processing in FLIR Tools and FLIR ThermaCAM researcher pro 2.10 environment software. The results show that temperature content decreased from immature to over mature oil palm FFBs. An overall analysis-of-variance (ANOVA) test was proved that this predictor gave significant difference between underripe, ripe and overripe maturity categories. This shows that the temperature as predictors can be good indicators to classify oil palm FFB. Classification analysis was performed by using the temperature of the FFB as predictors through Linear Discriminant Analysis (LDA), Mahalanobis Discriminant Analysis (MDA), Artificial Neural Network (ANN) and K- Nearest Neighbor (KNN) methods. The highest overall classification accuracy was 88.2% by using Artificial Neural Network. This research proves that thermal imaging and neural network method can be used as predictors of oil palm maturity classification.

Keywords: artificial neural network, maturity classification, oil palm FFB, thermal imaging

Procedia PDF Downloads 331
4594 Using AI for Analysing Political Leaders

Authors: Shuai Zhao, Shalendra D. Sharma, Jin Xu

Abstract:

This research uses advanced machine learning models to learn a number of hypotheses regarding political executives. Specifically, it analyses the impact these powerful leaders have on economic growth by using leaders’ data from the Archigos database from 1835 to the end of 2015. The data is processed by the AutoGluon, which was developed by Amazon. Automated Machine Learning (AutoML) and AutoGluon can automatically extract features from the data and then use multiple classifiers to train the data. Use a linear regression model and classification model to establish the relationship between leaders and economic growth (GDP per capita growth), and to clarify the relationship between their characteristics and economic growth from a machine learning perspective. Our work may show as a model or signal for collaboration between the fields of statistics and artificial intelligence (AI) that can light up the way for political researchers and economists.

Keywords: comparative politics, political executives, leaders’ characteristics, artificial intelligence

Procedia PDF Downloads 61
4593 Amharic Text News Classification Using Supervised Learning

Authors: Misrak Assefa

Abstract:

The Amharic language is the second most widely spoken Semitic language in the world. There are several new overloaded on the web. Searching some useful documents from the web on a specific topic, which is written in the Amharic language, is a challenging task. Hence, document categorization is required for managing and filtering important information. In the classification of Amharic text news, there is still a gap in the domain of information that needs to be launch. This study attempts to design an automatic Amharic news classification using a supervised learning mechanism on four un-touch classes. To achieve this research, 4,182 news articles were used. Naive Bayes (NB) and Decision tree (j48) algorithms were used to classify the given Amharic dataset. In this paper, k-fold cross-validation is used to estimate the accuracy of the classifier. As a result, it shows those algorithms can be applicable in Amharic news categorization. The best average accuracy result is achieved by j48 decision tree and naïve Bayes is 95.2345 %, and 94.6245 % respectively using three categories. This research indicated that a typical decision tree algorithm is more applicable to Amharic news categorization.

Keywords: text categorization, supervised machine learning, naive Bayes, decision tree

Procedia PDF Downloads 168
4592 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 265
4591 The Optimization of Decision Rules in Multimodal Decision-Level Fusion Scheme

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

This paper introduces an original method of parametric optimization of the structure for multimodal decision-level fusion scheme which combines the results of the partial solution of the classification task obtained from assembly of the mono-modal classifiers. As a result, a multimodal fusion classifier which has the minimum value of the total error rate has been obtained.

Keywords: classification accuracy, fusion solution, total error rate, multimodal fusion classifier

Procedia PDF Downloads 441
4590 The Prognostic Prediction Value of Positive Lymph Nodes Numbers for the Hypopharyngeal Squamous Cell Carcinoma

Authors: Wendu Pang, Yaxin Luo, Junhong Li, Yu Zhao, Danni Cheng, Yufang Rao, Minzi Mao, Ke Qiu, Yijun Dong, Fei Chen, Jun Liu, Jian Zou, Haiyang Wang, Wei Xu, Jianjun Ren

Abstract:

We aimed to compare the prognostic prediction value of positive lymph node number (PLNN) to the American Joint Committee on Cancer (AJCC) tumor, lymph node, and metastasis (TNM) staging system for patients with hypopharyngeal squamous cell carcinoma (HPSCC). A total of 826 patients with HPSCC from the Surveillance, Epidemiology, and End Results database (2004–2015) were identified and split into two independent cohorts: training (n=461) and validation (n=365). Univariate and multivariate Cox regression analyses were used to evaluate the prognostic effects of PLNN in patients with HPSCC. We further applied six Cox regression models to compare the survival predictive values of the PLNN and AJCC TNM staging system. PLNN showed a significant association with overall survival (OS) and cancer-specific survival (CSS) (P < 0.001) in both univariate and multivariable analyses, and was divided into three groups (PLNN 0, PLNN 1-5, and PLNN>5). In the training cohort, multivariate analysis revealed that the increased PLNN of HPSCC gave rise to significantly poor OS and CSS after adjusting for age, sex, tumor size, and cancer stage; this trend was also verified by the validation cohort. Additionally, the survival model incorporating a composite of PLNN and TNM classification (C-index, 0.705, 0.734) performed better than the PLNN and AJCC TNM models. PLNN can serve as a powerful survival predictor for patients with HPSCC and is a surrogate supplement for cancer staging systems.

Keywords: hypopharyngeal squamous cell carcinoma, positive lymph nodes number, prognosis, prediction models, survival predictive values

Procedia PDF Downloads 120
4589 Support Vector Regression Combined with Different Optimization Algorithms to Predict Global Solar Radiation on Horizontal Surfaces in Algeria

Authors: Laidi Maamar, Achwak Madani, Abdellah El Ahdj Abdellah

Abstract:

The aim of this work is to use Support Vector regression (SVR) combined with dragonfly, firefly, Bee Colony and particle swarm Optimization algorithm to predict global solar radiation on horizontal surfaces in some cities in Algeria. Combining these optimization algorithms with SVR aims principally to enhance accuracy by fine-tuning the parameters, speeding up the convergence of the SVR model, and exploring a larger search space efficiently; these parameters are the regularization parameter (C), kernel parameters, and epsilon parameter. By doing so, the aim is to improve the generalization and predictive accuracy of the SVR model. Overall, the aim is to leverage the strengths of both SVR and optimization algorithms to create a more powerful and effective regression model for various cities and under different climate conditions. Results demonstrate close agreement between predicted and measured data in terms of different metrics. In summary, SVM has proven to be a valuable tool in modeling global solar radiation, offering accurate predictions and demonstrating versatility when combined with other algorithms or used in hybrid forecasting models.

Keywords: support vector regression (SVR), optimization algorithms, global solar radiation prediction, hybrid forecasting models

Procedia PDF Downloads 10
4588 Statistic Regression and Open Data Approach for Identifying Economic Indicators That Influence e-Commerce

Authors: Apollinaire Barme, Simon Tamayo, Arthur Gaudron

Abstract:

This paper presents a statistical approach to identify explanatory variables linearly related to e-commerce sales. The proposed methodology allows specifying a regression model in order to quantify the relevance between openly available data (economic and demographic) and national e-commerce sales. The proposed methodology consists in collecting data, preselecting input variables, performing regressions for choosing variables and models, testing and validating. The usefulness of the proposed approach is twofold: on the one hand, it allows identifying the variables that influence e- commerce sales with an accessible approach. And on the other hand, it can be used to model future sales from the input variables. Results show that e-commerce is linearly dependent on 11 economic and demographic indicators.

Keywords: e-commerce, statistical modeling, regression, empirical research

Procedia PDF Downloads 203
4587 An Overview of the Porosity Classification in Carbonate Reservoirs and Their Challenges: An Example of Macro-Microporosity Classification from Offshore Miocene Carbonate in Central Luconia, Malaysia

Authors: Hammad T. Janjuhah, Josep Sanjuan, Mohamed K. Salah

Abstract:

Biological and chemical activities in carbonates are responsible for the complexity of the pore system. Primary porosity is generally of natural origin while secondary porosity is subject to chemical reactivity through diagenetic processes. To understand the integrated part of hydrocarbon exploration, it is necessary to understand the carbonate pore system. However, the current porosity classification scheme is limited to adequately predict the petrophysical properties of different reservoirs having various origins and depositional environments. Rock classification provides a descriptive method for explaining the lithofacies but makes no significant contribution to the application of porosity and permeability (poro-perm) correlation. The Central Luconia carbonate system (Malaysia) represents a good example of pore complexity (in terms of nature and origin) mainly related to diagenetic processes which have altered the original reservoir. For quantitative analysis, 32 high-resolution images of each thin section were taken using transmitted light microscopy. The quantification of grains, matrix, cement, and macroporosity (pore types) was achieved using a petrographic analysis of thin sections and FESEM images. The point counting technique was used to estimate the amount of macroporosity from thin section, which was then subtracted from the total porosity to derive the microporosity. The quantitative observation of thin sections revealed that the mouldic porosity (macroporosity) is the dominant porosity type present, whereas the microporosity seems to correspond to a sum of 40 to 50% of the total porosity. It has been proven that these Miocene carbonates contain a significant amount of microporosity, which significantly complicates the estimation and production of hydrocarbons. Neglecting its impact can increase uncertainty about estimating hydrocarbon reserves. Due to the diversity of geological parameters, the application of existing porosity classifications does not allow a better understanding of the poro-perm relationship. However, the classification can be improved by including the pore types and pore structures where they can be divided into macro- and microporosity. Such studies of microporosity identification/classification represent now a major concern in limestone reservoirs around the world.

Keywords: overview of porosity classification, reservoir characterization, microporosity, carbonate reservoir

Procedia PDF Downloads 126
4586 Using Time Series NDVI to Model Land Cover Change: A Case Study in the Berg River Catchment Area, Western Cape, South Africa

Authors: Adesuyi Ayodeji Steve, Zahn Munch

Abstract:

This study investigates the use of MODIS NDVI to identify agricultural land cover change areas on an annual time step (2007 - 2012) and characterize the trend in the study area. An ISODATA classification was performed on the MODIS imagery to select only the agricultural class producing 3 class groups namely: agriculture, agriculture/semi-natural, and semi-natural. NDVI signatures were created for the time series to identify areas dominated by cereals and vineyards with the aid of ancillary, pictometry and field sample data. The NDVI signature curve and training samples aided in creating a decision tree model in WEKA 3.6.9. From the training samples two classification models were built in WEKA using decision tree classifier (J48) algorithm; Model 1 included ISODATA classification and Model 2 without, both having accuracies of 90.7% and 88.3% respectively. The two models were used to classify the whole study area, thus producing two land cover maps with Model 1 and 2 having classification accuracies of 77% and 80% respectively. Model 2 was used to create change detection maps for all the other years. Subtle changes and areas of consistency (unchanged) were observed in the agricultural classes and crop practices over the years as predicted by the land cover classification. 41% of the catchment comprises of cereals with 35% possibly following a crop rotation system. Vineyard largely remained constant over the years, with some conversion to vineyard (1%) from other land cover classes. Some of the changes might be as a result of misclassification and crop rotation system.

Keywords: change detection, land cover, modis, NDVI

Procedia PDF Downloads 379
4585 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases

Authors: Hao-Hsiang Ku, Ching-Ho Chi

Abstract:

Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.

Keywords: Hadoop, NoSQL, ontology, back propagation neural network, high distributed file system

Procedia PDF Downloads 241
4584 Land Use Change Detection Using Satellite Images for Najran City, Kingdom of Saudi Arabia (KSA)

Authors: Ismail Elkhrachy

Abstract:

Determination of land use changing is an important component of regional planning for applications ranging from urban fringe change detection to monitoring change detection of land use. This data are very useful for natural resources management.On the other hand, the technologies and methods of change detection also have evolved dramatically during past 20 years. So it has been well recognized that the change detection had become the best methods for researching dynamic change of land use by multi-temporal remotely-sensed data. The objective of this paper is to assess, evaluate and monitor land use change surrounding the area of Najran city, Kingdom of Saudi Arabia (KSA) using Landsat images (June 23, 2009) and ETM+ image(June. 21, 2014). The post-classification change detection technique was applied. At last,two-time subset images of Najran city are compared on a pixel-by-pixel basis using the post-classification comparison method and the from-to change matrix is produced, the land use change information obtained.Three classes were obtained, urban, bare land and agricultural land from unsupervised classification method by using Erdas Imagine and ArcGIS software. Accuracy assessment of classification has been performed before calculating change detection for study area. The obtained accuracy is between 61% to 87% percent for all the classes. Change detection analysis shows that rapid growth in urban area has been increased by 73.2%, the agricultural area has been decreased by 10.5 % and barren area reduced by 7% between 2009 and 2014. The quantitative study indicated that the area of urban class has unchanged by 58.2 km〗^2, gained 70.3 〖km〗^2 and lost 16 〖km〗^2. For bare land class 586.4〖km〗^2 has unchanged, 53.2〖km〗^2 has gained and 101.5〖km〗^2 has lost. While agriculture area class, 20.2〖km〗^2 has unchanged, 31.2〖km〗^2 has gained and 37.2〖km〗^2 has lost.

Keywords: land use, remote sensing, change detection, satellite images, image classification

Procedia PDF Downloads 504
4583 Performance Prediction Methodology of Slow Aging Assets

Authors: M. Ben Slimene, M.-S. Ouali

Abstract:

Asset management of urban infrastructures faces a multitude of challenges that need to be overcome to obtain a reliable measurement of performances. Predicting the performance of slowly aging systems is one of those challenges, which helps the asset manager to investigate specific failure modes and to undertake the appropriate maintenance and rehabilitation interventions to avoid catastrophic failures as well as to optimize the maintenance costs. This article presents a methodology for modeling the deterioration of slowly degrading assets based on an operating history. It consists of extracting degradation profiles by grouping together assets that exhibit similar degradation sequences using an unsupervised classification technique derived from artificial intelligence. The obtained clusters are used to build the performance prediction models. This methodology is applied to a sample of a stormwater drainage culvert dataset.

Keywords: artificial Intelligence, clustering, culvert, regression model, slow degradation

Procedia PDF Downloads 84
4582 Assessment of the Spatio-Temporal Distribution of Pteridium aquilinum (Bracken Fern) Invasion on the Grassland Plateau in Nyika National Park

Authors: Andrew Kanzunguze, Lusayo Mwabumba, Jason K. Gilbertson, Dominic B. Gondwe, George Z. Nxumayo

Abstract:

Knowledge about the spatio-temporal distribution of invasive plants in protected areas provides a base from which hypotheses explaining proliferation of plant invasions can be made alongside development of relevant invasive plant monitoring programs. The aim of this study was to investigate the spatio-temporal distribution of bracken fern on the grassland plateau of Nyika National Park over the past 30 years (1986-2016) as well as to determine the current extent of the invasion. Remote sensing, machine learning, and statistical modelling techniques (object-based image analysis, image classification and linear regression analysis) in geographical information systems were used to determine both the spatial and temporal distribution of bracken fern in the study area. Results have revealed that bracken fern has been increasing coverage on the Nyika plateau at an estimated annual rate of 87.3 hectares since 1986. This translates to an estimated net increase of 2,573.1 hectares, which was recorded from 1,788.1 hectares (1986) to 4,361.9 hectares (2016). As of 2017 bracken fern covered 20,940.7 hectares, approximately 14.3% of the entire grassland plateau. Additionally, it was observed that the fern was distributed most densely around Chelinda camp (on the central plateau) as well as in forest verges and roadsides across the plateau. Based on these results it is recommended that Ecological Niche Modelling approaches be employed to (i) isolate the most important factors influencing bracken fern proliferation as well as (ii) identify and prioritize areas requiring immediate control interventions so as to minimize bracken fern proliferation in Nyika National Park.

Keywords: bracken fern, image classification, Landsat-8, Nyika National Park, spatio-temporal distribution

Procedia PDF Downloads 159
4581 The Necessity to Standardize Procedures of Providing Engineering Geological Data for Designing Road and Railway Tunneling Projects

Authors: Atefeh Saljooghi Khoshkar, Jafar Hassanpour

Abstract:

One of the main problems of the design stage relating to many tunneling projects is the lack of an appropriate standard for the provision of engineering geological data in a predefined format. In particular, this is more reflected in highway and railroad tunnel projects in which there is a number of tunnels and different professional teams involved. In this regard, comprehensive software needs to be designed using the accepted methods in order to help engineering geologists to prepare standard reports, which contain sufficient input data for the design stage. Regarding this necessity, applied software has been designed using macro capabilities and Visual Basic programming language (VBA) through Microsoft Excel. In this software, all of the engineering geological input data, which are required for designing different parts of tunnels, such as discontinuities properties, rock mass strength parameters, rock mass classification systems, boreability classification, the penetration rate, and so forth, can be calculated and reported in a standard format.

Keywords: engineering geology, rock mass classification, rock mechanic, tunnel

Procedia PDF Downloads 56
4580 Defect Classification of Hydrogen Fuel Pressure Vessels using Deep Learning

Authors: Dongju Kim, Youngjoo Suh, Hyojin Kim, Gyeongyeong Kim

Abstract:

Acoustic Emission Testing (AET) is widely used to test the structural integrity of an operational hydrogen storage container, and clustering algorithms are frequently used in pattern recognition methods to interpret AET results. However, the interpretation of AET results can vary from user to user as the tuning of the relevant parameters relies on the user's experience and knowledge of AET. Therefore, it is necessary to use a deep learning model to identify patterns in acoustic emission (AE) signal data that can be used to classify defects instead. In this paper, a deep learning-based model for classifying the types of defects in hydrogen storage tanks, using AE sensor waveforms, is proposed. As hydrogen storage tanks are commonly constructed using carbon fiber reinforced polymer composite (CFRP), a defect classification dataset is collected through a tensile test on a specimen of CFRP with an AE sensor attached. The performance of the classification model, using one-dimensional convolutional neural network (1-D CNN) and synthetic minority oversampling technique (SMOTE) data augmentation, achieved 91.09% accuracy for each defect. It is expected that the deep learning classification model in this paper, used with AET, will help in evaluating the operational safety of hydrogen storage containers.

Keywords: acoustic emission testing, carbon fiber reinforced polymer composite, one-dimensional convolutional neural network, smote data augmentation

Procedia PDF Downloads 69
4579 Classification of Manufacturing Data for Efficient Processing on an Edge-Cloud Network

Authors: Onyedikachi Ulelu, Andrew P. Longstaff, Simon Fletcher, Simon Parkinson

Abstract:

The widespread interest in 'Industry 4.0' or 'digital manufacturing' has led to significant research requiring the acquisition of data from sensors, instruments, and machine signals. In-depth research then identifies methods of analysis of the massive amounts of data generated before and during manufacture to solve a particular problem. The ultimate goal is for industrial Internet of Things (IIoT) data to be processed automatically to assist with either visualisation or autonomous system decision-making. However, the collection and processing of data in an industrial environment come with a cost. Little research has been undertaken on how to specify optimally what data to capture, transmit, process, and store at various levels of an edge-cloud network. The first step in this specification is to categorise IIoT data for efficient and effective use. This paper proposes the required attributes and classification to take manufacturing digital data from various sources to determine the most suitable location for data processing on the edge-cloud network. The proposed classification framework will minimise overhead in terms of network bandwidth/cost and processing time of machine tool data via efficient decision making on which dataset should be processed at the ‘edge’ and what to send to a remote server (cloud). A fast-and-frugal heuristic method is implemented for this decision-making. The framework is tested using case studies from industrial machine tools for machine productivity and maintenance.

Keywords: data classification, decision making, edge computing, industrial IoT, industry 4.0

Procedia PDF Downloads 156
4578 Local Directional Encoded Derivative Binary Pattern Based Coral Image Classification Using Weighted Distance Gray Wolf Optimization Algorithm

Authors: Annalakshmi G., Sakthivel Murugan S.

Abstract:

This paper presents a local directional encoded derivative binary pattern (LDEDBP) feature extraction method that can be applied for the classification of submarine coral reef images. The classification of coral reef images using texture features is difficult due to the dissimilarities in class samples. In coral reef image classification, texture features are extracted using the proposed method called local directional encoded derivative binary pattern (LDEDBP). The proposed approach extracts the complete structural arrangement of the local region using local binary batten (LBP) and also extracts the edge information using local directional pattern (LDP) from the edge response available in a particular region, thereby achieving extra discriminative feature value. Typically the LDP extracts the edge details in all eight directions. The process of integrating edge responses along with the local binary pattern achieves a more robust texture descriptor than the other descriptors used in texture feature extraction methods. Finally, the proposed technique is applied to an extreme learning machine (ELM) method with a meta-heuristic algorithm known as weighted distance grey wolf optimizer (GWO) to optimize the input weight and biases of single-hidden-layer feed-forward neural networks (SLFN). In the empirical results, ELM-WDGWO demonstrated their better performance in terms of accuracy on all coral datasets, namely RSMAS, EILAT, EILAT2, and MLC, compared with other state-of-the-art algorithms. The proposed method achieves the highest overall classification accuracy of 94% compared to the other state of art methods.

Keywords: feature extraction, local directional pattern, ELM classifier, GWO optimization

Procedia PDF Downloads 142
4577 Support Vector Regression for Retrieval of Soil Moisture Using Bistatic Scatterometer Data at X-Band

Authors: Dileep Kumar Gupta, Rajendra Prasad, Pradeep Kumar, Varun Narayan Mishra, Ajeet Kumar Vishwakarma, Prashant K. Srivastava

Abstract:

An approach was evaluated for the retrieval of soil moisture of bare soil surface using bistatic scatterometer data in the angular range of 200 to 700 at VV- and HH- polarization. The microwave data was acquired by specially designed X-band (10 GHz) bistatic scatterometer. The linear regression analysis was done between scattering coefficients and soil moisture content to select the suitable incidence angle for retrieval of soil moisture content. The 250 incidence angle was found more suitable. The support vector regression analysis was used to approximate the function described by the input-output relationship between the scattering coefficient and corresponding measured values of the soil moisture content. The performance of support vector regression algorithm was evaluated by comparing the observed and the estimated soil moisture content by statistical performance indices %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 2.9451, 1.0986, and 0.9214, respectively at HH-polarization. At VV- polarization, the values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 3.6186, 0.9373, and 0.9428, respectively.

Keywords: bistatic scatterometer, soil moisture, support vector regression, RMSE, %Bias, NSE

Procedia PDF Downloads 398
4576 A Family of Distributions on Learnable Problems without Uniform Convergence

Authors: César Garza

Abstract:

In supervised binary classification and regression problems, it is well-known that learnability is equivalent to a uniform convergence of the hypothesis class, and if a problem is learnable, it is learnable by empirical risk minimization. For the general learning setting of unsupervised learning tasks, there are non-trivial learning problems where uniform convergence does not hold. We present here the task of learning centers of mass with an extra feature that “activates” some of the coordinates over the unit ball in a Hilbert space. We show that the learning problem is learnable under a stable RLM rule. We introduce a family of distributions over the domain space with some mild restrictions for which the sample complexity of uniform convergence for these problems must grow logarithmically with the dimension of the Hilbert space. If we take this dimension to infinity, we obtain a learnable problem for which the uniform convergence property fails for a vast family of distributions.

Keywords: statistical learning theory, learnability, uniform convergence, stability, regularized loss minimization

Procedia PDF Downloads 101
4575 Kannada HandWritten Character Recognition by Edge Hinge and Edge Distribution Techniques Using Manhatan and Minimum Distance Classifiers

Authors: C. V. Aravinda, H. N. Prakash

Abstract:

In this paper, we tried to convey fusion and state of art pertaining to SIL character recognition systems. In the first step, the text is preprocessed and normalized to perform the text identification correctly. The second step involves extracting relevant and informative features. The third step implements the classification decision. The three stages which involved are Data acquisition and preprocessing, Feature extraction, and Classification. Here we concentrated on two techniques to obtain features, Feature Extraction & Feature Selection. Edge-hinge distribution is a feature that characterizes the changes in direction of a script stroke in handwritten text. The edge-hinge distribution is extracted by means of a windowpane that is slid over an edge-detected binary handwriting image. Whenever the mid pixel of the window is on, the two edge fragments (i.e. connected sequences of pixels) emerging from this mid pixel are measured. Their directions are measured and stored as pairs. A joint probability distribution is obtained from a large sample of such pairs. Despite continuous effort, handwriting identification remains a challenging issue, due to different approaches use different varieties of features, having different. Therefore, our study will focus on handwriting recognition based on feature selection to simplify features extracting task, optimize classification system complexity, reduce running time and improve the classification accuracy.

Keywords: word segmentation and recognition, character recognition, optical character recognition, hand written character recognition, South Indian languages

Procedia PDF Downloads 474
4574 Music Genre Classification Based on Non-Negative Matrix Factorization Features

Authors: Soyon Kim, Edward Kim

Abstract:

In order to retrieve information from the massive stream of songs in the music industry, music search by title, lyrics, artist, mood, and genre has become more important. Despite the subjectivity and controversy over the definition of music genres across different nations and cultures, automatic genre classification systems that facilitate the process of music categorization have been developed. Manual genre selection by music producers is being provided as statistical data for designing automatic genre classification systems. In this paper, an automatic music genre classification system utilizing non-negative matrix factorization (NMF) is proposed. Short-term characteristics of the music signal can be captured based on the timbre features such as mel-frequency cepstral coefficient (MFCC), decorrelated filter bank (DFB), octave-based spectral contrast (OSC), and octave band sum (OBS). Long-term time-varying characteristics of the music signal can be summarized with (1) the statistical features such as mean, variance, minimum, and maximum of the timbre features and (2) the modulation spectrum features such as spectral flatness measure, spectral crest measure, spectral peak, spectral valley, and spectral contrast of the timbre features. Not only these conventional basic long-term feature vectors, but also NMF based feature vectors are proposed to be used together for genre classification. In the training stage, NMF basis vectors were extracted for each genre class. The NMF features were calculated in the log spectral magnitude domain (NMF-LSM) as well as in the basic feature vector domain (NMF-BFV). For NMF-LSM, an entire full band spectrum was used. However, for NMF-BFV, only low band spectrum was used since high frequency modulation spectrum of the basic feature vectors did not contain important information for genre classification. In the test stage, using the set of pre-trained NMF basis vectors, the genre classification system extracted the NMF weighting values of each genre as the NMF feature vectors. A support vector machine (SVM) was used as a classifier. The GTZAN multi-genre music database was used for training and testing. It is composed of 10 genres and 100 songs for each genre. To increase the reliability of the experiments, 10-fold cross validation was used. For a given input song, an extracted NMF-LSM feature vector was composed of 10 weighting values that corresponded to the classification probabilities for 10 genres. An NMF-BFV feature vector also had a dimensionality of 10. Combined with the basic long-term features such as statistical features and modulation spectrum features, the NMF features provided the increased accuracy with a slight increase in feature dimensionality. The conventional basic features by themselves yielded 84.0% accuracy, but the basic features with NMF-LSM and NMF-BFV provided 85.1% and 84.2% accuracy, respectively. The basic features required dimensionality of 460, but NMF-LSM and NMF-BFV required dimensionalities of 10 and 10, respectively. Combining the basic features, NMF-LSM and NMF-BFV together with the SVM with a radial basis function (RBF) kernel produced the significantly higher classification accuracy of 88.3% with a feature dimensionality of 480.

Keywords: mel-frequency cepstral coefficient (MFCC), music genre classification, non-negative matrix factorization (NMF), support vector machine (SVM)

Procedia PDF Downloads 274
4573 Forecasting Equity Premium Out-of-Sample with Sophisticated Regression Training Techniques

Authors: Jonathan Iworiso

Abstract:

Forecasting the equity premium out-of-sample is a major concern to researchers in finance and emerging markets. The quest for a superior model that can forecast the equity premium with significant economic gains has resulted in several controversies on the choice of variables and suitable techniques among scholars. This research focuses mainly on the application of Regression Training (RT) techniques to forecast monthly equity premium out-of-sample recursively with an expanding window method. A broad category of sophisticated regression models involving model complexity was employed. The RT models include Ridge, Forward-Backward (FOBA) Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), Relaxed LASSO, Elastic Net, and Least Angle Regression were trained and used to forecast the equity premium out-of-sample. In this study, the empirical investigation of the RT models demonstrates significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk. Thus, the forecasting models appeared to guarantee an investor in a market setting who optimally reallocates a monthly portfolio between equities and risk-free treasury bills using equity premium forecasts at minimal risk.

Keywords: regression training, out-of-sample forecasts, expanding window, statistical predictability, economic significance, utility gains

Procedia PDF Downloads 79
4572 Self-Image of Police Officers

Authors: Leo Carlo B. Rondina

Abstract:

Self-image is an important factor to improve the self-esteem of the personnel. The purpose of the study is to determine the self-image of the police. The respondents were the 503 policemen assigned in different Police Station in Davao City, and they were chosen with the used of random sampling. With the used of Exploratory Factor Analysis (EFA), latent construct variables of police image were identified as follows; professionalism, obedience, morality and justice and fairness. Further, ordinal regression indicates statistical characteristics on ages 21-40 which means the age of the respondent statistically improves self-image.

Keywords: police image, exploratory factor analysis, ordinal regression, Galatea effect

Procedia PDF Downloads 262
4571 Quantitative Texture Analysis of Shoulder Sonography for Rotator Cuff Lesion Classification

Authors: Chung-Ming Lo, Chung-Chien Lee

Abstract:

In many countries, the lifetime prevalence of shoulder pain is up to 70%. In America, the health care system spends 7 billion per year about the healthy issues of shoulder pain. With respect to the origin, up to 70% of shoulder pain is attributed to rotator cuff lesions This study proposed a computer-aided diagnosis (CAD) system to assist radiologists classifying rotator cuff lesions with less operator dependence. Quantitative features were extracted from the shoulder ultrasound images acquired using an ALOKA alpha-6 US scanner (Hitachi-Aloka Medical, Tokyo, Japan) with linear array probe (scan width: 36mm) ranging from 5 to 13 MHz. During examination, the postures of the examined patients are standard sitting position and are followed by the regular routine. After acquisition, the shoulder US images were drawn out from the scanner and stored as 8-bit images with pixel value ranging from 0 to 255. Upon the sonographic appearance, the boundary of each lesion was delineated by a physician to indicate the specific pattern for analysis. The three lesion categories for classification were composed of 20 cases of tendon inflammation, 18 cases of calcific tendonitis, and 18 cases of supraspinatus tear. For each lesion, second-order statistics were quantified in the feature extraction. The second-order statistics were the texture features describing the correlations between adjacent pixels in a lesion. Because echogenicity patterns were expressed via grey-scale. The grey-scale co-occurrence matrixes with four angles of adjacent pixels were used. The texture metrics included the mean and standard deviation of energy, entropy, correlation, inverse different moment, inertia, cluster shade, cluster prominence, and Haralick correlation. Then, the quantitative features were combined in a multinomial logistic regression classifier to generate a prediction model of rotator cuff lesions. Multinomial logistic regression classifier is widely used in the classification of more than two categories such as the three lesion types used in this study. In the classifier, backward elimination was used to select a feature subset which is the most relevant. They were selected from the trained classifier with the lowest error rate. Leave-one-out cross-validation was used to evaluate the performance of the classifier. Each case was left out of the total cases and used to test the trained result by the remaining cases. According to the physician’s assessment, the performance of the proposed CAD system was shown by the accuracy. As a result, the proposed system achieved an accuracy of 86%. A CAD system based on the statistical texture features to interpret echogenicity values in shoulder musculoskeletal ultrasound was established to generate a prediction model for rotator cuff lesions. Clinically, it is difficult to distinguish some kinds of rotator cuff lesions, especially partial-thickness tear of rotator cuff. The shoulder orthopaedic surgeon and musculoskeletal radiologist reported greater diagnostic test accuracy than general radiologist or ultrasonographers based on the available literature. Consequently, the proposed CAD system which was developed according to the experiment of the shoulder orthopaedic surgeon can provide reliable suggestions to general radiologists or ultrasonographers. More quantitative features related to the specific patterns of different lesion types would be investigated in the further study to improve the prediction.

Keywords: shoulder ultrasound, rotator cuff lesions, texture, computer-aided diagnosis

Procedia PDF Downloads 261
4570 Decision Making System for Clinical Datasets

Authors: P. Bharathiraja

Abstract:

Computer Aided decision making system is used to enhance diagnosis and prognosis of diseases and also to assist clinicians and junior doctors in clinical decision making. Medical Data used for decision making should be definite and consistent. Data Mining and soft computing techniques are used for cleaning the data and for incorporating human reasoning in decision making systems. Fuzzy rule based inference technique can be used for classification in order to incorporate human reasoning in the decision making process. In this work, missing values are imputed using the mean or mode of the attribute. The data are normalized using min-ma normalization to improve the design and efficiency of the fuzzy inference system. The fuzzy inference system is used to handle the uncertainties that exist in the medical data. Equal-width-partitioning is used to partition the attribute values into appropriate fuzzy intervals. Fuzzy rules are generated using Class Based Associative rule mining algorithm. The system is trained and tested using heart disease data set from the University of California at Irvine (UCI) Machine Learning Repository. The data was split using a hold out approach into training and testing data. From the experimental results it can be inferred that classification using fuzzy inference system performs better than trivial IF-THEN rule based classification approaches. Furthermore it is observed that the use of fuzzy logic and fuzzy inference mechanism handles uncertainty and also resembles human decision making. The system can be used in the absence of a clinical expert to assist junior doctors and clinicians in clinical decision making.

Keywords: decision making, data mining, normalization, fuzzy rule, classification

Procedia PDF Downloads 494
4569 Regression Analysis of Travel Indicators and Public Transport Usage in Urban Areas

Authors: Mehdi Moeinaddini, Zohreh Asadi-Shekari, Muhammad Zaly Shah, Amran Hamzah

Abstract:

Currently, planners try to have more green travel options to decrease economic, social and environmental problems. Therefore, this study tries to find significant urban travel factors to be used to increase the usage of alternative urban travel modes. This paper attempts to identify the relationship between prominent urban mobility indicators and daily trips by public transport in 30 cities from various parts of the world. Different travel modes, infrastructures and cost indicators were evaluated in this research as mobility indicators. The results of multi-linear regression analysis indicate that there is a significant relationship between mobility indicators and the daily usage of public transport.

Keywords: green travel modes, urban travel indicators, daily trips by public transport, multi-linear regression analysis

Procedia PDF Downloads 528