Search results for: k nearest neighbor classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 666

Search results for: k nearest neighbor classifier

456 A Virtual Set-Up to Evaluate Augmented Reality Effect on Simulated Driving

Authors: Alicia Yanadira Nava Fuentes, Ilse Cervantes Camacho, Amadeo José Argüelles Cruz, Ana María Balboa Verduzco

Abstract:

Augmented reality promises being present in future driving, with its immersive technology let to show directions and maps to identify important places indicating with graphic elements when the car driver requires the information. On the other side, driving is considered a multitasking activity and, for some people, a complex activity where different situations commonly occur that require the immediate attention of the car driver to make decisions that contribute to avoid accidents; therefore, the main aim of the project is the instrumentation of a platform with biometric sensors that allows evaluating the performance in driving vehicles with the influence of augmented reality devices to detect the level of attention in drivers, since it is important to know the effect that it produces. In this study, the physiological sensors EPOC X (EEG), ECG06 PRO and EMG Myoware are joined in the driving test platform with a Logitech G29 steering wheel and the simulation software City Car Driving in which the level of traffic can be controlled, as well as the number of pedestrians that exist within the simulation obtaining a driver interaction in real mode and through a MSP430 microcontroller achieves the acquisition of data for storage. The sensors bring a continuous analog signal in time that needs signal conditioning, at this point, a signal amplifier is incorporated due to the acquired signals having a sensitive range of 1.25 mm/mV, also filtering that consists in eliminating the frequency bands of the signal in order to be interpretative and without noise to convert it from an analog signal into a digital signal to analyze the physiological signals of the drivers, these values are stored in a database. Based on this compilation, we work on the extraction of signal features and implement K-NN (k-nearest neighbor) classification methods and decision trees (unsupervised learning) that enable the study of data for the identification of patterns and determine by classification methods different effects of augmented reality on drivers. The expected results of this project include are a test platform instrumented with biometric sensors for data acquisition during driving and a database with the required variables to determine the effect caused by augmented reality on people in simulated driving.

Keywords: augmented reality, driving, physiological signals, test platform

Procedia PDF Downloads 113
455 Classification Using Worldview-2 Imagery of Giant Panda Habitat in Wolong, Sichuan Province, China

Authors: Yunwei Tang, Linhai Jing, Hui Li, Qingjie Liu, Xiuxia Li, Qi Yan, Haifeng Ding

Abstract:

The giant panda (Ailuropoda melanoleuca) is an endangered species, mainly live in central China, where bamboos act as the main food source of wild giant pandas. Knowledge of spatial distribution of bamboos therefore becomes important for identifying the habitat of giant pandas. There have been ongoing studies for mapping bamboos and other tree species using remote sensing. WorldView-2 (WV-2) is the first high resolution commercial satellite with eight Multi-Spectral (MS) bands. Recent studies demonstrated that WV-2 imagery has a high potential in classification of tree species. The advanced classification techniques are important for utilising high spatial resolution imagery. It is generally agreed that object-based image analysis is a more desirable method than pixel-based analysis in processing high spatial resolution remotely sensed data. Classifiers that use spatial information combined with spectral information are known as contextual classifiers. It is suggested that contextual classifiers can achieve greater accuracy than non-contextual classifiers. Thus, spatial correlation can be incorporated into classifiers to improve classification results. The study area is located at Wuyipeng area in Wolong, Sichuan Province. The complex environment makes it difficult for information extraction since bamboos are sparsely distributed, mixed with brushes, and covered by other trees. Extensive fieldworks in Wuyingpeng were carried out twice. The first one was on 11th June, 2014, aiming at sampling feature locations for geometric correction and collecting training samples for classification. The second fieldwork was on 11th September, 2014, for the purposes of testing the classification results. In this study, spectral separability analysis was first performed to select appropriate MS bands for classification. Also, the reflectance analysis provided information for expanding sample points under the circumstance of knowing only a few. Then, a spatially weighted object-based k-nearest neighbour (k-NN) classifier was applied to the selected MS bands to identify seven land cover types (bamboo, conifer, broadleaf, mixed forest, brush, bare land, and shadow), accounting for spatial correlation within classes using geostatistical modelling. The spatially weighted k-NN method was compared with three alternatives: the traditional k-NN classifier, the Support Vector Machine (SVM) method and the Classification and Regression Tree (CART). Through field validation, it was proved that the classification result obtained using the spatially weighted k-NN method has the highest overall classification accuracy (77.61%) and Kappa coefficient (0.729); the producer’s accuracy and user’s accuracy achieve 81.25% and 95.12% for the bamboo class, respectively, also higher than the other methods. Photos of tree crowns were taken at sample locations using a fisheye camera, so the canopy density could be estimated. It is found that it is difficult to identify bamboo in the areas with a large canopy density (over 0.70); it is possible to extract bamboos in the areas with a median canopy density (from 0.2 to 0.7) and in a sparse forest (canopy density is less than 0.2). In summary, this study explores the ability of WV-2 imagery for bamboo extraction in a mountainous region in Sichuan. The study successfully identified the bamboo distribution, providing supporting knowledge for assessing the habitats of giant pandas.

Keywords: bamboo mapping, classification, geostatistics, k-NN, worldview-2

Procedia PDF Downloads 292
454 An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing

Authors: Aleksandra Zysk, Pawel Badura

Abstract:

Recognizing and controlling vocal registers during singing is a difficult task for beginner vocalist. It requires among others identifying which part of natural resonators is being used when a sound propagates through the body. Thus, an application has been designed allowing for sound recording, automatic vocal register recognition (VRR), and a graphical user interface providing real-time visualization of the signal and recognition results. Six spectral features are determined for each time frame and passed to the support vector machine classifier yielding a binary decision on the head or chest register assignment of the segment. The classification training and testing data have been recorded by ten professional female singers (soprano, aged 19-29) performing sounds for both chest and head register. The classification accuracy exceeded 93% in each of various validation schemes. Apart from a hard two-class clustering, the support vector classifier returns also information on the distance between particular feature vector and the discrimination hyperplane in a feature space. Such an information reflects the level of certainty of the vocal register classification in a fuzzy way. Thus, the designed recognition and training application is able to assess and visualize the continuous trend in singing in a user-friendly graphical mode providing an easy way to control the vocal emission.

Keywords: classification, singing, spectral analysis, vocal emission, vocal register

Procedia PDF Downloads 282
453 An Efficient Algorithm for Solving the Transmission Network Expansion Planning Problem Integrating Machine Learning with Mathematical Decomposition

Authors: Pablo Oteiza, Ricardo Alvarez, Mehrdad Pirnia, Fuat Can

Abstract:

To effectively combat climate change, many countries around the world have committed to a decarbonisation of their electricity, along with promoting a large-scale integration of renewable energy sources (RES). While this trend represents a unique opportunity to effectively combat climate change, achieving a sound and cost-efficient energy transition towards low-carbon power systems poses significant challenges for the multi-year Transmission Network Expansion Planning (TNEP) problem. The objective of the multi-year TNEP is to determine the necessary network infrastructure to supply the projected demand in a cost-efficient way, considering the evolution of the new generation mix, including the integration of RES. The rapid integration of large-scale RES increases the variability and uncertainty in the power system operation, which in turn increases short-term flexibility requirements. To meet these requirements, flexible generating technologies such as energy storage systems must be considered within the TNEP as well, along with proper models for capturing the operational challenges of future power systems. As a consequence, TNEP formulations are becoming more complex and difficult to solve, especially for its application in realistic-sized power system models. To meet these challenges, there is an increasing need for developing efficient algorithms capable of solving the TNEP problem with reasonable computational time and resources. In this regard, a promising research area is the use of artificial intelligence (AI) techniques for solving large-scale mixed-integer optimization problems, such as the TNEP. In particular, the use of AI along with mathematical optimization strategies based on decomposition has shown great potential. In this context, this paper presents an efficient algorithm for solving the multi-year TNEP problem. The algorithm combines AI techniques with Column Generation, a traditional decomposition-based mathematical optimization method. One of the challenges of using Column Generation for solving the TNEP problem is that the subproblems are of mixed-integer nature, and therefore solving them requires significant amounts of time and resources. Hence, in this proposal we solve a linearly relaxed version of the subproblems, and trained a binary classifier that determines the value of the binary variables, based on the results obtained from the linearized version. A key feature of the proposal is that we integrate the binary classifier into the optimization algorithm in such a way that the optimality of the solution can be guaranteed. The results of a study case based on the HRP 38-bus test system shows that the binary classifier has an accuracy above 97% for estimating the value of the binary variables. Since the linearly relaxed version of the subproblems can be solved with significantly less time than the integer programming counterpart, the integration of the binary classifier into the Column Generation algorithm allowed us to reduce the computational time required for solving the problem by 50%. The final version of this paper will contain a detailed description of the proposed algorithm, the AI-based binary classifier technique and its integration into the CG algorithm. To demonstrate the capabilities of the proposal, we evaluate the algorithm in case studies with different scenarios, as well as in other power system models.

Keywords: integer optimization, machine learning, mathematical decomposition, transmission planning

Procedia PDF Downloads 54
452 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 42
451 Effect of Carbide Precipitates in Tool Steel on Material Transfer: A Molecular Dynamics Study

Authors: Ahmed Tamer AlMotasem, Jens Bergström, Anders Gåård, Pavel Krakhmalev, Thijs Jan Holleboom

Abstract:

In sheet metal forming processes, accumulation and transfer of sheet material to tool surfaces, often referred to as galling, is the major cause of tool failure. Initiation of galling is assumed to occur due to local adhesive wear between two surfaces. Therefore, reducing adhesion between the tool and the work sheet has a great potential to improve the tool materials galling resistance. Experimental observations and theoretical studies show that the presence of primary micro-sized carbides and/or nitrides in alloyed steels may significantly improve galling resistance. Generally, decreased adhesion between the ceramic precipitates and the sheet material counter-surface are attributed as main reason to the latter observations. On the other hand, adhesion processes occur at an atomic scale and, hence, fundamental understanding of galling can be obtained via atomic scale simulations. In the present study, molecular dynamics simulations are used, with utilizing second nearest neighbor embedded atom method potential to investigate the influence of nano-sized cementite precipitates embedded in tool atoms. The main aim of the simulations is to gain new fundamental knowledge on galling initiation mechanisms. Two tool/work piece configurations, iron/iron and iron-cementite/iron, are studied under dry sliding conditions. We find that the average frictional force decreases whereas the normal force increases for the iron-cementite/iron system, in comparison to the iron/iron configuration. Moreover, the average friction coefficient between the tool/work-piece decreases by about 10 % for the iron-cementite/iron case. The increase of the normal force in the case of iron-cementite/iron system may be attributed to the high stiffness of cementite compared to bcc iron. In order to qualitatively explain the effect of cementite on adhesion, the adhesion force between self-mated iron/iron and cementite/iron surfaces has been determined and we found that iron/cementite surface exhibits lower adhesive force than that of iron-iron surface. The variation of adhesion force with temperature was investigated up to 600 K and we found that the adhesive force, generally, decreases with increasing temperature. Structural analyses show that plastic deformation is the main deformation mechanism of the work-piece, accompanied with dislocations generation.

Keywords: adhesion, cementite, galling, molecular dynamics

Procedia PDF Downloads 279
450 Bias Prevention in Automated Diagnosis of Melanoma: Augmentation of a Convolutional Neural Network Classifier

Authors: Kemka Ihemelandu, Chukwuemeka Ihemelandu

Abstract:

Melanoma remains a public health crisis, with incidence rates increasing rapidly in the past decades. Improving diagnostic accuracy to decrease misdiagnosis using Artificial intelligence (AI) continues to be documented. Unfortunately, unintended racially biased outcomes, a product of lack of diversity in the dataset used, with a noted class imbalance favoring lighter vs. darker skin tone, have increasingly been recognized as a problem.Resulting in noted limitations of the accuracy of the Convolutional neural network (CNN)models. CNN models are prone to biased output due to biases in the dataset used to train them. Our aim in this study was the optimization of convolutional neural network algorithms to mitigate bias in the automated diagnosis of melanoma. We hypothesized that our proposed training algorithms based on a data augmentation method to optimize the diagnostic accuracy of a CNN classifier by generating new training samples from the original ones will reduce bias in the automated diagnosis of melanoma. We applied geometric transformation, including; rotations, translations, scale change, flipping, and shearing. Resulting in a CNN model that provided a modifiedinput data making for a model that could learn subtle racial features. Optimal selection of the momentum and batch hyperparameter increased our model accuracy. We show that our augmented model reduces bias while maintaining accuracy in the automated diagnosis of melanoma.

Keywords: bias, augmentation, melanoma, convolutional neural network

Procedia PDF Downloads 179
449 MIMIC: A Multi Input Micro-Influencers Classifier

Authors: Simone Leonardi, Luca Ardito

Abstract:

Micro-influencers are effective elements in the marketing strategies of companies and institutions because of their capability to create an hyper-engaged audience around a specific topic of interest. In recent years, many scientific approaches and commercial tools have handled the task of detecting this type of social media users. These strategies adopt solutions ranging from rule based machine learning models to deep neural networks and graph analysis on text, images, and account information. This work compares the existing solutions and proposes an ensemble method to generalize them with different input data and social media platforms. The deployed solution combines deep learning models on unstructured data with statistical machine learning models on structured data. We retrieve both social media accounts information and multimedia posts on Twitter and Instagram. These data are mapped into feature vectors for an eXtreme Gradient Boosting (XGBoost) classifier. Sixty different topics have been analyzed to build a rule based gold standard dataset and to compare the performances of our approach against baseline classifiers. We prove the effectiveness of our work by comparing the accuracy, precision, recall, and f1 score of our model with different configurations and architectures. We obtained an accuracy of 0.91 with our best performing model.

Keywords: deep learning, gradient boosting, image processing, micro-influencers, NLP, social media

Procedia PDF Downloads 151
448 Quantitative Texture Analysis of Shoulder Sonography for Rotator Cuff Lesion Classification

Authors: Chung-Ming Lo, Chung-Chien Lee

Abstract:

In many countries, the lifetime prevalence of shoulder pain is up to 70%. In America, the health care system spends 7 billion per year about the healthy issues of shoulder pain. With respect to the origin, up to 70% of shoulder pain is attributed to rotator cuff lesions This study proposed a computer-aided diagnosis (CAD) system to assist radiologists classifying rotator cuff lesions with less operator dependence. Quantitative features were extracted from the shoulder ultrasound images acquired using an ALOKA alpha-6 US scanner (Hitachi-Aloka Medical, Tokyo, Japan) with linear array probe (scan width: 36mm) ranging from 5 to 13 MHz. During examination, the postures of the examined patients are standard sitting position and are followed by the regular routine. After acquisition, the shoulder US images were drawn out from the scanner and stored as 8-bit images with pixel value ranging from 0 to 255. Upon the sonographic appearance, the boundary of each lesion was delineated by a physician to indicate the specific pattern for analysis. The three lesion categories for classification were composed of 20 cases of tendon inflammation, 18 cases of calcific tendonitis, and 18 cases of supraspinatus tear. For each lesion, second-order statistics were quantified in the feature extraction. The second-order statistics were the texture features describing the correlations between adjacent pixels in a lesion. Because echogenicity patterns were expressed via grey-scale. The grey-scale co-occurrence matrixes with four angles of adjacent pixels were used. The texture metrics included the mean and standard deviation of energy, entropy, correlation, inverse different moment, inertia, cluster shade, cluster prominence, and Haralick correlation. Then, the quantitative features were combined in a multinomial logistic regression classifier to generate a prediction model of rotator cuff lesions. Multinomial logistic regression classifier is widely used in the classification of more than two categories such as the three lesion types used in this study. In the classifier, backward elimination was used to select a feature subset which is the most relevant. They were selected from the trained classifier with the lowest error rate. Leave-one-out cross-validation was used to evaluate the performance of the classifier. Each case was left out of the total cases and used to test the trained result by the remaining cases. According to the physician’s assessment, the performance of the proposed CAD system was shown by the accuracy. As a result, the proposed system achieved an accuracy of 86%. A CAD system based on the statistical texture features to interpret echogenicity values in shoulder musculoskeletal ultrasound was established to generate a prediction model for rotator cuff lesions. Clinically, it is difficult to distinguish some kinds of rotator cuff lesions, especially partial-thickness tear of rotator cuff. The shoulder orthopaedic surgeon and musculoskeletal radiologist reported greater diagnostic test accuracy than general radiologist or ultrasonographers based on the available literature. Consequently, the proposed CAD system which was developed according to the experiment of the shoulder orthopaedic surgeon can provide reliable suggestions to general radiologists or ultrasonographers. More quantitative features related to the specific patterns of different lesion types would be investigated in the further study to improve the prediction.

Keywords: shoulder ultrasound, rotator cuff lesions, texture, computer-aided diagnosis

Procedia PDF Downloads 257
447 Generalized Synchronization in Systems with a Complex Topology of Attractor

Authors: Olga I. Moskalenko, Vladislav A. Khanadeev, Anastasya D. Koloskova, Alexey A. Koronovskii, Anatoly A. Pivovarov

Abstract:

Generalized synchronization is one of the most intricate phenomena in nonlinear science. It can be observed both in systems with a unidirectional and mutual type of coupling including the complex networks. Such a phenomenon has a number of practical applications, for example, for the secure information transmission through the communication channel with a high level of noise. Known methods for the secure information transmission needs in the increase of the privacy of data transmission that arises a question about the observation of such phenomenon in systems with a complex topology of chaotic attractor possessing two or more positive Lyapunov exponents. The present report is devoted to the study of such phenomenon in two unidirectionally and mutually coupled dynamical systems being in chaotic (with one positive Lyapunov exponent) and hyperchaotic (with two or more positive Lyapunov exponents) regimes, respectively. As the systems under study, we have used two mutually coupled modified Lorenz oscillators and two unidirectionally coupled time-delayed generators. We have shown that in both cases the generalized synchronization regime can be detected by means of the calculation of Lyapunov exponents and phase tube approach whereas due to the complex topology of attractor the nearest neighbor method is misleading. Moreover, the auxiliary system approaches being the standard method for the synchronous regime observation, for the mutual type of coupling results in incorrect results. To calculate the Lyapunov exponents in time-delayed systems we have proposed an approach based on the modification of Gram-Schmidt orthogonalization procedure in the context of the time-delayed system. We have studied in detail the mechanisms resulting in the generalized synchronization regime onset paying a great attention to the field where one positive Lyapunov exponent has already been become negative whereas the second one is a positive yet. We have found the intermittency here and studied its characteristics. To detect the laminar phase lengths the method based on a calculation of local Lyapunov exponents has been proposed. The efficiency of the method has been verified using the example of two unidirectionally coupled Rössler systems being in the band chaos regime. We have revealed the main characteristics of intermittency, i.e. the distribution of the laminar phase lengths and dependence of the mean length of the laminar phases on the criticality parameter, for all systems studied in the report. This work has been supported by the Russian President's Council grant for the state support of young Russian scientists (project MK-531.2018.2).

Keywords: complex topology of attractor, generalized synchronization, hyperchaos, Lyapunov exponents

Procedia PDF Downloads 249
446 Benchmarking Machine Learning Approaches for Forecasting Hotel Revenue

Authors: Rachel Y. Zhang, Christopher K. Anderson

Abstract:

A critical aspect of revenue management is a firm’s ability to predict demand as a function of price. Historically hotels have used simple time series models (regression and/or pick-up based models) owing to the complexities of trying to build casual models of demands. Machine learning approaches are slowly attracting attention owing to their flexibility in modeling relationships. This study provides an overview of approaches to forecasting hospitality demand – focusing on the opportunities created by machine learning approaches, including K-Nearest-Neighbors, Support vector machine, Regression Tree, and Artificial Neural Network algorithms. The out-of-sample performances of above approaches to forecasting hotel demand are illustrated by using a proprietary sample of the market level (24 properties) transactional data for Las Vegas NV. Causal predictive models can be built and evaluated owing to the availability of market level (versus firm level) data. This research also compares and contrast model accuracy of firm-level models (i.e. predictive models for hotel A only using hotel A’s data) to models using market level data (prices, review scores, location, chain scale, etc… for all hotels within the market). The prospected models will be valuable for hotel revenue prediction given the basic characters of a hotel property or can be applied in performance evaluation for an existed hotel. The findings will unveil the features that play key roles in a hotel’s revenue performance, which would have considerable potential usefulness in both revenue prediction and evaluation.

Keywords: hotel revenue, k-nearest-neighbors, machine learning, neural network, prediction model, regression tree, support vector machine

Procedia PDF Downloads 108
445 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 209
444 The Application of Video Segmentation Methods for the Purpose of Action Detection in Videos

Authors: Nassima Noufail, Sara Bouhali

Abstract:

In this work, we develop a semi-supervised solution for the purpose of action detection in videos and propose an efficient algorithm for video segmentation. The approach is divided into video segmentation, feature extraction, and classification. In the first part, a video is segmented into clips, and we used the K-means algorithm for this segmentation; our goal is to find groups based on similarity in the video. The application of k-means clustering into all the frames is time-consuming; therefore, we started by the identification of transition frames where the scene in the video changes significantly, and then we applied K-means clustering into these transition frames. We used two image filters, the gaussian filter and the Laplacian of Gaussian. Each filter extracts a set of features from the frames. The Gaussian filter blurs the image and omits the higher frequencies, and the Laplacian of gaussian detects regions of rapid intensity changes; we then used this vector of filter responses as an input to our k-means algorithm. The output is a set of cluster centers. Each video frame pixel is then mapped to the nearest cluster center and painted with a corresponding color to form a visual map. The resulting visual map had similar pixels grouped. We then computed a cluster score indicating how clusters are near each other and plotted a signal representing frame number vs. clustering score. Our hypothesis was that the evolution of the signal would not change if semantically related events were happening in the scene. We marked the breakpoints at which the root mean square level of the signal changes significantly, and each breakpoint is an indication of the beginning of a new video segment. In the second part, for each segment from part 1, we randomly selected a 16-frame clip, then we extracted spatiotemporal features using convolutional 3D network C3D for every 16 frames using a pre-trained model. The C3D final output is a 512-feature vector dimension; hence we used principal component analysis (PCA) for dimensionality reduction. The final part is the classification. The C3D feature vectors are used as input to a multi-class linear support vector machine (SVM) for the training model, and we used a multi-classifier to detect the action. We evaluated our experiment on the UCF101 dataset, which consists of 101 human action categories, and we achieved an accuracy that outperforms the state of art by 1.2%.

Keywords: video segmentation, action detection, classification, Kmeans, C3D

Procedia PDF Downloads 51
443 Reduction of False Positives in Head-Shoulder Detection Based on Multi-Part Color Segmentation

Authors: Lae-Jeong Park

Abstract:

The paper presents a method that utilizes figure-ground color segmentation to extract effective global feature in terms of false positive reduction in the head-shoulder detection. Conventional detectors that rely on local features such as HOG due to real-time operation suffer from false positives. Color cue in an input image provides salient information on a global characteristic which is necessary to alleviate the false positives of the local feature based detectors. An effective approach that uses figure-ground color segmentation has been presented in an effort to reduce the false positives in object detection. In this paper, an extended version of the approach is presented that adopts separate multipart foregrounds instead of a single prior foreground and performs the figure-ground color segmentation with each of the foregrounds. The multipart foregrounds include the parts of the head-shoulder shape and additional auxiliary foregrounds being optimized by a search algorithm. A classifier is constructed with the feature that consists of a set of the multiple resulting segmentations. Experimental results show that the presented method can discriminate more false positive than the single prior shape-based classifier as well as detectors with the local features. The improvement is possible because the presented approach can reduce the false positives that have the same colors in the head and shoulder foregrounds.

Keywords: pedestrian detection, color segmentation, false positive, feature extraction

Procedia PDF Downloads 257
442 Design and Implementation of Generative Models for Odor Classification Using Electronic Nose

Authors: Kumar Shashvat, Amol P. Bhondekar

Abstract:

In the midst of the five senses, odor is the most reminiscent and least understood. Odor testing has been mysterious and odor data fabled to most practitioners. The delinquent of recognition and classification of odor is important to achieve. The facility to smell and predict whether the artifact is of further use or it has become undesirable for consumption; the imitation of this problem hooked on a model is of consideration. The general industrial standard for this classification is color based anyhow; odor can be improved classifier than color based classification and if incorporated in machine will be awfully constructive. For cataloging of odor for peas, trees and cashews various discriminative approaches have been used Discriminative approaches offer good prognostic performance and have been widely used in many applications but are incapable to make effectual use of the unlabeled information. In such scenarios, generative approaches have better applicability, as they are able to knob glitches, such as in set-ups where variability in the series of possible input vectors is enormous. Generative models are integrated in machine learning for either modeling data directly or as a transitional step to form an indeterminate probability density function. The algorithms or models Linear Discriminant Analysis and Naive Bayes Classifier have been used for classification of the odor of cashews. Linear Discriminant Analysis is a method used in data classification, pattern recognition, and machine learning to discover a linear combination of features that typifies or divides two or more classes of objects or procedures. The Naive Bayes algorithm is a classification approach base on Bayes rule and a set of qualified independence theory. Naive Bayes classifiers are highly scalable, requiring a number of restraints linear in the number of variables (features/predictors) in a learning predicament. The main recompenses of using the generative models are generally a Generative Models make stronger assumptions about the data, specifically, about the distribution of predictors given the response variables. The Electronic instrument which is used for artificial odor sensing and classification is an electronic nose. This device is designed to imitate the anthropological sense of odor by providing an analysis of individual chemicals or chemical mixtures. The experimental results have been evaluated in the form of the performance measures i.e. are accuracy, precision and recall. The investigational results have proven that the overall performance of the Linear Discriminant Analysis was better in assessment to the Naive Bayes Classifier on cashew dataset.

Keywords: odor classification, generative models, naive bayes, linear discriminant analysis

Procedia PDF Downloads 357
441 [Keynote Talk]: sEMG Interface Design for Locomotion Identification

Authors: Rohit Gupta, Ravinder Agarwal

Abstract:

Surface electromyographic (sEMG) signal has the potential to identify the human activities and intention. This potential is further exploited to control the artificial limbs using the sEMG signal from residual limbs of amputees. The paper deals with the development of multichannel cost efficient sEMG signal interface for research application, along with evaluation of proposed class dependent statistical approach of the feature selection method. The sEMG signal acquisition interface was developed using ADS1298 of Texas Instruments, which is a front-end interface integrated circuit for ECG application. Further, the sEMG signal is recorded from two lower limb muscles for three locomotions namely: Plane Walk (PW), Stair Ascending (SA), Stair Descending (SD). A class dependent statistical approach is proposed for feature selection and also its performance is compared with 12 preexisting feature vectors. To make the study more extensive, performance of five different types of classifiers are compared. The outcome of the current piece of work proves the suitability of the proposed feature selection algorithm for locomotion recognition, as compared to other existing feature vectors. The SVM Classifier is found as the outperformed classifier among compared classifiers with an average recognition accuracy of 97.40%. Feature vector selection emerges as the most dominant factor affecting the classification performance as it holds 51.51% of the total variance in classification accuracy. The results demonstrate the potentials of the developed sEMG signal acquisition interface along with the proposed feature selection algorithm.

Keywords: classifiers, feature selection, locomotion, sEMG

Procedia PDF Downloads 269
440 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier

Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh

Abstract:

This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.

Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems

Procedia PDF Downloads 12
439 Identification of Damage Mechanisms in Interlock Reinforced Composites Using a Pattern Recognition Approach of Acoustic Emission Data

Authors: M. Kharrat, G. Moreau, Z. Aboura

Abstract:

The latest advances in the weaving industry, combined with increasingly sophisticated means of materials processing, have made it possible to produce complex 3D composite structures. Mainly used in aeronautics, composite materials with 3D architecture offer better mechanical properties than 2D reinforced composites. Nevertheless, these materials require a good understanding of their behavior. Because of the complexity of such materials, the damage mechanisms are multiple, and the scenario of their appearance and evolution depends on the nature of the exerted solicitations. The AE technique is a well-established tool for discriminating between the damage mechanisms. Suitable sensors are used during the mechanical test to monitor the structural health of the material. Relevant AE-features are then extracted from the recorded signals, followed by a data analysis using pattern recognition techniques. In order to better understand the damage scenarios of interlock composite materials, a multi-instrumentation was set-up in this work for tracking damage initiation and development, especially in the vicinity of the first significant damage, called macro-damage. The deployed instrumentation includes video-microscopy, Digital Image Correlation, Acoustic Emission (AE) and micro-tomography. In this study, a multi-variable AE data analysis approach was developed for the discrimination between the different signal classes representing the different emission sources during testing. An unsupervised classification technique was adopted to perform AE data clustering without a priori knowledge. The multi-instrumentation and the clustered data served to label the different signal families and to build a learning database. This latter is useful to construct a supervised classifier that can be used for automatic recognition of the AE signals. Several materials with different ingredients were tested under various solicitations in order to feed and enrich the learning database. The methodology presented in this work was useful to refine the damage threshold for the new generation materials. The damage mechanisms around this threshold were highlighted. The obtained signal classes were assigned to the different mechanisms. The isolation of a 'noise' class makes it possible to discriminate between the signals emitted by damages without resorting to spatial filtering or increasing the AE detection threshold. The approach was validated on different material configurations. For the same material and the same type of solicitation, the identified classes are reproducible and little disturbed. The supervised classifier constructed based on the learning database was able to predict the labels of the classified signals.

Keywords: acoustic emission, classifier, damage mechanisms, first damage threshold, interlock composite materials, pattern recognition

Procedia PDF Downloads 135
438 Milk Yield and Fingerprinting of Beta-Casein Precursor (CSN2) Gene in Some Saudi Camel Breeds

Authors: Amr A. El Hanafy, Yasser M. Saad, Saleh A. Alkarim, Hussein A. Almehdar, Elrashdy M. Redwan

Abstract:

Camels are substantial providers of transport, milk, sport, meat, shelter, fuel, security and capital in many countries, particularly Saudi Arabia. Identification of animal breeds has progressed rapidly during the last decade. Advanced molecular techniques are playing a significant role in breeding or strain protection laws. On the other hand, fingerprinting of some molecular markers related to some productive traits in farm animals represents most important studies to our knowledge, which aim to conserve these local genetic resources, and to the genetic improvement of such local breeds by selective programs depending on gene markers. Milk records were taken two days in each week from female camels of Majahem, Safara, Wathaha, and Hamara breeds, respectively from different private farms in northern Jeddah, Riyadh and Alwagh governorates and average weekly yields were calculated. DNA sequencing for CSN2 gene was used for evaluating the genetic variations and calculating the genetic distance values among four Saudi camel populations which are Hamra(R), Safra(Y), Wadha(W) and Majaheim(M). In addition, this marker was analyzed for reconstructing the Neighbor joining tree among evaluating camel breeds. In respect to milk yield during winter season, result indicated that average weekly milk yield of Safara camel breed (30.05 Kg/week) is significantly (p < 0.05) lower than the other 3 breeds which ranged from 39.68 for Hamara to 42.42 Kg/week for Majahem, while there are not significant differences between these three breeds. The Neighbor Joining analysis that re-constructed based on DNA variations showed that samples are clustered into two unique clades. The first clade includes Y (from Y4 to Y18) and M (from M1, to M9). On the other hand, the second cluster is including all R (from R1 to R6) and W (from W1 to W6). The genetic distance values were equal 0.0068 (between the groups M&Y and R&W) and equal 0 (within each group).

Keywords: milk yield, beta-casein precursor (CSN2), Saudi camel, molecular markers

Procedia PDF Downloads 193
437 Assessing India’s Foreign Policy Towards Afghanistan

Authors: Saifurahman Fayiz

Abstract:

Afghanistan and India have close technical, political, economic, and diplomatic bilateral ties. The ties is not limited between the governments of the two countries, but their relationship are among the peoples. India is the best regional trustworthy partner and biggest donor for the development of Afghanistan. The objectives of this study to assess India’s foreign policy towards Afghanistan since 9\11. The research method conducted based on qualitative research method with descriptive. The research findings propose that; India should deal with and build up its strategy relations with neighbor countries.

Keywords: strategy, policy, India, Afghanistan

Procedia PDF Downloads 306
436 Automatic Detection of Traffic Stop Locations Using GPS Data

Authors: Areej Salaymeh, Loren Schwiebert, Stephen Remias, Jonathan Waddell

Abstract:

Extracting information from new data sources has emerged as a crucial task in many traffic planning processes, such as identifying traffic patterns, route planning, traffic forecasting, and locating infrastructure improvements. Given the advanced technologies used to collect Global Positioning System (GPS) data from dedicated GPS devices, GPS equipped phones, and navigation tools, intelligent data analysis methodologies are necessary to mine this raw data. In this research, an automatic detection framework is proposed to help identify and classify the locations of stopped GPS waypoints into two main categories: signalized intersections or highway congestion. The Delaunay triangulation is used to perform this assessment in the clustering phase. While most of the existing clustering algorithms need assumptions about the data distribution, the effectiveness of the Delaunay triangulation relies on triangulating geographical data points without such assumptions. Our proposed method starts by cleaning noise from the data and normalizing it. Next, the framework will identify stoppage points by calculating the traveled distance. The last step is to use clustering to form groups of waypoints for signalized traffic and highway congestion. Next, a binary classifier was applied to find distinguish highway congestion from signalized stop points. The binary classifier uses the length of the cluster to find congestion. The proposed framework shows high accuracy for identifying the stop positions and congestion points in around 99.2% of trials. We show that it is possible, using limited GPS data, to distinguish with high accuracy.

Keywords: Delaunay triangulation, clustering, intelligent transportation systems, GPS data

Procedia PDF Downloads 253
435 Neighbor Caring Environment System (NCE) Using Parallel Replication Mechanism

Authors: Ahmad Shukri Mohd Noor, Emma Ahmad Sirajudin, Rabiei Mamat

Abstract:

Pertaining to a particular Marine interest, the process of data sampling could take years before a study can be concluded. Therefore, the need for a robust backup system for the data is invariably implicit. In recent advancement of Marine applications, more functionalities and tools are integrated to assist the work of the researchers. It is anticipated that this modality will continue as research scope widens and intensifies and at the same to follow suit with current technologies and lifestyles. The convenience to collect and share information these days also applies to the work in Marine research. Therefore, Marine system designers should be aware that high availability is a necessary attribute in Marine repository applications as well as a robust backup system for the data. In this paper, the approach to high availability is related both to hardware and software but the focus is more on software. We consider a NABTIC repository system that is primitively built on a single server and does not have replicated components. First, the system is decomposed into separate modules. The modules are placed on multiple servers to create a distributed system. Redundancy is added by placing the copies of the modules on different servers using Neighbor Caring Environment System(NCES) technique. NCER is utilizing parallel replication components mechanism. A background monitoring is established to check servers’ heartbeats to confirm their aliveness. At the same time, a critical adaptive threshold is maintained to make sure a failure is timely detected using Adaptive Fault Detection (AFD). A confirmed failure will set the recovery mode where a selection process will be done before a fail-over server is instructed. In effect, the Marine repository service is continued as the fail-over masks a recent failure. The performance of the new prototype is tested and is confirmed to be more highly available. Furthermore, the downtime is not noticeable as service is immediately restored automatically. The Marine repository system is said to have achieved fault tolerance.

Keywords: availability, fault detection, replication, fault tolerance, marine application

Procedia PDF Downloads 294
434 Modeling Engagement with Multimodal Multisensor Data: The Continuous Performance Test as an Objective Tool to Track Flow

Authors: Mohammad H. Taheri, David J. Brown, Nasser Sherkat

Abstract:

Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.

Keywords: affective computing in education, affect detection, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, student engagement

Procedia PDF Downloads 72
433 India’s Strategy toward Afghanistan since 9\11

Authors: Saifurahman Fayiz

Abstract:

overall, India had friendly relation with different governments in Afghanistan except for the Taliban regime amongst the years 1996 to 2001. The terrorist attack in the United States provided India a chance to follow its strategy in Afghanistan. India support Afghanistan since 9\11. The objectives of this study to study India’s strategy towards Afghanistan and its implication to neighbor countries. The research method conducted based on qualitative research method with descriptive. The research findings propose that; India has chosen a soft power policy to implement its strategy in Afghanistan.

Keywords: strategy, policy, soft power, Afghanistan

Procedia PDF Downloads 232
432 Radar on Bike: Coarse Classification based on Multi-Level Clustering for Cyclist Safety Enhancement

Authors: Asma Omri, Noureddine Benothman, Sofiane Sayahi, Fethi Tlili, Hichem Besbes

Abstract:

Cycling, a popular mode of transportation, can also be perilous due to cyclists' vulnerability to collisions with vehicles and obstacles. This paper presents an innovative cyclist safety system based on radar technology designed to offer real-time collision risk warnings to cyclists. The system incorporates a low-power radar sensor affixed to the bicycle and connected to a microcontroller. It leverages radar point cloud detections, a clustering algorithm, and a supervised classifier. These algorithms are optimized for efficiency to run on the TI’s AWR 1843 BOOST radar, utilizing a coarse classification approach distinguishing between cars, trucks, two-wheeled vehicles, and other objects. To enhance the performance of clustering techniques, we propose a 2-Level clustering approach. This approach builds on the state-of-the-art Density-based spatial clustering of applications with noise (DBSCAN). The objective is to first cluster objects based on their velocity, then refine the analysis by clustering based on position. The initial level identifies groups of objects with similar velocities and movement patterns. The subsequent level refines the analysis by considering the spatial distribution of these objects. The clusters obtained from the first level serve as input for the second level of clustering. Our proposed technique surpasses the classical DBSCAN algorithm in terms of geometrical metrics, including homogeneity, completeness, and V-score. Relevant cluster features are extracted and utilized to classify objects using an SVM classifier. Potential obstacles are identified based on their velocity and proximity to the cyclist. To optimize the system, we used the View of Delft dataset for hyperparameter selection and SVM classifier training. The system's performance was assessed using our collected dataset of radar point clouds synchronized with a camera on an Nvidia Jetson Nano board. The radar-based cyclist safety system is a practical solution that can be easily installed on any bicycle and connected to smartphones or other devices, offering real-time feedback and navigation assistance to cyclists. We conducted experiments to validate the system's feasibility, achieving an impressive 85% accuracy in the classification task. This system has the potential to significantly reduce the number of accidents involving cyclists and enhance their safety on the road.

Keywords: 2-level clustering, coarse classification, cyclist safety, warning system based on radar technology

Procedia PDF Downloads 55
431 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models

Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti

Abstract:

In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.

Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics

Procedia PDF Downloads 24
430 An Ensemble System of Classifiers for Computer-Aided Volcano Monitoring

Authors: Flavio Cannavo

Abstract:

Continuous evaluation of the status of potentially hazardous volcanos plays a key role for civil protection purposes. The importance of monitoring volcanic activity, especially for energetic paroxysms that usually come with tephra emissions, is crucial not only for exposures to the local population but also for airline traffic. Presently, real-time surveillance of most volcanoes worldwide is essentially delegated to one or more human experts in volcanology, who interpret data coming from different kind of monitoring networks. Unfavorably, the high nonlinearity of the complex and coupled volcanic dynamics leads to a large variety of different volcanic behaviors. Moreover, continuously measured parameters (e.g. seismic, deformation, infrasonic and geochemical signals) are often not able to fully explain the ongoing phenomenon, thus making the fast volcano state assessment a very puzzling task for the personnel on duty at the control rooms. With the aim of aiding the personnel on duty in volcano surveillance, here we introduce a system based on an ensemble of data-driven classifiers to infer automatically the ongoing volcano status from all the available different kind of measurements. The system consists of a heterogeneous set of independent classifiers, each one built with its own data and algorithm. Each classifier gives an output about the volcanic status. The ensemble technique allows weighting the single classifier output to combine all the classifications into a single status that maximizes the performance. We tested the model on the Mt. Etna (Italy) case study by considering a long record of multivariate data from 2011 to 2015 and cross-validated it. Results indicate that the proposed model is effective and of great power for decision-making purposes.

Keywords: Bayesian networks, expert system, mount Etna, volcano monitoring

Procedia PDF Downloads 221
429 Comparison of the Effectiveness of Tree Algorithms in Classification of Spongy Tissue Texture

Authors: Roza Dzierzak, Waldemar Wojcik, Piotr Kacejko

Abstract:

Analysis of the texture of medical images consists of determining the parameters and characteristics of the examined tissue. The main goal is to assign the analyzed area to one of two basic groups: as a healthy tissue or a tissue with pathological changes. The CT images of the thoracic lumbar spine from 15 healthy patients and 15 with confirmed osteoporosis were used for the analysis. As a result, 120 samples with dimensions of 50x50 pixels were obtained. The set of features has been obtained based on the histogram, gradient, run-length matrix, co-occurrence matrix, autoregressive model, and Haar wavelet. As a result of the image analysis, 290 descriptors of textural features were obtained. The dimension of the space of features was reduced by the use of three selection methods: Fisher coefficient (FC), mutual information (MI), minimization of the classification error probability and average correlation coefficients between the chosen features minimization of classification error probability (POE) and average correlation coefficients (ACC). Each of them returned ten features occupying the initial place in the ranking devised according to its own coefficient. As a result of the Fisher coefficient and mutual information selections, the same features arranged in a different order were obtained. In both rankings, the 50% percentile (Perc.50%) was found in the first place. The next selected features come from the co-occurrence matrix. The sets of features selected in the selection process were evaluated using six classification tree methods. These were: decision stump (DS), Hoeffding tree (HT), logistic model trees (LMT), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). In order to assess the accuracy of classifiers, the following parameters were used: overall classification accuracy (ACC), true positive rate (TPR, classification sensitivity), true negative rate (TNR, classification specificity), positive predictive value (PPV) and negative predictive value (NPV). Taking into account the classification results, it should be stated that the best results were obtained for the Hoeffding tree and logistic model trees classifiers, using the set of features selected by the POE + ACC method. In the case of the Hoeffding tree classifier, the highest values of three parameters were obtained: ACC = 90%, TPR = 93.3% and PPV = 93.3%. Additionally, the values of the other two parameters, i.e., TNR = 86.7% and NPV = 86.6% were close to the maximum values obtained for the LMT classifier. In the case of logistic model trees classifier, the same ACC value was obtained ACC=90% and the highest values for TNR=88.3% and NPV= 88.3%. The values of the other two parameters remained at a level close to the highest TPR = 91.7% and PPV = 91.6%. The results obtained in the experiment show that the use of classification trees is an effective method of classification of texture features. This allows identifying the conditions of the spongy tissue for healthy cases and those with the porosis.

Keywords: classification, feature selection, texture analysis, tree algorithms

Procedia PDF Downloads 147
428 Strategies for Synchronizing Chocolate Conching Data Using Dynamic Time Warping

Authors: Fernanda A. P. Peres, Thiago N. Peres, Flavio S. Fogliatto, Michel J. Anzanello

Abstract:

Batch processes are widely used in food industry and have an important role in the production of high added value products, such as chocolate. Process performance is usually described by variables that are monitored as the batch progresses. Data arising from these processes are likely to display a strong correlation-autocorrelation structure, and are usually monitored using control charts based on multiway principal components analysis (MPCA). Process control of a new batch is carried out comparing the trajectories of its relevant process variables with those in a reference set of batches that yielded products within specifications; it is clear that proper determination of the reference set is key for the success of a correct signalization of non-conforming batches in such quality control schemes. In chocolate manufacturing, misclassifications of non-conforming batches in the conching phase may lead to significant financial losses. In such context, the accuracy of process control grows in relevance. In addition to that, the main assumption in MPCA-based monitoring strategies is that all batches are synchronized in duration, both the new batch being monitored and those in the reference set. Such assumption is often not satisfied in chocolate manufacturing process. As a consequence, traditional techniques as MPCA-based charts are not suitable for process control and monitoring. To address that issue, the objective of this work is to compare the performance of three dynamic time warping (DTW) methods in the alignment and synchronization of chocolate conching process variables’ trajectories, aimed at properly determining the reference distribution for multivariate statistical process control. The power of classification of batches in two categories (conforming and non-conforming) was evaluated using the k-nearest neighbor (KNN) algorithm. Real data from a milk chocolate conching process was collected and the following variables were monitored over time: frequency of soybean lecithin dosage, rotation speed of the shovels, current of the main motor of the conche, and chocolate temperature. A set of 62 batches with durations between 495 and 1,170 minutes was considered; 53% of the batches were known to be conforming based on lab test results and experts’ evaluations. Results showed that all three DTW methods tested were able to align and synchronize the conching dataset. However, synchronized datasets obtained from these methods performed differently when inputted in the KNN classification algorithm. Kassidas, MacGregor and Taylor’s (named KMT) method was deemed the best DTW method for aligning and synchronizing a milk chocolate conching dataset, presenting 93.7% accuracy, 97.2% sensitivity and 90.3% specificity in batch classification, being considered the best option to determine the reference set for the milk chocolate dataset. Such method was recommended due to the lowest number of iterations required to achieve convergence and highest average accuracy in the testing portion using the KNN classification technique.

Keywords: batch process monitoring, chocolate conching, dynamic time warping, reference set distribution, variable duration

Procedia PDF Downloads 144
427 Assessing the Utility of Unmanned Aerial Vehicle-Borne Hyperspectral Image and Photogrammetry Derived 3D Data for Wetland Species Distribution Quick Mapping

Authors: Qiaosi Li, Frankie Kwan Kit Wong, Tung Fung

Abstract:

Lightweight unmanned aerial vehicle (UAV) loading with novel sensors offers a low cost approach for data acquisition in complex environment. This study established a framework for applying UAV system in complex environment quick mapping and assessed the performance of UAV-based hyperspectral image and digital surface model (DSM) derived from photogrammetric point clouds for 13 species classification in wetland area Mai Po Inner Deep Bay Ramsar Site, Hong Kong. The study area was part of shallow bay with flat terrain and the major species including reedbed and four mangroves: Kandelia obovata, Aegiceras corniculatum, Acrostichum auerum and Acanthus ilicifolius. Other species involved in various graminaceous plants, tarbor, shrub and invasive species Mikania micrantha. In particular, invasive species climbed up to the mangrove canopy caused damage and morphology change which might increase species distinguishing difficulty. Hyperspectral images were acquired by Headwall Nano sensor with spectral range from 400nm to 1000nm and 0.06m spatial resolution image. A sequence of multi-view RGB images was captured with 0.02m spatial resolution and 75% overlap. Hyperspectral image was corrected for radiative and geometric distortion while high resolution RGB images were matched to generate maximum dense point clouds. Furtherly, a 5 cm grid digital surface model (DSM) was derived from dense point clouds. Multiple feature reduction methods were compared to identify the efficient method and to explore the significant spectral bands in distinguishing different species. Examined methods including stepwise discriminant analysis (DA), support vector machine (SVM) and minimum noise fraction (MNF) transformation. Subsequently, spectral subsets composed of the first 20 most importance bands extracted by SVM, DA and MNF, and multi-source subsets adding extra DSM to 20 spectrum bands were served as input in maximum likelihood classifier (MLC) and SVM classifier to compare the classification result. Classification results showed that feature reduction methods from best to worst are MNF transformation, DA and SVM. MNF transformation accuracy was even higher than all bands input result. Selected bands frequently laid along the green peak, red edge and near infrared. Additionally, DA found that chlorophyll absorption red band and yellow band were also important for species classification. In terms of 3D data, DSM enhanced the discriminant capacity among low plants, arbor and mangrove. Meanwhile, DSM largely reduced misclassification due to the shadow effect and morphological variation of inter-species. In respect to classifier, nonparametric SVM outperformed than MLC for high dimension and multi-source data in this study. SVM classifier tended to produce higher overall accuracy and reduce scattered patches although it costs more time than MLC. The best result was obtained by combining MNF components and DSM in SVM classifier. This study offered a precision species distribution survey solution for inaccessible wetland area with low cost of time and labour. In addition, findings relevant to the positive effect of DSM as well as spectral feature identification indicated that the utility of UAV-borne hyperspectral and photogrammetry deriving 3D data is promising in further research on wetland species such as bio-parameters modelling and biological invasion monitoring.

Keywords: digital surface model (DSM), feature reduction, hyperspectral, photogrammetric point cloud, species mapping, unmanned aerial vehicle (UAV)

Procedia PDF Downloads 232