Search results for: multiwavelenght dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1165

Search results for: multiwavelenght dataset

415 Burnout Recognition for Call Center Agents by Using Skin Color Detection with Hand Poses

Authors: El Sayed A. Sharara, A. Tsuji, K. Terada

Abstract:

Call centers have been expanding and they have influence on activation in various markets increasingly. A call center’s work is known as one of the most demanding and stressful jobs. In this paper, we propose the fatigue detection system in order to detect burnout of call center agents in the case of a neck pain and upper back pain. Our proposed system is based on the computer vision technique combined skin color detection with the Viola-Jones object detector. To recognize the gesture of hand poses caused by stress sign, the YCbCr color space is used to detect the skin color region including face and hand poses around the area related to neck ache and upper back pain. A cascade of clarifiers by Viola-Jones is used for face recognition to extract from the skin color region. The detection of hand poses is given by the evaluation of neck pain and upper back pain by using skin color detection and face recognition method. The system performance is evaluated using two groups of dataset created in the laboratory to simulate call center environment. Our call center agent burnout detection system has been implemented by using a web camera and has been processed by MATLAB. From the experimental results, our system achieved 96.3% for upper back pain detection and 94.2% for neck pain detection.

Keywords: call center agents, fatigue, skin color detection, face recognition

Procedia PDF Downloads 294
414 Quality Parameters of Offset Printing Wastewater

Authors: Kiurski S. Jelena, Kecić S. Vesna, Aksentijević M. Snežana

Abstract:

Samples of tap and wastewater were collected in three offset printing facilities in Novi Sad, Serbia. Ten physicochemical parameters were analyzed within all collected samples: pH, conductivity, m - alkalinity, p - alkalinity, acidity, carbonate concentration, hydrogen carbonate concentration, active oxygen content, chloride concentration and total alkali content. All measurements were conducted using the standard analytical and instrumental methods. Comparing the obtained results for tap water and wastewater, a clear quality difference was noticeable, since all physicochemical parameters were significantly higher within wastewater samples. The study also involves the application of simple linear regression analysis on the obtained dataset. By using software package ORIGIN 5 the pH value was mutually correlated with other physicochemical parameters. Based on the obtained values of Pearson coefficient of determination a strong positive correlation between chloride concentration and pH (r = -0.943), as well as between acidity and pH (r = -0.855) was determined. In addition, statistically significant difference was obtained only between acidity and chloride concentration with pH values, since the values of parameter F (247.634 and 182.536) were higher than Fcritical (5.59). In this way, results of statistical analysis highlighted the most influential parameter of water contamination in offset printing, in the form of acidity and chloride concentration. The results showed that variable dependence could be represented by the general regression model: y = a0 + a1x+ k, which further resulted with matching graphic regressions.

Keywords: pollution, printing industry, simple linear regression analysis, wastewater

Procedia PDF Downloads 235
413 A Bibliometric Analysis on Filter Bubble

Authors: Misbah Fatma, Anam Saiyeda

Abstract:

This analysis charts the introduction and expansion of research into the filter bubble phenomena over the last 10 years using a large dataset of academic publications. This bibliometric study demonstrates how interdisciplinary filter bubble research is. The identification of key authors and organizations leading the filter bubble study sheds information on collaborative networks and knowledge transfer. Relevant papers are organized based on themes including algorithmic bias, polarisation, social media, and ethical implications through a systematic examination of the literature. In order to shed light on how these patterns have changed over time, the study plots their historical history. The study also looks at how research is distributed globally, showing geographic patterns and discrepancies in scholarly output. The results of this bibliometric analysis let us fully comprehend the development and reach of filter bubble research. This study offers insights into the ongoing discussion surrounding information personalization and its implications for societal discourse, democratic participation, and the potential risks to an informed citizenry by exposing dominant themes, interdisciplinary collaborations, and geographic patterns. In order to solve the problems caused by filter bubbles and to advance a more diverse and inclusive information environment, this analysis is essential for scholars and researchers.

Keywords: bibliometric analysis, social media, social networking, algorithmic personalization, self-selection, content moderation policies and limited access to information, recommender system and polarization

Procedia PDF Downloads 118
412 Melanoma and Non-Melanoma, Skin Lesion Classification, Using a Deep Learning Model

Authors: Shaira L. Kee, Michael Aaron G. Sy, Myles Joshua T. Tan, Hezerul Abdul Karim, Nouar AlDahoul

Abstract:

Skin diseases are considered the fourth most common disease, with melanoma and non-melanoma skin cancer as the most common type of cancer in Caucasians. The alarming increase in Skin Cancer cases shows an urgent need for further research to improve diagnostic methods, as early diagnosis can significantly improve the 5-year survival rate. Machine Learning algorithms for image pattern analysis in diagnosing skin lesions can dramatically increase the accuracy rate of detection and decrease possible human errors. Several studies have shown the diagnostic performance of computer algorithms outperformed dermatologists. However, existing methods still need improvements to reduce diagnostic errors and generate efficient and accurate results. Our paper proposes an ensemble method to classify dermoscopic images into benign and malignant skin lesions. The experiments were conducted using the International Skin Imaging Collaboration (ISIC) image samples. The dataset contains 3,297 dermoscopic images with benign and malignant categories. The results show improvement in performance with an accuracy of 88% and an F1 score of 87%, outperforming other existing models such as support vector machine (SVM), Residual network (ResNet50), EfficientNetB0, EfficientNetB4, and VGG16.

Keywords: deep learning - VGG16 - efficientNet - CNN – ensemble – dermoscopic images - melanoma

Procedia PDF Downloads 81
411 When Sex Matters: A Comparative Generalized Structural Equation Model (GSEM) for the Determinants of Stunting Amongst Under-fives in Uganda

Authors: Vallence Ngabo M., Leonard Atuhaire, Peter Clever Rutayisire

Abstract:

The main aim of this study was to establish the differences in both the determinants of stunting and the causal mechanism through which the identified determinants influence stunting amongst male and female under-fives in Uganda. Literature shows that male children below the age of five years are at a higher risk of being stunted than their female counterparts. Specifically, studies in Uganda indicate that being a male child is positively associated with stunting, while being a female is negatively associated with stunting. Data for 904 males and 829 females under-fives was extracted form UDHS-2016 survey dataset. Key variables for this study were identified and used in generating relevant models and paths. Structural equation modeling techniques were used in their generalized form (GSEM). The generalized nature necessitated specifying both the family and link functions for each response variable in the system of the model. The sex of the child (b4) was used as a grouping factor and the height for age (HAZ) scores were used to construct the status for stunting of under-fives. The estimated models and path clearly indicated that the set of underlying factors that influence male and female under-fives respectively was different and the path through which they influence stunting was different. However, some of the determinants that influenced stunting amongst male under-fives also influenced stunting amongst the female under-fives. To reduce the stunting problem to the desirable state, it is important to consider the multifaceted and complex nature of the risk factors that influence stunting amongst the under-fives but, more importantly, consider the different sex-specific factors and their causal mechanism or paths through which they influence stunting.

Keywords: stunting, underfives, sex of the child, GSEM, causal mechanism

Procedia PDF Downloads 140
410 Unattended Crowdsensing Method to Monitor the Quality Condition of Dirt Roads

Authors: Matias Micheletto, Rodrigo Santos, Sergio F. Ochoa

Abstract:

In developing countries, the most roads in rural areas are dirt road. They require frequent maintenance since are affected by erosive events, such as rain or wind, and the transit of heavy-weight trucks and machinery. Early detection of damages on the road condition is a key aspect, since it allows to reduce the main-tenance time and cost, and also the limitations for other vehicles to travel through. Most proposals that help address this problem require the explicit participation of drivers, a permanent internet connection, or important instrumentation in vehicles or roads. These constraints limit the suitability of these proposals when applied into developing regions, like in Latin America. This paper proposes an alternative method, based on unattended crowdsensing, to determine the quality of dirt roads in rural areas. This method involves the use of a mobile application that complements the road condition surveys carried out by organizations in charge of the road network maintenance, giving them early warnings about road areas that could be requiring maintenance. Drivers can also take advantage of the early warnings while they move through these roads. The method was evaluated using information from a public dataset. Although they are preliminary, the results indicate the proposal is potentially suitable to provide awareness about dirt roads condition to drivers, transportation authority and road maintenance companies.

Keywords: dirt roads automatic quality assessment, collaborative system, unattended crowdsensing method, roads quality awareness provision

Procedia PDF Downloads 199
409 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity

Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Saifur Rahman Sabuj

Abstract:

This paper examines relationships between solar activity and earthquakes; it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to affect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth.

Keywords: k-nearest neighbour, support vector regression, random forest regression, long short-term memory network, earthquakes, solar activity, sunspot number, solar wind, solar flares

Procedia PDF Downloads 73
408 Defect Classification of Hydrogen Fuel Pressure Vessels using Deep Learning

Authors: Dongju Kim, Youngjoo Suh, Hyojin Kim, Gyeongyeong Kim

Abstract:

Acoustic Emission Testing (AET) is widely used to test the structural integrity of an operational hydrogen storage container, and clustering algorithms are frequently used in pattern recognition methods to interpret AET results. However, the interpretation of AET results can vary from user to user as the tuning of the relevant parameters relies on the user's experience and knowledge of AET. Therefore, it is necessary to use a deep learning model to identify patterns in acoustic emission (AE) signal data that can be used to classify defects instead. In this paper, a deep learning-based model for classifying the types of defects in hydrogen storage tanks, using AE sensor waveforms, is proposed. As hydrogen storage tanks are commonly constructed using carbon fiber reinforced polymer composite (CFRP), a defect classification dataset is collected through a tensile test on a specimen of CFRP with an AE sensor attached. The performance of the classification model, using one-dimensional convolutional neural network (1-D CNN) and synthetic minority oversampling technique (SMOTE) data augmentation, achieved 91.09% accuracy for each defect. It is expected that the deep learning classification model in this paper, used with AET, will help in evaluating the operational safety of hydrogen storage containers.

Keywords: acoustic emission testing, carbon fiber reinforced polymer composite, one-dimensional convolutional neural network, smote data augmentation

Procedia PDF Downloads 93
407 Multimodal Deep Learning for Human Activity Recognition

Authors: Ons Slimene, Aroua Taamallah, Maha Khemaja

Abstract:

In recent years, human activity recognition (HAR) has been a key area of research due to its diverse applications. It has garnered increasing attention in the field of computer vision. HAR plays an important role in people’s daily lives as it has the ability to learn advanced knowledge about human activities from data. In HAR, activities are usually represented by exploiting different types of sensors, such as embedded sensors or visual sensors. However, these sensors have limitations, such as local obstacles, image-related obstacles, sensor unreliability, and consumer concerns. Recently, several deep learning-based approaches have been proposed for HAR and these approaches are classified into two categories based on the type of data used: vision-based approaches and sensor-based approaches. This research paper highlights the importance of multimodal data fusion from skeleton data obtained from videos and data generated by embedded sensors using deep neural networks for achieving HAR. We propose a deep multimodal fusion network based on a twostream architecture. These two streams use the Convolutional Neural Network combined with the Bidirectional LSTM (CNN BILSTM) to process skeleton data and data generated by embedded sensors and the fusion at the feature level is considered. The proposed model was evaluated on a public OPPORTUNITY++ dataset and produced a accuracy of 96.77%.

Keywords: human activity recognition, action recognition, sensors, vision, human-centric sensing, deep learning, context-awareness

Procedia PDF Downloads 101
406 Glaucoma Detection in Retinal Tomography Using the Vision Transformer

Authors: Sushish Baral, Pratibha Joshi, Yaman Maharjan

Abstract:

Glaucoma is a chronic eye condition that causes vision loss that is irreversible. Early detection and treatment are critical to prevent vision loss because it can be asymptomatic. For the identification of glaucoma, multiple deep learning algorithms are used. Transformer-based architectures, which use the self-attention mechanism to encode long-range dependencies and acquire extremely expressive representations, have recently become popular. Convolutional architectures, on the other hand, lack knowledge of long-range dependencies in the image due to their intrinsic inductive biases. The aforementioned statements inspire this thesis to look at transformer-based solutions and investigate the viability of adopting transformer-based network designs for glaucoma detection. Using retinal fundus images of the optic nerve head to develop a viable algorithm to assess the severity of glaucoma necessitates a large number of well-curated images. Initially, data is generated by augmenting ocular pictures. After that, the ocular images are pre-processed to make them ready for further processing. The system is trained using pre-processed images, and it classifies the input images as normal or glaucoma based on the features retrieved during training. The Vision Transformer (ViT) architecture is well suited to this situation, as it allows the self-attention mechanism to utilise structural modeling. Extensive experiments are run on the common dataset, and the results are thoroughly validated and visualized.

Keywords: glaucoma, vision transformer, convolutional architectures, retinal fundus images, self-attention, deep learning

Procedia PDF Downloads 191
405 Classification of Manufacturing Data for Efficient Processing on an Edge-Cloud Network

Authors: Onyedikachi Ulelu, Andrew P. Longstaff, Simon Fletcher, Simon Parkinson

Abstract:

The widespread interest in 'Industry 4.0' or 'digital manufacturing' has led to significant research requiring the acquisition of data from sensors, instruments, and machine signals. In-depth research then identifies methods of analysis of the massive amounts of data generated before and during manufacture to solve a particular problem. The ultimate goal is for industrial Internet of Things (IIoT) data to be processed automatically to assist with either visualisation or autonomous system decision-making. However, the collection and processing of data in an industrial environment come with a cost. Little research has been undertaken on how to specify optimally what data to capture, transmit, process, and store at various levels of an edge-cloud network. The first step in this specification is to categorise IIoT data for efficient and effective use. This paper proposes the required attributes and classification to take manufacturing digital data from various sources to determine the most suitable location for data processing on the edge-cloud network. The proposed classification framework will minimise overhead in terms of network bandwidth/cost and processing time of machine tool data via efficient decision making on which dataset should be processed at the ‘edge’ and what to send to a remote server (cloud). A fast-and-frugal heuristic method is implemented for this decision-making. The framework is tested using case studies from industrial machine tools for machine productivity and maintenance.

Keywords: data classification, decision making, edge computing, industrial IoT, industry 4.0

Procedia PDF Downloads 180
404 Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model

Authors: Youngjae Jin, Daeshik Kim

Abstract:

This paper describes a cycle accurate simulation results of weight values learned by an auto-encoder behavior model in terms of pre-route simulation. Given the results we visualized the first layer representations with natural images. Many common deep learning threads have focused on learning high-level abstraction of unlabeled raw data by unsupervised feature learning. However, in the process of handling such a huge amount of data, the learning method’s computation complexity and time limited advanced research. These limitations came from the fact these algorithms were computed by using only single core CPUs. For this reason, parallel-based hardware, FPGAs, was seen as a possible solution to overcome these limitations. We adopted and simulated the ready-made auto-encoder to design a behavior model in Verilog HDL before designing hardware. With the auto-encoder behavior model pre-route simulation, we obtained the cycle accurate results of the parameter of each hidden layer by using MODELSIM. The cycle accurate results are very important factor in designing a parallel-based digital hardware. Finally this paper shows an appropriate operation of behavior model based pre-route simulation. Moreover, we visualized learning latent representations of the first hidden layer with Kyoto natural image dataset.

Keywords: auto-encoder, behavior model simulation, digital hardware design, pre-route simulation, Unsupervised feature learning

Procedia PDF Downloads 446
403 A Real-Time Snore Detector Using Neural Networks and Selected Sound Features

Authors: Stelios A. Mitilineos, Nicolas-Alexander Tatlas, Georgia Korompili, Lampros Kokkalas, Stelios M. Potirakis

Abstract:

Obstructive Sleep Apnea Hypopnea Syndrome (OSAHS) is a widespread chronic disease that mostly remains undetected, mainly due to the fact that it is diagnosed via polysomnography which is a time and resource-intensive procedure. Screening the disease’s symptoms at home could be used as an alternative approach in order to alert individuals that potentially suffer from OSAHS without compromising their everyday routine. Since snoring is usually linked to OSAHS, developing a snore detector is appealing as an enabling technology for screening OSAHS at home using ubiquitous equipment like commodity microphones (included in, e.g., smartphones). In this context, this study developed a snore detection tool and herein present the approach and selection of specific sound features that discriminate snoring vs. environmental sounds, as well as the performance of the proposed tool. Furthermore, a Real-Time Snore Detector (RTSD) is built upon the snore detection tool and employed in whole-night sleep sound recordings resulting to a large dataset of snoring sound excerpts that are made freely available to the public. The RTSD may be used either as a stand-alone tool that offers insight to an individual’s sleep quality or as an independent component of OSAHS screening applications in future developments.

Keywords: obstructive sleep apnea hypopnea syndrome, apnea screening, snoring detection, machine learning, neural networks

Procedia PDF Downloads 207
402 Health Trajectory Clustering Using Deep Belief Networks

Authors: Farshid Hajati, Federico Girosi, Shima Ghassempour

Abstract:

We present a Deep Belief Network (DBN) method for clustering health trajectories. Deep Belief Network (DBN) is a deep architecture that consists of a stack of Restricted Boltzmann Machines (RBM). In a deep architecture, each layer learns more complex features than the past layers. The proposed method depends on DBN in clustering without using back propagation learning algorithm. The proposed DBN has a better a performance compared to the deep neural network due the initialization of the connecting weights. We use Contrastive Divergence (CD) method for training the RBMs which increases the performance of the network. The performance of the proposed method is evaluated extensively on the Health and Retirement Study (HRS) database. The University of Michigan Health and Retirement Study (HRS) is a nationally representative longitudinal study that has surveyed more than 27,000 elderly and near-elderly Americans since its inception in 1992. Participants are interviewed every two years and they collect data on physical and mental health, insurance coverage, financial status, family support systems, labor market status, and retirement planning. The dataset is publicly available and we use the RAND HRS version L, which is easy to use and cleaned up version of the data. The size of sample data set is 268 and the length of the trajectories is equal to 10. The trajectories do not stop when the patient dies and represent 10 different interviews of live patients. Compared to the state-of-the-art benchmarks, the experimental results show the effectiveness and superiority of the proposed method in clustering health trajectories.

Keywords: health trajectory, clustering, deep learning, DBN

Procedia PDF Downloads 369
401 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms

Authors: Neha Ahirwar

Abstract:

In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.

Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree

Procedia PDF Downloads 66
400 Deep Learning-Based Object Detection on Low Quality Images: A Case Study of Real-Time Traffic Monitoring

Authors: Jean-Francois Rajotte, Martin Sotir, Frank Gouineau

Abstract:

The installation and management of traffic monitoring devices can be costly from both a financial and resource point of view. It is therefore important to take advantage of in-place infrastructures to extract the most information. Here we show how low-quality urban road traffic images from cameras already available in many cities (such as Montreal, Vancouver, and Toronto) can be used to estimate traffic flow. To this end, we use a pre-trained neural network, developed for object detection, to count vehicles within images. We then compare the results with human annotations gathered through crowdsourcing campaigns. We use this comparison to assess performance and calibrate the neural network annotations. As a use case, we consider six months of continuous monitoring over hundreds of cameras installed in the city of Montreal. We compare the results with city-provided manual traffic counting performed in similar conditions at the same location. The good performance of our system allows us to consider applications which can monitor the traffic conditions in near real-time, making the counting usable for traffic-related services. Furthermore, the resulting annotations pave the way for building a historical vehicle counting dataset to be used for analysing the impact of road traffic on many city-related issues, such as urban planning, security, and pollution.

Keywords: traffic monitoring, deep learning, image annotation, vehicles, roads, artificial intelligence, real-time systems

Procedia PDF Downloads 200
399 Human Immunodeficiency Virus (HIV) Test Predictive Modeling and Identify Determinants of HIV Testing for People with Age above Fourteen Years in Ethiopia Using Data Mining Techniques: EDHS 2011

Authors: S. Abera, T. Gidey, W. Terefe

Abstract:

Introduction: Testing for HIV is the key entry point to HIV prevention, treatment, and care and support services. Hence, predictive data mining techniques can greatly benefit to analyze and discover new patterns from huge datasets like that of EDHS 2011 data. Objectives: The objective of this study is to build a predictive modeling for HIV testing and identify determinants of HIV testing for adults with age above fourteen years using data mining techniques. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes among adult Ethiopians. Decision tree, Naïve-Bayes, logistic regression and artificial neural networks of data mining techniques were used to build the predictive models. Results: The target dataset contained 30,625 study participants; of which 16, 515 (53.9%) were women. Nearly two-fifth; 17,719 (58%), have never been tested for HIV while the rest 12,906 (42%) had been tested. Ethiopians with higher wealth index, higher educational level, belonging 20 to 29 years old, having no stigmatizing attitude towards HIV positive person, urban residents, having HIV related knowledge, information about family planning on mass media and knowing a place where to get testing for HIV showed an increased patterns with respect to HIV testing. Conclusion and Recommendation: Public health interventions should consider the identified determinants to promote people to get testing for HIV.

Keywords: data mining, HIV, testing, ethiopia

Procedia PDF Downloads 496
398 Detection of Atrial Fibrillation Using Wearables via Attentional Two-Stream Heterogeneous Networks

Authors: Huawei Bai, Jianguo Yao, Fellow, IEEE

Abstract:

Atrial fibrillation (AF) is the most common form of heart arrhythmia and is closely associated with mortality and morbidity in heart failure, stroke, and coronary artery disease. The development of single spot optical sensors enables widespread photoplethysmography (PPG) screening, especially for AF, since it represents a more convenient and noninvasive approach. To our knowledge, most existing studies based on public and unbalanced datasets can barely handle the multiple noises sources in the real world and, also, lack interpretability. In this paper, we construct a large- scale PPG dataset using measurements collected from PPG wrist- watch devices worn by volunteers and propose an attention-based two-stream heterogeneous neural network (TSHNN). The first stream is a hybrid neural network consisting of a three-layer one-dimensional convolutional neural network (1D-CNN) and two-layer attention- based bidirectional long short-term memory (Bi-LSTM) network to learn representations from temporally sampled signals. The second stream extracts latent representations from the PPG time-frequency spectrogram using a five-layer CNN. The outputs from both streams are fed into a fusion layer for the outcome. Visualization of the attention weights learned demonstrates the effectiveness of the attention mechanism against noise. The experimental results show that the TSHNN outperforms all the competitive baseline approaches and with 98.09% accuracy, achieves state-of-the-art performance.

Keywords: PPG wearables, atrial fibrillation, feature fusion, attention mechanism, hyber network

Procedia PDF Downloads 121
397 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble

Procedia PDF Downloads 138
396 Evaluating Surface Water Quality Using WQI, Trend Analysis, and Cluster Classification in Kebir Rhumel Basin, Algeria

Authors: Lazhar Belkhiri, Ammar Tiri, Lotfi Mouni, Fatma Elhadj Lakouas

Abstract:

This study evaluates the surface water quality in the Kebir Rhumel Basin by analyzing hydrochemical parameters. To assess spatial and temporal variations in water quality, we applied the Water Quality Index (WQI), Mann-Kendall (MK) trend analysis, and hierarchical cluster analysis (HCA). Monthly measurements of eleven hydrochemical parameters were collected across eight stations from January 2016 to December 2020. Calcium and sulfate emerged as the dominant cation and anion, respectively. WQI analysis indicated a high incidence of poor water quality at stations Ain Smara (AS), Beni Haroune (BH), Grarem (GR), and Sidi Khalifa (SK), where 89.5%, 90.6%, 78.2%, and 62.7% of samples, respectively, fell into this category. The MK trend analysis revealed a significant upward trend in WQI at Oued Boumerzoug (ON) and SK stations, signaling temporal deterioration in these areas. HCA grouped the dataset into three clusters, covering approximately 22%, 30%, and 48% of the months, respectively. Within these clusters, specific stations exhibited elevated WQI values: GR and ON in the first cluster, OB and SK in the second, and AS, BH, El Milia (EM), and Hammam Grouz (HG) in the third. Furthermore, approximately 38%, 41%, and 38% of samples in clusters one, two, and three, respectively, were classified as having poor water quality. These findings provide essential insights for policymakers in formulating strategies to restore and manage surface water quality in the region.

Keywords: surface water quality, water quality index (WQI), Mann-Kendall Trend Analysis, hierarchical cluster analysis (HCA), spatial-temporal distribution, Kebir Rhumel Basin

Procedia PDF Downloads 16
395 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition

Procedia PDF Downloads 123
394 MIMIC: A Multi Input Micro-Influencers Classifier

Authors: Simone Leonardi, Luca Ardito

Abstract:

Micro-influencers are effective elements in the marketing strategies of companies and institutions because of their capability to create an hyper-engaged audience around a specific topic of interest. In recent years, many scientific approaches and commercial tools have handled the task of detecting this type of social media users. These strategies adopt solutions ranging from rule based machine learning models to deep neural networks and graph analysis on text, images, and account information. This work compares the existing solutions and proposes an ensemble method to generalize them with different input data and social media platforms. The deployed solution combines deep learning models on unstructured data with statistical machine learning models on structured data. We retrieve both social media accounts information and multimedia posts on Twitter and Instagram. These data are mapped into feature vectors for an eXtreme Gradient Boosting (XGBoost) classifier. Sixty different topics have been analyzed to build a rule based gold standard dataset and to compare the performances of our approach against baseline classifiers. We prove the effectiveness of our work by comparing the accuracy, precision, recall, and f1 score of our model with different configurations and architectures. We obtained an accuracy of 0.91 with our best performing model.

Keywords: deep learning, gradient boosting, image processing, micro-influencers, NLP, social media

Procedia PDF Downloads 183
393 Deep Vision: A Robust Dominant Colour Extraction Framework for T-Shirts Based on Semantic Segmentation

Authors: Kishore Kumar R., Kaustav Sengupta, Shalini Sood Sehgal, Poornima Santhanam

Abstract:

Fashion is a human expression that is constantly changing. One of the prime factors that consistently influences fashion is the change in colour preferences. The role of colour in our everyday lives is very significant. It subconsciously explains a lot about one’s mindset and mood. Analyzing the colours by extracting them from the outfit images is a critical study to examine the individual’s/consumer behaviour. Several research works have been carried out on extracting colours from images, but to the best of our knowledge, there were no studies that extract colours to specific apparel and identify colour patterns geographically. This paper proposes a framework for accurately extracting colours from T-shirt images and predicting dominant colours geographically. The proposed method consists of two stages: first, a U-Net deep learning model is adopted to segment the T-shirts from the images. Second, the colours are extracted only from the T-shirt segments. The proposed method employs the iMaterialist (Fashion) 2019 dataset for the semantic segmentation task. The proposed framework also includes a mechanism for gathering data and analyzing India’s general colour preferences. From this research, it was observed that black and grey are the dominant colour in different regions of India. The proposed method can be adapted to study fashion’s evolving colour preferences.

Keywords: colour analysis in t-shirts, convolutional neural network, encoder-decoder, k-means clustering, semantic segmentation, U-Net model

Procedia PDF Downloads 111
392 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 339
391 Enhanced Planar Pattern Tracking for an Outdoor Augmented Reality System

Authors: L. Yu, W. K. Li, S. K. Ong, A. Y. C. Nee

Abstract:

In this paper, a scalable augmented reality framework for handheld devices is presented. The presented framework is enabled by using a server-client data communication structure, in which the search for tracking targets among a database of images is performed on the server-side while pixel-wise 3D tracking is performed on the client-side, which, in this case, is a handheld mobile device. Image search on the server-side adopts a residual-enhanced image descriptors representation that gives the framework a scalability property. The tracking algorithm on the client-side is based on a gravity-aligned feature descriptor which takes the advantage of a sensor-equipped mobile device and an optimized intensity-based image alignment approach that ensures the accuracy of 3D tracking. Automatic content streaming is achieved by using a key-frame selection algorithm, client working phase monitoring and standardized rules for content communication between the server and client. The recognition accuracy test performed on a standard dataset shows that the method adopted in the presented framework outperforms the Bag-of-Words (BoW) method that has been used in some of the previous systems. Experimental test conducted on a set of video sequences indicated the real-time performance of the tracking system with a frame rate at 15-30 frames per second. The presented framework is exposed to be functional in practical situations with a demonstration application on a campus walk-around.

Keywords: augmented reality framework, server-client model, vision-based tracking, image search

Procedia PDF Downloads 275
390 Fast Approximate Bayesian Contextual Cold Start Learning (FAB-COST)

Authors: Jack R. McKenzie, Peter A. Appleby, Thomas House, Neil Walton

Abstract:

Cold-start is a notoriously difficult problem which can occur in recommendation systems, and arises when there is insufficient information to draw inferences for users or items. To address this challenge, a contextual bandit algorithm – the Fast Approximate Bayesian Contextual Cold Start Learning algorithm (FAB-COST) – is proposed, which is designed to provide improved accuracy compared to the traditionally used Laplace approximation in the logistic contextual bandit, while controlling both algorithmic complexity and computational cost. To this end, FAB-COST uses a combination of two moment projection variational methods: Expectation Propagation (EP), which performs well at the cold start, but becomes slow as the amount of data increases; and Assumed Density Filtering (ADF), which has slower growth of computational cost with data size but requires more data to obtain an acceptable level of accuracy. By switching from EP to ADF when the dataset becomes large, it is able to exploit their complementary strengths. The empirical justification for FAB-COST is presented, and systematically compared to other approaches on simulated data. In a benchmark against the Laplace approximation on real data consisting of over 670, 000 impressions from autotrader.co.uk, FAB-COST demonstrates at one point increase of over 16% in user clicks. On the basis of these results, it is argued that FAB-COST is likely to be an attractive approach to cold-start recommendation systems in a variety of contexts.

Keywords: cold-start learning, expectation propagation, multi-armed bandits, Thompson Sampling, variational inference

Procedia PDF Downloads 108
389 Determination of Direct Solar Radiation Using Atmospheric Physics Models

Authors: Pattra Pukdeekiat, Siriluk Ruangrungrote

Abstract:

This work was originated to precisely determine direct solar radiation by using atmospheric physics models since the accurate prediction of solar radiation is necessary and useful for solar energy applications including atmospheric research. The possible models and techniques for a calculation of regional direct solar radiation were challenging and compulsory for the case of unavailable instrumental measurement. The investigation was mathematically governed by six astronomical parameters i.e. declination (δ), hour angle (ω), solar time, solar zenith angle (θz), extraterrestrial radiation (Iso) and eccentricity (E0) along with two atmospheric parameters i.e. air mass (mr) and dew point temperature at Bangna meteorological station (13.67° N, 100.61° E) in Bangkok, Thailand. Analyses of five models of solar radiation determination with the assumption of clear sky were applied accompanied by three statistical tests: Mean Bias Difference (MBD), Root Mean Square Difference (RMSD) and Coefficient of determination (R2) in order to validate the accuracy of obtainable results. The calculated direct solar radiation was in a range of 491-505 Watt/m2 with relative percentage error 8.41% for winter and 532-540 Watt/m2 with relative percentage error 4.89% for summer 2014. Additionally, dataset of seven continuous days, representing both seasons were considered with the MBD, RMSD and R2 of -0.08, 0.25, 0.86 and -0.14, 0.35, 3.29, respectively, which belong to Kumar model for winter and CSR model for summer. In summary, the determination of direct solar radiation based on atmospheric models and empirical equations could advantageously provide immediate and reliable values of the solar components for any site in the region without a constraint of actual measurement.

Keywords: atmospheric physics models, astronomical parameters, atmospheric parameters, clear sky condition

Procedia PDF Downloads 409
388 A Convolutional Neural Network-Based Model for Lassa fever Virus Prediction Using Patient Blood Smear Image

Authors: A. M. John-Otumu, M. M. Rahman, M. C. Onuoha, E. P. Ojonugwa

Abstract:

A Convolutional Neural Network (CNN) model for predicting Lassa fever was built using Python 3.8.0 programming language, alongside Keras 2.2.4 and TensorFlow 2.6.1 libraries as the development environment in order to reduce the current high risk of Lassa fever in West Africa, particularly in Nigeria. The study was prompted by some major flaws in existing conventional laboratory equipment for diagnosing Lassa fever (RT-PCR), as well as flaws in AI-based techniques that have been used for probing and prognosis of Lassa fever based on literature. There were 15,679 blood smear microscopic image datasets collected in total. The proposed model was trained on 70% of the dataset and tested on 30% of the microscopic images in avoid overfitting. A 3x3x3 convolution filter was also used in the proposed system to extract features from microscopic images. The proposed CNN-based model had a recall value of 96%, a precision value of 93%, an F1 score of 95%, and an accuracy of 94% in predicting and accurately classifying the images into clean or infected samples. Based on empirical evidence from the results of the literature consulted, the proposed model outperformed other existing AI-based techniques evaluated. If properly deployed, the model will assist physicians, medical laboratory scientists, and patients in making accurate diagnoses for Lassa fever cases, allowing the mortality rate due to the Lassa fever virus to be reduced through sound decision-making.

Keywords: artificial intelligence, ANN, blood smear, CNN, deep learning, Lassa fever

Procedia PDF Downloads 120
387 Enhancing Rupture Pressure Prediction for Corroded Pipes Through Finite Element Optimization

Authors: Benkouiten Imene, Chabli Ouerdia, Boutoutaou Hamid, Kadri Nesrine, Bouledroua Omar

Abstract:

Algeria is actively enhancing gas productivity by augmenting the supply flow. However, this effort has led to increased internal pressure, posing a potential risk to the pipeline's integrity, particularly in the presence of corrosion defects. Sonatrach relies on a vast network of pipelines spanning 24,000 kilometers for the transportation of gas and oil. The aging of these pipelines raises the likelihood of corrosion both internally and externally, heightening the risk of ruptures. To address this issue, a comprehensive inspection is imperative, utilizing specialized scraping tools. These advanced tools furnish a detailed assessment of all pipeline defects. It is essential to recalculate the pressure parameters to safeguard the corroded pipeline's integrity while ensuring the continuity of production. In this context, Sonatrach employs symbolic pressure limit calculations, such as ASME B31G (2009) and the modified ASME B31G (2012). The aim of this study is to perform a comparative analysis of various limit pressure calculation methods documented in the literature, namely DNV RP F-101, SHELL, P-CORRC, NETTO, and CSA Z662. This comparative assessment will be based on a dataset comprising 329 burst tests published in the literature. Ultimately, we intend to introduce a novel approach grounded in the finite element method, employing ANSYS software.

Keywords: pipeline burst pressure, burst test, corrosion defect, corroded pipeline, finite element method

Procedia PDF Downloads 58
386 Fake News Detection for Korean News Using Machine Learning Techniques

Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Keywords: fake news detection, Korean news, machine learning, text mining

Procedia PDF Downloads 275