Search results for: dataset production
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8206

Search results for: dataset production

8206 Data Gathering and Analysis for Arabic Historical Documents

Authors: Ali Dulla

Abstract:

This paper introduces a new dataset (and the methodology used to generate it) based on a wide range of historical Arabic documents containing clean data simple and homogeneous-page layouts. The experiments are implemented on printed and handwritten documents obtained respectively from some important libraries such as Qatar Digital Library, the British Library and the Library of Congress. We have gathered and commented on 150 archival document images from different locations and time periods. It is based on different documents from the 17th-19th century. The dataset comprises differing page layouts and degradations that challenge text line segmentation methods. Ground truth is produced using the Aletheia tool by PRImA and stored in an XML representation, in the PAGE (Page Analysis and Ground truth Elements) format. The dataset presented will be easily available to researchers world-wide for research into the obstacles facing various historical Arabic documents such as geometric correction of historical Arabic documents.

Keywords: dataset production, ground truth production, historical documents, arbitrary warping, geometric correction

Procedia PDF Downloads 141
8205 Reducing the Imbalance Penalty Through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: Hayriye Anıl, Görkem Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations since geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning, and, time series methods, the total generation of the power plants belonging to Zorlu Natural Electricity Generation, which has a high installed capacity in terms of geothermal, was estimated for the first one and two weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting

Procedia PDF Downloads 74
8204 Machine Learning Approach for Yield Prediction in Semiconductor Production

Authors: Heramb Somthankar, Anujoy Chakraborty

Abstract:

This paper presents a classification study on yield prediction in semiconductor production using machine learning approaches. A complicated semiconductor production process is generally monitored continuously by signals acquired from sensors and measurement sites. A monitoring system contains a variety of signals, all of which contain useful information, irrelevant information, and noise. In the case of each signal being considered a feature, "Feature Selection" is used to find the most relevant signals. The open-source UCI SECOM Dataset provides 1567 such samples, out of which 104 fail in quality assurance. Feature extraction and selection are performed on the dataset, and useful signals were considered for further study. Afterward, common machine learning algorithms were employed to predict whether the signal yields pass or fail. The most relevant algorithm is selected for prediction based on the accuracy and loss of the ML model.

Keywords: deep learning, feature extraction, feature selection, machine learning classification algorithms, semiconductor production monitoring, signal processing, time-series analysis

Procedia PDF Downloads 71
8203 Distorted Document Images Dataset for Text Detection and Recognition

Authors: Ilia Zharikov, Philipp Nikitin, Ilia Vasiliev, Vladimir Dokholyan

Abstract:

With the increasing popularity of document analysis and recognition systems, text detection (TD) and optical character recognition (OCR) in document images become challenging tasks. However, according to our best knowledge, no publicly available datasets for these particular problems exist. In this paper, we introduce a Distorted Document Images dataset (DDI-100) and provide a detailed analysis of the DDI-100 in its current state. To create the dataset we collected 7000 unique document pages, and extend it by applying different types of distortions and geometric transformations. In total, DDI-100 contains more than 100,000 document images together with binary text masks, text and character locations in terms of bounding boxes. We also present an analysis of several state-of-the-art TD and OCR approaches on the presented dataset. Lastly, we demonstrate the usefulness of DDI-100 to improve accuracy and stability of the considered TD and OCR models.

Keywords: document analysis, open dataset, optical character recognition, text detection

Procedia PDF Downloads 132
8202 SAMRA: Dataset in Al-Soudani Arabic Maghrebi Script for Recognition of Arabic Ancient Words Handwritten

Authors: Sidi Ahmed Maouloud, Cheikh Ba

Abstract:

Much of West Africa’s cultural heritage is written in the Al-Soudani Arabic script, which was widely used in West Africa before the time of European colonization. This Al-Soudani Arabic script is an African version of the Maghrebi script, in particular, the Al-Mebssout script. However, the local African qualities were incorporated into the Al-Soudani script in a way that gave it a unique African diversity and character. Despite the existence of several Arabic datasets in Oriental script, allowing for the analysis, layout, and recognition of texts written in these calligraphies, many Arabic scripts and written traditions remain understudied. In this paper, we present a dataset of words from Al-Soudani calligraphy scripts. This dataset consists of 100 images selected from three different manuscripts written in Al-Soudani Arabic script by different copyists. The primary source for this database was the libraries of Boston University and Cambridge University. This dataset highlights the unique characteristics of the Al-Soudani Arabic script as well as the new challenges it presents in terms of automatic word recognition of Arabic manuscripts. An HTR system based on a hybrid ANN (CRNN-CTC) is also proposed to test this dataset. SAMRA is a dataset of annotated Arabic manuscript words in the Al-Soudani script that can help researchers automatically recognize and analyze manuscript words written in this script.

Keywords: dataset, CRNN-CTC, handwritten words recognition, Al-Soudani Arabic script, HTR, manuscripts

Procedia PDF Downloads 71
8201 Fuzzy-Machine Learning Models for the Prediction of Fire Outbreak: A Comparative Analysis

Authors: Uduak Umoh, Imo Eyoh, Emmauel Nyoho

Abstract:

This paper compares fuzzy-machine learning algorithms such as Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) for the predicting cases of fire outbreak. The paper uses the fire outbreak dataset with three features (Temperature, Smoke, and Flame). The data is pre-processed using Interval Type-2 Fuzzy Logic (IT2FL) algorithm. Min-Max Normalization and Principal Component Analysis (PCA) are used to predict feature labels in the dataset, normalize the dataset, and select relevant features respectively. The output of the pre-processing is a dataset with two principal components (PC1 and PC2). The pre-processed dataset is then used in the training of the aforementioned machine learning models. K-fold (with K=10) cross-validation method is used to evaluate the performance of the models using the matrices – ROC (Receiver Operating Curve), Specificity, and Sensitivity. The model is also tested with 20% of the dataset. The validation result shows KNN is the better model for fire outbreak detection with an ROC value of 0.99878, followed by SVM with an ROC value of 0.99753.

Keywords: Machine Learning Algorithms , Interval Type-2 Fuzzy Logic, Fire Outbreak, Support Vector Machine, K-Nearest Neighbour, Principal Component Analysis

Procedia PDF Downloads 134
8200 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 89
8199 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 378
8198 Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

Authors: Yalong Jiang, Zheru Chi

Abstract:

In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.

Keywords: CNN, convolutional neural network, capsule network, capacity optimization, character recognition, data augmentation, semantic segmentation

Procedia PDF Downloads 117
8197 Energy Complementary in Colombia: Imputation of Dataset

Authors: Felipe Villegas-Velasquez, Harold Pantoja-Villota, Sergio Holguin-Cardona, Alejandro Osorio-Botero, Brayan Candamil-Arango

Abstract:

Colombian electricity comes mainly from hydric resources, affected by environmental variations such as the El Niño phenomenon. That is why incorporating other types of resources is necessary to provide electricity constantly. This research seeks to fill the wind speed and global solar irradiance dataset for two years with the highest amount of information. A further result is the characterization of the data by region that led to infer which errors occurred and offered the incomplete dataset.

Keywords: energy, wind speed, global solar irradiance, Colombia, imputation

Procedia PDF Downloads 111
8196 A Deep Learning Model with Greedy Layer-Wise Pretraining Approach for Optimal Syngas Production by Dry Reforming of Methane

Authors: Maryam Zarabian, Hector Guzman, Pedro Pereira-Almao, Abraham Fapojuwo

Abstract:

Dry reforming of methane (DRM) has sparked significant industrial and scientific interest not only as a viable alternative for addressing the environmental concerns of two main contributors of the greenhouse effect, i.e., carbon dioxide (CO₂) and methane (CH₄), but also produces syngas, i.e., a mixture of hydrogen (H₂) and carbon monoxide (CO) utilized by a wide range of downstream processes as a feedstock for other chemical productions. In this study, we develop an AI-enable syngas production model to tackle the problem of achieving an equivalent H₂/CO ratio [1:1] with respect to the most efficient conversion. Firstly, the unsupervised density-based spatial clustering of applications with noise (DBSAN) algorithm removes outlier data points from the original experimental dataset. Then, random forest (RF) and deep neural network (DNN) models employ the error-free dataset to predict the DRM results. DNN models inherently would not be able to obtain accurate predictions without a huge dataset. To cope with this limitation, we employ reusing pre-trained layers’ approaches such as transfer learning and greedy layer-wise pretraining. Compared to the other deep models (i.e., pure deep model and transferred deep model), the greedy layer-wise pre-trained deep model provides the most accurate prediction as well as similar accuracy to the RF model with R² values 1.00, 0.999, 0.999, 0.999, 0.999, and 0.999 for the total outlet flow, H₂/CO ratio, H₂ yield, CO yield, CH₄ conversion, and CO₂ conversion outputs, respectively.

Keywords: artificial intelligence, dry reforming of methane, artificial neural network, deep learning, machine learning, transfer learning, greedy layer-wise pretraining

Procedia PDF Downloads 57
8195 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique

Authors: Yeliz Karaca, Rana Karabudak

Abstract:

Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.

Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques

Procedia PDF Downloads 132
8194 Corn Production in the Visayas: An Industry Study from 2002-2019

Authors: Julie Ann L. Gadin, Andrearose C. Igano, Carl Joseph S. Ignacio, Christopher C. Bacungan

Abstract:

Corn production has become an important and pervasive industry in the Visayas for many years. Its role as a substitute commodity to rice heightens demand for health-particular consumers. Unfortunately, the corn industry is confronted with several challenges, such as weak institutions. Considering these issues, the paper examined the factors that influence corn production in the three administrative regions in the Visayas, namely, Western Visayas, Central Visayas, and Eastern Visayas. The data used was retrieved from a variety of publicly available data sources such as the Philippine Statistics Authority, the Department of Agriculture, the Philippine Crop Insurance Corporation, and the International Disaster Database. Utilizing a dataset from 2002 to 2019, the indicators were tested using three multiple linear regression (MLR) models. Results showed that the land area harvested (p=0.02), and the value of corn production (p=0.00) are statistically significant variables that influence corn production in the Visayas. Given these findings, it is suggested that the policy of forest conversion and sustainable land management should be effective in enabling farmworkers to obtain land to grow corn crops, especially in rural regions. Furthermore, the Biofuels Act of 2006, the Livestock Industry Restructuring and Rationalization Act, and supported policy, Senate Bill No. 225, or an Act Establishing the Philippine Corn Research Institute and Appropriating Funds, should be enforced inclusively in order to improve the demand for the corn-allied industries which may lead to an increase in the value and volume of corn production in the Visayas.

Keywords: corn, industry, production, MLR, Visayas

Procedia PDF Downloads 146
8193 Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language

Authors: Marie Alaghband, Niloofar Yousefi, Ivan Garibay

Abstract:

Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image’s facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.

Keywords: annotated facial expression dataset, gesture recognition, sequenced facial expression dataset, sign language recognition

Procedia PDF Downloads 128
8192 Recent Climate Variability and Crop Production in the Central Highlands of Ethiopia

Authors: Arragaw Alemayehu, Woldeamlak Bewket

Abstract:

The aim of this study was to understand the influence of current climate variability on crop production in the central highlands of Ethiopia. We used monthly rainfall and temperature data from 132 points each representing a pixel of 10×10 km. The data are reconstructions based on station records and meteorological satellite observations. Production data of the five major crops in the area were collected from the Central Statistical Agency for the period 2004-2013 and for the main cropping season, locally known as Meher. The production data are at the Enumeration Area (EA ) level and hence the best available dataset on crop production. The results show statistically significant decreasing trends in March–May (Belg) rainfall in the area. However, June – September (Kiremt) rainfall showed increasing trends in Efratana Gidim and Menz Gera Meder which the latter is statistically significant. Annual rainfall also showed positive trends in the area except Basona Werana where significant negative trends were observed. On the other hand, maximum and minimum temperatures showed warming trends in the study area. Correlation results have shown that crop production and area of cultivation have positive correlation with rainfall, and negative with temperature. When the trends in crop production are investigated, most crops showed negative trends and below average production was observed. Regression results have shown that rainfall was the most important determinant of crop production in the area. It is concluded that current climate variability has a significant influence on crop production in the area and any unfavorable change in the local climate in the future will have serious implications for household level food security. Efforts to adapt to the ongoing climate change should begin from tackling the current climate variability and take a climate risk management approach.

Keywords: central highlands, climate variability, crop production, Ethiopia, regression, trend

Procedia PDF Downloads 402
8191 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network

Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang

Abstract:

As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.

Keywords: GUI, deep learning, GAN, data augmentation

Procedia PDF Downloads 143
8190 Pose Normalization Network for Object Classification

Authors: Bingquan Shen

Abstract:

Convolutional Neural Networks (CNN) have demonstrated their effectiveness in synthesizing 3D views of object instances at various viewpoints. Given the problem where one have limited viewpoints of a particular object for classification, we present a pose normalization architecture to transform the object to existing viewpoints in the training dataset before classification to yield better classification performance. We have demonstrated that this Pose Normalization Network (PNN) can capture the style of the target object and is able to re-render it to a desired viewpoint. Moreover, we have shown that the PNN improves the classification result for the 3D chairs dataset and ShapeNet airplanes dataset when given only images at limited viewpoint, as compared to a CNN baseline.

Keywords: convolutional neural networks, object classification, pose normalization, viewpoint invariant

Procedia PDF Downloads 303
8189 Enhancing Fault Detection in Rotating Machinery Using Wiener-CNN Method

Authors: Mohamad R. Moshtagh, Ahmad Bagheri

Abstract:

Accurate fault detection in rotating machinery is of utmost importance to ensure optimal performance and prevent costly downtime in industrial applications. This study presents a robust fault detection system based on vibration data collected from rotating gears under various operating conditions. The considered scenarios include: (1) both gears being healthy, (2) one healthy gear and one faulty gear, and (3) introducing an imbalanced condition to a healthy gear. Vibration data was acquired using a Hentek 1008 device and stored in a CSV file. Python code implemented in the Spider environment was used for data preprocessing and analysis. Winner features were extracted using the Wiener feature selection method. These features were then employed in multiple machine learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), and Random Forest, to evaluate their performance in detecting and classifying faults in both the training and validation datasets. The comparative analysis of the methods revealed the superior performance of the Wiener-CNN approach. The Wiener-CNN method achieved a remarkable accuracy of 100% for both the two-class (healthy gear and faulty gear) and three-class (healthy gear, faulty gear, and imbalanced) scenarios in the training and validation datasets. In contrast, the other methods exhibited varying levels of accuracy. The Wiener-MLP method attained 100% accuracy for the two-class training dataset and 100% for the validation dataset. For the three-class scenario, the Wiener-MLP method demonstrated 100% accuracy in the training dataset and 95.3% accuracy in the validation dataset. The Wiener-KNN method yielded 96.3% accuracy for the two-class training dataset and 94.5% for the validation dataset. In the three-class scenario, it achieved 85.3% accuracy in the training dataset and 77.2% in the validation dataset. The Wiener-Random Forest method achieved 100% accuracy for the two-class training dataset and 85% for the validation dataset, while in the three-class training dataset, it attained 100% accuracy and 90.8% accuracy for the validation dataset. The exceptional accuracy demonstrated by the Wiener-CNN method underscores its effectiveness in accurately identifying and classifying fault conditions in rotating machinery. The proposed fault detection system utilizes vibration data analysis and advanced machine learning techniques to improve operational reliability and productivity. By adopting the Wiener-CNN method, industrial systems can benefit from enhanced fault detection capabilities, facilitating proactive maintenance and reducing equipment downtime.

Keywords: fault detection, gearbox, machine learning, wiener method

Procedia PDF Downloads 47
8188 Evaluating Models Through Feature Selection Methods Using Data Driven Approach

Authors: Shital Patil, Surendra Bhosale

Abstract:

Cardiac diseases are the leading causes of mortality and morbidity in the world, from recent few decades accounting for a large number of deaths have emerged as the most life-threatening disorder globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four feature selection methods: Data Analysis, minimum Redundancy maximum Relevance (mRMR), Recursive Feature Elimination (RFE), Chi-squared are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum Recall of 98%, maximum Precision of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum Recall of 95%, maximum Precision of 97%.

Keywords: cardio vascular diseases, machine learning, feature selection, SMOTE

Procedia PDF Downloads 79
8187 Static and Dynamic Hand Gesture Recognition Using Convolutional Neural Network Models

Authors: Keyi Wang

Abstract:

Similar to the touchscreen, hand gesture based human-computer interaction (HCI) is a technology that could allow people to perform a variety of tasks faster and more conveniently. This paper proposes a training method of an image-based hand gesture image and video clip recognition system using a CNN (Convolutional Neural Network) with a dataset. A dataset containing 6 hand gesture images is used to train a 2D CNN model. ~98% accuracy is achieved. Furthermore, a 3D CNN model is trained on a dataset containing 4 hand gesture video clips resulting in ~83% accuracy. It is demonstrated that a Cozmo robot loaded with pre-trained models is able to recognize static and dynamic hand gestures.

Keywords: deep learning, hand gesture recognition, computer vision, image processing

Procedia PDF Downloads 105
8186 Data Mining Approach: Classification Model Evaluation

Authors: Lubabatu Sada Sodangi

Abstract:

The rapid growth in exchange and accessibility of information via the internet makes many organisations acquire data on their own operation. The aim of data mining is to analyse the different behaviour of a dataset using observation. Although, the subset of the dataset being analysed may not display all the behaviours and relationships of the entire data and, therefore, may not represent other parts that exist in the dataset. There is a range of techniques used in data mining to determine the hidden or unknown information in datasets. In this paper, the performance of two algorithms Chi-Square Automatic Interaction Detection (CHAID) and multilayer perceptron (MLP) would be matched using an Adult dataset to find out the percentage of an/the adults that earn > 50k and those that earn <= 50k per year. The two algorithms were studied and compared using IBM SPSS statistics software. The result for CHAID shows that the most important predictors are relationship and education. The algorithm shows that those are married (husband) and have qualification: Bachelor, Masters, Doctorate or Prof-school whose their age is > 41<57 earn > 50k. Also, multilayer perceptron displays marital status and capital gain as the most important predictors of the income. It also shows that individuals that their capital gain is less than 6,849 and are single, separated or widow, earn <= 50K, whereas individuals with their capital gain is > 6,849, work > 35 hrs/wk, and > 27yrs their income will be > 50k. By comparing the two algorithms, it is observed that both algorithms are reliable but there is strong reliability in CHAID which clearly shows that relation and education contribute to the prediction as displayed in the data visualisation.

Keywords: data mining, CHAID, multi-layer perceptron, SPSS, Adult dataset

Procedia PDF Downloads 349
8185 Video Object Segmentation for Automatic Image Annotation of Ethernet Connectors with Environment Mapping and 3D Projection

Authors: Marrone Silverio Melo Dantas Pedro Henrique Dreyer, Gabriel Fonseca Reis de Souza, Daniel Bezerra, Ricardo Souza, Silvia Lins, Judith Kelner, Djamel Fawzi Hadj Sadok

Abstract:

The creation of a dataset is time-consuming and often discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and support video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset, as well as the efficiency in the context of detection and classification problems. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. In order to evaluate the quality of the annotated images used for classification problems, we employed deep learning architectures. We adopted metrics accuracy and F1-Score, for VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.

Keywords: RJ45, automatic annotation, object tracking, 3D projection

Procedia PDF Downloads 132
8184 Engagement Analysis Using DAiSEE Dataset

Authors: Naman Solanki, Souraj Mondal

Abstract:

With the world moving towards online communication, the video datastore has exploded in the past few years. Consequently, it has become crucial to analyse participant’s engagement levels in online communication videos. Engagement prediction of people in videos can be useful in many domains, like education, client meetings, dating, etc. Video-level or frame-level prediction of engagement for a user involves the development of robust models that can capture facial micro-emotions efficiently. For the development of an engagement prediction model, it is necessary to have a widely-accepted standard dataset for engagement analysis. DAiSEE is one of the datasets which consist of in-the-wild data and has a gold standard annotation for engagement prediction. Earlier research done using the DAiSEE dataset involved training and testing standard models like CNN-based models, but the results were not satisfactory according to industry standards. In this paper, a multi-level classification approach has been introduced to create a more robust model for engagement analysis using the DAiSEE dataset. This approach has recorded testing accuracies of 0.638, 0.7728, 0.8195, and 0.866 for predicting boredom level, engagement level, confusion level, and frustration level, respectively.

Keywords: computer vision, engagement prediction, deep learning, multi-level classification

Procedia PDF Downloads 85
8183 Advantages of a New Manufacturing Facility for the Production of Nanofiber

Authors: R. Knizek, D. Karhankova

Abstract:

The production of nanofibers and the machinery for their production is a current issue. The pioneer, in the industrial production of nanofibers, is the machinery with the sales descriptions NanospiderTM from the company Elmarco, which came into being in 2008. Most of the production facilities, like NanospiderTM, use electrospinning. There are also other methods of industrial production of nanofibers, such as the centrifugal spinning process, which is used by FibeRio Technology Corporation. However, each method and machine has its advantages, but also disadvantages and that is the reason why a new machine called as Nanomachine, which eliminates the disadvantages of other production facilities producing nanofibers, has been developed.

Keywords: nanomachine, nanospider, spinning slat, electrospinning

Procedia PDF Downloads 275
8182 Face Recognition Using Body-Worn Camera: Dataset and Baseline Algorithms

Authors: Ali Almadan, Anoop Krishnan, Ajita Rattani

Abstract:

Facial recognition is a widely adopted technology in surveillance, border control, healthcare, banking services, and lately, in mobile user authentication with Apple introducing “Face ID” moniker with iPhone X. A lot of research has been conducted in the area of face recognition on datasets captured by surveillance cameras, DSLR, and mobile devices. Recently, face recognition technology has also been deployed on body-worn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic so far, without the availability of any publicly available datasets with a sufficient sample size. This paper aims to advance research in the area of face recognition using body-worn cameras. To this aim, the contribution of this work is two-fold: (1) collection of a dataset consisting of a total of 136,939 facial images of 102 subjects captured using body-worn cameras in in-door and daylight conditions and (2) evaluation of various deep-learning architectures for face identification on the collected dataset. Experimental results suggest a maximum True Positive Rate(TPR) of 99.86% at False Positive Rate(FPR) of 0.000 obtained by SphereFace based deep learning architecture in daylight condition. The collected dataset and the baseline algorithms will promote further research and development. A downloadable link of the dataset and the algorithms is available by contacting the authors.

Keywords: face recognition, body-worn cameras, deep learning, person identification

Procedia PDF Downloads 134
8181 Environmental Performance of Olive Oil Production in Greece

Authors: P. Tsarouhas, Ch. Achillas, D. Aidonis, D. Folinas, V. Maslis, N. Moussiopoulos

Abstract:

Agricultural production is a sector with high socioeconomic significance and key implications on employment and nutritional security. However, the impacts of agrifood production and consumption patterns on the environment are considerable, mainly due to the demand of large inputs of resources. This paper presents a case study of olive oil production in Greece, an important agri-product especially for countries in the Mediterranean basin. Life Cycle Analysis has been used to quantify the environmental performance of olive oil production. All key parameters that are associated with the life cycle of olive oil production are studied and environmental “hotspots” are diagnosed.

Keywords: LCA, olive oil production, environmental impact, case study, Greece

Procedia PDF Downloads 402
8180 An Approach of High Scalable Production Capacity by Adaption of the Concept 'Everything as a Service'

Authors: Johannes Atug, Stefan Braunreuther, Gunther Reinhart

Abstract:

Volatile markets, as well as increasing global competition in manufacturing, lead to a high demand of flexible and agile production systems. These advanced production systems in turn conduct to high capital expenditure along with high investment risks. Developments in production regarding digitalization and cyber-physical systems result to a merger of informational- and operational technology. The approach of this paper is to benefit from this merger and present a framework of a production network with scalable production capacity and low capital expenditure by adaptation of the IT concept 'everything as a service' into the production environment.

Keywords: digital manufacturing system, everything as a service, reconfigurable production, value network

Procedia PDF Downloads 310
8179 Design and Implementation a Platform for Adaptive Online Learning Based on Fuzzy Logic

Authors: Budoor Al Abid

Abstract:

Educational systems are increasingly provided as open online services, providing guidance and support for individual learners. To adapt the learning systems, a proper evaluation must be made. This paper builds the evaluation model Fuzzy C Means Adaptive System (FCMAS) based on data mining techniques to assess the difficulty of the questions. The following steps are implemented; first using a dataset from an online international learning system called (slepemapy.cz) the dataset contains over 1300000 records with 9 features for students, questions and answers information with feedback evaluation. Next, a normalization process as preprocessing step was applied. Then FCM clustering algorithms are used to adaptive the difficulty of the questions. The result is three cluster labeled data depending on the higher Wight (easy, Intermediate, difficult). The FCM algorithm gives a label to all the questions one by one. Then Random Forest (RF) Classifier model is constructed on the clustered dataset uses 70% of the dataset for training and 30% for testing; the result of the model is a 99.9% accuracy rate. This approach improves the Adaptive E-learning system because it depends on the student behavior and gives accurate results in the evaluation process more than the evaluation system that depends on feedback only.

Keywords: machine learning, adaptive, fuzzy logic, data mining

Procedia PDF Downloads 163
8178 Application of Production Planning to Improve Operation in Local Factory

Authors: Bashayer Al-Enezi, Budoor Al-Sabti, Eman Al-Durai, Fatmah Kalban, Meshael Ahmed

Abstract:

Production planning and control principles are concerned with planning, controlling and balancing all aspects of manufacturing including raw materials, finished goods, production schedules, and equipment requirements. Hence, an effective production planning and control system is very critical to the success of any factory. This project will focus on the application of production planning and control principles on “The National Canned Food Production and Trading Company (NCFP)” factory to find problems or areas for improvement.

Keywords: production planning, operations improvement, inventory management, National Canned Food Production and Trading Company (NCFP)

Procedia PDF Downloads 465
8177 Life Cycle Assessment of Bioethanol from Feedstocks in Thailand

Authors: Thanapat Chaireongsirikul, Apichit Svang-Ariyaskul

Abstract:

An analysis of mass balance, energy performance, and environmental impact assessment were performed to evaluate bioethanol production in Thailand. Thailand is an agricultural country. Thai government plans to increase the use of alternative energy to 20 percent by 2022. One of the primary campaigns is to promote a bioethanol production from abundant biomass resources such as bitter cassava, molasses and sugarcane. The bioethanol production is composed of three stages: cultivation, pretreatment, and bioethanol conversion. All of mass, material, fuel, and energy were calculated to determine the environmental impact of three types of bioethanol production: bioethanol production from cassava (CBP), bioethanol production from molasses (MBP), and bioethanol production from rice straw (RBP). The results showed that bioethanol production from cassava has the best environmental performance. CBP contributes less impact when compared to the other processes.

Keywords: bioethanol production, biofuel, LCA, chemical engineering

Procedia PDF Downloads 333