Search results for: categorical datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 809

Search results for: categorical datasets

689 Using Satellite Images Datasets for Road Intersection Detection in Route Planning

Authors: Fatma El-Zahraa El-Taher, Ayman Taha, Jane Courtney, Susan Mckeever

Abstract:

Understanding road networks plays an important role in navigation applications such as self-driving vehicles and route planning for individual journeys. Intersections of roads are essential components of road networks. Understanding the features of an intersection, from a simple T-junction to larger multi-road junctions, is critical to decisions such as crossing roads or selecting the safest routes. The identification and profiling of intersections from satellite images is a challenging task. While deep learning approaches offer the state-of-the-art in image classification and detection, the availability of training datasets is a bottleneck in this approach. In this paper, a labelled satellite image dataset for the intersection recognition problem is presented. It consists of 14,692 satellite images of Washington DC, USA. To support other users of the dataset, an automated download and labelling script is provided for dataset replication. The challenges of construction and fine-grained feature labelling of a satellite image dataset is examined, including the issue of how to address features that are spread across multiple images. Finally, the accuracy of the detection of intersections in satellite images is evaluated.

Keywords: satellite images, remote sensing images, data acquisition, autonomous vehicles

Procedia PDF Downloads 111
688 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: cancer classification, feature selection, deep learning, genetic algorithm

Procedia PDF Downloads 88
687 Disaggregation the Daily Rainfall Dataset into Sub-Daily Resolution in the Temperate Oceanic Climate Region

Authors: Mohammad Bakhshi, Firas Al Janabi

Abstract:

High resolution rain data are very important to fulfill the input of hydrological models. Among models of high-resolution rainfall data generation, the temporal disaggregation was chosen for this study. The paper attempts to generate three different rainfall resolutions (4-hourly, hourly and 10-minutes) from daily for around 20-year record period. The process was done by DiMoN tool which is based on random cascade model and method of fragment. Differences between observed and simulated rain dataset are evaluated with variety of statistical and empirical methods: Kolmogorov-Smirnov test (K-S), usual statistics, and Exceedance probability. The tool worked well at preserving the daily rainfall values in wet days, however, the generated data are cumulated in a shorter time period and made stronger storms. It is demonstrated that the difference between generated and observed cumulative distribution function curve of 4-hourly datasets is passed the K-S test criteria while in hourly and 10-minutes datasets the P-value should be employed to prove that their differences were reasonable. The results are encouraging considering the overestimation of generated high-resolution rainfall data.

Keywords: DiMoN Tool, disaggregation, exceedance probability, Kolmogorov-Smirnov test, rainfall

Procedia PDF Downloads 178
686 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 268
685 Generation of High-Quality Synthetic CT Images from Cone Beam CT Images Using A.I. Based Generative Networks

Authors: Heeba A. Gurku

Abstract:

Introduction: Cone Beam CT(CBCT) images play an integral part in proper patient positioning in cancer patients undergoing radiation therapy treatment. But these images are low in quality. The purpose of this study is to generate high-quality synthetic CT images from CBCT using generative models. Material and Methods: This study utilized two datasets from The Cancer Imaging Archive (TCIA) 1) Lung cancer dataset of 20 patients (with full view CBCT images) and 2) Pancreatic cancer dataset of 40 patients (only 27 patients having limited view images were included in the study). Cycle Generative Adversarial Networks (GAN) and its variant Attention Guided Generative Adversarial Networks (AGGAN) models were used to generate the synthetic CTs. Models were evaluated by visual evaluation and on four metrics, Structural Similarity Index Measure (SSIM), Peak Signal Noise Ratio (PSNR) Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), to compare the synthetic CT and original CT images. Results: For pancreatic dataset with limited view CBCT images, our study showed that in Cycle GAN model, MAE, RMSE, PSNR improved from 12.57to 8.49, 20.94 to 15.29 and 21.85 to 24.63, respectively but structural similarity only marginally increased from 0.78 to 0.79. Similar, results were achieved with AGGAN with no improvement over Cycle GAN. However, for lung dataset with full view CBCT images Cycle GAN was able to reduce MAE significantly from 89.44 to 15.11 and AGGAN was able to reduce it to 19.77. Similarly, RMSE was also decreased from 92.68 to 23.50 in Cycle GAN and to 29.02 in AGGAN. SSIM and PSNR also improved significantly from 0.17 to 0.59 and from 8.81 to 21.06 in Cycle GAN respectively while in AGGAN SSIM increased to 0.52 and PSNR increased to 19.31. In both datasets, GAN models were able to reduce artifacts, reduce noise, have better resolution, and better contrast enhancement. Conclusion and Recommendation: Both Cycle GAN and AGGAN were significantly able to reduce MAE, RMSE and PSNR in both datasets. However, full view lung dataset showed more improvement in SSIM and image quality than limited view pancreatic dataset.

Keywords: CT images, CBCT images, cycle GAN, AGGAN

Procedia PDF Downloads 55
684 Traumatic Brain Injury in Cameroon: A Prospective Observational Study in a Level 1 Trauma Centre

Authors: Franklin Chu Buh, Irene Ule Ngole Sumbele, Andrew I. R. Maas, Mathieu Motah, Jogi V. Pattisapu, Eric Youm, Basil Kum Meh, Firas H. Kobeissy, Kevin W. Wang, Peter J. A. Hutchinson, Germain Sotoing Taiwe

Abstract:

Introduction: Studying TBI characteristics and their relation to outcomes can identify initiatives to improve TBI prevention and care. The objective of this study was to define the features and outcomes of TBI patients seen over a 1-year period in a level-I trauma center in Cameroon. Methods: Data on demographics, causes, injury mechanisms, clinical aspects, and discharge status were prospectively collected over a period of 12 months. The Glasgow Outcome Scale-Extended (GOSE) and the Quality of Life Questionnaire after Brain Injury (QoLIBRI) were used to evaluate outcomes 6-months after TBI. Categorical variables were described as frequencies and percentages. Comparisons between 2 categorical variables were done using Pearson's Chi-square test or Fisher's exact test. Results: A total of 160 TBI patients participated in the study. The age group 15-45 years (78%; 125) was most represented. Males were more affected (90%; 144). Low educational level was recorded in 122 (76%) cases. Road traffic incidents (RTI) were the main cause of TBI (85%), with professional bike riders being frequently involved (27%, 43/160). Assaults (7.5%) and falls (2.5%) represent the second and third most common causes of TBI in Cameroon, respectively. Only 15 patients were transported to the hospital by ambulance, and 14 of these were from a referring hospital. CT-imaging was performed in 78% (125/160) of cases intracranial traumatic abnormality was identified in 77/125 (64%) cases. Financial constraints were the main reason for not performing a CT scan on 35 patients. A total of 46 (33%) patients were discharged against medical advice (DAMA) due to financial constraints. Mortality was 14% (22/160) but disproportionately high in patients with severe TBI (46%). DAMA had poor outcomes with QoLIBRI. Only 4 patients received post-injury physiotherapy services. Conclusion: TBI in Cameroon mainly results from RTIs and commonly affects young adult males, and low educational or socioeconomic status and commercial bike riding appear to be predisposing factors. Lack of pre-hospital care, financial constraints limiting both CT-scanning and medical care, and lack of acute physiotherapy services likely influenced care and outcomes adversely.

Keywords: characteristics, traumatic brain injury, outcome, disparities in care, prospective study

Procedia PDF Downloads 97
683 Emotion Mining and Attribute Selection for Actionable Recommendations to Improve Customer Satisfaction

Authors: Jaishree Ranganathan, Poonam Rajurkar, Angelina A. Tzacheva, Zbigniew W. Ras

Abstract:

In today’s world, business often depends on the customer feedback and reviews. Sentiment analysis helps identify and extract information about the sentiment or emotion of the of the topic or document. Attribute selection is a challenging problem, especially with large datasets in actionable pattern mining algorithms. Action Rule Mining is one of the methods to discover actionable patterns from data. Action Rules are rules that help describe specific actions to be made in the form of conditions that help achieve the desired outcome. The rules help to change from any undesirable or negative state to a more desirable or positive state. In this paper, we present a Lexicon based weighted scheme approach to identify emotions from customer feedback data in the area of manufacturing business. Also, we use Rough sets and explore the attribute selection method for large scale datasets. Then we apply Actionable pattern mining to extract possible emotion change recommendations. This kind of recommendations help business analyst to improve their customer service which leads to customer satisfaction and increase sales revenue.

Keywords: actionable pattern discovery, attribute selection, business data, data mining, emotion

Procedia PDF Downloads 168
682 The Hurricane 'Bump': Measuring the Effects of Hurricanes on Wages in Southern Louisiana

Authors: Jasmine Latiolais

Abstract:

Much of the disaster-related literature finds a positive relationship between the impact of a natural disaster and the growth of wages. Panel datasets are often used to explore these effects. However, natural disasters do not impact a single variable in the economy. Rather, natural disasters affect all facets of the economy, simultaneously, upon impact. It is difficult to control for all factors that would be influenced by the impact of a natural disaster, which can lead to lead to omitted variable bias in those studies employing panel datasets. To address this issue of omitted variable bias, an interrupted time series analysis is used to test the short-run relationship between the impact of Hurricanes Katrina and Rita on parish wage levels in Southern Louisiana, inherently controlling for economic conditions. This study provides evidence that natural disasters do increase wages in the very short term (one quarter following the impact of the hurricane) but that these results are not seen in the longer term and are not robust. In addition, the significance of the coefficients changes depending on the parish. Overall, this study finds that previous literature on this topic may not be robust when considered through a time-series lens.

Keywords: economic recovery, local economies, local wage growth, natural disasters

Procedia PDF Downloads 108
681 Modeling Geogenic Groundwater Contamination Risk with the Groundwater Assessment Platform (GAP)

Authors: Joel Podgorski, Manouchehr Amini, Annette Johnson, Michael Berg

Abstract:

One-third of the world’s population relies on groundwater for its drinking water. Natural geogenic arsenic and fluoride contaminate ~10% of wells. Prolonged exposure to high levels of arsenic can result in various internal cancers, while high levels of fluoride are responsible for the development of dental and crippling skeletal fluorosis. In poor urban and rural settings, the provision of drinking water free of geogenic contamination can be a major challenge. In order to efficiently apply limited resources in the testing of wells, water resource managers need to know where geogenically contaminated groundwater is likely to occur. The Groundwater Assessment Platform (GAP) fulfills this need by providing state-of-the-art global arsenic and fluoride contamination hazard maps as well as enabling users to create their own groundwater quality models. The global risk models were produced by logistic regression of arsenic and fluoride measurements using predictor variables of various soil, geological and climate parameters. The maps display the probability of encountering concentrations of arsenic or fluoride exceeding the World Health Organization’s (WHO) stipulated concentration limits of 10 µg/L or 1.5 mg/L, respectively. In addition to a reconsideration of the relevant geochemical settings, these second-generation maps represent a great improvement over the previous risk maps due to a significant increase in data quantity and resolution. For example, there is a 10-fold increase in the number of measured data points, and the resolution of predictor variables is generally 60 times greater. These same predictor variable datasets are available on the GAP platform for visualization as well as for use with a modeling tool. The latter requires that users upload their own concentration measurements and select the predictor variables that they wish to incorporate in their models. In addition, users can upload additional predictor variable datasets either as features or coverages. Such models can represent an improvement over the global models already supplied, since (a) users may be able to use their own, more detailed datasets of measured concentrations and (b) the various processes leading to arsenic and fluoride groundwater contamination can be isolated more effectively on a smaller scale, thereby resulting in a more accurate model. All maps, including user-created risk models, can be downloaded as PDFs. There is also the option to share data in a secure environment as well as the possibility to collaborate in a secure environment through the creation of communities. In summary, GAP provides users with the means to reliably and efficiently produce models specific to their region of interest by making available the latest datasets of predictor variables along with the necessary modeling infrastructure.

Keywords: arsenic, fluoride, groundwater contamination, logistic regression

Procedia PDF Downloads 314
680 Life Cycle Datasets for the Ornamental Stone Sector

Authors: Isabella Bianco, Gian Andrea Blengini

Abstract:

The environmental impact related to ornamental stones (such as marbles and granites) is largely debated. Starting from the industrial revolution, continuous improvements of machineries led to a higher exploitation of this natural resource and to a more international interaction between markets. As a consequence, the environmental impact of the extraction and processing of stones has increased. Nevertheless, if compared with other building materials, ornamental stones are generally more durable, natural, and recyclable. From the scientific point of view, studies on stone life cycle sustainability have been carried out, but these are often partial or not very significant because of the high percentage of approximations and assumptions in calculations. This is due to the lack, in life cycle databases (e.g. Ecoinvent, Thinkstep, and ELCD), of datasets about the specific technologies employed in the stone production chain. For example, databases do not contain information about diamond wires, chains or explosives, materials commonly used in quarries and transformation plants. The project presented in this paper aims to populate the life cycle databases with specific data of specific stone processes. To this goal, the methodology follows the standardized approach of Life Cycle Assessment (LCA), according to the requirements of UNI 14040-14044 and to the International Reference Life Cycle Data System (ILCD) Handbook guidelines of the European Commission. The study analyses the processes of the entire production chain (from-cradle-to-gate system boundaries), including the extraction of benches, the cutting of blocks into slabs/tiles and the surface finishing. Primary data have been collected in Italian quarries and transformation plants which use technologies representative of the current state-of-the-art. Since the technologies vary according to the hardness of the stone, the case studies comprehend both soft stones (marbles) and hard stones (gneiss). In particular, data about energy, materials and emissions were collected in marble basins of Carrara and in Beola and Serizzo basins located in the province of Verbano Cusio Ossola. Data were then elaborated through an appropriate software to build a life cycle model. The model was realized setting free parameters that allow an easy adaptation to specific productions. Through this model, the study aims to boost the direct participation of stone companies and encourage the use of LCA tool to assess and improve the stone sector environmental sustainability. At the same time, the realization of accurate Life Cycle Inventory data aims at making available, to researchers and stone experts, ILCD compliant datasets of the most significant processes and technologies related to the ornamental stone sector.

Keywords: life cycle assessment, LCA datasets, ornamental stone, stone environmental impact

Procedia PDF Downloads 199
679 Evaluating Machine Learning Techniques for Activity Classification in Smart Home Environments

Authors: Talal Alshammari, Nasser Alshammari, Mohamed Sedky, Chris Howard

Abstract:

With the widespread adoption of the Internet-connected devices, and with the prevalence of the Internet of Things (IoT) applications, there is an increased interest in machine learning techniques that can provide useful and interesting services in the smart home domain. The areas that machine learning techniques can help advance are varied and ever-evolving. Classifying smart home inhabitants’ Activities of Daily Living (ADLs), is one prominent example. The ability of machine learning technique to find meaningful spatio-temporal relations of high-dimensional data is an important requirement as well. This paper presents a comparative evaluation of state-of-the-art machine learning techniques to classify ADLs in the smart home domain. Forty-two synthetic datasets and two real-world datasets with multiple inhabitants are used to evaluate and compare the performance of the identified machine learning techniques. Our results show significant performance differences between the evaluated techniques. Such as AdaBoost, Cortical Learning Algorithm (CLA), Decision Trees, Hidden Markov Model (HMM), Multi-layer Perceptron (MLP), Structured Perceptron and Support Vector Machines (SVM). Overall, neural network based techniques have shown superiority over the other tested techniques.

Keywords: activities of daily living, classification, internet of things, machine learning, prediction, smart home

Procedia PDF Downloads 312
678 Narcissism and Kohut's Self-Psychology: Self Practices in Service of Self-Transcendence

Authors: Noelene Rose

Abstract:

The DSM has been plagued with conceptual issues since its inception, not least discriminant validity and comorbidity issues. An attempt to remain a-theoretical in the divide between the psycho-dynamicists and the behaviourists contributed to much of this, in particular relating to the Personality Disorders. With the DSM-5, although the criterion have remained unchanged, major conceptual and structural directions have been flagged and proposed in section III. The biggest changes concern the Personality Disorders. While Narcissistic Personality Disorder (NPD) was initially tagged for removal, instead the addition of section III proposes a move away from a categorical approach to a more dimensional approach, with a measure of Global Function of Personality. This global measure is an assessment of impairment of self-other relations; a measure of trait narcissism. In the same way mainstream psychology has struggled in its diagnosis of narcissism, so too in its treatment. Kohut’s self psychology represents the most significant inroad in theory and treatment for the narcissistic disorders. Kohut had moved away from a categorical system, towards disorders of the self. According to this theory, disorders of the self are the result of childhood trauma (impaired attunement) resulting in a developmental arrest. Self-psychological, Psychodynamic treatment of narcissism, however, is expensive, in time and money and outside the awareness or access of most people. There is more than a suggestion that narcissism is on the increase, created in trauma and worsened by a fearful world climate. A dimensional model of narcissism, from mild to severe, requires cut off points for diagnosis. But where do we draw the line? Mainstream psychology is inclined to set it high when there is some degree of impairment in functioning in daily life. Transpersonal Psychology is inclined to set it low, with the concept that we all have some degree of narcissism and that it is the point and the path of our life journey to transcend our focus on our selves. Mainstream psychology stops its focus on trait narcissism with a healthy level of self esteem, but it is at this point that Transpersonal Psychology can complement the discussion. From a Transpersonal point of view, failure to begin the process of self-transcendence will also create emotional symptoms of meaning or purpose, often later in our lives, and is also conceived of as a developmental arrest. The maps for this transcendence are hidden in plain sight; in the chakras of kundalini yoga, in the sacraments of the Catholic Church, in the Kabbalah tree of life of Judaism, in Maslow’s hierarchy of needs, to name a few. This paper outlines some proposed research exploring the use of daily practices that can be incorporated into the therapy room; practices that utilise meditation, visualisation and imagination: that are informed by spiritual technology and guided by the psychodynamic theory of Self Psychology.

Keywords: narcissism, self-psychology, self-practice, self-transcendence

Procedia PDF Downloads 234
677 Gene Expression Profile Reveals Breast Cancer Proliferation and Metastasis

Authors: Nandhana Vivek, Bhaskar Gogoi, Ayyavu Mahesh

Abstract:

Breast cancer metastasis plays a key role in cancer progression and fatality. The present study examines the potential causes of metastasis in breast cancer by investigating the novel interactions between genes and their pathways. The gene expression profile of GSE99394, GSE1246464, and GSE103865 was downloaded from the GEO data repository to analyze the differentially expressed genes (DEGs). Protein-protein interactions, target factor interactions, pathways and gene relationships, and functional enrichment networks were investigated. The proliferation pathway was shown to be highly expressed in breast cancer progression and metastasis in all three datasets. Gene Ontology analysis revealed 11 DEGs as gene targets to control breast cancer metastasis: LYN, DLGAP5, CXCR4, CDC6, NANOG, IFI30, TXP2, AGTR1, MKI67, and FTH1. Upon studying the function, genomic and proteomic data, and pathway involvement of the target genes, DLGAP5 proved to be a promising candidate due to it being highly differentially expressed in all datasets. The study takes a unique perspective on the avenues through which DLGAP5 promotes metastasis. The current investigation helps pave the way in understanding the role DLGAP5 plays in metastasis, which leads to an increased incidence of death among breast cancer patients.

Keywords: genomics, metastasis, microarray, cancer

Procedia PDF Downloads 69
676 Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language

Authors: Marie Alaghband, Niloofar Yousefi, Ivan Garibay

Abstract:

Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image’s facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.

Keywords: annotated facial expression dataset, gesture recognition, sequenced facial expression dataset, sign language recognition

Procedia PDF Downloads 129
675 On the Fourth-Order Hybrid Beta Polynomial Kernels in Kernel Density Estimation

Authors: Benson Ade Eniola Afere

Abstract:

This paper introduces a family of fourth-order hybrid beta polynomial kernels developed for statistical analysis. The assessment of these kernels' performance centers on two critical metrics: asymptotic mean integrated squared error (AMISE) and kernel efficiency. Through the utilization of both simulated and real-world datasets, a comprehensive evaluation was conducted, facilitating a thorough comparison with conventional fourth-order polynomial kernels. The evaluation procedure encompassed the computation of AMISE and efficiency values for both the proposed hybrid kernels and the established classical kernels. The consistently observed trend was the superior performance of the hybrid kernels when compared to their classical counterparts. This trend persisted across diverse datasets, underscoring the resilience and efficacy of the hybrid approach. By leveraging these performance metrics and conducting evaluations on both simulated and real-world data, this study furnishes compelling evidence in favour of the superiority of the proposed hybrid beta polynomial kernels. The discernible enhancement in performance, as indicated by lower AMISE values and higher efficiency scores, strongly suggests that the proposed kernels offer heightened suitability for statistical analysis tasks when compared to traditional kernels.

Keywords: AMISE, efficiency, fourth-order Kernels, hybrid Kernels, Kernel density estimation

Procedia PDF Downloads 42
674 Correlation between Speech Emotion Recognition Deep Learning Models and Noises

Authors: Leah Lee

Abstract:

This paper examines the correlation between deep learning models and emotions with noises to see whether or not noises mask emotions. The deep learning models used are plain convolutional neural networks (CNN), auto-encoder, long short-term memory (LSTM), and Visual Geometry Group-16 (VGG-16). Emotion datasets used are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D), Toronto Emotional Speech Set (TESS), and Surrey Audio-Visual Expressed Emotion (SAVEE). To make it four times bigger, audio set files, stretch, and pitch augmentations are utilized. From the augmented datasets, five different features are extracted for inputs of the models. There are eight different emotions to be classified. Noise variations are white noise, dog barking, and cough sounds. The variation in the signal-to-noise ratio (SNR) is 0, 20, and 40. In summation, per a deep learning model, nine different sets with noise and SNR variations and just augmented audio files without any noises will be used in the experiment. To compare the results of the deep learning models, the accuracy and receiver operating characteristic (ROC) are checked.

Keywords: auto-encoder, convolutional neural networks, long short-term memory, speech emotion recognition, visual geometry group-16

Procedia PDF Downloads 38
673 Interface between Personal Values and Social Entrepreneurship in Social Projects That Develop Sports Practice

Authors: Leticia Lengler, Jefferson Oliveira, Vania Estivalete, Jordana Marques Kneipp

Abstract:

The context of social, economic and environmental transformations has driven innumerable changes in the organizational environment, influencing the social interactions that occur in this scenario. In this sense, social entrepreneurship emerges as a unique opportunity to challenge, question, rethink certain concepts and traditional theories widely discussed in relation to entrepreneurship. Therefore, the interest in studying personal values has been based on the idea that they might be predictors of the behavior of individuals. As an attempt to relate personal values with the characteristics of social entrepreneurs, this study aims to investigate the salient values and the social entrepreneurship perceptions that occur in two social projects responsible for developing sports skills among the students. For purposes of analysis, it is intended to consider: (i) a description of both Social Projects and their respective institutions, considering their history and relevance in the context; (ii) analysis of the personal values of the idealizers and teachers responsible for the projects, (iii) identification of the characteristics of social entrepreneurship manifested in the two projects, and (iv) discussion of similarities and disparities of the categories identified among the participants of the projects. Therefore, this study will carry a qualitative analysis from the interviews with 10 participants of each social project (named Projeto Remar/ASENA and Projeto Mãos Dadas/JUDÔ SANTA MARIA): 2 projects coordinators, 2 students, 2 parents of students, 2 physical education internships and 2 businessmen who stablished a partnership with each project. The data collection will be done through semi-structured interviews that are going to last around 30 minutes each, being recorded, transcribed and later analyzed, through the categorical analysis. The option for categorical analysis is supported by the fact that it is the best alternative when one wants to study values, opinions, attitudes and beliefs, through qualitative ones. In the present research, the pre-analysis phase consisted of an organization of the material collected during the research with Remar and Mãos Dadas Project, and a dynamic reading of this material, seeking to identify the characteristics of social entrepreneurship and values addressed in the study. In the analytical description phase, a more in-depth analysis of the material collected in the research will be carried out. The third phase, referred to as referential interpretation or treatment of results obtained will allow to verify the homogeneity and the heterogeneity among the participants' perceptions of the projects. Some preliminary results coming from the first interviews revealed the projects are guided by values such as cooperation, respect, well-being and nature preservation. These values are linked to the social entrepreneurship perception of the projects managers, who established their activities in behalf of the local community.

Keywords: personal values, social entrepreneurship, social projects, sports participants

Procedia PDF Downloads 329
672 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 382
671 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: breast cancer, diagnosis, machine learning, biomarker classification, neural network

Procedia PDF Downloads 97
670 Towards Real-Time Classification of Finger Movement Direction Using Encephalography Independent Components

Authors: Mohamed Mounir Tellache, Hiroyuki Kambara, Yasuharu Koike, Makoto Miyakoshi, Natsue Yoshimura

Abstract:

This study explores the practicality of using electroencephalographic (EEG) independent components to predict eight-direction finger movements in pseudo-real-time. Six healthy participants with individual-head MRI images performed finger movements in eight directions with two different arm configurations. The analysis was performed in two stages. The first stage consisted of using independent component analysis (ICA) to separate the signals representing brain activity from non-brain activity signals and to obtain the unmixing matrix. The resulting independent components (ICs) were checked, and those reflecting brain-activity were selected. Finally, the time series of the selected ICs were used to predict eight finger-movement directions using Sparse Logistic Regression (SLR). The second stage consisted of using the previously obtained unmixing matrix, the selected ICs, and the model obtained by applying SLR to classify a different EEG dataset. This method was applied to two different settings, namely the single-participant level and the group-level. For the single-participant level, the EEG dataset used in the first stage and the EEG dataset used in the second stage originated from the same participant. For the group-level, the EEG datasets used in the first stage were constructed by temporally concatenating each combination without repetition of the EEG datasets of five participants out of six, whereas the EEG dataset used in the second stage originated from the remaining participants. The average test classification results across datasets (mean ± S.D.) were 38.62 ± 8.36% for the single-participant, which was significantly higher than the chance level (12.50 ± 0.01%), and 27.26 ± 4.39% for the group-level which was also significantly higher than the chance level (12.49% ± 0.01%). The classification accuracy within [–45°, 45°] of the true direction is 70.03 ± 8.14% for single-participant and 62.63 ± 6.07% for group-level which may be promising for some real-life applications. Clustering and contribution analyses further revealed the brain regions involved in finger movement and the temporal aspect of their contribution to the classification. These results showed the possibility of using the ICA-based method in combination with other methods to build a real-time system to control prostheses.

Keywords: brain-computer interface, electroencephalography, finger motion decoding, independent component analysis, pseudo real-time motion decoding

Procedia PDF Downloads 115
669 Video Object Segmentation for Automatic Image Annotation of Ethernet Connectors with Environment Mapping and 3D Projection

Authors: Marrone Silverio Melo Dantas Pedro Henrique Dreyer, Gabriel Fonseca Reis de Souza, Daniel Bezerra, Ricardo Souza, Silvia Lins, Judith Kelner, Djamel Fawzi Hadj Sadok

Abstract:

The creation of a dataset is time-consuming and often discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and support video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset, as well as the efficiency in the context of detection and classification problems. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. In order to evaluate the quality of the annotated images used for classification problems, we employed deep learning architectures. We adopted metrics accuracy and F1-Score, for VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.

Keywords: RJ45, automatic annotation, object tracking, 3D projection

Procedia PDF Downloads 133
668 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Authors: Rodolfo Lorbieski, Silvia Modesto Nassar

Abstract:

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Keywords: stacking, multi-layers, ensemble, multi-class

Procedia PDF Downloads 235
667 Power Iteration Clustering Based on Deflation Technique on Large Scale Graphs

Authors: Taysir Soliman

Abstract:

One of the current popular clustering techniques is Spectral Clustering (SC) because of its advantages over conventional approaches such as hierarchical clustering, k-means, etc. and other techniques as well. However, one of the disadvantages of SC is the time consuming process because it requires computing the eigenvectors. In the past to overcome this disadvantage, a number of attempts have been proposed such as the Power Iteration Clustering (PIC) technique, which is one of versions from SC; some of PIC advantages are: 1) its scalability and efficiency, 2) finding one pseudo-eigenvectors instead of computing eigenvectors, and 3) linear combination of the eigenvectors in linear time. However, its worst disadvantage is an inter-class collision problem because it used only one pseudo-eigenvectors which is not enough. Previous researchers developed Deflation-based Power Iteration Clustering (DPIC) to overcome problems of PIC technique on inter-class collision with the same efficiency of PIC. In this paper, we developed Parallel DPIC (PDPIC) to improve the time and memory complexity which is run on apache spark framework using sparse matrix. To test the performance of PDPIC, we compared it to SC, ESCG, ESCALG algorithms on four small graph benchmark datasets and nine large graph benchmark datasets, where PDPIC proved higher accuracy and better time consuming than other compared algorithms.

Keywords: spectral clustering, power iteration clustering, deflation-based power iteration clustering, Apache spark, large graph

Procedia PDF Downloads 150
666 Mixture statistical modeling for predecting mortality human immunodeficiency virus (HIV) and tuberculosis(TB) infection patients

Authors: Mohd Asrul Affendi Bi Abdullah, Nyi Nyi Naing

Abstract:

The purpose of this study was to identify comparable manner between negative binomial death rate (NBDR) and zero inflated negative binomial death rate (ZINBDR) with died patients with (HIV + T B+) and (HIV + T B−). HIV and TB is a serious world wide problem in the developing country. Data were analyzed with applying NBDR and ZINBDR to make comparison which a favorable model is better to used. The ZINBDR model is able to account for the disproportionately large number of zero within the data and is shown to be a consistently better fit than the NBDR model. Hence, as a results ZINBDR model is a superior fit to the data than the NBDR model and provides additional information regarding the died mechanisms HIV+TB. The ZINBDR model is shown to be a use tool for analysis death rate according age categorical.

Keywords: zero inflated negative binomial death rate, HIV and TB, AIC and BIC, death rate

Procedia PDF Downloads 401
665 Urban Heat Island Effects on Human Health in Birmingham and Its Mitigation

Authors: N. A. Parvin, E. B. Ferranti, L. A. Chapman, C. A. Pfrang

Abstract:

This study intends to investigate the effects of the Urban Heat Island on public health in Birmingham. Birmingham is located at the center of the West Midlands and its weather is Highly variable due to geographical factors. Residential developments, road networks and infrastructure often replace open spaces and vegetation. This transformation causes the temperature of urban areas to increase and creates an "island" of higher temperatures in the urban landscape. Extreme heat in the urban area is influencing public health in the UK as well as in the world. Birmingham is a densely built-up area with skyscrapers and congested buildings in the city center, which is a barrier to air circulation. We will investigate the city regarding heat and cold-related human mortality and other impacts. We are using primary and secondary datasets to examine the effect of population shift and land-use change on the UHI in Birmingham. We will also use freely available weather data from the Birmingham Urban Observatory and will incorporate satellite data to determine urban spatial expansion and its effect on the UHI. We have produced a temperature map based on summer datasets of 2020, which has covered 25 weather stations in Birmingham to show the differences between diurnal and nocturnal summer and annual temperature trends. Some impacts of the UHI may be beneficial, such as the lengthening of the plant growing season, but most of them are highly negative. We are looking for various effects of urban heat which is impacting human health and investigating mitigation options.

Keywords: urban heat, public health, climate change

Procedia PDF Downloads 69
664 The Identification of Combined Genomic Expressions as a Diagnostic Factor for Oral Squamous Cell Carcinoma

Authors: Ki-Yeo Kim

Abstract:

Trends in genetics are transforming in order to identify differential coexpressions of correlated gene expression rather than the significant individual gene. Moreover, it is known that a combined biomarker pattern improves the discrimination of a specific cancer. The identification of the combined biomarker is also necessary for the early detection of invasive oral squamous cell carcinoma (OSCC). To identify the combined biomarker that could improve the discrimination of OSCC, we explored an appropriate number of genes in a combined gene set in order to attain the highest level of accuracy. After detecting a significant gene set, including the pre-defined number of genes, a combined expression was identified using the weights of genes in a gene set. We used the Principal Component Analysis (PCA) for the weight calculation. In this process, we used three public microarray datasets. One dataset was used for identifying the combined biomarker, and the other two datasets were used for validation. The discrimination accuracy was measured by the out-of-bag (OOB) error. There was no relation between the significance and the discrimination accuracy in each individual gene. The identified gene set included both significant and insignificant genes. One of the most significant gene sets in the classification of normal and OSCC included MMP1, SOCS3 and ACOX1. Furthermore, in the case of oral dysplasia and OSCC discrimination, two combined biomarkers were identified. The combined genomic expression achieved better performance in the discrimination of different conditions than in a single significant gene. Therefore, it could be expected that accurate diagnosis for cancer could be possible with a combined biomarker.

Keywords: oral squamous cell carcinoma, combined biomarker, microarray dataset, correlated genes

Procedia PDF Downloads 392
663 The Impact of Innovation Efficiency on the Production of New Knowledge: A Manufacturing Firm Level Perspective

Authors: Vasilios Kanellopoulos

Abstract:

The present paper examines the effect of innovation efficiency on the production of new knowledge from a firm level perspective. It resorts to the Greek version of community innovation survey (CIS 2012-2014 microdata) and employs 1274 firms of the manufacturing, which constitutes the main sector of examination. It assumes a knowledge production function (KPF) and finds that R&D spillovers related to the expenditures on innovation activities, internal R&D, external R&D, skilled labor, and the expenditures in the acquisition of machinery have a positive and significant effect on the production of new knowledge when OLS techniques are applied. However, innovation efficiency comes from a Banker and Morey (1986) data envelopment analysis (DEA) with categorical variables has a statistically insignificant impact on the production of new knowledge measured by firm’s turnover.

Keywords: firms, innovation efficiency, production of new knowledge, R&D spillovers

Procedia PDF Downloads 105
662 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink

Authors: Sanjay Rathee, Arti Kashyap

Abstract:

Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.

Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining

Procedia PDF Downloads 258
661 Using Photogrammetric Techniques to Map the Mars Surface

Authors: Ahmed Elaksher, Islam Omar

Abstract:

For many years, Mars surface has been a mystery for scientists. Lately with the help of geospatial data and photogrammetric procedures researchers were able to capture some insights about this planet. Two of the most imperative data sources to explore Mars are the The High Resolution Imaging Science Experiment (HiRISE) and the Mars Orbiter Laser Altimeter (MOLA). HiRISE is one of six science instruments carried by the Mars Reconnaissance Orbiter, launched August 12, 2005, and managed by NASA. The MOLA sensor is a laser altimeter carried by the Mars Global Surveyor (MGS) and launched on November 7, 1996. In this project, we used MOLA-based DEMs to orthorectify HiRISE optical images for generating a more accurate and trustful surface of Mars. The MOLA data was interpolated using the kriging interpolation technique. Corresponding tie points were digitized from both datasets. These points were employed in co-registering both datasets using GIS analysis tools. In this project, we employed three different 3D to 2D transformation models. These are the parallel projection (3D affine) transformation model; the extended parallel projection transformation model; the Direct Linear Transformation (DLT) model. A set of tie-points was digitized from both datasets. These points were split into two sets: Ground Control Points (GCPs), used to evaluate the transformation parameters using least squares adjustment techniques, and check points (ChkPs) to evaluate the computed transformation parameters. Results were evaluated using the RMSEs between the precise horizontal coordinates of the digitized check points and those estimated through the transformation models using the computed transformation parameters. For each set of GCPs, three different configurations of GCPs and check points were tested, and average RMSEs are reported. It was found that for the 2D transformation models, average RMSEs were in the range of five meters. Increasing the number of GCPs from six to ten points improve the accuracy of the results with about two and half meters. Further increasing the number of GCPs didn’t improve the results significantly. Using the 3D to 2D transformation parameters provided three to two meters accuracy. Best results were reported using the DLT transformation model. However, increasing the number of GCPS didn’t have substantial effect. The results support the use of the DLT model as it provides the required accuracy for ASPRS large scale mapping standards. However, well distributed sets of GCPs is a key to provide such accuracy. The model is simple to apply and doesn’t need substantial computations.

Keywords: mars, photogrammetry, MOLA, HiRISE

Procedia PDF Downloads 39
660 Face Recognition Using Body-Worn Camera: Dataset and Baseline Algorithms

Authors: Ali Almadan, Anoop Krishnan, Ajita Rattani

Abstract:

Facial recognition is a widely adopted technology in surveillance, border control, healthcare, banking services, and lately, in mobile user authentication with Apple introducing “Face ID” moniker with iPhone X. A lot of research has been conducted in the area of face recognition on datasets captured by surveillance cameras, DSLR, and mobile devices. Recently, face recognition technology has also been deployed on body-worn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic so far, without the availability of any publicly available datasets with a sufficient sample size. This paper aims to advance research in the area of face recognition using body-worn cameras. To this aim, the contribution of this work is two-fold: (1) collection of a dataset consisting of a total of 136,939 facial images of 102 subjects captured using body-worn cameras in in-door and daylight conditions and (2) evaluation of various deep-learning architectures for face identification on the collected dataset. Experimental results suggest a maximum True Positive Rate(TPR) of 99.86% at False Positive Rate(FPR) of 0.000 obtained by SphereFace based deep learning architecture in daylight condition. The collected dataset and the baseline algorithms will promote further research and development. A downloadable link of the dataset and the algorithms is available by contacting the authors.

Keywords: face recognition, body-worn cameras, deep learning, person identification

Procedia PDF Downloads 137