Search results for: statistical classifiers
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4052

Search results for: statistical classifiers

3812 Parallel Fuzzy Rough Support Vector Machine for Data Classification in Cloud Environment

Authors: Arindam Chaudhuri

Abstract:

Classification of data has been actively used for most effective and efficient means of conveying knowledge and information to users. The prima face has always been upon techniques for extracting useful knowledge from data such that returns are maximized. With emergence of huge datasets the existing classification techniques often fail to produce desirable results. The challenge lies in analyzing and understanding characteristics of massive data sets by retrieving useful geometric and statistical patterns. We propose a supervised parallel fuzzy rough support vector machine (PFRSVM) for data classification in cloud environment. The classification is performed by PFRSVM using hyperbolic tangent kernel. The fuzzy rough set model takes care of sensitiveness of noisy samples and handles impreciseness in training samples bringing robustness to results. The membership function is function of center and radius of each class in feature space and is represented with kernel. It plays an important role towards sampling the decision surface. The success of PFRSVM is governed by choosing appropriate parameter values. The training samples are either linear or nonlinear separable. The different input points make unique contributions to decision surface. The algorithm is parallelized with a view to reduce training times. The system is built on support vector machine library using Hadoop implementation of MapReduce. The algorithm is tested on large data sets to check its feasibility and convergence. The performance of classifier is also assessed in terms of number of support vectors. The challenges encountered towards implementing big data classification in machine learning frameworks are also discussed. The experiments are done on the cloud environment available at University of Technology and Management, India. The results are illustrated for Gaussian RBF and Bayesian kernels. The effect of variability in prediction and generalization of PFRSVM is examined with respect to values of parameter C. It effectively resolves outliers’ effects, imbalance and overlapping class problems, normalizes to unseen data and relaxes dependency between features and labels. The average classification accuracy for PFRSVM is better than other classifiers for both Gaussian RBF and Bayesian kernels. The experimental results on both synthetic and real data sets clearly demonstrate the superiority of the proposed technique.

Keywords: FRSVM, Hadoop, MapReduce, PFRSVM

Procedia PDF Downloads 471
3811 Off-Topic Text Detection System Using a Hybrid Model

Authors: Usama Shahid

Abstract:

Be it written documents, news columns, or students' essays, verifying the content can be a time-consuming task. Apart from the spelling and grammar mistakes, the proofreader is also supposed to verify whether the content included in the essay or document is relevant or not. The irrelevant content in any document or essay is referred to as off-topic text and in this paper, we will address the problem of off-topic text detection from a document using machine learning techniques. Our study aims to identify the off-topic content from a document using Echo state network model and we will also compare data with other models. The previous study uses Convolutional Neural Networks and TFIDF to detect off-topic text. We will rearrange the existing datasets and take new classifiers along with new word embeddings and implement them on existing and new datasets in order to compare the results with the previously existing CNN model.

Keywords: off topic, text detection, eco state network, machine learning

Procedia PDF Downloads 60
3810 Recommendations Using Online Water Quality Sensors for Chlorinated Drinking Water Monitoring at Drinking Water Distribution Systems Exposed to Glyphosate

Authors: Angela Maria Fasnacht

Abstract:

Detection of anomalies due to contaminants’ presence, also known as early detection systems in water treatment plants, has become a critical point that deserves an in-depth study for their improvement and adaptation to current requirements. The design of these systems requires a detailed analysis and processing of the data in real-time, so it is necessary to apply various statistical methods appropriate to the data generated, such as Spearman’s Correlation, Factor Analysis, Cross-Correlation, and k-fold Cross-validation. Statistical analysis and methods allow the evaluation of large data sets to model the behavior of variables; in this sense, statistical treatment or analysis could be considered a vital step to be able to develop advanced models focused on machine learning that allows optimized data management in real-time, applied to early detection systems in water treatment processes. These techniques facilitate the development of new technologies used in advanced sensors. In this work, these methods were applied to identify the possible correlations between the measured parameters and the presence of the glyphosate contaminant in the single-pass system. The interaction between the initial concentration of glyphosate and the location of the sensors on the reading of the reported parameters was studied.

Keywords: glyphosate, emergent contaminants, machine learning, probes, sensors, predictive

Procedia PDF Downloads 94
3809 A Statistical Approach to Classification of Agricultural Regions

Authors: Hasan Vural

Abstract:

Turkey is a favorable country to produce a great variety of agricultural products because of her different geographic and climatic conditions which have been used to divide the country into four main and seven sub regions. This classification into seven regions traditionally has been used in order to data collection and publication especially related with agricultural production. Afterwards, nine agricultural regions were considered. Recently, the governmental body which is responsible of data collection and dissemination (Turkish Institute of Statistics-TIS) has used 12 classes which include 11 sub regions and Istanbul province. This study aims to evaluate these classification efforts based on the acreage of ten main crops in a ten years time period (1996-2005). The panel data grouped in 11 subregions has been evaluated by cluster and multivariate statistical methods. It was concluded that from the agricultural production point of view, it will be rather meaningful to consider three main and eight sub-agricultural regions throughout the country.

Keywords: agricultural region, factorial analysis, cluster analysis,

Procedia PDF Downloads 384
3808 Foot Recognition Using Deep Learning for Knee Rehabilitation

Authors: Rakkrit Duangsoithong, Jermphiphut Jaruenpunyasak, Alba Garcia

Abstract:

The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.

Keywords: foot recognition, deep learning, knee rehabilitation, convolutional neural network

Procedia PDF Downloads 132
3807 Evaluation of the Factors Affecting Violence Against Women (Case Study: Couples Referring to Family Counseling Centers in Tehran)

Authors: Hassan Manouchehri

Abstract:

The present study aimed to identify and evaluate the factors affecting violence against women. The statistical population included all couples referring to family counseling centers in Tehran due to domestic violence during the past year. A number of 305 people were selected as a statistical sample using simple random sampling and Cochran's formula in unlimited conditions. A researcher-made questionnaire including 110 items was used for data collection. The face validity and content validity of the questionnaire were confirmed by 30 experts and its reliability was obtained above 0.7 for all studied variables in a preliminary test with 30 subjects and it was acceptable. In order to analyze the data, descriptive statistical methods were used with SPSS software version 22 and inferential statistics were used for modeling structural equations in Smart PLS software version 2. Evaluating the theoretical framework and domestic and foreign studies indicated that, in general, four main factors, including cultural and social factors, economic factors, legal factors, as well as medical factors, underlie violence against women. In addition, structural equation modeling findings indicated that cultural and social factors, economic factors, legal factors, and medical factors affect violence against women.

Keywords: violence against women, cultural and social factors, economic factors, legal factors, medical factors

Procedia PDF Downloads 115
3806 Electroencephalogram Based Alzheimer Disease Classification using Machine and Deep Learning Methods

Authors: Carlos Roncero-Parra, Alfonso Parreño-Torres, Jorge Mateo Sotos, Alejandro L. Borja

Abstract:

In this research, different methods based on machine/deep learning algorithms are presented for the classification and diagnosis of patients with mental disorders such as alzheimer. For this purpose, the signals obtained from 32 unipolar electrodes identified by non-invasive EEG were examined, and their basic properties were obtained. More specifically, different well-known machine learning based classifiers have been used, i.e., support vector machine (SVM), Bayesian linear discriminant analysis (BLDA), decision tree (DT), Gaussian Naïve Bayes (GNB), K-nearest neighbor (KNN) and Convolutional Neural Network (CNN). A total of 668 patients from five different hospitals have been studied in the period from 2011 to 2021. The best accuracy is obtained was around 93 % in both ADM and ADA classifications. It can be concluded that such a classification will enable the training of algorithms that can be used to identify and classify different mental disorders with high accuracy.

Keywords: alzheimer, machine learning, deep learning, EEG

Procedia PDF Downloads 93
3805 EEG-Based Screening Tool for School Student’s Brain Disorders Using Machine Learning Algorithms

Authors: Abdelrahman A. Ramzy, Bassel S. Abdallah, Mohamed E. Bahgat, Sarah M. Abdelkader, Sherif H. ElGohary

Abstract:

Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.

Keywords: ADHD, autism, epilepsy, EEG, SVM

Procedia PDF Downloads 166
3804 Predictive Maintenance of Industrial Shredders: Efficient Operation through Real-Time Monitoring Using Statistical Machine Learning

Authors: Federico Pittino, Thomas Arnold

Abstract:

The shredding of waste materials is a key step in the recycling process towards the circular economy. Industrial shredders for waste processing operate in very harsh operating conditions, leading to the need for frequent maintenance of critical components. Maintenance optimization is particularly important also to increase the machine’s efficiency, thereby reducing the operational costs. In this work, a monitoring system has been developed and deployed on an industrial shredder located at a waste recycling plant in Austria. The machine has been monitored for one year, and methods for predictive maintenance have been developed for two key components: the cutting knives and the drive belt. The large amount of collected data is leveraged by statistical machine learning techniques, thereby not requiring very detailed knowledge of the machine or its live operating conditions. The results show that, despite the wide range of operating conditions, a reliable estimate of the optimal time for maintenance can be derived. Moreover, the trade-off between the cost of maintenance and the increase in power consumption due to the wear state of the monitored components of the machine is investigated. This work proves the benefits of real-time monitoring system for the efficient operation of industrial shredders.

Keywords: predictive maintenance, circular economy, industrial shredder, cost optimization, statistical machine learning

Procedia PDF Downloads 103
3803 Room Level Indoor Localization Using Relevant Channel Impulse Response Parameters

Authors: Raida Zouari, Iness Ahriz, Rafik Zayani, Ali Dziri, Ridha Bouallegue

Abstract:

This paper proposes a room level indoor localization algorithm based on the use Multi-Layer Neural Network (MLNN) classifiers and one versus one strategy. Seven parameters of the Channel Impulse Response (CIR) were used and Gram-Shmidt Orthogonalization was performed to study the relevance of the extracted parameters. Simulation results show that when relevant CIR parameters are used as position fingerprint and when optimal MLNN architecture is selected good room level localization score can be achieved. The current study showed also that some of the CIR parameters are not correlated to the location and can decrease the localization performance of the system.

Keywords: mobile indoor localization, multi-layer neural network (MLNN), channel impulse response (CIR), Gram-Shmidt orthogonalization

Procedia PDF Downloads 333
3802 Identification of Breast Anomalies Based on Deep Convolutional Neural Networks and K-Nearest Neighbors

Authors: Ayyaz Hussain, Tariq Sadad

Abstract:

Breast cancer (BC) is one of the widespread ailments among females globally. The early prognosis of BC can decrease the mortality rate. Exact findings of benign tumors can avoid unnecessary biopsies and further treatments of patients under investigation. However, due to variations in images, it is a tough job to isolate cancerous cases from normal and benign ones. The machine learning technique is widely employed in the classification of BC pattern and prognosis. In this research, a deep convolution neural network (DCNN) called AlexNet architecture is employed to get more discriminative features from breast tissues. To achieve higher accuracy, K-nearest neighbor (KNN) classifiers are employed as a substitute for the softmax layer in deep learning. The proposed model is tested on a widely used breast image database called MIAS dataset for experimental purposes and achieved 99% accuracy.

Keywords: breast cancer, DCNN, KNN, mammography

Procedia PDF Downloads 110
3801 Dicotyledon Weed Quantification Algorithm for Selective Herbicide Application in Maize Crops: Statistical Evaluation of the Potential Herbicide Savings

Authors: Morten Stigaard Laursen, Rasmus Nyholm Jørgensen, Henrik Skov Midtiby, Anders Krogh Mortensen, Sanmohan Baby

Abstract:

This work contributes a statistical model and simulation framework yielding the best estimate possible for the potential herbicide reduction when using the MoDiCoVi algorithm all the while requiring a efficacy comparable to conventional spraying. In June 2013 a maize field located in Denmark were seeded. The field was divided into parcels which was assigned to one of two main groups: 1) Control, consisting of subgroups of no spray and full dose spraty; 2) MoDiCoVi algorithm subdivided into five different leaf cover thresholds for spray activation. In addition approximately 25% of the parcels were seeded with additional weeds perpendicular to the maize rows. In total 299 parcels were randomly assigned with the 28 different treatment combinations. In the statistical analysis, bootstrapping was used for balancing the number of replicates. The achieved potential herbicide savings was found to be 70% to 95% depending on the initial weed coverage. However additional field trials covering more seasons and locations are needed to verify the generalisation of these results. There is a potential for further herbicide savings as the time interval between the first and second spraying session was not long enough for the weeds to turn yellow, instead they only stagnated in growth.

Keywords: herbicide reduction, macrosprayer, weed crop discrimination, site-specific, sprayer boom

Procedia PDF Downloads 277
3800 A Molding Surface Auto-inspection System

Authors: Ssu-Han Chen, Der-Baau Perng

Abstract:

Molding process in IC manufacturing secures chips against the harms done by hot, moisture or other external forces. While a chip was being molded, defects like cracks, dilapidation, or voids may be embedding on the molding surface. The molding surfaces the study poises to treat and the ones on the market, though, differ in the surface where texture similar to defects is everywhere. Manual inspection usually passes over low-contrast cracks or voids; hence an automatic optical inspection system for molding surface is necessary. The proposed system is consisted of a CCD, a coaxial light, a back light as well as a motion control unit. Based on the property of statistical textures of the molding surface, a series of digital image processing and classification procedure is carried out. After training of the parameter associated with above algorithm, result of the experiment suggests that the accuracy rate is up to 93.75%, contributing to the inspection quality of IC molding surface.

Keywords: molding surface, machine vision, statistical texture, discrete Fourier transformation

Procedia PDF Downloads 407
3799 The Effect of Excel on Undergraduate Students’ Understanding of Statistics and the Normal Distribution

Authors: Masomeh Jamshid Nejad

Abstract:

Nowadays, statistical literacy is no longer a necessary skill but an essential skill with broad applications across diverse fields, especially in operational decision areas such as business management, finance, and economics. As such, learning and deep understanding of statistical concepts are essential in the context of business studies. One of the crucial topics in statistical theory and its application is the normal distribution, often called a bell-shaped curve. To interpret data and conduct hypothesis tests, comprehending the properties of normal distribution (the mean and standard deviation) is essential for business students. This requires undergraduate students in the field of economics and business management to visualize and work with data following a normal distribution. Since technology is interconnected with education these days, it is important to teach statistics topics in the context of Python, R-studio, and Microsoft Excel to undergraduate students. This research endeavours to shed light on the effect of Excel-based instruction on learners’ knowledge of statistics, specifically the central concept of normal distribution. As such, two groups of undergraduate students (from the Business Management program) were compared in this research study. One group underwent Excel-based instruction and another group relied only on traditional teaching methods. We analyzed experiential data and BBA participants’ responses to statistic-related questions focusing on the normal distribution, including its key attributes, such as the mean and standard deviation. The results of our study indicate that exposing students to Excel-based learning supports learners in comprehending statistical concepts more effectively compared with the other group of learners (teaching with the traditional method). In addition, students in the context of Excel-based instruction showed ability in picturing and interpreting data concentrated on normal distribution.

Keywords: statistics, excel-based instruction, data visualization, pedagogy

Procedia PDF Downloads 33
3798 Identifying and Ranking Environmental Risks of Oil and Gas Projects Using the VIKOR Method for Multi-Criteria Decision Making

Authors: Sasan Aryaee, Mahdi Ravanshadnia

Abstract:

Naturally, any activity is associated with risk, and humans have understood this concept from very long times ago and seek to identify its factors and sources. On the one hand, proper risk management can cause problems such as delays and unforeseen costs in the development projects, temporary or permanent loss of services, getting lost or information theft, complexity and limitations in processes, unreliable information caused by rework, holes in the systems and many such problems. In the present study, a model has been presented to rank the environmental risks of oil and gas projects. The statistical population of the study consists of all executives active in the oil and gas fields, that the statistical sample is selected randomly. In the framework of the proposed method, environmental risks of oil and gas projects were first extracted, then a questionnaire based on these indicators was designed based on Likert scale and distributed among the statistical sample. After assessing the validity and reliability of the questionnaire, environmental risks of oil and gas projects were ranked using the VIKOR method of multiple-criteria decision-making. The results showed that the best options for HSE planning of oil and gas projects that caused the reduction of risks and personal injury and casualties and less than other options is costly for the project and it will add less time to the duration of implementing the project is the entering of dye to the environment when painting the generator pond and the presence of the rigger near the crane.

Keywords: ranking, multi-criteria decision making, oil and gas projects, HSEmanagement, environmental risks

Procedia PDF Downloads 130
3797 Emotional Security in Relation to Students' Emotional Efficiency

Authors: Ibtisam Mahmoud Mohammed Sultan

Abstract:

The present research aimed to identify the level of both emotional and emotional competence among students in Tikrit University aimed to know the assumptions in statistical significance for both variables as gender variables (m-f) and specialty (scientific-humanistic), as research to learn what Relationship between emotional safety and efficiency alanfaalet Tikrit University students. The researcher built emotional security measure (54) as built measure emotional competence (46), as the researcher extract full alsaykomtrih characteristics of both scales. The research sample consisted of (600) students selected by the random way and applying the scales on a basic search sample and processed statistical data using a variety of methods, including statistical test (test T.) and Pearson correlation coefficient, the researcher found a set of results. The following: 1. that the Tikrit University students possess a high level of emotional security. 2. to safely enjoy passionate males more than females. 3. that there is no difference between students of scientific and humanitarian specialization in variable emotional security. 4. that the Tikrit University students enjoy a high level of emotional competence. 5. the female-male outperforming in emotional competence level. 6. the humanitarian specialization students Excel in emotional competence for those of specialty. 7. the existence of a positive correlation between variables. Through search results, the researcher has developed a set of conclusions, proposals, and recommendations.

Keywords: relation, emotional security, students, efficiency

Procedia PDF Downloads 97
3796 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 491
3795 Statistical Analysis to Select Evacuation Route

Authors: Zaky Musyarof, Dwi Yono Sutarto, Dwima Rindy Atika, R. B. Fajriya Hakim

Abstract:

Each country should be responsible for the safety of people, especially responsible for the safety of people living in disaster-prone areas. One of those services is provides evacuation route for them. But all this time, the selection of evacuation route is seem doesn’t well organized, it could be seen that when a disaster happen, there will be many accumulation of people on the steps of evacuation route. That condition is dangerous to people because hampers evacuation process. By some methods in Statistical analysis, author tries to give a suggestion how to prepare evacuation route which is organized and based on people habit. Those methods are association rules, sequential pattern mining, hierarchical cluster analysis and fuzzy logic.

Keywords: association rules, sequential pattern mining, cluster analysis, fuzzy logic, evacuation route

Procedia PDF Downloads 475
3794 Analytical and Statistical Study of the Parameters of Expansive Soil

Authors: A. Medjnoun, R. Bahar

Abstract:

The disorders caused by the shrinking-swelling phenomenon are prevalent in arid and semi-arid in the presence of swelling clay. This soil has the characteristic of changing state under the effect of water solicitation (wetting and drying). A set of geotechnical parameters is necessary for the characterization of this soil type, such as state parameters, physical and chemical parameters and mechanical parameters. Some of these tests are very long and some are very expensive, hence the use or methods of predictions. The complexity of this phenomenon and the difficulty of its characterization have prompted researchers to use several identification parameters in the prediction of swelling potential. This document is an analytical and statistical study of geotechnical parameters affecting the potential of swelling clays. This work is performing on a database obtained from investigations swelling Algerian soil. The obtained observations have helped us to understand the soil swelling structure and its behavior.

Keywords: analysis, estimated model, parameter identification, swelling of clay

Procedia PDF Downloads 382
3793 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 429
3792 A Computer-Aided System for Detection and Classification of Liver Cirrhosis

Authors: Abdel Hadi N. Ebraheim, Eman Azomi, Nefisa A. Fahmy

Abstract:

This paper designs and implements a computer-aided system (CAS) to help detect and diagnose liver cirrhosis in patients with Chronic Hepatitis C. Our system reduces the required features (tests) the patient is asked to do to tests to their minimal best most informative subset of tests, with a diagnostic accuracy above 99%, and hence saving both time and costs. We use the Support Vector Machine (SVM) with cross-validation, a Multilayer Perceptron Neural Network (MLP), and a Generalized Regression Neural Network (GRNN) that employs a base of radial functions for functional approximation, as classifiers. Our system is tested on 199 subjects, of them 99 Chronic Hepatitis C.The subjects were selected from among the outpatient clinic in National Herpetology and Tropical Medicine Research Institute (NHTMRI).

Keywords: liver cirrhosis, artificial neural network, support vector machine, multi-layer perceptron, classification, accuracy

Procedia PDF Downloads 427
3791 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm

Authors: Kamel Belammi, Houria Fatrim

Abstract:

imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.

Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes

Procedia PDF Downloads 506
3790 Comparing Deep Architectures for Selecting Optimal Machine Translation

Authors: Despoina Mouratidis, Katia Lida Kermanidis

Abstract:

Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.

Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification

Procedia PDF Downloads 106
3789 Fat-Tail Test of Regulatory DNA Sequences

Authors: Jian-Jun Shu

Abstract:

The statistical properties of CRMs are explored by estimating similar-word set occurrence distribution. It is observed that CRMs tend to have a fat-tail distribution for similar-word set occurrence. Thus, the fat-tail test with two fatness coefficients is proposed to distinguish CRMs from non-CRMs, especially from exons. For the first fatness coefficient, the separation accuracy between CRMs and exons is increased as compared with the existing content-based CRM prediction method – fluffy-tail test. For the second fatness coefficient, the computing time is reduced as compared with fluffy-tail test, making it very suitable for long sequences and large data-base analysis in the post-genome time. Moreover, these indexes may be used to predict the CRMs which have not yet been observed experimentally. This can serve as a valuable filtering process for experiment.

Keywords: statistical approach, transcription factor binding sites, cis-regulatory modules, DNA sequences

Procedia PDF Downloads 267
3788 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’

Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell

Abstract:

Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus.

Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML

Procedia PDF Downloads 106
3787 Wireless Sensor Anomaly Detection Using Soft Computing

Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh

Abstract:

We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.

Keywords: IDS, Machine learning, WSN, ZigBee technology

Procedia PDF Downloads 517
3786 A Framework for ERP Project Evaluation Based on BSC Model: A Study in Iran

Authors: Mohammad Reza Ostad Ali Naghi Kashani, Esfanji Elia

Abstract:

Nowadays, the amounts of companies which tend to have an Enterprise Resource Planning (ERP) application are increasing particularly in developing countries like Iran. ERP projects are expensive, time consuming, and complex, in addition the failure rate is high among these projects. It is important to know whether these projects could meet their goals or not. Furthermore, the area which should be improved should be identified. In this paper we made a framework to evaluate ERP projects success implementation. First, based on literature review we made a framework based on BSC model, financial, customer, processes, learning and knowledge, because of the importance of change management it was added to model. Then an organization was divided in three layers. We choose corporate, managerial, and operational levels. Then to find criteria to assess each aspect, we use Delphi method in two rounds. And for the second round we made a questionnaire and did some statistical tasks on them. Based on the statistical results some of them are accepted and others are rejected.

Keywords: ERP, BSC, ERP project evaluation, IT projects

Procedia PDF Downloads 300
3785 Experimental Investigation of On-Body Channel Modelling at 2.45 GHz

Authors: Hasliza A. Rahim, Fareq Malek, Nur A. M. Affendi, Azuwa Ali, Norshafinash Saudin, Latifah Mohamed

Abstract:

This paper presents the experimental investigation of on-body channel fading at 2.45 GHz considering two effects of the user body movement; stationary and mobile. A pair of body-worn antennas was utilized in this measurement campaign. A statistical analysis was performed by comparing the measured on-body path loss to five well-known distributions; lognormal, normal, Nakagami, Weibull and Rayleigh. The results showed that the average path loss of moving arm varied higher than the path loss in sitting position for upper-arm-to-left-chest link, up to 3.5 dB. The analysis also concluded that the Nakagami distribution provided the best fit for most of on-body static link path loss in standing still and sitting position, while the arm movement can be best described by log-normal distribution.

Keywords: on-body channel communications, fading characteristics, statistical model, body movement

Procedia PDF Downloads 327
3784 Social Anxiety Connection with Individual Characteristics: Theory of Mind, Verbal Irony Comprehension and Personal Traits

Authors: Anano Tenieshvili, Teona Lodia

Abstract:

Social anxiety disorder (SAD) is one of the most common mental health problems not only in adults but also in adolescents. Individuals with SAD exhibit difficulties in interpersonal relationships, understanding emotions, and regulating them as well. For social and emotional adaptation, it is crucial to identify, understand, accept and manage emotions correctly. Researchers actively learn those factors that contribute to the development and maintenance of this condition. Therefore, the main purpose of this study is to acquire knowledge about the association between social anxiety and individual characteristics, such as theory of mind (ToM), verbal irony comprehension, and personal traits. 112 adolescents aged from 12 to 18 were selected for this research. 15 of them are diagnosed with Social anxiety disorder. Statistical analysis was performed on the entire sample, and furthermore, two groups, adolescents with and without social anxiety disorder, were compared separately. Social anxiety and personal traits were assessed by questionnaires. Theory of mind and comprehension of verbal irony were measured using tests. Statistical analysis indicated a positive relationship between social anxiety and comprehension of ironic criticism. Moreover, social anxiety was significantly positively correlated with neuroticism and isolation tendency, whereas it was negatively related to extraversion and frustration tolerance. On top of that, statistical analysis revealed a positive relationship between ToM and verbal irony comprehension. However, the relationship between social anxiety and ToM was not statistically significant. In conclusion, the current research expands knowledge about social anxiety and supports the results of some previous studies.

Keywords: personal traits, social anxiety, theory of mind, verbal irony comprehension

Procedia PDF Downloads 170
3783 Social Anxiety Connection with Individual Characteristics: Theory of Mind, Verbal Irony Comprehension and Personal Traits

Authors: Anano Tenieshvili, Teona Lodia

Abstract:

Social anxiety disorder (SAD) is one of the most common mental health problems not only in adults but also in adolescents. Individuals with SAD exhibit difficulties in interpersonal relationships, understanding emotions and regulating them as well. For social and emotional adaptation, it is crucial to identify, understand, accept and manage emotions correctly. Researchers actively learn those factors that contribute to the development and maintenance of this condition. Therefore, the main purpose of this study is to acquire knowledge about the association between social anxiety and individual characteristics, such as the theory of mind (ToM), verbal irony comprehension and personal traits. 112 adolescents aged from 12 to 18 were selected for this research. 15 of them are diagnosed with Social anxiety disorder. Statistical analysis was performed on the entire sample and furthermore, two groups, adolescents with and without a social anxiety disorder, were compared separately. Social anxiety and personal traits were assessed by questionnaires. Theory of mind and comprehension of verbal irony was measured using tests. Statistical analysis indicated a positive relationship between social anxiety and comprehension of ironic criticism. Moreover, social anxiety was significantly positively correlated with neuroticism and isolation tendency, whereas it was negatively related to extraversion and frustration tolerance. On top of that, statistical analysis revealed a positive relationship between ToM and verbal irony comprehension. However, the relationship between social anxiety and ToM was not statistically significant. In conclusion, the current research expands knowledge about social anxiety and supports the results of some previous studies.

Keywords: personal traits, social anxiety, theory of mind, verbal irony comprehension

Procedia PDF Downloads 96