Search results for: statistical classifiers
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4168

Search results for: statistical classifiers

3898 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 319
3897 Fat-Tail Test of Regulatory DNA Sequences

Authors: Jian-Jun Shu

Abstract:

The statistical properties of CRMs are explored by estimating similar-word set occurrence distribution. It is observed that CRMs tend to have a fat-tail distribution for similar-word set occurrence. Thus, the fat-tail test with two fatness coefficients is proposed to distinguish CRMs from non-CRMs, especially from exons. For the first fatness coefficient, the separation accuracy between CRMs and exons is increased as compared with the existing content-based CRM prediction method – fluffy-tail test. For the second fatness coefficient, the computing time is reduced as compared with fluffy-tail test, making it very suitable for long sequences and large data-base analysis in the post-genome time. Moreover, these indexes may be used to predict the CRMs which have not yet been observed experimentally. This can serve as a valuable filtering process for experiment.

Keywords: statistical approach, transcription factor binding sites, cis-regulatory modules, DNA sequences

Procedia PDF Downloads 289
3896 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti

Abstract:

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by density-based time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., mean value, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one class classifier (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, a new anomaly detector strategy is proposed, namely one class classifier neural network two (OCCNN2), which exploit the classification capability of standard classifiers in an anomaly detection problem, finding the standard class (the boundary of the features space in normal operating conditions) through a two-step approach: coarse and fine boundary estimation. The coarse estimation uses classics OCC techniques, while the fine estimation is performed through a feedforward neural network (NN) trained that exploits the boundaries estimated in the coarse step. The detection algorithms vare then compared with known methods based on principal component analysis (PCA), kernel principal component analysis (KPCA), and auto-associative neural network (ANN). In many cases, the proposed solution increases the performance with respect to the standard OCC algorithms in terms of F1 score and accuracy. In particular, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 96% with the proposed method.

Keywords: anomaly detection, frequencies selection, modal analysis, neural network, sensor network, structural health monitoring, vibration measurement

Procedia PDF Downloads 122
3895 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 349
3894 A Framework for ERP Project Evaluation Based on BSC Model: A Study in Iran

Authors: Mohammad Reza Ostad Ali Naghi Kashani, Esfanji Elia

Abstract:

Nowadays, the amounts of companies which tend to have an Enterprise Resource Planning (ERP) application are increasing particularly in developing countries like Iran. ERP projects are expensive, time consuming, and complex, in addition the failure rate is high among these projects. It is important to know whether these projects could meet their goals or not. Furthermore, the area which should be improved should be identified. In this paper we made a framework to evaluate ERP projects success implementation. First, based on literature review we made a framework based on BSC model, financial, customer, processes, learning and knowledge, because of the importance of change management it was added to model. Then an organization was divided in three layers. We choose corporate, managerial, and operational levels. Then to find criteria to assess each aspect, we use Delphi method in two rounds. And for the second round we made a questionnaire and did some statistical tasks on them. Based on the statistical results some of them are accepted and others are rejected.

Keywords: ERP, BSC, ERP project evaluation, IT projects

Procedia PDF Downloads 322
3893 Experimental Investigation of On-Body Channel Modelling at 2.45 GHz

Authors: Hasliza A. Rahim, Fareq Malek, Nur A. M. Affendi, Azuwa Ali, Norshafinash Saudin, Latifah Mohamed

Abstract:

This paper presents the experimental investigation of on-body channel fading at 2.45 GHz considering two effects of the user body movement; stationary and mobile. A pair of body-worn antennas was utilized in this measurement campaign. A statistical analysis was performed by comparing the measured on-body path loss to five well-known distributions; lognormal, normal, Nakagami, Weibull and Rayleigh. The results showed that the average path loss of moving arm varied higher than the path loss in sitting position for upper-arm-to-left-chest link, up to 3.5 dB. The analysis also concluded that the Nakagami distribution provided the best fit for most of on-body static link path loss in standing still and sitting position, while the arm movement can be best described by log-normal distribution.

Keywords: on-body channel communications, fading characteristics, statistical model, body movement

Procedia PDF Downloads 353
3892 Sub-Pixel Level Classification Using Remote Sensing For Arecanut Crop

Authors: S. Athiralakshmi, B.E. Bhojaraja, U. Pruthviraj

Abstract:

In agriculture, remote sensing is applied for monitoring of plant development, evaluating of physiological processes and growth conditions. Especially valuable are the spatio-temporal aspects of the remotely sensed data in detecting crop state differences and stress situations. In this study, hyperion imagery is used for classifying arecanut crops based on their age so that these maps can be used in yield estimation of crops, irrigation purposes, applying fertilizers etc. Traditional hard classifiers assigns the mixed pixels to the dominant classes. The proposed method uses a sub pixel level classifier called linear spectral unmixing available in ENVI software. It provides the relative abundance of surface materials and the context within a pixel that may be a potential solution to effectively identifying the land-cover distribution. Validation is done referring to field spectra collected using spectroradiometer and the ground control points obtained from GPS.

Keywords: FLAASH, Hyperspectral remote sensing, Linear Spectral Unmixing, Spectral Angle Mapper Classifier.

Procedia PDF Downloads 518
3891 Social Anxiety Connection with Individual Characteristics: Theory of Mind, Verbal Irony Comprehension and Personal Traits

Authors: Anano Tenieshvili, Teona Lodia

Abstract:

Social anxiety disorder (SAD) is one of the most common mental health problems not only in adults but also in adolescents. Individuals with SAD exhibit difficulties in interpersonal relationships, understanding emotions, and regulating them as well. For social and emotional adaptation, it is crucial to identify, understand, accept and manage emotions correctly. Researchers actively learn those factors that contribute to the development and maintenance of this condition. Therefore, the main purpose of this study is to acquire knowledge about the association between social anxiety and individual characteristics, such as theory of mind (ToM), verbal irony comprehension, and personal traits. 112 adolescents aged from 12 to 18 were selected for this research. 15 of them are diagnosed with Social anxiety disorder. Statistical analysis was performed on the entire sample, and furthermore, two groups, adolescents with and without social anxiety disorder, were compared separately. Social anxiety and personal traits were assessed by questionnaires. Theory of mind and comprehension of verbal irony were measured using tests. Statistical analysis indicated a positive relationship between social anxiety and comprehension of ironic criticism. Moreover, social anxiety was significantly positively correlated with neuroticism and isolation tendency, whereas it was negatively related to extraversion and frustration tolerance. On top of that, statistical analysis revealed a positive relationship between ToM and verbal irony comprehension. However, the relationship between social anxiety and ToM was not statistically significant. In conclusion, the current research expands knowledge about social anxiety and supports the results of some previous studies.

Keywords: personal traits, social anxiety, theory of mind, verbal irony comprehension

Procedia PDF Downloads 201
3890 Social Anxiety Connection with Individual Characteristics: Theory of Mind, Verbal Irony Comprehension and Personal Traits

Authors: Anano Tenieshvili, Teona Lodia

Abstract:

Social anxiety disorder (SAD) is one of the most common mental health problems not only in adults but also in adolescents. Individuals with SAD exhibit difficulties in interpersonal relationships, understanding emotions and regulating them as well. For social and emotional adaptation, it is crucial to identify, understand, accept and manage emotions correctly. Researchers actively learn those factors that contribute to the development and maintenance of this condition. Therefore, the main purpose of this study is to acquire knowledge about the association between social anxiety and individual characteristics, such as the theory of mind (ToM), verbal irony comprehension and personal traits. 112 adolescents aged from 12 to 18 were selected for this research. 15 of them are diagnosed with Social anxiety disorder. Statistical analysis was performed on the entire sample and furthermore, two groups, adolescents with and without a social anxiety disorder, were compared separately. Social anxiety and personal traits were assessed by questionnaires. Theory of mind and comprehension of verbal irony was measured using tests. Statistical analysis indicated a positive relationship between social anxiety and comprehension of ironic criticism. Moreover, social anxiety was significantly positively correlated with neuroticism and isolation tendency, whereas it was negatively related to extraversion and frustration tolerance. On top of that, statistical analysis revealed a positive relationship between ToM and verbal irony comprehension. However, the relationship between social anxiety and ToM was not statistically significant. In conclusion, the current research expands knowledge about social anxiety and supports the results of some previous studies.

Keywords: personal traits, social anxiety, theory of mind, verbal irony comprehension

Procedia PDF Downloads 122
3889 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

Authors: Cristina Vatamanu, Doina Cosovan, Dragos Gavrilut, Henri Luchian

Abstract:

In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through semi-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.

Keywords: ensembles, false positives, feature selection, one side class algorithm

Procedia PDF Downloads 291
3888 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 133
3887 The Predictive Role of Attachment and Adjustment in the Decision-Making Process in Infertility

Authors: A. Luli, A. Santona

Abstract:

It is rare for individuals that are involved in a relationship to think about the possibility of having procreation problems in the near present or in the future. However, infertility is a condition that affects millions of people all around the world. Often, infertile individuals have to deal with experiences of psychological, relational and social problems. In these cases, they have to review their choices and take into consideration, if it is necessary, new ones. Different studies have examined the different decisions that infertile individuals have to go through dealing with infertility and its treatment, but none of them is focused on the decision-making style used by infertile individuals to solve their problem and on the factors that influences it. The aim of this paper is to define the style of decision-making used by infertile persons to give a solution to the ‘problem’ and the potential predictive role of the attachment and of the dyadic adjustment. The total sample is composed by 251 participants, divided in two groups: the experimental group composed by 114 participants, 62 males and 52 females, age between 25 and 59 years, and the control group composed by 137 participants, 65 males and 72 females, age between 22 and 49 years. The battery of instruments used is composed by: the General Decision Making Style (GDMS), the Experiences in Close Relationships Questionnaire Revised (ECR-R), Dyadic Adjustment Scale (DAS), and the Symptom Checklist-90-R (SCL-90-R). The results from the analysis of the samples showed a prevalence of the rational decision-making style for both males and females. No significant statistical difference was found between the experimental and control group. Also the analyses showed a significant statistical relationship between the decision making styles and the adult attachment styles for both males and females. In this case, only for males, there was a significant statistical difference between the experimental and the control group. Another significant statistical relationship was founded between the decision making styles and the adjustment scales for both males and females. Also in this case, the difference between the two groups was founded to be significant only of males. These results contribute to enrich the literature on the subject of decision-making styles in infertile individuals, showing also the predictive role of the attachment styles and the adjustment, confirming in this was the few results in the literature.

Keywords: adjustment, attachment, decision-making style, infertility

Procedia PDF Downloads 332
3886 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 181
3885 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles

Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi

Abstract:

Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.

Keywords: artificial neural networks, fuel consumption, friedman test, machine learning, statistical hypothesis testing

Procedia PDF Downloads 178
3884 An AK-Chart for the Non-Normal Data

Authors: Chia-Hau Liu, Tai-Yue Wang

Abstract:

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data

Procedia PDF Downloads 421
3883 Jointly Optimal Statistical Process Control and Maintenance Policy for Deteriorating Processes

Authors: Lucas Paganin, Viliam Makis

Abstract:

With the advent of globalization, the market competition has become a major issue for most companies. One of the main strategies to overcome this situation is the quality improvement of the product at a lower cost to meet customers’ expectations. In order to achieve the desired quality of products, it is important to control the process to meet the specifications, and to implement the optimal maintenance policy for the machines and the production lines. Thus, the overall objective is to reduce process variation and the production and maintenance costs. In this paper, an integrated model involving Statistical Process Control (SPC) and maintenance is developed to achieve this goal. Therefore, the main focus of this paper is to develop the jointly optimal maintenance and statistical process control policy minimizing the total long run expected average cost per unit time. In our model, the production process can go out of control due to either the deterioration of equipment or other assignable causes. The equipment is also subject to failures in any of the operating states due to deterioration and aging. Hence, the process mean is controlled by an Xbar control chart using equidistant sampling epochs. We assume that the machine inspection epochs are the times when the control chart signals an out-of-control condition, considering both true and false alarms. At these times, the production process will be stopped, and an investigation will be conducted not only to determine whether it is a true or false alarm, but also to identify the causes of the true alarm, whether it was caused by the change in the machine setting, by other assignable causes, or by both. If the system is out of control, the proper actions will be taken to bring it back to the in-control state. At these epochs, a maintenance action can be taken, which can be no action, or preventive replacement of the unit. When the equipment is in the failure state, a corrective maintenance action is performed, which can be minimal repair or replacement of the machine and the process is brought to the in-control state. SMDP framework is used to formulate and solve the joint control problem. Numerical example is developed to demonstrate the effectiveness of the control policy.

Keywords: maintenance, semi-Markov decision process, statistical process control, Xbar control chart

Procedia PDF Downloads 90
3882 Catalytic Thermodynamics of Nanocluster Adsorbates from Informational Statistical Mechanics

Authors: Forrest Kaatz, Adhemar Bultheel

Abstract:

We use an informational statistical mechanics approach to study the catalytic thermodynamics of platinum and palladium cuboctahedral nanoclusters. Nanoclusters and their adatoms are viewed as chemical graphs with a nearest neighbor adjacency matrix. We use the Morse potential to determine bond energies between cluster atoms in a coordination type calculation. We use adsorbate energies calculated from density functional theory (DFT) to study the adatom effects on the thermodynamic quantities, which are derived from a Hamiltonian. Oxygen radical and molecular adsorbates are studied on platinum clusters and hydrogen on palladium clusters. We calculate the entropy, free energy, and total energy as the coverage of adsorbates increases from bridge and hollow sites on the surface. Thermodynamic behavior versus adatom coverage is related to the structural distribution of adatoms on the nanocluster surfaces. The thermodynamic functions are characterized using a simple adsorption model, with linear trends as the coverage of adatoms increases. The data exhibits size effects for the measured thermodynamic properties with cluster diameters between 2 and 5 nm. Entropy and enthalpy calculations of Pt-O2 compare well with previous theoretical data for Pt(111)-O2, and our Pd-H results show similar trends as experimental measurements for Pd-H2 nanoclusters. Our methods are general and may be applied to wide variety of nanocluster adsorbate systems.

Keywords: catalytic thermodynamics, palladium nanocluster absorbates, platinum nanocluster absorbates, statistical mechanics

Procedia PDF Downloads 165
3881 Therapeutic Effect of 12 Weeks of Sensorimotor Exercise on Pain, Functionality and Quality of Life in Non-athlete Women With Patellofemoral Pain Syndrome

Authors: Kasbparast Mehdi, Hassani Zainab

Abstract:

Aim: The purpose of this research was to investigate the effectiveness of therapeutical sensorimotor exercise. The statistical population of women who were diagnosed with patellofemoral pain syndrome by a doctor and were between the ages of 35 and 45 and registered for the first time in a sports club in the 4th district of Tehran, 30 people by random sampling and according to The include and exclude criteria were selected and divided into 2 equal control and experimental and homogeneous groups (in terms of height, weight and BMI).In both control and experimental groups, the pain was measured using a Visual Analog Scale(VAS) functionality was measured using the step-down test and quality of life was measured using a World Health Organization Quality of Life Scale (WHOQOL-BREF) (pre-test). Then, only the experimental group performed sensorimotor exercises for 12 weeks and 3 sessions each week, a total of 24 sessions and each session for 1 hour, and during this period, the control group only continued their daily activities. After the end of the training period, the desired factors were evaluated again (post-test) in the same way as the pre-test was done for them (experimental group and control group), with the same quality. Findings: The statistical results showed that in the experimental group, the amount of pain, function and quality of life had a statistical improvement (P≤0.05). Conclusion: In general conclusion, it can be stated that using sensorimotor exercises not only improved functionality and quality of life but also reduced the amount of pain in people with patellofemoral pain syndrome.

Keywords: pain, PFPS, sensori motor training, functionality

Procedia PDF Downloads 75
3880 Ontological Modeling Approach for Statistical Databases Publication in Linked Open Data

Authors: Bourama Mane, Ibrahima Fall, Mamadou Samba Camara, Alassane Bah

Abstract:

At the level of the National Statistical Institutes, there is a large volume of data which is generally in a format which conditions the method of publication of the information they contain. Each household or business data collection project includes a dissemination platform for its implementation. Thus, these dissemination methods previously used, do not promote rapid access to information and especially does not offer the option of being able to link data for in-depth processing. In this paper, we present an approach to modeling these data to publish them in a format intended for the Semantic Web. Our objective is to be able to publish all this data in a single platform and offer the option to link with other external data sources. An application of the approach will be made on data from major national surveys such as the one on employment, poverty, child labor and the general census of the population of Senegal.

Keywords: Semantic Web, linked open data, database, statistic

Procedia PDF Downloads 174
3879 Secured Embedding of Patient’s Confidential Data in Electrocardiogram Using Chaotic Maps

Authors: Butta Singh

Abstract:

This paper presents a chaotic map based approach for secured embedding of patient’s confidential data in electrocardiogram (ECG) signal. The chaotic map generates predefined locations through the use of selective control parameters. The sample value difference method effectually hides the confidential data in ECG sample pairs at these predefined locations. Evaluation of proposed method on all 48 records of MIT-BIH arrhythmia ECG database demonstrates that the embedding does not alter the diagnostic features of cover ECG. The secret data imperceptibility in stego-ECG is evident through various statistical and clinical performance measures. Statistical metrics comprise of Percentage Root Mean Square Difference (PRD) and Peak Signal to Noise Ratio (PSNR). Further, a comparative analysis between proposed method and existing approaches was also performed. The results clearly demonstrated the superiority of proposed method.

Keywords: chaotic maps, ECG steganography, data embedding, electrocardiogram

Procedia PDF Downloads 193
3878 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 272
3877 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 140
3876 Statistical Analysis of Rainfall Change over the Blue Nile Basin

Authors: Hany Mustafa, Mahmoud Roushdi, Khaled Kheireldin

Abstract:

Rainfall variability is an important feature of semi-arid climates. Climate change is very likely to increase the frequency, magnitude, and variability of extreme weather events such as droughts, floods, and storms. The Blue Nile Basin is facing extreme climate change-related events such as floods and droughts and its possible impacts on ecosystem, livelihood, agriculture, livestock, and biodiversity are expected. Rainfall variability is a threat to food production in the Blue Nile Basin countries. This study investigates the long-term variations and trends of seasonal and annual precipitation over the Blue Nile Basin for 102-year period (1901-2002). Six statistical trend analysis of precipitation was performed with nonparametric Mann-Kendall test and Sen's slope estimator. On the other hands, four statistical absolute homogeneity tests: Standard Normal Homogeneity Test, Buishand Range test, Pettitt test and the Von Neumann ratio test were applied to test the homogeneity of the rainfall data, using XLSTAT software, which results of p-valueless than alpha=0.05, were significant. The percentages of significant trends obtained for each parameter in the different seasons are presented. The study recommends adaptation strategies to be streamlined to relevant policies, enhancing local farmers’ adaptive capacity for facing future climate change effects.

Keywords: Blue Nile basin, climate change, Mann-Kendall test, trend analysis

Procedia PDF Downloads 548
3875 Various Advanced Statistical Analyses of Index Values Extracted from Outdoor Agricultural Workers Motion Data

Authors: Shinji Kawakura, Ryosuke Shibasaki

Abstract:

We have been grouping and developing various kinds of practical, promising sensing applied systems concerning agricultural advancement and technical tradition (guidance). These include advanced devices to secure real-time data related to worker motion, and we analyze by methods of various advanced statistics and human dynamics (e.g. primary component analysis, Ward system based cluster analysis, and mapping). What is more, we have been considering worker daily health and safety issues. Targeted fields are mainly common farms, meadows, and gardens. After then, we observed and discussed time-line style, changing data. And, we made some suggestions. The entire plan makes it possible to improve both the aforementioned applied systems and farms.

Keywords: advanced statistical analysis, wearable sensing system, tradition of skill, supporting for workers, detecting crisis

Procedia PDF Downloads 392
3874 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.

Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis

Procedia PDF Downloads 389
3873 Global Developmental Delay and Its Association with Risk Factors: Validation by Structural Equation Modelling

Authors: Bavneet Kaur Sidhu, Manoj Tiwari

Abstract:

Global Developmental Delay (GDD) is a common pediatric condition. Etiologies of GDD might, however, differ in developing countries. In the last decade, sporadic families are being reported in various countries. As to the author’s best knowledge, many risk factors and their correlation with the prevalence of GDD have been studied but its statistical correlation has not been done. Thus we propose the present study by targeting the risk factor, prevalence and their statistical correlation with GDD. FMR1 gene was studied to confirm the disease and its penetrance. A complete questionnaire-based performance was designed for the statistical studies having a personal, past and present medical history along with their socio-economic status as well. Methods: We distributed the children’s age in 4 different age groups having 5-year intervals and applied structural equation modeling (SEM) techniques, Spearman’s rank correlation coefficient, Karl Pearson correlation coefficient, and chi-square test.Result: A total of 1100 families were enrolled for this study; among them, 330 were clinically and biologically confirmed (radiological studies) for the disease, 204 were males (61.8%), 126 were females (38.18%). We found that 27.87% were genetic and 72.12 were sporadic, out of 72.12 %, 43.277% cases from urban and 56.72% from the rural locality, the mothers' literacy rate was 32.12% and working women numbers were 41.21%. Conclusions: There is a significant association between mothers' age and GDD prevalence, which is also followed by mothers' literacy rate and mothers' occupation, whereas there was no association between fathers' age and GDD.

Keywords: global developmental delay, FMR1 gene, spearman’ rank correlation coefficient, structural equation modeling

Procedia PDF Downloads 130
3872 A Cross-Dialect Statistical Analysis of Final Declarative Intonation in Tuvinian

Authors: D. Beziakina, E. Bulgakova

Abstract:

This study continues the research on Tuvinian intonation and presents a general cross-dialect analysis of intonation of Tuvinian declarative utterances, specifically the character of the tone movement in order to test the hypothesis about the prevalence of level tone in some Tuvinian dialects. The results of the analysis of basic pitch characteristics of Tuvinian speech (in general and in comparison with two other Turkic languages - Uzbek and Azerbaijani) are also given in this paper. The goal of our work was to obtain the ranges of pitch parameter values typical for Tuvinian speech. Such language-specific values can be used in speaker identification systems in order to get more accurate results of ethnic speech analysis. We also present the results of a cross-dialect analysis of declarative intonation in the poorly studied Tuvinian language.

Keywords: speech analysis, statistical analysis, speaker recognition, identification of person

Procedia PDF Downloads 468
3871 Statistic Regression and Open Data Approach for Identifying Economic Indicators That Influence e-Commerce

Authors: Apollinaire Barme, Simon Tamayo, Arthur Gaudron

Abstract:

This paper presents a statistical approach to identify explanatory variables linearly related to e-commerce sales. The proposed methodology allows specifying a regression model in order to quantify the relevance between openly available data (economic and demographic) and national e-commerce sales. The proposed methodology consists in collecting data, preselecting input variables, performing regressions for choosing variables and models, testing and validating. The usefulness of the proposed approach is twofold: on the one hand, it allows identifying the variables that influence e- commerce sales with an accessible approach. And on the other hand, it can be used to model future sales from the input variables. Results show that e-commerce is linearly dependent on 11 economic and demographic indicators.

Keywords: e-commerce, statistical modeling, regression, empirical research

Procedia PDF Downloads 224
3870 A Statistical Model for the Dynamics of Single Cathode Spot in Vacuum Cylindrical Cathode

Authors: Po-Wen Chen, Jin-Yu Wu, Md. Manirul Ali, Yang Peng, Chen-Te Chang, Der-Jun Jan

Abstract:

Dynamics of cathode spot has become a major part of vacuum arc discharge with its high academic interest and wide application potential. In this article, using a three-dimensional statistical model, we simulate the distribution of the ignition probability of a new cathode spot occurring in different magnetic pressure on old cathode spot surface and at different arcing time. This model for the ignition probability of a new cathode spot was proposed in two typical situations, one by the pure isotropic random walk in the absence of an external magnetic field, other by the retrograde motion in external magnetic field, in parallel with the cathode surface. We mainly focus on developed relationship between the ignition probability density distribution of a new cathode spot and the external magnetic field.

Keywords: cathode spot, vacuum arc discharge, transverse magnetic field, random walk

Procedia PDF Downloads 431
3869 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: classification, data mining, spam filtering, naive bayes, decision tree

Procedia PDF Downloads 408