Search results for: outlier.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32

Search results for: outlier.

32 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance

Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar

Abstract:

Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.

Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2228
31 Semi-Supervised Outlier Detection Using a Generative and Adversary Framework

Authors: Jindong Gu, Matthias Schubert, Volker Tresp

Abstract:

In many outlier detection tasks, only training data belonging to one class, i.e., the positive class, is available. The task is then to predict a new data point as belonging either to the positive class or to the negative class, in which case the data point is considered an outlier. For this task, we propose a novel corrupted Generative Adversarial Network (CorGAN). In the adversarial process of training CorGAN, the Generator generates outlier samples for the negative class, and the Discriminator is trained to distinguish the positive training data from the generated negative data. The proposed framework is evaluated using an image dataset and a real-world network intrusion dataset. Our outlier-detection method achieves state-of-the-art performance on both tasks.

Keywords: Outlier detection, generative adversary networks, semi-supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1010
30 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Authors: Yogita, Durga Toshniwal

Abstract:

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2584
29 Class Outliers Mining: Distance-Based Approach

Authors: Nabil M. Hewahi, Motaz K. Saad

Abstract:

In large datasets, identifying exceptional or rare cases with respect to a group of similar cases is considered very significant problem. The traditional problem (Outlier Mining) is to find exception or rare cases in a dataset irrespective of the class label of these cases, they are considered rare events with respect to the whole dataset. In this research, we pose the problem that is Class Outliers Mining and a method to find out those outliers. The general definition of this problem is “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels". We introduce a novel definition of Outlier that is Class Outlier, and propose the Class Outlier Factor (COF) which measures the degree of being a Class Outlier for a data object. Our work includes a proposal of a new algorithm towards mining of the Class Outliers, presenting experimental results applied on various domains of real world datasets and finally a comparison study with other related methods is performed.

Keywords: Class Outliers, Distance-Based Approach, Outliers Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3338
28 On the outlier Detection in Nonlinear Regression

Authors: Hossein Riazoshams, Midi Habshah, Jr., Mohamad Bakri Adam

Abstract:

The detection of outliers is very essential because of their responsibility for producing huge interpretative problem in linear as well as in nonlinear regression analysis. Much work has been accomplished on the identification of outlier in linear regression, but not in nonlinear regression. In this article we propose several outlier detection techniques for nonlinear regression. The main idea is to use the linear approximation of a nonlinear model and consider the gradient as the design matrix. Subsequently, the detection techniques are formulated. Six detection measures are developed that combined with three estimation techniques such as the Least-Squares, M and MM-estimators. The study shows that among the six measures, only the studentized residual and Cook Distance which combined with the MM estimator, consistently capable of identifying the correct outliers.

Keywords: Nonlinear Regression, outliers, Gradient, LeastSquare, M-estimate, MM-estimate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3109
27 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1360
26 DCBOR: A Density Clustering Based on Outlier Removal

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. We present an enhanced version of the well known single link clustering algorithm. We will refer to this algorithm as DCBOR. The proposed algorithm alleviates the chain effect by removing the outliers from the given dataset. So this algorithm provides outlier detection and data clustering simultaneously. This algorithm does not need to update the distance matrix, since the algorithm depends on merging the most k-nearest objects in one step and the cluster continues grow as long as possible under specified condition. So the algorithm consists of two phases; at the first phase, it removes the outliers from the input dataset. At the second phase, it performs the clustering process. This algorithm discovers clusters of different shapes, sizes, densities and requires only one input parameter; this parameter represents a threshold for outlier points. The value of the input parameter is ranging from 0 to 1. The algorithm supports the user in determining an appropriate value for it. We have tested this algorithm on different datasets contain outlier and connecting clusters by chain of density points, and the algorithm discovers the correct clusters. The results of our experiments demonstrate the effectiveness and the efficiency of DCBOR.

Keywords: Data Clustering, Clustering Algorithms, Handling Noise, Arbitrary Shape of Clusters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1887
25 Outlier Pulse Detection and Feature Extraction for Wrist Pulse Analysis

Authors: Bhaskar Thakker, Anoop Lal Vyas

Abstract:

Wrist pulse analysis for identification of health status is found in Ancient Indian as well as Chinese literature. The preprocessing of wrist pulse is necessary to remove outlier pulses and fluctuations prior to the analysis of pulse pressure signal. This paper discusses the identification of irregular pulses present in the pulse series and intricacies associated with the extraction of time domain pulse features. An approach of Dynamic Time Warping (DTW) has been utilized for the identification of outlier pulses in the wrist pulse series. The ambiguity present in the identification of pulse features is resolved with the help of first derivative of Ensemble Average of wrist pulse series. An algorithm for detecting tidal and dicrotic notch in individual wrist pulse segment is proposed.

Keywords: Wrist Pulse Segment, Ensemble Average, Dynamic Time Warping (DTW), Pulse Similarity Vector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2036
24 Semantic Spatial Objects Data Structure for Spatial Access Method

Authors: Kalum Priyanath Udagepola, Zuo Decheng, Wu Zhibo, Yang Xiaozong

Abstract:

Modern spatial database management systems require a unique Spatial Access Method (SAM) in order solve complex spatial quires efficiently. In this case the spatial data structure takes a prominent place in the SAM. Inadequate data structure leads forming poor algorithmic choices and forging deficient understandings of algorithm behavior on the spatial database. A key step in developing a better semantic spatial object data structure is to quantify the performance effects of semantic and outlier detections that are not reflected in the previous tree structures (R-Tree and its variants). This paper explores a novel SSRO-Tree on SAM to the Topo-Semantic approach. The paper shows how to identify and handle the semantic spatial objects with outlier objects during page overflow/underflow, using gain/loss metrics. We introduce a new SSRO-Tree algorithm which facilitates the achievement of better performance in practice over algorithms that are superior in the R*-Tree and RO-Tree by considering selection queries.

Keywords: Outlier, semantic spatial object, spatial objects, SSRO-Tree, topo-semantic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646
23 Techniques for Video Mosaicing

Authors: P.Saravanan, Narayanan .C.K., P.V.S.S Prakash, Prabhakara Rao .G.V

Abstract:

Video Mosaicing is the stitching of selected frames of a video by estimating the camera motion between the frames and thereby registering successive frames of the video to arrive at the mosaic. Different techniques have been proposed in the literature for video mosaicing. Despite of the large number of papers dealing with techniques to generate mosaic, only a few authors have investigated conditions under which these techniques generate good estimate of motion parameters. In this paper, these techniques are studied under different videos, and the reasons for failures are found. We propose algorithms with incorporation of outlier removal algorithms for better estimation of motion parameters.

Keywords: Motion parameters, Outlier removal algorithms, Registering , and Video Mosaicing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1206
22 Robust Variogram Fitting Using Non-Linear Rank-Based Estimators

Authors: Hazem M. Al-Mofleh, John E. Daniels, Joseph W. McKean

Abstract:

In this paper numerous robust fitting procedures are considered in estimating spatial variograms. In spatial statistics, the conventional variogram fitting procedure (non-linear weighted least squares) suffers from the same outlier problem that has plagued this method from its inception. Even a 3-parameter model, like the variogram, can be adversely affected by a single outlier. This paper uses the Hogg-Type adaptive procedures to select an optimal score function for a rank-based estimator for these non-linear models. Numeric examples and simulation studies will demonstrate the robustness, utility, efficiency, and validity of these estimates.

Keywords: Asymptotic relative efficiency, non-linear rank-based, robust, rank estimates, variogram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1537
21 EEG Signal Processing Methods to Differentiate Mental States

Authors: Sun H. Hwang, Young E. Lee, Yunhan Ga, Gilwon Yoon

Abstract:

EEG is a very complex signal with noises and other bio-potential interferences. EOG is the most distinct interfering signal when EEG signals are measured and analyzed. It is very important how to process raw EEG signals in order to obtain useful information. In this study, the EEG signal processing techniques such as EOG filtering and outlier removal were examined to minimize unwanted EOG signals and other noises. The two different mental states of resting and focusing were examined through EEG analysis. A focused state was induced by letting subjects to watch a red dot on the white screen. EEG data for 32 healthy subjects were measured. EEG data after 60-Hz notch filtering were processed by a commercially available EOG filtering and our presented algorithm based on the removal of outliers. The ratio of beta wave to theta wave was used as a parameter for determining the degree of focusing. The results show that our algorithm was more appropriate than the existing EOG filtering.

Keywords: EEG, focus, mental state, outlier, signal processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1478
20 Electron Density Discrepancy Analysis of Energy Metabolism Coenzymes

Authors: Alan Luo, Hunter N. B. Moseley

Abstract:

Many macromolecular structure entries in the Protein Data Bank (PDB) have a range of regional (localized) quality issues, be it derived from X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy, or other experimental approaches. However, most PDB entries are judged by global quality metrics like R-factor, R-free, and resolution for X-ray crystallography or backbone phi-psi distribution statistics and average restraint violations for NMR. Regional quality is often ignored when PDB entries are re-used for a variety of structurally based analyses. The binding of ligands, especially ligands involved in energy metabolism, is of particular interest in many structurally focused protein studies. Using a regional quality metric that provides chemically interpretable information from electron density maps, a significant number of outliers in regional structural quality was detected across X-ray crystallographic PDB entries for proteins bound to biochemically critical ligands. In this study, a series of analyses was performed to evaluate both specific and general potential factors that could promote these outliers. In particular, these potential factors were the minimum distance to a metal ion, the minimum distance to a crystal contact, and the isotropic atomic b-factor. To evaluate these potential factors, Fisher’s exact tests were performed, using regional quality criteria of outlier (top 1%, 2.5%, 5%, or 10%) versus non-outlier compared to a potential factor metric above versus below a certain outlier cutoff. The results revealed a consistent general effect from region-specific normalized b-factors but no specific effect from metal ion contact distances and only a very weak effect from crystal contact distance as compared to the b-factor results. These findings indicate that no single specific potential factor explains a majority of the outlier ligand-bound regions, implying that human error is likely as important as these other factors. Thus, all factors, including human error, should be considered when regions of low structural quality are detected. Also, the downstream re-use of protein structures for studying ligand-bound conformations should screen the regional quality of the binding sites. Doing so prevents misinterpretation due to the presence of structural uncertainty or flaws in regions of interest.

Keywords: Biomacromolecular structure, coenzyme, electron density discrepancy analysis, X-ray crystallography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 119
19 Evaluation of Graph-based Analysis for Forest Fire Detections

Authors: Young Gi Byun, Yong Huh, Kiyun Yu, Yong Il Kim

Abstract:

Spatial outliers in remotely sensed imageries represent observed quantities showing unusual values compared to their neighbor pixel values. There have been various methods to detect the spatial outliers based on spatial autocorrelations in statistics and data mining. These methods may be applied in detecting forest fire pixels in the MODIS imageries from NASA-s AQUA satellite. This is because the forest fire detection can be referred to as finding spatial outliers using spatial variation of brightness temperature. This point is what distinguishes our approach from the traditional fire detection methods. In this paper, we propose a graph-based forest fire detection algorithm which is based on spatial outlier detection methods, and test the proposed algorithm to evaluate its applicability. For this the ordinary scatter plot and Moran-s scatter plot were used. In order to evaluate the proposed algorithm, the results were compared with the MODIS fire product provided by the NASA MODIS Science Team, which showed the possibility of the proposed algorithm in detecting the fire pixels.

Keywords: Spatial Outlier Detection, MODIS, Forest Fire

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2177
18 Using Data Mining Techniques for Finding Cardiac Outlier Patients

Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi

Abstract:

In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.

Keywords: Data Mining, Clustering, Classification, Drug Utilization..

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1849
17 Identification of Outliers in Flood Frequency Analysis: Comparison of Original and Multiple Grubbs-Beck Test

Authors: Ayesha S. Rahman, Khaled Haddad, Ataur Rahman

Abstract:

At-site flood frequency analysis is used to estimate flood quantiles when at-site record length is reasonably long. In Australia, FLIKE software has been introduced for at-site flood frequency analysis. The advantage of FLIKE is that, for a given application, the user can compare a number of most commonly adopted probability distributions and parameter estimation methods relatively quickly using a windows interface. The new version of FLIKE has been incorporated with the multiple Grubbs and Beck test which can identify multiple numbers of potentially influential low flows. This paper presents a case study considering six catchments in eastern Australia which compares two outlier identification tests (original Grubbs and Beck test and multiple Grubbs and Beck test) and two commonly applied probability distributions (Generalized Extreme Value (GEV) and Log Pearson type 3 (LP3)) using FLIKE software. It has been found that the multiple Grubbs and Beck test when used with LP3 distribution provides more accurate flood quantile estimates than when LP3 distribution is used with the original Grubbs and Beck test. Between these two methods, the differences in flood quantile estimates have been found to be up to 61% for the six study catchments. It has also been found that GEV distribution (with L moments) and LP3 distribution with the multiple Grubbs and Beck test provide quite similar results in most of the cases; however, a difference up to 38% has been noted for flood quantiles for annual exceedance probability (AEP) of 1 in 100 for one catchment. This finding needs to be confirmed with a greater number of stations across other Australian states.

Keywords: Floods, FLIKE, probability distributions, flood frequency, outlier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3256
16 Characteristic Function in Estimation of Probability Distribution Moments

Authors: Vladimir S. Timofeev

Abstract:

In this article the problem of distributional moments estimation is considered. The new approach of moments estimation based on usage of the characteristic function is proposed. By statistical simulation technique author shows that new approach has some robust properties. For calculation of the derivatives of characteristic function there is used numerical differentiation. Obtained results confirmed that author’s idea has a certain working efficiency and it can be recommended for any statistical applications.

Keywords: Characteristic function, distributional moments, robustness, outlier, statistical estimation problem, statistical simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2212
15 Latent Topic Based Medical Data Classification

Authors: Jian-hua Yeh, Shi-yi Kuo

Abstract:

This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers are quite different in their nature: target set is only 0.6% size in total, while the outliers consist of 99.4% of the data set. We use this data set as an example to show how we dealt with this extremely biased data set with latent topic discovery and noise reduction techniques. Our experiment faces two major challenge: (1) extremely distributed outliers, and (2) positive samples are far smaller than negative ones. We try to propose a suitable process flow to deal with these issues and get a best AUC result of 0.98.

Keywords: classification, latent topics, outlier adjustment, feature scaling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607
14 On Preprocessing of Speech Signals

Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin

Abstract:

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1651
13 Autonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set

Authors: Andreas Theissler, Ian Dear

Abstract:

The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter  of the widely used RBF kernel. While for two-class SVMs the parameters can be tuned using cross-validation based on the confusion matrix, for a one-class SVM this is not possible, because only true positives and false negatives can occur during training. This paper proposes an approach to find the optimal set of parameters for SVDD solely based on a training set from one class and without any user parameterisation. Results on artificial and real data sets are presented, underpinning the usefulness of the approach.

Keywords: Support vector data description, anomaly detection, one-class classification, parameter tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2884
12 An Efficient Fundamental Matrix Estimation for Moving Object Detection

Authors: Yeongyu Choi, Ju H. Park, S. M. Lee, Ho-Youl Jung

Abstract:

In this paper, an improved method for estimating fundamental matrix is proposed. The method is applied effectively to monocular camera based moving object detection. The method consists of corner points detection, moving object’s motion estimation and fundamental matrix calculation. The corner points are obtained by using Harris corner detector, motions of moving objects is calculated from pyramidal Lucas-Kanade optical flow algorithm. Through epipolar geometry analysis using RANSAC, the fundamental matrix is calculated. In this method, we have improved the performances of moving object detection by using two threshold values that determine inlier or outlier. Through the simulations, we compare the performances with varying the two threshold values.

Keywords: Corner detection, optical flow, epipolar geometry, RANSAC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1076
11 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: Feature selection methods, Machine learning, NB, One-class SVM, Sentiment Analysis, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3248
10 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analyzed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analyzed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European Union, Internet purchases, multiple linear regression model, outlier

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2896
9 Effective Image and Video Error Concealment using RST-Invariant Partial Patch Matching Model and Exemplar-based Inpainting

Authors: Shiraz Ahmad, Zhe-Ming Lu

Abstract:

An effective visual error concealment method has been presented by employing a robust rotation, scale, and translation (RST) invariant partial patch matching model (RSTI-PPMM) and exemplar-based inpainting. While the proposed robust and inherently feature-enhanced texture synthesis approach ensures the generation of excellent and perceptually plausible visual error concealment results, the outlier pruning property guarantees the significant quality improvements, both quantitatively and qualitatively. No intermediate user-interaction is required for the pre-segmented media and the presented method follows a bootstrapping approach for an automatic visual loss recovery and the image and video error concealment.

Keywords: Exemplar-based image and video inpainting, outlierpruning, RST-invariant partial patch matching model (RSTI-PPMM), visual error concealment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1374
8 A Machine Learning-based Analysis of Autism Prevalence Rates across US States against Multiple Potential Explanatory Variables

Authors: Ronit Chakraborty, Sugata Banerji

Abstract:

There has been a marked increase in the reported prevalence of Autism Spectrum Disorder (ASD) among children in the US over the past two decades. This research has analyzed the growth in state-level ASD prevalence against 45 different potentially explanatory factors including socio-economic, demographic, healthcare, public policy and political factors. The goal was to understand if these factors have adequate predictive power in modeling the differential growth in ASD prevalence across various states, and, if they do, which factors are the most influential. The key findings of this study include (1) there is a confirmation that the chosen feature set has considerable power in predicting the growth in ASD prevalence, (2) the most influential predictive factors are identified, (3) given the nature of the most influential predictive variables, an indication that a considerable portion of the reported ASD prevalence differentials across states could be attributable to over and under diagnosis, and (4) Florida is identified as a key outlier state pointing to a potential under-diagnosis of ASD.

Keywords: Autism Spectrum Disorder, ASD, clustering, Machine Learning, predictive modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 522
7 Moving Object Detection Using Histogram of Uniformly Oriented Gradient

Authors: Wei-Jong Yang, Yu-Siang Su, Pau-Choo Chung, Jar-Ferr Yang

Abstract:

Moving object detection (MOD) is an important issue in advanced driver assistance systems (ADAS). There are two important moving objects, pedestrians and scooters in ADAS. In real-world systems, there exist two important challenges for MOD, including the computational complexity and the detection accuracy. The histogram of oriented gradient (HOG) features can easily detect the edge of object without invariance to changes in illumination and shadowing. However, to reduce the execution time for real-time systems, the image size should be down sampled which would lead the outlier influence to increase. For this reason, we propose the histogram of uniformly-oriented gradient (HUG) features to get better accurate description of the contour of human body. In the testing phase, the support vector machine (SVM) with linear kernel function is involved. Experimental results show the correctness and effectiveness of the proposed method. With SVM classifiers, the real testing results show the proposed HUG features achieve better than classification performance than the HOG ones.

Keywords: Moving object detection, histogram of oriented gradient histogram of oriented gradient, histogram of uniformly-oriented gradient, linear support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1184
6 A Robust and Adaptive Unscented Kalman Filter for the Air Fine Alignment of the Strapdown Inertial Navigation System/GPS

Authors: Jian Shi, Baoguo Yu, Haonan Jia, Meng Liu, Ping Huang

Abstract:

Adapting to the flexibility of war, a large number of guided weapons launch from aircraft. Therefore, the inertial navigation system loaded in the weapon needs to undergo an alignment process in the air. This article proposes the following methods to the problem of inaccurate modeling of the system under large misalignment angles, the accuracy reduction of filtering caused by outliers, and the noise changes in GPS signals: first, considering the large misalignment errors of Strapdown Inertial Navigation System (SINS)/GPS, a more accurate model is made rather than to make a small-angle approximation, and the Unscented Kalman Filter (UKF) algorithms are used to estimate the state; then, taking into account the impact of GPS noise changes on the fine alignment algorithm, the innovation adaptive filtering algorithm is introduced to estimate the GPS’s noise in real-time; at the same time, in order to improve the anti-interference ability of the air fine alignment algorithm, a robust filtering algorithm based on outlier detection is combined with the air fine alignment algorithm to improve the robustness of the algorithm. The algorithm can improve the alignment accuracy and robustness under interference conditions, which is verified by simulation.

Keywords: Air alignment, fine alignment, inertial navigation system, integrated navigation system, UKF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 467
5 The Robust Clustering with Reduction Dimension

Authors: Dyah E. Herwindiati

Abstract:

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper

Keywords: Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1655
4 Battery Grading Algorithm in 2nd-Life Repurposing Li-ion Battery System

Authors: Ya Lv, Benjamin Ong Wei Lin, Wanli Niu, Benjamin Seah Chin Tat

Abstract:

This article presents a methodology that improves reliability and cyclability of 2nd-life Li-ion battery system repurposed as energy storage system (ESS). Most of the 2nd-life retired battery systems in market have module/pack-level state of health (SOH) indicator, which is utilized for guiding appropriate depth of discharge (DOD) in the application of ESS. Due to the lack of cell-level SOH indication, the different degrading behaviors among various cells cannot be identified upon reaching retired status; in the end, considering end of life (EOL) loss and pack-level DOD, the repurposed ESS has to be oversized by > 1.5 times to complement the application requirement of reliability and cyclability. This proposed battery grading algorithm, using non-invasive methodology, is able to detect outlier cells based on historical voltage data and calculate cell-level historical maximum temperature data using semi-analytic methodology. In this way, the individual battery cell in the 2nd-life battery system can be graded in terms of SOH on basis of the historical voltage fluctuation and estimated historical maximum temperature variation. These grades will have corresponding DOD grades in the application of the repurposed ESS to enhance the system reliability and cyclability. In all, this introduced battery grading algorithm is non-invasive, compatible with all kinds of retired Li-ion battery systems which lack of cell-level SOH indication, as well as potentially being embedded into battery management software for preventive maintenance and real-time cyclability optimization.

Keywords: Battery grading algorithm, 2nd-life repurposing battery system, semi-analytic methodology, reliability and cyclability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 773
3 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur

Abstract:

Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: Building archetypes, data analysis, energy benchmarks, GHG emissions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 972