Search results for: Data warehouse
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7383

Search results for: Data warehouse

6933 Simulated Annealing Algorithm for Data Aggregation Trees in Wireless Sensor Networks and Comparison with Genetic Algorithm

Authors: Ladan Darougaran, Hossein Shahinzadeh, Hajar Ghotb, Leila Ramezanpour

Abstract:

In ad hoc networks, the main issue about designing of protocols is quality of service, so that in wireless sensor networks the main constraint in designing protocols is limited energy of sensors. In fact, protocols which minimize the power consumption in sensors are more considered in wireless sensor networks. One approach of reducing energy consumption in wireless sensor networks is to reduce the number of packages that are transmitted in network. The technique of collecting data that combines related data and prevent transmission of additional packages in network can be effective in the reducing of transmitted packages- number. According to this fact that information processing consumes less power than information transmitting, Data Aggregation has great importance and because of this fact this technique is used in many protocols [5]. One of the Data Aggregation techniques is to use Data Aggregation tree. But finding one optimum Data Aggregation tree to collect data in networks with one sink is a NP-hard problem. In the Data Aggregation technique, related information packages are combined in intermediate nodes and form one package. So the number of packages which are transmitted in network reduces and therefore, less energy will be consumed that at last results in improvement of longevity of network. Heuristic methods are used in order to solve the NP-hard problem that one of these optimization methods is to solve Simulated Annealing problems. In this article, we will propose new method in order to build data collection tree in wireless sensor networks by using Simulated Annealing algorithm and we will evaluate its efficiency whit Genetic Algorithm.

Keywords: Data aggregation, wireless sensor networks, energy efficiency, simulated annealing algorithm, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1648
6932 Encoding and Compressing Data for Decreasing Number of Switches in Baseline Networks

Authors: Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh, Hasan Asil, Amir Asil

Abstract:

This method decrease usage power (expenditure) in networks on chips (NOC). This method data coding for data transferring in order to reduces expenditure. This method uses data compression reduces the size. Expenditure calculation in NOC occurs inside of NOC based on grown models and transitive activities in entry ports. The goal of simulating is to weigh expenditure for encoding, decoding and compressing in Baseline networks and reduction of switches in this type of networks. KeywordsNetworks on chip, Compression, Encoding, Baseline networks, Banyan networks.

Keywords: Networks on chip, Compression, Encoding, Baseline networks, Banyan networks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1945
6931 Sampled-Data Control for Fuel Cell Systems

Authors: H. Y. Jung, Ju H. Park, S. M. Lee

Abstract:

Sampled-data controller is presented for solid oxide fuel cell systems which is expressed by a sector bounded nonlinear model. The proposed control law is obtained by solving a convex problem satisfying several linear matrix inequalities. Simulation results are given to show the effectiveness of the proposed design method.

Keywords: Sampled-data control, Sector bound, Solid oxide fuel cell, Time-delay.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1685
6930 Automatic Detection and Spatio-temporal Analysis of Commercial Accumulations Using Digital Yellow Page Data

Authors: Yuki. Akiyama, Hiroaki. Sengoku, Ryosuke. Shibasaki

Abstract:

In this study, the locations and areas of commercial accumulations were detected by using digital yellow page data. An original buffering method that can accurately create polygons of commercial accumulations is proposed in this paper.; by using this method, distribution of commercial accumulations can be easily created and monitored over a wide area. The locations, areas, and time-series changes of commercial accumulations in the South Kanto region can be monitored by integrating polygons of commercial accumulations with the time-series data of digital yellow page data. The circumstances of commercial accumulations were shown to vary according to areas, that is, highly- urbanized regions such as the city center of Tokyo and prefectural capitals, suburban areas near large cities, and suburban and rural areas.

Keywords: Commercial accumulations, Spatio-temporal analysis, Urban monitoring, Yellow page data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1223
6929 EEG Waves Classifier using Wavelet Transform and Fourier Transform

Authors: Maan M. Shaker

Abstract:

The electroencephalograph (EEG) signal is one of the most widely signal used in the bioinformatics field due to its rich information about human tasks. In this work EEG waves classification is achieved using the Discrete Wavelet Transform DWT with Fast Fourier Transform (FFT) by adopting the normalized EEG data. The DWT is used as a classifier of the EEG wave's frequencies, while FFT is implemented to visualize the EEG waves in multi-resolution of DWT. Several real EEG data sets (real EEG data for both normal and abnormal persons) have been tested and the results improve the validity of the proposed technique.

Keywords: Bioinformatics, DWT, EEG waves, FFT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5505
6928 Obstacle Classification Method Based On 2D LIDAR Database

Authors: Moohyun Lee, Soojung Hur, Yongwan Park

Abstract:

We propose obstacle classification method based on 2D LIDAR Database. The existing obstacle classification method based on 2D LIDAR, has an advantage in terms of accuracy and shorter calculation time. However, it was difficult to classifier the type of obstacle and therefore accurate path planning was not possible. In order to overcome this problem, a method of classifying obstacle type based on width data of obstacle was proposed. However, width data was not sufficient to improve accuracy. In this paper, database was established by width and intensity data; the first classification was processed by the width data; the second classification was processed by the intensity data; classification was processed by comparing to database; result of obstacle classification was determined by finding the one with highest similarity values. An experiment using an actual autonomous vehicle under real environment shows that calculation time declined in comparison to 3D LIDAR and it was possible to classify obstacle using single 2D LIDAR.

Keywords: Obstacle, Classification, LIDAR, Segmentation, Width, Intensity, Database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3408
6927 An Empirical Mode Decomposition Based Method for Action Potential Detection in Neural Raw Data

Authors: Sajjad Farashi, Mohammadjavad Abolhassani, Mostafa Taghavi Kani

Abstract:

Information in the nervous system is coded as firing patterns of electrical signals called action potential or spike so an essential step in analysis of neural mechanism is detection of action potentials embedded in the neural data. There are several methods proposed in the literature for such a purpose. In this paper a novel method based on empirical mode decomposition (EMD) has been developed. EMD is a decomposition method that extracts oscillations with different frequency range in a waveform. The method is adaptive and no a-priori knowledge about data or parameter adjusting is needed in it. The results for simulated data indicate that proposed method is comparable with wavelet based methods for spike detection. For neural signals with signal-to-noise ratio near 3 proposed methods is capable to detect more than 95% of action potentials accurately.

Keywords: EMD, neural data processing, spike detection, wavelet decomposition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2336
6926 Platform-as-a-Service Sticky Policies for Privacy Classification in the Cloud

Authors: Maha Shamseddine, Amjad Nusayr, Wassim Itani

Abstract:

In this paper, we present a Platform-as-a-Service (PaaS) model for controlling the privacy enforcement mechanisms applied on user data when stored and processed in Cloud data centers. The proposed architecture consists of establishing user configurable ‘sticky’ policies on the Graphical User Interface (GUI) data-bound components during the application development phase to specify the details of privacy enforcement on the contents of these components. Various privacy classification classes on the data components are formally defined to give the user full control on the degree and scope of privacy enforcement including the type of execution containers to process the data in the Cloud. This not only enhances the privacy-awareness of the developed Cloud services, but also results in major savings in performance and energy efficiency due to the fact that the privacy mechanisms are solely applied on sensitive data units and not on all the user content. The proposed design is implemented in a real PaaS cloud computing environment on the Microsoft Azure platform.

Keywords: Privacy enforcement, Platform-as-a-Service privacy awareness, cloud computing privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 707
6925 DIFFER: A Propositionalization approach for Learning from Structured Data

Authors: Thashmee Karunaratne, Henrik Böstrom

Abstract:

Logic based methods for learning from structured data is limited w.r.t. handling large search spaces, preventing large-sized substructures from being considered by the resulting classifiers. A novel approach to learning from structured data is introduced that employs a structure transformation method, called finger printing, for addressing these limitations. The method, which generates features corresponding to arbitrarily complex substructures, is implemented in a system, called DIFFER. The method is demonstrated to perform comparably to an existing state-of-art method on some benchmark data sets without requiring restrictions on the search space. Furthermore, learning from the union of features generated by finger printing and the previous method outperforms learning from each individual set of features on all benchmark data sets, demonstrating the benefit of developing complementary, rather than competing, methods for structure classification.

Keywords: Machine learning, Structure classification, Propositionalization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1188
6924 Improving the Performance of Proxy Server by Using Data Mining Technique

Authors: P. Jomsri

Abstract:

Currently, web usage make a huge data from a lot of user attention. In general, proxy server is a system to support web usage from user and can manage system by using hit rates. This research tries to improve hit rates in proxy system by applying data mining technique. The data set are collected from proxy servers in the university and are investigated relationship based on several features. The model is used to predict the future access websites. Association rule technique is applied to get the relation among Date, Time, Main Group web, Sub Group web, and Domain name for created model. The results showed that this technique can predict web content for the next day, moreover the future accesses of websites increased from 38.15% to 85.57 %. This model can predict web page access which tends to increase the efficient of proxy servers as a result. In additional, the performance of internet access will be improved and help to reduce traffic in networks.

Keywords: Association rule, proxy server, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3019
6923 Performance Analysis of the Subgroup Method for Collective I/O

Authors: Kwangho Cha, Hyeyoung Cho, Sungho Kim

Abstract:

As many scientific applications require large data processing, the importance of parallel I/O has been increasingly recognized. Collective I/O is one of the considerable features of parallel I/O and enables application programmers to easily handle their large data volume. In this paper we measured and analyzed the performance of original collective I/O and the subgroup method, the way of using collective I/O of MPI effectively. From the experimental results, we found that the subgroup method showed good performance with small data size.

Keywords: Collective I/O, MPI, parallel file system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532
6922 Statistical Analysis for Overdispersed Medical Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.

Keywords: Zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3267
6921 Student Satisfaction Data for Work Based Learners

Authors: Rosie Borup, Hanifa Shah

Abstract:

This paper aims to describe how student satisfaction is measured for work-based learners as these are non-traditional learners, conducting academic learning in the workplace, typically their curricula have a high degree of negotiation, and whose motivations are directly related to their employers- needs, as well as their own career ambitions. We argue that while increasing WBL participation, and use of SSD are both accepted as being of strategic importance to the HE agenda, the use of WBL SSD is rarely examined, and lessons can be learned from the comparison of SSD from a range of WBL programmes, and increased visibility of this type of data will provide insight into ways to improve and develop this type of delivery. The key themes that emerged from the analysis of the interview data were: learners profiles and needs, employers drivers, academic staff drivers, organizational approach, tools for collecting data and visibility of findings. The paper concludes with observations on best practice in the collection, analysis and use of WBL SSD, thus offering recommendations for both academic managers and practitioners.

Keywords: Student satisfaction data, work based learning, employer engagement, NSS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1448
6920 A Consistency Protocol Multi-Layer for Replicas Management in Large Scale Systems

Authors: Ghalem Belalem, Yahya Slimani

Abstract:

Large scale systems such as computational Grid is a distributed computing infrastructure that can provide globally available network resources. The evolution of information processing systems in Data Grid is characterized by a strong decentralization of data in several fields whose objective is to ensure the availability and the reliability of the data in the reason to provide a fault tolerance and scalability, which cannot be possible only with the use of the techniques of replication. Unfortunately the use of these techniques has a height cost, because it is necessary to maintain consistency between the distributed data. Nevertheless, to agree to live with certain imperfections can improve the performance of the system by improving competition. In this paper, we propose a multi-layer protocol combining the pessimistic and optimistic approaches conceived for the data consistency maintenance in large scale systems. Our approach is based on a hierarchical representation model with tree layers, whose objective is with double vocation, because it initially makes it possible to reduce response times compared to completely pessimistic approach and it the second time to improve the quality of service compared to an optimistic approach.

Keywords: Data Grid, replication, consistency, optimistic approach, pessimistic approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1536
6919 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5390
6918 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data

Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara

Abstract:

This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.

Keywords: Text mining, sequential mining, time constraints, daily business reports.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1451
6917 A 3.125Gb/s Clock and Data Recovery Circuit Using 1/4-Rate Technique

Authors: Il-Do Jeong, Hang-Geun Jeong

Abstract:

This paper describes the design and fabrication of a clock and data recovery circuit (CDR). We propose a new clock and data recovery which is based on a 1/4-rate frequency detector (QRFD). The proposed frequency detector helps reduce the VCO frequency and is thus advantageous for high speed application. The proposed frequency detector can achieve low jitter operation and extend the pull-in range without using the reference clock. The proposed CDR was implemented using a 1/4-rate bang-bang type phase detector (PD) and a ring voltage controlled oscillator (VCO). The CDR circuit has been fabricated in a standard 0.18 CMOS technology. It occupies an active area of 1 x 1 and consumes 90 mW from a single 1.8V supply.

Keywords: Clock and data recovery, 1/4-rate frequency detector, 1/4-rate phase detector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2878
6916 Very High Speed Data Driven Dynamic NAND Gate at 22nm High K Metal Gate Strained Silicon Technology Node

Authors: Shobha Sharma, Amita Dev

Abstract:

Data driven dynamic logic is the high speed dynamic circuit with low area. The clock of the dynamic circuit is removed and data drives the circuit instead of clock for precharging purpose. This data driven dynamic nand gate is given static forward substrate biasing of Vsupply/2 as well as the substrate bias is connected to the input data, resulting in dynamic substrate bias. The dynamic substrate bias gives the shortest propagation delay with a penalty on the power dissipation. Propagation delay is reduced by 77.8% compared to the normal reverse substrate bias Data driven dynamic nand. Also dynamic substrate biased D3nand’s propagation delay is reduced by 31.26% compared to data driven dynamic nand gate with static forward substrate biasing of Vdd/2. This data driven dynamic nand gate with dynamic body biasing gives us the highest speed with no area penalty and finds its applications where power penalty is acceptable. Also combination of Dynamic and static Forward body bias can be used with reduced propagation delay compared to static forward biased circuit and with comparable increase in an average power. The simulations were done on hspice simulator with 22nm High-k metal gate strained Si technology HP models of Arizona State University, USA.

Keywords: Data driven nand gate, dynamic substrate biasing, nand gate, static substrate biasing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1571
6915 Soft Computing based Retrieval System for Medical Applications

Authors: Pardeep Singh, Sanjay Sharma

Abstract:

With increasing data in medical databases, medical data retrieval is growing in popularity. Some of this analysis including inducing propositional rules from databases using many soft techniques, and then using these rules in an expert system. Diagnostic rules and information on features are extracted from clinical databases on diseases of congenital anomaly. This paper explain the latest soft computing techniques and some of the adaptive techniques encompasses an extensive group of methods that have been applied in the medical domain and that are used for the discovery of data dependencies, importance of features, patterns in sample data, and feature space dimensionality reduction. These approaches pave the way for new and interesting avenues of research in medical imaging and represent an important challenge for researchers.

Keywords: CBIR, GA, Rough sets, CBMIR, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1693
6914 Secure Power Systems Against Malicious Cyber-Physical Data Attacks: Protection and Identification

Authors: Morteza Talebi, Jianan Wang, Zhihua Qu

Abstract:

The security of power systems against malicious cyberphysical data attacks becomes an important issue. The adversary always attempts to manipulate the information structure of the power system and inject malicious data to deviate state variables while evading the existing detection techniques based on residual test. The solutions proposed in the literature are capable of immunizing the power system against false data injection but they might be too costly and physically not practical in the expansive distribution network. To this end, we define an algebraic condition for trustworthy power system to evade malicious data injection. The proposed protection scheme secures the power system by deterministically reconfiguring the information structure and corresponding residual test. More importantly, it does not require any physical effort in either microgrid or network level. The identification scheme of finding meters being attacked is proposed as well. Eventually, a well-known IEEE 30-bus system is adopted to demonstrate the effectiveness of the proposed schemes.

Keywords: Algebraic Criterion, Malicious Cyber-Physical Data Injection, Protection and Identification, Trustworthy Power System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1952
6913 NSBS: Design of a Network Storage Backup System

Authors: Xinyan Zhang, Zhipeng Tan, Shan Fan

Abstract:

The first layer of defense against data loss is the backup data. This paper implements an agent-based network backup system used the backup, server-storage and server-backup agent these tripartite construction, and the snapshot and hierarchical index are used in the NSBS. It realizes the control command and data flow separation, balances the system load, thereby improving efficiency of the system backup and recovery. The test results show the agent-based network backup system can effectively improve the task-based concurrency, reasonably allocate network bandwidth, the system backup performance loss costs smaller and improves data recovery efficiency by 20%.

Keywords: Agent, network backup system, three architecture model, NSBS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2184
6912 Estimating the Life-Distribution Parameters of Weibull-Life PV Systems Utilizing Non-Parametric Analysis

Authors: Saleem Z. Ramadan

Abstract:

In this paper, a model is proposed to determine the life distribution parameters of the useful life region for the PV system utilizing a combination of non-parametric and linear regression analysis for the failure data of these systems. Results showed that this method is dependable for analyzing failure time data for such reliable systems when the data is scarce.

Keywords: Masking, Bathtub model, reliability, non-parametric analysis, useful life.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1801
6911 Novel Security Strategy for Real Time Digital Videos

Authors: Prakash Devale, R. S. Prasad, Amol Dhumane, Pritesh Patil

Abstract:

Now a days video data embedding approach is a very challenging and interesting task towards keeping real time video data secure. We can implement and use this technique with high-level applications. As the rate-distortion of any image is not confirmed, because the gain provided by accurate image frame segmentation are balanced by the inefficiency of coding objects of arbitrary shape, with a lot factors like losses that depend on both the coding scheme and the object structure. By using rate controller in association with the encoder one can dynamically adjust the target bitrate. This paper discusses about to keep secure videos by mixing signature data with negligible distortion in the original video, and to keep steganographic video as closely as possible to the quality of the original video. In this discussion we propose the method for embedding the signature data into separate video frames by the use of block Discrete Cosine Transform. These frames are then encoded by real time encoding H.264 scheme concepts. After processing, at receiver end recovery of original video and the signature data is proposed.

Keywords: Data Hiding, Digital Watermarking, video coding H.264, Rate Control, Block DCT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1533
6910 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 443
6909 An Anomaly Detection Approach to Detect Unexpected Faults in Recordings from Test Drives

Authors: Andreas Theissler, Ian Dear

Abstract:

In the automotive industry test drives are being conducted during the development of new vehicle models or as a part of quality assurance of series-production vehicles. The communication on the in-vehicle network, data from external sensors, or internal data from the electronic control units is recorded by automotive data loggers during the test drives. The recordings are used for fault analysis. Since the resulting data volume is tremendous, manually analysing each recording in great detail is not feasible. This paper proposes to use machine learning to support domainexperts by preventing them from contemplating irrelevant data and rather pointing them to the relevant parts in the recordings. The underlying idea is to learn the normal behaviour from available recordings, i.e. a training set, and then to autonomously detect unexpected deviations and report them as anomalies. The one-class support vector machine “support vector data description” is utilised to calculate distances of feature vectors. SVDDSUBSEQ is proposed as a novel approach, allowing to classify subsequences in multivariate time series data. The approach allows to detect unexpected faults without modelling effort as is shown with experimental results on recordings from test drives.

Keywords: Anomaly detection, fault detection, test drive analysis, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2435
6908 Decision Tree for Competing Risks Survival Probability in Breast Cancer Study

Authors: N. A. Ibrahim, A. Kudus, I. Daud, M. R. Abu Bakar

Abstract:

Competing risks survival data that comprises of more than one type of event has been used in many applications, and one of these is in clinical study (e.g. in breast cancer study). The decision tree method can be extended to competing risks survival data by modifying the split function so as to accommodate two or more risks which might be dependent on each other. Recently, researchers have constructed some decision trees for recurrent survival time data using frailty and marginal modelling. We further extended the method for the case of competing risks. In this paper, we developed the decision tree method for competing risks survival time data based on proportional hazards for subdistribution of competing risks. In particular, we grow a tree by using deviance statistic. The application of breast cancer data is presented. Finally, to investigate the performance of the proposed method, simulation studies on identification of true group of observations were executed.

Keywords: Competing risks, Decision tree, Simulation, Subdistribution Proportional Hazard.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2339
6907 Spatially Random Sampling for Retail Food Risk Factors Study

Authors: Guilan Huang

Abstract:

In 2013 and 2014, the U.S. Food and Drug Administration (FDA) collected data from selected fast food restaurants and full service restaurants for tracking changes in the occurrence of foodborne illness risk factors. This paper discussed how we customized spatial random sampling method by considering financial position and availability of FDA resources, and how we enriched restaurants data with location. Location information of restaurants provides opportunity for quantitatively determining random sampling within non-government units (e.g.: 240 kilometers around each data-collector). Spatial analysis also could optimize data-collectors’ work plans and resource allocation. Spatial analytic and processing platform helped us handling the spatial random sampling challenges. Our method fits in FDA’s ability to pinpoint features of foodservice establishments, and reduced both time and expense on data collection.

Keywords: Geospatial technology, restaurant, retail food risk factors study, spatial random sampling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1422
6906 Anisotropic Total Fractional Order Variation Model in Seismic Data Denoising

Authors: Jianwei Ma, Diriba Gemechu

Abstract:

In seismic data processing, attenuation of random noise is the basic step to improve quality of data for further application of seismic data in exploration and development in different gas and oil industries. The signal-to-noise ratio of the data also highly determines quality of seismic data. This factor affects the reliability as well as the accuracy of seismic signal during interpretation for different purposes in different companies. To use seismic data for further application and interpretation, we need to improve the signal-to-noise ration while attenuating random noise effectively. To improve the signal-to-noise ration and attenuating seismic random noise by preserving important features and information about seismic signals, we introduce the concept of anisotropic total fractional order denoising algorithm. The anisotropic total fractional order variation model defined in fractional order bounded variation is proposed as a regularization in seismic denoising. The split Bregman algorithm is employed to solve the minimization problem of the anisotropic total fractional order variation model and the corresponding denoising algorithm for the proposed method is derived. We test the effectiveness of theproposed method for synthetic and real seismic data sets and the denoised result is compared with F-X deconvolution and non-local means denoising algorithm.

Keywords: Anisotropic total fractional order variation, fractional order bounded variation, seismic random noise attenuation, Split Bregman Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 965
6905 Improving Academic Performance Prediction using Voting Technique in Data Mining

Authors: Ikmal Hisyam Mohamad Paris, Lilly Suriani Affendey, Norwati Mustapha

Abstract:

In this paper we compare the accuracy of data mining methods to classifying students in order to predicting student-s class grade. These predictions are more useful for identifying weak students and assisting management to take remedial measures at early stages to produce excellent graduate that will graduate at least with second class upper. Firstly we examine single classifiers accuracy on our data set and choose the best one and then ensembles it with a weak classifier to produce simple voting method. We present results show that combining different classifiers outperformed other single classifiers for predicting student performance.

Keywords: Classification, Data Mining, Prediction, Combination of Multiple Classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2715
6904 Extracting Terrain Points from Airborne Laser Scanning Data in Densely Forested Areas

Authors: Ziad Abdeldayem, Jakub Markiewicz, Kunal Kansara, Laura Edwards

Abstract:

Airborne Laser Scanning (ALS) is one of the main technologies for generating high-resolution digital terrain models (DTMs). DTMs are crucial to several applications, such as topographic mapping, flood zone delineation, geographic information systems (GIS), hydrological modelling, spatial analysis, etc. Laser scanning system generates irregularly spaced three-dimensional cloud of points. Raw ALS data are mainly ground points (that represent the bare earth) and non-ground points (that represent buildings, trees, cars, etc.). Removing all the non-ground points from the raw data is referred to as filtering. Filtering heavily forested areas is considered a difficult and challenging task as the canopy stops laser pulses from reaching the terrain surface. This research presents an approach for removing non-ground points from raw ALS data in densely forested areas. Smoothing splines are exploited to interpolate and fit the noisy ALS data. The presented filter utilizes a weight function to allocate weights for each point of the data. Furthermore, unlike most of the methods, the presented filtering algorithm is designed to be automatic. Three different forested areas in the United Kingdom are used to assess the performance of the algorithm. The results show that the generated DTMs from the filtered data are accurate (when compared against reference terrain data) and the performance of the method is stable for all the heavily forested data samples. The average root mean square error (RMSE) value is 0.35 m.

Keywords: Airborne laser scanning, digital terrain models, filtering, forested areas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 670