Search results for: Data Streams
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7509

Search results for: Data Streams

6969 Improving Academic Performance Prediction using Voting Technique in Data Mining

Authors: Ikmal Hisyam Mohamad Paris, Lilly Suriani Affendey, Norwati Mustapha

Abstract:

In this paper we compare the accuracy of data mining methods to classifying students in order to predicting student-s class grade. These predictions are more useful for identifying weak students and assisting management to take remedial measures at early stages to produce excellent graduate that will graduate at least with second class upper. Firstly we examine single classifiers accuracy on our data set and choose the best one and then ensembles it with a weak classifier to produce simple voting method. We present results show that combining different classifiers outperformed other single classifiers for predicting student performance.

Keywords: Classification, Data Mining, Prediction, Combination of Multiple Classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2754
6968 Extracting Terrain Points from Airborne Laser Scanning Data in Densely Forested Areas

Authors: Ziad Abdeldayem, Jakub Markiewicz, Kunal Kansara, Laura Edwards

Abstract:

Airborne Laser Scanning (ALS) is one of the main technologies for generating high-resolution digital terrain models (DTMs). DTMs are crucial to several applications, such as topographic mapping, flood zone delineation, geographic information systems (GIS), hydrological modelling, spatial analysis, etc. Laser scanning system generates irregularly spaced three-dimensional cloud of points. Raw ALS data are mainly ground points (that represent the bare earth) and non-ground points (that represent buildings, trees, cars, etc.). Removing all the non-ground points from the raw data is referred to as filtering. Filtering heavily forested areas is considered a difficult and challenging task as the canopy stops laser pulses from reaching the terrain surface. This research presents an approach for removing non-ground points from raw ALS data in densely forested areas. Smoothing splines are exploited to interpolate and fit the noisy ALS data. The presented filter utilizes a weight function to allocate weights for each point of the data. Furthermore, unlike most of the methods, the presented filtering algorithm is designed to be automatic. Three different forested areas in the United Kingdom are used to assess the performance of the algorithm. The results show that the generated DTMs from the filtered data are accurate (when compared against reference terrain data) and the performance of the method is stable for all the heavily forested data samples. The average root mean square error (RMSE) value is 0.35 m.

Keywords: Airborne laser scanning, digital terrain models, filtering, forested areas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 718
6967 Blockchain for IoT Security and Privacy in Healthcare Sector

Authors: Umair Shafique, Hafiz Usman Zia, Fiaz Majeed, Samina Naz, Javeria Ahmed, Maleeha Zainab

Abstract:

The Internet of Things (IoT) has become a hot topic for the last couple of years. This innovative technology has shown promising progress in various areas and the world has witnessed exponential growth in multiple application domains. Researchers are working to investigate its aptitudes to get the best from it by harnessing its true potential. But at the same time, IoT networks open up a new aspect of vulnerability and physical threats to data integrity, privacy, and confidentiality. It is due to centralized control, data silos approach for handling information, and a lack of standardization in the IoT networks. As we know, blockchain is a new technology that involves creating secure distributed ledgers to store and communicate data. Some of the benefits include resiliency, integrity, anonymity, decentralization, and autonomous control. The potential for blockchain technology to provide the key to managing and controlling IoT has created a new wave of excitement around the idea of putting that data back into the hands of the end-users. In this manuscript, we have proposed a model that combines blockchain and IoT networks to address potential security and privacy issues in the healthcare domain and how various stakeholders will interact with the system.

Keywords: Internet of Things, IoT, blockchain, data integrity, authentication, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 413
6966 Multidimensional Performance Management

Authors: David Wiese

Abstract:

In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.

Keywords: Data Warehousing, OLAP, Multidimensional Navigation, Performance Diagnosis, Performance Management, Performance Tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2135
6965 A New Algorithm for Enhanced Robustness of Copyright Mark

Authors: Harsh Vikram Singh, S. P. Singh, Anand Mohan

Abstract:

This paper discusses a new heavy tailed distribution based data hiding into discrete cosine transform (DCT) coefficients of image, which provides statistical security as well as robustness against steganalysis attacks. Unlike other data hiding algorithms, the proposed technique does not introduce much effect in the stegoimage-s DCT coefficient probability plots, thus making the presence of hidden data statistically undetectable. In addition the proposed method does not compromise on hiding capacity. When compared to the generic block DCT based data-hiding scheme, our method found more robust against a variety of image manipulating attacks such as filtering, blurring, JPEG compression etc.

Keywords: Information Security, Robust Steganography, Steganalysis, Pareto Probability Distribution function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1797
6964 Liveability of Kuala Lumpur City Centre: An Evaluation of the Happiness Level of the Streets- Activities

Authors: Shuhana Shamsuddin, Nur Rasyiqah Abu Hassan, Ahmad Bashri Sulaiman

Abstract:

Liveable city is referred to as the quality of life in an area that contributes towards a safe, healthy and enjoyable place. This paper discusses the role of the streets- activities in making Kuala Lumpur a liveable city and the happiness level of the residents towards the city-s street activities. The study was conducted using the residents of Kuala Lumpur. A mixed method technique is used with the quantitative data as a main data and supported by the qualitative data. Data were collected using questionnaires, observation and also an interview session with a sample of residents of Kuala Lumpur. The sampling technique is based on multistage cluster data sampling. The findings revealed that, there is still no significant relationship between the length of stay of the resident in Kuala Lumpur with the happiness level towards the street activities that occurred in the city.

Keywords: Liveable city, activities, urban design quality, quality of life, happiness level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2890
6963 Cloud Computing Support for Diagnosing Researches

Authors: A. Amirov, O. Gerget, V. Kochegurov

Abstract:

One of the main biomedical problem lies in detecting dependencies in semi structured data. Solution includes biomedical portal and algorithms (integral rating health criteria, multidimensional data visualization methods). Biomedical portal allows to process diagnostic and research data in parallel mode using Microsoft System Center 2012, Windows HPC Server cloud technologies. Service does not allow user to see internal calculations instead it provides practical interface. When data is sent for processing user may track status of task and will achieve results as soon as computation is completed. Service includes own algorithms and allows diagnosing and predicating medical cases. Approved methods are based on complex system entropy methods, algorithms for determining the energy patterns of development and trajectory models of biological systems and logical–probabilistic approach with the blurring of images.

Keywords: Biomedical portal, cloud computing, diagnostic and prognostic research, mathematical data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644
6962 Non-Invasive Data Extraction from Machine Display Units Using Video Analytics

Authors: Ravneet Kaur, Joydeep Acharya, Sudhanshu Gaur

Abstract:

Artificial Intelligence (AI) has the potential to transform manufacturing by improving shop floor processes such as production, maintenance and quality. However, industrial datasets are notoriously difficult to extract in a real-time, streaming fashion thus, negating potential AI benefits. The main example is some specialized industrial controllers that are operated by custom software which complicates the process of connecting them to an Information Technology (IT) based data acquisition network. Security concerns may also limit direct physical access to these controllers for data acquisition. To connect the Operational Technology (OT) data stored in these controllers to an AI application in a secure, reliable and available way, we propose a novel Industrial IoT (IIoT) solution in this paper. In this solution, we demonstrate how video cameras can be installed in a factory shop floor to continuously obtain images of the controller HMIs. We propose image pre-processing to segment the HMI into regions of streaming data and regions of fixed meta-data. We then evaluate the performance of multiple Optical Character Recognition (OCR) technologies such as Tesseract and Google vision to recognize the streaming data and test it for typical factory HMIs and realistic lighting conditions. Finally, we use the meta-data to match the OCR output with the temporal, domain-dependent context of the data to improve the accuracy of the output. Our IIoT solution enables reliable and efficient data extraction which will improve the performance of subsequent AI applications.

Keywords: Human machine interface, industrial internet of things, internet of things, optical character recognition, video analytic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 739
6961 Estimating the Flow Velocity Using Flow Generated Sound

Authors: Saeed Hosseini, Ali Reza Tahavvor

Abstract:

Sound processing is one the subjects that newly attracts a lot of researchers. It is efficient and usually less expensive than other methods. In this paper the flow generated sound is used to estimate the flow speed of free flows. Many sound samples are gathered. After analyzing the data, a parameter named wave power is chosen. For all samples the wave power is calculated and averaged for each flow speed. A curve is fitted to the averaged data and a correlation between the wave power and flow speed is found. Test data are used to validate the method and errors for all test data were under 10 percent. The speed of the flow can be estimated by calculating the wave power of the flow generated sound and using the proposed correlation.

Keywords: Flow generated sound, sound processing, speed, wave power.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2369
6960 Exponential Particle Swarm Optimization Approach for Improving Data Clustering

Authors: Neveen I. Ghali, Nahed El-Dessouki, Mervat A. N., Lamiaa Bakrawi

Abstract:

In this paper we use exponential particle swarm optimization (EPSO) to cluster data. Then we compare between (EPSO) clustering algorithm which depends on exponential variation for the inertia weight and particle swarm optimization (PSO) clustering algorithm which depends on linear inertia weight. This comparison is evaluated on five data sets. The experimental results show that EPSO clustering algorithm increases the possibility to find the optimal positions as it decrease the number of failure. Also show that (EPSO) clustering algorithm has a smaller quantization error than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm more accurate than (PSO) clustering algorithm.

Keywords: Particle swarm optimization, data clustering, exponential PSO.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690
6959 Biosignal Measurement using Personal Area Network based on Human Body Communication

Authors: Yong-Gyu Lee, Jin-Hee Park, Gilwon Yoon

Abstract:

In this study, we introduced a communication system where human body was used as medium through which data were transferred. Multiple biosignal sensing units were attached to a subject and wireless personal area network was formed. Data of the sensing units were shared among them. We used wideband pulse communication that was simple, low-power consuming and high data rated. Each unit functioned as independent communication device or node. A method of channel search and communication among the modes was developed. A protocol of carrier sense multiple access/collision detect was implemented in order to avoid data collision or interferences. Biosignal sensing units should be located at different locations due to the nature of biosignal origin. Our research provided a flexibility of collecting data without using electrical wires. More non-constrained measurement was accomplished which was more suitable for u-Health monitoring.

Keywords: Human body communication, wideband pulse communication, personal area network, biosignal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2177
6958 Development and Evaluation of a Portable Ammonia Gas Detector

Authors: Jaheon Gu, Wooyong Chung, Mijung Koo, Seonbok Lee, Gyoutae Park, Sangguk Ahn, Hiesik Kim, Jungil Park

Abstract:

In this paper, we present a portable ammonia gas detector for performing the gas safety management efficiently. The display of the detector is separated from its body. The display module is received the data measured from the detector using ZigBee. The detector has a rechargeable li-ion battery which can be use for 11~12 hours, and a Bluetooth module for sending the data to the PC or the smart devices. The data are sent to the server and can access using the web browser or mobile application. The range of the detection concentration is 0~100ppm.

Keywords: Ammonia, detector, gas safety, portable.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
6957 LAYMOD; A Layered and Modular Platform for CAx Collaboration Management and Supporting Product data Integration based on STEP Standard

Authors: Omid F. Valilai, Mahmoud Houshmand

Abstract:

Nowadays companies strive to survive in a competitive global environment. To speed up product development/modifications, it is suggested to adopt a collaborative product development approach. However, despite the advantages of new IT improvements still many CAx systems work separately and locally. Collaborative design and manufacture requires a product information model that supports related CAx product data models. To solve this problem many solutions are proposed, which the most successful one is adopting the STEP standard as a product data model to develop a collaborative CAx platform. However, the improvement of the STEP-s Application Protocols (APs) over the time, huge number of STEP AP-s and cc-s, the high costs of implementation, costly process for conversion of older CAx software files to the STEP neutral file format; and lack of STEP knowledge, that usually slows down the implementation of the STEP standard in collaborative data exchange, management and integration should be considered. In this paper the requirements for a successful collaborative CAx system is discussed. The STEP standard capability for product data integration and its shortcomings as well as the dominant platforms for supporting CAx collaboration management and product data integration are reviewed. Finally a platform named LAYMOD to fulfil the requirements of CAx collaborative environment and integrating the product data is proposed. The platform is a layered platform to enable global collaboration among different CAx software packages/developers. It also adopts the STEP modular architecture and the XML data structures to enable collaboration between CAx software packages as well as overcoming the STEP standard limitations. The architecture and procedures of LAYMOD platform to manage collaboration and avoid contradicts in product data integration are introduced.

Keywords: CAx, Collaboration management, STEP applicationmodules, STEP standard, XML data structures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2218
6956 Grocery Customer Behavior Analysis using RFID-based Shopping Paths Data

Authors: In-Chul Jung, Young S. Kwon

Abstract:

Knowing about the customer behavior in a grocery has been a long-standing issue in the retailing industry. The advent of RFID has made it easier to collect moving data for an individual shopper's behavior. Most of the previous studies used the traditional statistical clustering technique to find the major characteristics of customer behavior, especially shopping path. However, in using the clustering technique, due to various spatial constraints in the store, standard clustering methods are not feasible because moving data such as the shopping path should be adjusted in advance of the analysis, which is time-consuming and causes data distortion. To alleviate this problem, we propose a new approach to spatial pattern clustering based on the longest common subsequence. Experimental results using real data obtained from a grocery confirm the good performance of the proposed method in finding the hot spot, dead spot and major path patterns of customer movements.

Keywords: customer path, shopping behavior, exploratoryanalysis, LCS, RFID

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3149
6955 Soft-Sensor for Estimation of Gasoline Octane Number in Platforming Processes with Adaptive Neuro-Fuzzy Inference Systems (ANFIS)

Authors: Hamed.Vezvaei, Sepideh.Ordibeheshti, Mehdi.Ardjmand

Abstract:

Gasoline Octane Number is the standard measure of the anti-knock properties of a motor in platforming processes, that is one of the important unit operations for oil refineries and can be determined with online measurement or use CFR (Cooperative Fuel Research) engines. Online measurements of the Octane number can be done using direct octane number analyzers, that it is too expensive, so we have to find feasible analyzer, like ANFIS estimators. ANFIS is the systems that neural network incorporated in fuzzy systems, using data automatically by learning algorithms of NNs. ANFIS constructs an input-output mapping based both on human knowledge and on generated input-output data pairs. In this research, 31 industrial data sets are used (21 data for training and the rest of the data used for generalization). Results show that, according to this simulation, hybrid method training algorithm in ANFIS has good agreements between industrial data and simulated results.

Keywords: Adaptive Neuro-Fuzzy Inference Systems, GasolineOctane Number, Soft-sensor, Catalytic Naphtha Reforming

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
6954 Tool for Metadata Extraction and Content Packaging as Endorsed in OAIS Framework

Authors: Payal Abichandani, Rishi Prakash, Paras Nath Barwal, B. K. Murthy

Abstract:

Information generated from various computerization processes is a potential rich source of knowledge for its designated community. To pass this information from generation to generation without modifying the meaning is a challenging activity. To preserve and archive the data for future generations it’s very essential to prove the authenticity of the data. It can be achieved by extracting the metadata from the data which can prove the authenticity and create trust on the archived data. Subsequent challenge is the technology obsolescence. Metadata extraction and standardization can be effectively used to resolve and tackle this problem. Metadata can be categorized at two levels i.e. Technical and Domain level broadly. Technical metadata will provide the information that can be used to understand and interpret the data record, but only this level of metadata isn’t sufficient to create trustworthiness. We have developed a tool which will extract and standardize the technical as well as domain level metadata. This paper is about the different features of the tool and how we have developed this.  

Keywords: Digital Preservation, Metadata, OAIS, PDI, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1824
6953 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2198
6952 Analysis of DNA Microarray Data using Association Rules: A Selective Study

Authors: M. Anandhavalli Gauthaman

Abstract:

DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.

Keywords: DNA microarray, gene expression, association rule mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2145
6951 Prospects, Problems of Marketing Research and Data Mining in Turkey

Authors: Sema Kurtuluş, Kemal Kurtuluş

Abstract:

The objective of this paper is to review and assess the methodological issues and problems in marketing research, data and knowledge mining in Turkey. As a summary, academic marketing research publications in Turkey have significant problems. The most vital problem seems to be related with modeling. Most of the publications had major weaknesses in modeling. There were also, serious problems regarding measurement and scaling, sampling and analyses. Analyses myopia seems to be the most important problem for young academia in Turkey. Another very important finding is the lack of publications on data and knowledge mining in the academic world.

Keywords: Marketing research, data mining, knowledge mining, research modeling, analyses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1968
6950 Analysis and Comparison of Image Encryption Algorithms

Authors: İsmet Öztürk, İbrahim Soğukpınar

Abstract:

With the fast progression of data exchange in electronic way, information security is becoming more important in data storage and transmission. Because of widely using images in industrial process, it is important to protect the confidential image data from unauthorized access. In this paper, we analyzed current image encryption algorithms and compression is added for two of them (Mirror-like image encryption and Visual Cryptography). Implementations of these two algorithms have been realized for experimental purposes. The results of analysis are given in this paper.

Keywords: image encryption, image cryptosystem, security, transmission

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4958
6949 Risk Classification of SMEs by Early Warning Model Based on Data Mining

Authors: Nermin Ozgulbas, Ali Serhan Koyuncugil

Abstract:

One of the biggest problems of SMEs is their tendencies to financial distress because of insufficient finance background. In this study, an Early Warning System (EWS) model based on data mining for financial risk detection is presented. CHAID algorithm has been used for development of the EWS. Developed EWS can be served like a tailor made financial advisor in decision making process of the firms with its automated nature to the ones who have inadequate financial background. Besides, an application of the model implemented which covered 7,853 SMEs based on Turkish Central Bank (TCB) 2007 data. By using EWS model, 31 risk profiles, 15 risk indicators, 2 early warning signals, and 4 financial road maps has been determined for financial risk mitigation.

Keywords: Early Warning Systems, Data Mining, Financial Risk, SMEs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3388
6948 Using Data from Foursquare Web Service to Represent the Commercial Activity of a City

Authors: Taras Agryzkov, Almudena Nolasco-Cirugeda, Jos´e L. Oliver, Leticia Serrano-Estrada, Leandro Tortosa, Jos´e F. Vicent

Abstract:

This paper aims to represent the commercial activity of a city taking as source data the social network Foursquare. The city of Murcia is selected as case study, and the location-based social network Foursquare is the main source of information. After carrying out a reorganisation of the user-generated data extracted from Foursquare, it is possible to graphically display on a map the various city spaces and venues especially those related to commercial, food and entertainment sector businesses. The obtained visualisation provides information about activity patterns in the city of Murcia according to the people‘s interests and preferences and, moreover, interesting facts about certain characteristics of the town itself.

Keywords: Social networks, Foursquare, spatial analysis, data visualization, geocomputation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2678
6947 Long-Range Dependence of Financial Time Series Data

Authors: Chatchai Pesee

Abstract:

This paper examines long-range dependence or longmemory of financial time series on the exchange rate data by the fractional Brownian motion (fBm). The principle of spectral density function in Section 2 is used to find the range of Hurst parameter (H) of the fBm. If 0< H <1/2, then it has a short-range dependence (SRD). It simulates long-memory or long-range dependence (LRD) if 1/2< H <1. The curve of exchange rate data is fBm because of the specific appearance of the Hurst parameter (H). Furthermore, some of the definitions of the fBm, long-range dependence and selfsimilarity are reviewed in Section II as well. Our results indicate that there exists a long-memory or a long-range dependence (LRD) for the exchange rate data in section III. Long-range dependence of the exchange rate data and estimation of the Hurst parameter (H) are discussed in Section IV, while a conclusion is discussed in Section V.

Keywords: Fractional Brownian motion, long-rangedependence, memory, short-range dependence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1884
6946 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2711
6945 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1951
6944 Studies on Determination of the Optimum Distance Between the Tmotes for Optimum Data Transfer in a Network with WLL Capability

Authors: N C Santhosh Kumar, N K Kishore

Abstract:

Using mini modules of Tmotes, it is possible to automate a small personal area network. This idea can be extended to large networks too by implementing multi-hop routing. Linking the various Tmotes using Programming languages like Nesc, Java and having transmitter and receiver sections, a network can be monitored. It is foreseen that, depending on the application, a long range at a low data transfer rate or average throughput may be an acceptable trade-off. To reduce the overall costs involved, an optimum number of Tmotes to be used under various conditions (Indoor/Outdoor) is to be deduced. By analyzing the data rates or throughputs at various locations of Tmotes, it is possible to deduce an optimal number of Tmotes for a specific network. This paper deals with the determination of optimum distances to reduce the cost and increase the reliability of the entire sensor network with Wireless Local Loop (WLL) capability.

Keywords: Average throughput, data rate, multi-hop routing, optimum data transfer, throughput, Tmotes, wireless local loop.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1367
6943 Unified Structured Process for Health Analytics

Authors: Supunmali Ahangama, Danny Chiang Choon Poo

Abstract:

Health analytics (HA) is used in healthcare systems for effective decision making, management and planning of healthcare and related activities. However, user resistances, unique position of medical data content and structure (including heterogeneous and unstructured data) and impromptu HA projects have held up the progress in HA applications. Notably, the accuracy of outcomes depends on the skills and the domain knowledge of the data analyst working on the healthcare data. Success of HA depends on having a sound process model, effective project management and availability of supporting tools. Thus, to overcome these challenges through an effective process model, we propose a HA process model with features from rational unified process (RUP) model and agile methodology.

Keywords: Agile methodology, health analytics, unified process model, UML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
6942 The Classification Model for Hard Disk Drive Functional Tests under Sparse Data Conditions

Authors: S. Pattanapairoj, D. Chetchotsak

Abstract:

This paper proposed classification models that would be used as a proxy for hard disk drive (HDD) functional test equitant which required approximately more than two weeks to perform the HDD status classification in either “Pass" or “Fail". These models were constructed by using committee network which consisted of a number of single neural networks. This paper also included the method to solve the problem of sparseness data in failed part, which was called “enforce learning method". Our results reveal that the constructed classification models with the proposed method could perform well in the sparse data conditions and thus the models, which used a few seconds for HDD classification, could be used to substitute the HDD functional tests.

Keywords: Sparse data, Classifications, Committee network

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
6941 Accurate Position Electromagnetic Sensor Using Data Acquisition System

Authors: Z. Ezzouine, A. Nakheli

Abstract:

This paper presents a high position electromagnetic sensor system (HPESS) that is applicable for moving object detection. The authors have developed a high-performance position sensor prototype dedicated to students’ laboratory. The challenge was to obtain a highly accurate and real-time sensor that is able to calculate position, length or displacement. An electromagnetic solution based on a two coil induction principal was adopted. The HPESS converts mechanical motion to electric energy with direct contact. The output signal can then be fed to an electronic circuit. The voltage output change from the sensor is captured by data acquisition system using LabVIEW software. The displacement of the moving object is determined. The measured data are transmitted to a PC in real-time via a DAQ (NI USB -6281). This paper also describes the data acquisition analysis and the conditioning card developed specially for sensor signal monitoring. The data is then recorded and viewed using a user interface written using National Instrument LabVIEW software. On-line displays of time and voltage of the sensor signal provide a user-friendly data acquisition interface. The sensor provides an uncomplicated, accurate, reliable, inexpensive transducer for highly sophisticated control systems.

Keywords: Electromagnetic sensor, data acquisition, accurately, position measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 961
6940 Calculus Logarithmic Function for Image Encryption

Authors: Adil AL-Rammahi

Abstract:

When we prefer to make the data secure from various attacks and fore integrity of data, we must encrypt the data before it is transmitted or stored. This paper introduces a new effective and lossless image encryption algorithm using a natural logarithmic function. The new algorithm encrypts an image through a three stage process. In the first stage, a reference natural logarithmic function is generated as the foundation for the encryption image. The image numeral matrix is then analyzed to five integer numbers, and then the numbers’ positions are transformed to matrices. The advantages of this method is useful for efficiently encrypting a variety of digital images, such as binary images, gray images, and RGB images without any quality loss. The principles of the presented scheme could be applied to provide complexity and then security for a variety of data systems such as image and others.

Keywords: Linear Systems, Image Encryption, Calculus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2401