Search results for: Multivariate Data Analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 13405

Search results for: Multivariate Data Analysis

13225 Characterization of Atmospheric Particulate Matter using PIXE Technique

Authors: P.Kothai, P. Prathibha, I.V.Saradhi, G.G. Pandit, V.D. Puranik

Abstract:

Coarse and fine particulate matter were collected at a residential area at Vashi, Navi Mumbai and the filter samples were analysed for trace elements using PIXE technique. The trend of particulate matter showed higher concentrations during winter than the summer and monsoon concentration levels. High concentrations of elements related to soil and sea salt were found in PM10 and PM2.5. Also high levels of zinc and sulphur found in the particulates of both the size fractions. EF analysis showed enrichment of Cu, Cr and Mn only in the fine fraction suggesting their origin from anthropogenic sources. The EF value was observed to be maximum for As, Pb and Zn in the fine particulates. However, crustal derived elements showed very low EF values indicating their origin from soil. The PCA based multivariate studies identified soil, sea salt, combustion and Se sources as common sources for coarse and additionally an industrial source has also been identified for fine particles.

Keywords: EF analysis, PM10, PM2.5, PIXE, PCA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1817
13224 Effects of Energy Consumption on Indoor Air Quality

Authors: M. Raatikainen, J-P. Skön, M. Johansson, K. Leiviskä, M. Kolehmainen

Abstract:

Continuous measurements and multivariate methods are applied in researching the effects of energy consumption on indoor air quality (IAQ) in a Finnish one-family house. Measured data used in this study was collected continuously in a house in Kuopio, Eastern Finland, during fourteen months long period. Consumption parameters measured were the consumptions of district heat, electricity and water. Indoor parameters gathered were temperature, relative humidity (RH), the concentrations of carbon dioxide (CO2) and carbon monoxide (CO) and differential air pressure. In this study, self-organizing map (SOM) and Sammon's mapping were applied to resolve the effects of energy consumption on indoor air quality. Namely, the SOM was qualified as a suitable method having a property to summarize the multivariable dependencies into easily observable two-dimensional map. Accompanying that, the Sammon's mapping method was used to cluster pre-processed data to find similarities of the variables, expressing distances and groups in the data. The methods used were able to distinguish 7 different clusters characterizing indoor air quality and energy efficiency in the study house. The results indicate, that the cost implications in euros of heating and electricity energy vary according to the differential pressure, concentration of carbon dioxide, temperature and season.

Keywords: Indoor air quality, Energy efficiency, Self- organizing map, Sammon's mapping

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1835
13223 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: Simulation data, data summarization, spatial histograms, exploration and visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 704
13222 Analysis of Web User Identification Methods

Authors: Renáta Iváncsy, Sándor Juhász

Abstract:

Web usage mining has become a popular research area, as a huge amount of data is available online. These data can be used for several purposes, such as web personalization, web structure enhancement, web navigation prediction etc. However, the raw log files are not directly usable; they have to be preprocessed in order to transform them into a suitable format for different data mining tasks. One of the key issues in the preprocessing phase is to identify web users. Identifying users based on web log files is not a straightforward problem, thus various methods have been developed. There are several difficulties that have to be overcome, such as client side caching, changing and shared IP addresses and so on. This paper presents three different methods for identifying web users. Two of them are the most commonly used methods in web log mining systems, whereas the third on is our novel approach that uses a complex cookie-based method to identify web users. Furthermore we also take steps towards identifying the individuals behind the impersonal web users. To demonstrate the efficiency of the new method we developed an implementation called Web Activity Tracking (WAT) system that aims at a more precise distinction of web users based on log data. We present some statistical analysis created by the WAT on real data about the behavior of the Hungarian web users and a comprehensive analysis and comparison of the three methods

Keywords: Data preparation, Tracking individuals, Web useridentification, Web usage mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4359
13221 Data Mining Classification Methods Applied in Drug Design

Authors: Mária Stachová, Lukáš Sobíšek

Abstract:

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

Keywords: data mining, classification, drug design, QSAR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2805
13220 A Meta-Analytic Path Analysis of e-Learning Acceptance Model

Authors: David W.S. Tai, Ren-Cheng Zhang, Sheng-Hung Chang, Chin-Pin Chen, Jia-Ling Chen

Abstract:

This study reports results of a meta-analytic path analysis e-learning Acceptance Model with k = 27 studies, Databases searched included Information Sciences Institute (ISI) website. Variables recorded included perceived usefulness, perceived ease of use, attitude toward behavior, and behavioral intention to use e-learning. A correlation matrix of these variables was derived from meta-analytic data and then analyzed by using structural path analysis to test the fitness of the e-learning acceptance model to the observed aggregated data. Results showed the revised hypothesized model to be a reasonable, good fit to aggregated data. Furthermore, discussions and implications are given in this article.

Keywords: E-learning, Meta Analytic Path Analysis, Technology Acceptance Model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2401
13219 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: Big data, big data Analytics, Hadoop framework, cloud computing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2275
13218 Extreme Temperature Forecast in Mbonge, Cameroon through Return Level Analysis of the Generalized Extreme Value (GEV) Distribution

Authors: Nkongho Ayuketang Arreyndip, Ebobenow Joseph

Abstract:

In this paper, temperature extremes are forecast by employing the block maxima method of the Generalized extreme value(GEV) distribution to analyse temperature data from the Cameroon Development Corporation (C.D.C). By considering two sets of data (Raw data and simulated data) and two (stationary and non-stationary) models of the GEV distribution, return levels analysis is carried out and it was found that in the stationary model, the return values are constant over time with the raw data while in the simulated data, the return values show an increasing trend but with an upper bound. In the non-stationary model, the return levels of both the raw data and simulated data show an increasing trend but with an upper bound. This clearly shows that temperatures in the tropics even-though show a sign of increasing in the future, there is a maximum temperature at which there is no exceedence. The results of this paper are very vital in Agricultural and Environmental research.

Keywords: Return level, Generalized extreme value (GEV), Meteorology, Forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2066
13217 Analysis of Lead Time Delays in Supply Chain: A Case Study

Authors: Abdel-Aziz M. Mohamed, Nermeen Coutry

Abstract:

Lead time is a critical measure of a supply chain's performance. It impacts both the customer satisfactions as well as the total cost of inventory. This paper presents the result of a study on the analysis of the customer order lead-time for a multinational company. In the study, the lead time was divided into three stages respectively: order entry, order fulfillment, and order delivery. A sample of size 2,425 order lines was extracted from the company's records to use for this study. The sample data entails information regarding customer orders from the time of order entry until order delivery. Data regarding the lead time of each stage for different orders were also provided. Summary statistics on lead time data reveals that about 30% of the orders were delivered later than the scheduled due date. The result of the multiple linear regression analysis technique revealed that component type, logistics parameter, order size and the customer type have significant impacts on lead time. Data analysis on the stages of lead time indicates that stage 2 consumed over 50% of the lead time. Pareto analysis was made to study the reasons for the customer order delay in each stage. Recommendation was given to resolve the problem.

Keywords: Lead time reduction, customer satisfaction, service quality, statistical analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6616
13216 Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Authors: Kasthurirangan Gopalakrishnan, Siddhartha Khaitan, Anshu Manik

Abstract:

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

Keywords: Artificial neural networks, cluster analysis, Kohonen maps, wine recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2090
13215 Analysis and Comparison of Image Encryption Algorithms

Authors: İsmet Öztürk, İbrahim Soğukpınar

Abstract:

With the fast progression of data exchange in electronic way, information security is becoming more important in data storage and transmission. Because of widely using images in industrial process, it is important to protect the confidential image data from unauthorized access. In this paper, we analyzed current image encryption algorithms and compression is added for two of them (Mirror-like image encryption and Visual Cryptography). Implementations of these two algorithms have been realized for experimental purposes. The results of analysis are given in this paper.

Keywords: image encryption, image cryptosystem, security, transmission

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4911
13214 Using TRACE and SNAP Codes to Establish the Model of Maanshan PWR for SBO Accident

Authors: B. R. Shen, J. R. Wang, J. H. Yang, S. W. Chen, C. Shih, Y. Chiang, Y. F. Chang, Y. H. Huang

Abstract:

In this research, TRACE code with the interface code-SNAP was used to simulate and analyze the SBO (station blackout) accident which occurred in Maanshan PWR (pressurized water reactor) nuclear power plant (NPP). There are four main steps in this research. First, the SBO accident data of Maanshan NPP were collected. Second, the TRACE/SNAP model of Maanshan NPP was established by using these data. Third, this TRACE/SNAP model was used to perform the simulation and analysis of SBO accident. Finally, the simulation and analysis of SBO with mitigation equipments was performed. The analysis results of TRACE are consistent with the data of Maanshan NPP. The mitigation equipments of Maanshan can maintain the safety of Maanshan in the SBO according to the TRACE predictions.

Keywords: PWR, TRACE, SBO, Maanshan.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703
13213 Plant Varieties Selection System

Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh

Abstract:

In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.

Keywords: Plant varieties selection system, decision tree, expert recommendation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755
13212 A Quantitative Tool for Analyze Process Design

Authors: Andrés Carrión García, Aura López de Murillo, José Jabaloyes Vivas, Angela Grisales del Río

Abstract:

Some quality control tools use non metric subjective information coming from experts, who qualify the intensity of relations existing inside processes, but without quantifying them. In this paper we have developed a quality control analytic tool, measuring the impact or strength of the relationship between process operations and product characteristics. The tool includes two models: a qualitative model, allowing relationships description and analysis; and a formal quantitative model, by means of which relationship quantification is achieved. In the first one, concepts from the Graphs Theory were applied to identify those process elements which can be sources of variation, that is, those quality characteristics or operations that have some sort of prelacy over the others and that should become control items. Also the most dependent elements can be identified, that is those elements receiving the effects of elements identified as variation sources. If controls are focused in those dependent elements, efficiency of control is compromised by the fact that we are controlling effects, not causes. The second model applied adapts the multivariate statistical technique of Covariance Structural Analysis. This approach allowed us to quantify the relationships. The computer package LISREL was used to obtain statistics and to validate the model.

Keywords: Characteristics matrix, covariance structure analysis, LISREL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1565
13211 The Comparison of Parental Childrearing Styles and Anxiety in Children with Stuttering and Normal Population

Authors: Pegah Farokhzad

Abstract:

Family has a crucial role in maintaining the physical, social and mental health of the children. Most of the mental and anxiety problems of children reflect the complex interpersonal situations among family members, especially parents. In other words, anxiety problems of the children are correlated with deficit relationships of family members and improper childrearing styles. The parental child rearing styles leads to positive and negative consequences which affect the children’s mental health. Therefore, the present research was aimed to compare the parental childrearing styles and anxiety of children with stuttering and normal population. It was also aimed to study the relationship between parental child rearing styles and anxiety of children. The research sample included 54 boys with stuttering and 54 normal boys who were selected from the children (boys) of Tehran, Iran in the age range of 5 to 8 years in 2013. In order to collect data, Baum-rind Childrearing Styles Inventory and Spence Parental Anxiety Inventory were used. Appropriate descriptive statistical methods and multivariate variance analysis and t test for independent groups were used to test the study hypotheses. Statistical data analyses demonstrated that there was a significant difference between stuttering boys and normal boys in anxiety (t = 7.601, p< 0.01); but there was no significant difference between stuttering boys and normal boys in parental childrearing styles (F = 0.129). There was also not found significant relationship between parental childrearing styles and children anxiety (F = 0.135, p< 0.05). It can be concluded that the influential factors of children’s society are parents, school, teachers, peers and media. So, parental childrearing styles are not the only influential factors on anxiety of children, and other factors including genetic, environment and child experiences are effective in anxiety as well. Details are discussed.

Keywords: Anxiety, Childrearing Styles, Stuttering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3031
13210 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2565
13209 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1788
13208 Adaptive Kernel Principal Analysis for Online Feature Extraction

Authors: Mingtao Ding, Zheng Tian, Haixia Xu

Abstract:

The batch nature limits the standard kernel principal component analysis (KPCA) methods in numerous applications, especially for dynamic or large-scale data. In this paper, an efficient adaptive approach is presented for online extraction of the kernel principal components (KPC). The contribution of this paper may be divided into two parts. First, kernel covariance matrix is correctly updated to adapt to the changing characteristics of data. Second, KPC are recursively formulated to overcome the batch nature of standard KPCA.This formulation is derived from the recursive eigen-decomposition of kernel covariance matrix and indicates the KPC variation caused by the new data. The proposed method not only alleviates sub-optimality of the KPCA method for non-stationary data, but also maintains constant update speed and memory usage as the data-size increases. Experiments for simulation data and real applications demonstrate that our approach yields improvements in terms of both computational speed and approximation accuracy.

Keywords: adaptive method, kernel principal component analysis, online extraction, recursive algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520
13207 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2075
13206 Analysis of Textual Data Based On Multiple 2-Class Classification Models

Authors: Shigeaki Sakurai, Ryohei Orihara

Abstract:

This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2- class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the models. The method extracts expressions from the new items classified into the viewpoints and extracts characteristic expressions corresponding to the viewpoints by comparing the frequency of expressions among the viewpoints. This paper also applies the method to questionnaire data given by guests at a hotel and verifies its effect through numerical experiments.

Keywords: Text mining, Multiple viewpoints, Differential analysis, Questionnaire data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1261
13205 Dynamical Analysis of Circadian Gene Expression

Authors: Carla Layana Luis Diambra

Abstract:

Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.

Keywords: circadian rhythms, clustering, gene expression, PCA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1560
13204 Applying Hybrid Graph Drawing and Clustering Methods on Stock Investment Analysis

Authors: Mouataz Zreika, Maria Estela Varua

Abstract:

Stock investment decisions are often made based on current events of the global economy and the analysis of historical data. Conversely, visual representation could assist investors’ gain deeper understanding and better insight on stock market trends more efficiently. The trend analysis is based on long-term data collection. The study adopts a hybrid method that combines the Clustering algorithm and Force-directed algorithm to overcome the scalability problem when visualizing large data. This method exemplifies the potential relationships between each stock, as well as determining the degree of strength and connectivity, which will provide investors another understanding of the stock relationship for reference. Information derived from visualization will also help them make an informed decision. The results of the experiments show that the proposed method is able to produced visualized data aesthetically by providing clearer views for connectivity and edge weights.

Keywords: Clustering, force-directed, graph drawing, stock investment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564
13203 The Performance Analysis of Error Saturation Nonlinearity LMS in Impulsive Noise based on Weighted-Energy Conservation

Authors: T Panigrahi, G Panda, Mulgrew

Abstract:

This paper introduces a new approach for the performance analysis of adaptive filter with error saturation nonlinearity in the presence of impulsive noise. The performance analysis of adaptive filters includes both transient analysis which shows that how fast a filter learns and the steady-state analysis gives how well a filter learns. The recursive expressions for mean-square deviation(MSD) and excess mean-square error(EMSE) are derived based on weighted energy conservation arguments which provide the transient behavior of the adaptive algorithm. The steady-state analysis for co-related input regressor data is analyzed, so this approach leads to a new performance results without restricting the input regression data to be white.

Keywords: Error saturation nonlinearity, transient analysis, impulsive noise.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1746
13202 Risk-Management by Numerical Pattern Analysis in Data-Mining

Authors: M. Kargar, R. Mirmiran, F. Fartash, T. Saderi

Abstract:

In this paper a new method is suggested for risk management by the numerical patterns in data-mining. These patterns are designed using probability rules in decision trees and are cared to be valid, novel, useful and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. The patterns are analyzed through the produced matrices and some results are pointed out. By using the suggested method the direction of the functionality route in the systems can be controlled and best planning for special objectives be done.

Keywords: Analysis, Data-mining, Pattern, Risk Management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1241
13201 Numerical Simulations of Flood and Inundation in Jobaru River Basin Using Laser Profiler Data

Authors: Hiroto Nakashima, Toshihiro Morita, Koichiro Ohgushi

Abstract:

Laser Profiler (LP) data from aerial laser surveys have been increasingly used as topographical inputs to numerical simulations of flooding and inundation in river basins. LP data has great potential for reproducing topography, but its effective usage has not yet been fully established. In this study, flooding and inundation are simulated numerically using LP data for the Jobaru River basin of Japan’s Saga Plain. The analysis shows that the topography is reproduced satisfactorily in the computational domain with urban and agricultural areas requiring different grid sizes. A 2-D numerical simulation shows that flood flow behavior changes as grid size is varied.

Keywords: LP data, numerical simulation, topological analysis, mesh size.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1496
13200 Big Brain: A Single Database System for a Federated Data Warehouse Architecture

Authors: X. Gumara Rigol, I. Martínez de Apellaniz Anzuola, A. Garcia Serrano, A. Franzi Cros, O. Vidal Calbet, A. Al Maruf

Abstract:

Traditional federated architectures for data warehousing work well when corporations have existing regional data warehouses and there is a need to aggregate data at a global level. Schibsted Media Group has been maturing from a decentralised organisation into a more globalised one and needed to build both some of the regional data warehouses for some brands at the same time as the global one. In this paper, we present the architectural alternatives studied and why a custom federated approach was the notable recommendation to go further with the implementation. Although the data warehouses are logically federated, the implementation uses a single database system which presented many advantages like: cost reduction and improved data access to global users allowing consumers of the data to have a common data model for detailed analysis across different geographies and a flexible layer for local specific needs in the same place.

Keywords: Data integration, data warehousing, federated architecture, online analytical processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 671
13199 dynr.mi: An R Program for Multiple Imputation in Dynamic Modeling

Authors: Yanling Li, Linying Ji, Zita Oravecz, Timothy R. Brick, Michael D. Hunter, Sy-Miin Chow

Abstract:

Assessing several individuals intensively over time yields intensive longitudinal data (ILD). Even though ILD provide rich information, they also bring other data analytic challenges. One of these is the increased occurrence of missingness with increased study length, possibly under non-ignorable missingness scenarios. Multiple imputation (MI) handles missing data by creating several imputed data sets, and pooling the estimation results across imputed data sets to yield final estimates for inferential purposes. In this article, we introduce dynr.mi(), a function in the R package, Dynamic Modeling in R (dynr). The package dynr provides a suite of fast and accessible functions for estimating and visualizing the results from fitting linear and nonlinear dynamic systems models in discrete as well as continuous time. By integrating the estimation functions in dynr and the MI procedures available from the R package, Multivariate Imputation by Chained Equations (MICE), the dynr.mi() routine is designed to handle possibly non-ignorable missingness in the dependent variables and/or covariates in a user-specified dynamic systems model via MI, with convergence diagnostic check. We utilized dynr.mi() to examine, in the context of a vector autoregressive model, the relationships among individuals’ ambulatory physiological measures, and self-report affect valence and arousal. The results from MI were compared to those from listwise deletion of entries with missingness in the covariates. When we determined the number of iterations based on the convergence diagnostics available from dynr.mi(), differences in the statistical significance of the covariate parameters were observed between the listwise deletion and MI approaches. These results underscore the importance of considering diagnostic information in the implementation of MI procedures.

Keywords: Dynamic modeling, missing data, multiple imputation, physiological measures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 768
13198 Measured versus Default Interstate Traffic Data in New Mexico, USA

Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder

Abstract:

This study investigates how the site specific traffic data differs from the Mechanistic Empirical Pavement Design Software default values. Two Weigh-in-Motion (WIM) stations were installed in Interstate-40 (I-40) and Interstate-25 (I-25) to developed site specific data. A computer program named WIM Data Analysis Software (WIMDAS) was developed using Microsoft C-Sharp (.Net) for quality checking and processing of raw WIM data. A complete year data from November 2013 to October 2014 was analyzed using the developed WIM Data Analysis Program. After that, the vehicle class distribution, directional distribution, lane distribution, monthly adjustment factor, hourly distribution, axle load spectra, average number of axle per vehicle, axle spacing, lateral wander distribution, and wheelbase distribution were calculated. Then a comparative study was done between measured data and AASHTOWare default values. It was found that the measured general traffic inputs for I-40 and I-25 significantly differ from the default values.

Keywords: AASHTOWare, Traffic, Weigh-in-Motion, Axle load Distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1654
13197 Improving the Analytical Power of Dynamic DEA Models, by the Consideration of the Shape of the Distribution of Inputs/Outputs Data: A Linear Piecewise Decomposition Approach

Authors: Elias K. Maragos, Petros E. Maravelakis

Abstract:

In Dynamic Data Envelopment Analysis (DDEA), which is a subfield of Data Envelopment Analysis (DEA), the productivity of Decision Making Units (DMUs) is considered in relation to time. In this case, as it is accepted by the most of the researchers, there are outputs, which are produced by a DMU to be used as inputs in a future time. Those outputs are known as intermediates. The common models, in DDEA, do not take into account the shape of the distribution of those inputs, outputs or intermediates data, assuming that the distribution of the virtual value of them does not deviate from linearity. This weakness causes the limitation of the accuracy of the analytical power of the traditional DDEA models. In this paper, the authors, using the concept of piecewise linear inputs and outputs, propose an extended DDEA model. The proposed model increases the flexibility of the traditional DDEA models and improves the measurement of the dynamic performance of DMUs.

Keywords: Data envelopment analysis, Dynamic DEA, Piecewise linear inputs, Piecewise linear outputs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 624
13196 Microscopic Emission and Fuel Consumption Modeling for Light-duty Vehicles Using Portable Emission Measurement System Data

Authors: Wei Lei, Hui Chen, Lin Lu

Abstract:

Microscopic emission and fuel consumption models have been widely recognized as an effective method to quantify real traffic emission and energy consumption when they are applied with microscopic traffic simulation models. This paper presents a framework for developing the Microscopic Emission (HC, CO, NOx, and CO2) and Fuel consumption (MEF) models for light-duty vehicles. The variable of composite acceleration is introduced into the MEF model with the purpose of capturing the effects of historical accelerations interacting with current speed on emission and fuel consumption. The MEF model is calibrated by multivariate least-squares method for two types of light-duty vehicle using on-board data collected in Beijing, China by a Portable Emission Measurement System (PEMS). The instantaneous validation results shows the MEF model performs better with lower Mean Absolute Percentage Error (MAPE) compared to other two models. Moreover, the aggregate validation results tells the MEF model produces reasonable estimations compared to actual measurements with prediction errors within 12%, 10%, 19%, and 9% for HC, CO, NOx emissions and fuel consumption, respectively.

Keywords: Emission, Fuel consumption, Light-duty vehicle, Microscopic, Modeling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954