Search results for: Data transformation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7764

Search results for: Data transformation

7434 Simulation of Sample Paths of Non Gaussian Stationary Random Fields

Authors: Fabrice Poirion, Benedicte Puig

Abstract:

Mathematical justifications are given for a simulation technique of multivariate nonGaussian random processes and fields based on Rosenblatt-s transformation of Gaussian processes. Different types of convergences are given for the approaching sequence. Moreover an original numerical method is proposed in order to solve the functional equation yielding the underlying Gaussian process autocorrelation function.

Keywords: Simulation, nonGaussian, random field, multivariate, stochastic process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1784
7433 Peakwise Smoothing of Data Models using Wavelets

Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan

Abstract:

Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.

Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708
7432 A New Precautionary Method for Measurement and Improvement the Data Quality

Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi

Abstract:

the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.

Keywords: Data quality, precaution, information system, measurement, improvement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
7431 An Efficient Data Mining Approach on Compressed Transactions

Authors: Jia-Yu Dai, Don-Lin Yang, Jungpin Wu, Ming-Chuan Hung

Abstract:

In an era of knowledge explosion, the growth of data increases rapidly day by day. Since data storage is a limited resource, how to reduce the data space in the process becomes a challenge issue. Data compression provides a good solution which can lower the required space. Data mining has many useful applications in recent years because it can help users discover interesting knowledge in large databases. However, existing compression algorithms are not appropriate for data mining. In [1, 2], two different approaches were proposed to compress databases and then perform the data mining process. However, they all lack the ability to decompress the data to their original state and improve the data mining performance. In this research a new approach called Mining Merged Transactions with the Quantification Table (M2TQT) was proposed to solve these problems. M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate itemsets which are impossible to become frequent in order to improve the performance of mining association rules. The experiments show that M2TQT performs better than existing approaches.

Keywords: Association rule, data mining, merged transaction, quantification table.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1923
7430 Weigh-in-Motion Data Analysis Software for Developing Traffic Data for Mechanistic Empirical Pavement Design

Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder

Abstract:

Currently, there are few user friendly Weigh-in- Motion (WIM) data analysis softwares available which can produce traffic input data for the recently developed AASHTOWare pavement Mechanistic-Empirical (ME) design software. However, these softwares have only rudimentary Quality Control (QC) processes. Therefore, they cannot properly deal with erroneous WIM data. As the pavement performance is highly sensible to the quality of WIM data, it is highly recommended to use more refined QC process on raw WIM data to get a good result. This study develops a userfriendly software, which can produce traffic input for the ME design software. This software takes the raw data (Class and Weight data) collected from the WIM station and processes it with a sophisticated QC procedure. Traffic data such as traffic volume, traffic distribution, axle load spectra, etc. can be obtained from this software; which can directly be used in the ME design software.

Keywords: Weigh-in-motion, software, axle load spectra, traffic distribution, AASHTOWare.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1849
7429 An Expansion Method for Schrödinger Equation of Quantum Billiards with Arbitrary Shapes

Authors: İnci M. Erhan

Abstract:

A numerical method for solving the time-independent Schrödinger equation of a particle moving freely in a three-dimensional axisymmetric region is developed. The boundary of the region is defined by an arbitrary analytic function. The method uses a coordinate transformation and an expansion in eigenfunctions. The effectiveness is checked and confirmed by applying the method to a particular example, which is a prolate spheroid.

Keywords: Bessel functions, Eigenfunction expansion, Quantum billiard, Schrödinger equation, Spherical harmonics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5163
7428 Human Growth Curve Estimation through a Combination of Longitudinal and Cross-sectional Data

Authors: Sedigheh Mirzaei S., Debasis Sengupta

Abstract:

Parametric models have been quite popular for studying human growth, particularly in relation to biological parameters such as peak size velocity and age at peak size velocity. Longitudinal data are generally considered to be vital for fittinga parametric model to individual-specific data, and for studying the distribution of these biological parameters in a human population. However, cross-sectional data are easier to obtain than longitudinal data. In this paper, we present a method of combining longitudinal and cross-sectional data for the purpose of estimating the distribution of the biological parameters. We demonstrate, through simulations in the special case ofthePreece Baines model, how estimates based on longitudinal data can be improved upon by harnessing the information contained in cross-sectional data.We study the extent of improvement for different mixes of the two types of data, and finally illustrate the use of the method through data collected by the Indian Statistical Institute.

Keywords: Preece-Baines growth model, MCMC method, Mixed effect model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2092
7427 Semantic Support for Hypothesis-Based Research from Smart Environment Monitoring and Analysis Technologies

Authors: T. S. Myers, J. Trevathan

Abstract:

Improvements in the data fusion and data analysis phase of research are imperative due to the exponential growth of sensed data. Currently, there are developments in the Semantic Sensor Web community to explore efficient methods for reuse, correlation and integration of web-based data sets and live data streams. This paper describes the integration of remotely sensed data with web-available static data for use in observational hypothesis testing and the analysis phase of research. The Semantic Reef system combines semantic technologies (e.g., well-defined ontologies and logic systems) with scientific workflows to enable hypothesis-based research. A framework is presented for how the data fusion concepts from the Semantic Reef architecture map to the Smart Environment Monitoring and Analysis Technologies (SEMAT) intelligent sensor network initiative. The data collected via SEMAT and the inferred knowledge from the Semantic Reef system are ingested to the Tropical Data Hub for data discovery, reuse, curation and publication.

Keywords: Information architecture, Semantic technologies Sensor networks, Ontologies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675
7426 Data Migration between Document-Oriented and Relational Databases

Authors: Bogdan Walek, Cyril Klimes

Abstract:

Current tools for data migration between documentoriented and relational databases have several disadvantages. We propose a new approach for data migration between documentoriented and relational databases. During data migration the relational schema of the target (relational database) is automatically created from collection of XML documents. Proposed approach is verified on data migration between document-oriented database IBM Lotus/ Notes Domino and relational database implemented in relational database management system (RDBMS) MySQL.

Keywords: data migration, database, document-oriented database, XML, relational schema

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3472
7425 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1072
7424 Simulating the Dynamics of Distribution of Hazardous Substances Emitted by Motor Engines in a Residential Quarter

Authors: S. Grishin

Abstract:

This article is dedicated to development of mathematical models for determining the dynamics of concentration of hazardous substances in urban turbulent atmosphere. Development of the mathematical models implied taking into account the time-space variability of the fields of meteorological items and such turbulent atmosphere data as vortex nature, nonlinear nature, dissipativity and diffusivity. Knowing the turbulent airflow velocity is not assumed when developing the model. However, a simplified model implies that the turbulent and molecular diffusion ratio is a piecewise constant function that changes depending on vertical distance from the earth surface. Thereby an important assumption of vertical stratification of urban air due to atmospheric accumulation of hazardous substances emitted by motor vehicles is introduced into the mathematical model. The suggested simplified non-linear mathematical model of determining the sought exhaust concentration at a priori unknown turbulent flow velocity through non-degenerate transformation is reduced to the model which is subsequently solved analytically.

Keywords: Urban ecology, time-dependent mathematical model, exhaust concentration, turbulent and molecular diffusion, airflow velocity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1374
7423 Organizational Involvement and Employees’ Consumption of New Work Practices in State-owned Enterprises: The Ghanaian Case

Authors: M. Aminu Sanda, K. Ewontumah

Abstract:

This paper explored the challenges faced by the management of a Ghanaian state enterprise in managing conflicts and disturbances associated with its attempt to implement new work practices to enhance its capability to operate as a commercial entity. The purpose was to understand the extent to which organizational involvement, consistency and adaptability influence employees’ consumption of new work practices in transforming the organization’s organizational activity system. Using selfadministered questionnaires, data were collected from one hundred and eighty (180) employees and analyzed using both descriptive and inferential statistics. The results showed that constraints in organizational involvement and adaptability prevented the positive consumption of new work practices by employees in the organization. It is also found that the organization’s employees failed to consume the new practices being implemented, because they perceived the process as non-involving, and as such, did not encourage the development of employee capability, empowerment, and teamwork. The study concluded that the failure of the organization’s management to create opportunities for organizational learning constrained its ability to get employees consume the new work practices, which situation could have facilitated the organization’s capabilities of operating as a commercial entity.

Keywords: Organizational transformation, new work practices, work practice consumption, organizational involvement, state-owned enterprise, Ghana.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534
7422 Power Saving System in Green Data Center

Authors: Joon-young Jung, Dong-oh Kang, Chang-seok Bae

Abstract:

Power consumption is rapidly increased in data centers because the number of data center is increased and more the scale of data center become larger. Therefore, it is one of key research items to reduce power consumption in data center. The peak power of a typical server is around 250 watts. When a server is idle, it continues to use around 60% of the power consumed when in use, though vendors are putting effort into reducing this “idle" power load. Servers tend to work at only around a 5% to 20% utilization rate, partly because of response time concerns. An average of 10% of servers in their data centers was unused. In those reason, we propose dynamic power management system to reduce power consumption in green data center. Experiment result shows that about 55% power consumption is reduced at idle time.

Keywords: Data Center, Green IT, Management Server, Power Saving.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1583
7421 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: Spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2636
7420 Territorial Availability of Social and Economic Infrastructure in Kazakhstan: Comparative Analysis of Urban and Rural Households

Authors: Nazym Shedenova, Aigul Beimisheva

Abstract:

The market transformation in Kazakhstan during the last two decades has essentially strengthened a gap between development of urban and rural areas. Implementation of market institutes, transition from public financing to paid rendering of social services, change of forms of financing of social and economic infrastructure have led to strengthening of an economic inequality of social groups, including growth of stratification of the city and the village. Sociological survey of urban and rural households in Almaty city and villages of Almaty region has been carried out within the international research project “Livelihoods Strategies of Private Households in Central Asia: A Rural–Urban Comparison in Kazakhstan and Kyrgyzstan" (Germany, Kazakhstan, Kyrgyzstan). The analysis of statistical data and results of sociological research of urban and rural households allows us to reveal issues of territorial development, to investigate an availability of medical, educational and other services in the city and the village, to reveal an evaluation urban and rural dwellers of living conditions, to compare economic strategies of households in the city and the village.

Keywords: Urban and rural households, social and economic infrastructure, territorial availability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117
7419 MATLAB-Based Graphical User Interface (GUI) for Data Mining as a Tool for Environment Management

Authors: M. Awawdeh, A. Fedi

Abstract:

The application of data mining to environmental monitoring has become crucial for a number of tasks related to emergency management. Over recent years, many tools have been developed for decision support system (DSS) for emergency management. In this article a graphical user interface (GUI) for environmental monitoring system is presented. This interface allows accomplishing (i) data collection and observation and (ii) extraction for data mining. This tool may be the basis for future development along the line of the open source software paradigm.

Keywords: Data Mining, Environmental data, Mathematical Models, Matlab Graphical User Interface.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4693
7418 Storing OWL Ontologies in SQL Relational Databases

Authors: Irina Astrova, Nahum Korda, Ahto Kalja

Abstract:

Relational databases are often used as a basis for persistent storage of ontologies to facilitate rapid operations such as search and retrieval, and to utilize the benefits of relational databases management systems such as transaction management, security and integrity control. On the other hand, there appear more and more OWL files that contain ontologies. Therefore, this paper proposes to extract ontologies from OWL files and then store them in relational databases. A prerequisite for this storing is transformation of ontologies to relational databases, which is the purpose of this paper.

Keywords: Ontologies, relational databases, SQL, and OWL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5170
7417 Tsunami Inundation Modeling in a Boundary Fitted Curvilinear Grid Model Using the Method of Lines Technique

Authors: M. Ashaque Meah, M. Shah Noor, M Asif Arefin, Md. Fazlul Karim

Abstract:

A numerical technique in a boundary-fitted curvilinear grid model is developed to simulate the extent of inland inundation along the coastal belts of Peninsular Malaysia and Southern Thailand due to 2004 Indian ocean tsunami. Tsunami propagation and run-up are also studied in this paper. The vertically integrated shallow water equations are solved by using the method of lines (MOL). For this purpose the boundary-fitted grids are generated along the coastal and island boundaries and the other open boundaries of the model domain. A transformation is used to the governing equations so that the transformed physical domain is converted into a rectangular one. The MOL technique is applied to the transformed shallow water equations and the boundary conditions so that the equations are converted into ordinary differential equations initial value problem. Finally the 4th order Runge-Kutta method is used to solve these ordinary differential equations. The moving boundary technique is applied instead of fixed sea side wall or fixed coastal boundary to ensure the movement of the coastal boundary. The extent of intrusion of water and associated tsunami propagation are simulated for the 2004 Indian Ocean tsunami along the west coast of Peninsular Malaysia and southern Thailand. The simulated results are compared with the results obtained from a finite difference model and the data available in the USGS website. All simulations show better approximation than earlier research and also show excellent agreement with the observed data.

Keywords: Open boundary condition, moving boundary condition, boundary-fitted curvilinear grids, far field tsunami, Shallow Water Equations, tsunami source, Indonesian tsunami of 2004.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 815
7416 Principal Component Analysis using Singular Value Decomposition of Microarray Data

Authors: Dong Hoon Lim

Abstract:

A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. Principal component analysis(PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. PCA, which can be implemented via a singular value decomposition(SVD), is useful for analysis of microarray data. For application of PCA using SVD we use the DNA microarray data for the small round blue cell tumors(SRBCT) of childhood by Khan et al.(2001). To decide the number of components which account for sufficient amount of information we draw scree plot. Biplot, a graphic display associated with PCA, reveals important features that exhibit relationship between variables and also the relationship of variables with observations.

Keywords: Principal component analysis, singular value decomposition, microarray data, SRBCT

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3210
7415 Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

Authors: Youngji Yoo, Cheong-Sool Park, Jun Seok Kim, Young-Hak Lee, Sung-Shick Kim, Jun-Geol Baek

Abstract:

In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.

Keywords: Semiconductor, non-normal mixed process data, clustering, Statistical Quality Control (SQC), regression tree, Pearson distribution system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1732
7414 Speech Data Compression using Vector Quantization

Authors: H. B. Kekre, Tanuja K. Sarode

Abstract:

Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table shows computational complexity of these three algorithms. Here we have introduced a new performance parameter Average Fractional Change in Speech Sample (AFCSS). Our FCG algorithm gives far better performance considering mean absolute error, AFCSS and complexity as compared to others.

Keywords: Vector Quantization, Data Compression, Encoding, , Speech coding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2356
7413 Ontology and CDSS Based Intelligent Health Data Management in Health Care Server

Authors: Eun-Jung Ko, Hyung-Jik Lee, Jeun-Woo Lee

Abstract:

In ubiqutious healthcare environment, user's health data are transfered to the remote healthcare server by the user's wearable system or mobile phone. These collected user's health data should be managed and analyzed in the healthcare server, so that care giver or user can monitor user's physiological state. In this paper, we designed and developed the intelligent Healthcare Server to manage the user's health data using CDSS and ontology. Our system can analyze user's health data semantically using CDSS and ontology, and report the result of user's physiological raw data to the user and care giver.

Keywords: u-healthcare, CDSS, healthcare server, health data, ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2196
7412 A Genetic Algorithm for Clustering on Image Data

Authors: Qin Ding, Jim Gasvoda

Abstract:

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

Keywords: Clustering, data mining, genetic algorithm, image data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999
7411 A Holistic Framework for Unifying Data Security and Management in Modern Enterprises

Authors: Ashly Joseph

Abstract:

Modern businesses struggle significantly to secure and manage their data properly as the volume and complexity of their data both expand exponentially. Through the use of a multi-layered defense strategy, a centralized management platform, and cutting-edge technologies like AI, this research paper presents a comprehensive framework to integrate data security and management. The constraints of current data protection and management strategies, technological advancements, and the evolving threat landscape are all examined in this article. It suggests best practices for putting into practice integrated data security and governance models, placing an emphasis on ongoing adaptation. The advantages mentioned include a strengthened security posture, simpler procedures, lower costs, and reduced complexity. Additionally, issues including skill shortages, antiquated systems, and cultural obstacles are examined. Security executives and Chief Information Security Officers are given practical advice on how to evaluate, plan, and put into place strong data-centric security and management capabilities. The goal of the paper is to provide a thorough study of the data security and management landscape and to arm contemporary businesses with the knowledge they need to be proactive in protecting their data assets.

Keywords: Data security, security management, cloud computing, cybersecurity, data governance, security architecture, data management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167
7410 Post Mining- Discovering Valid Rules from Different Sized Data Sources

Authors: R. Nedunchezhian, K. Anbumani

Abstract:

A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.

Keywords: Association rules, multiple data stores, synthesizing, valid rules.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1363
7409 RFID-ready Master Data Management for Reverse Logistics

Authors: Jincheol Han, Hyunsun Ju, Jonghoon Chun

Abstract:

Sharing consistent and correct master data among disparate applications in a reverse-logistics chain has long been recognized as an intricate problem. Although a master data management (MDM) system can surely assume that responsibility, applications that need to co-operate with it must comply with proprietary query interfaces provided by the specific MDM system. In this paper, we present a RFID-ready MDM system which makes master data readily available for any participating applications in a reverse-logistics chain. We propose a RFID-wrapper as a part of our MDM. It acts as a gateway between any data retrieval request and query interfaces that process it. With the RFID-wrapper, any participating applications in a reverse-logistics chain can easily retrieve master data in a way that is analogous to retrieval of any other RFID-based logistics transactional data.

Keywords: Reverse Logistics, Master Data Management, RFID.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1927
7408 Dynamic Models versus Frailty Models for Recurrent Event Data

Authors: Entisar A. Elgmati

Abstract:

Recurrent event data is a special type of multivariate survival data. Dynamic and frailty models are one of the approaches that dealt with this kind of data. A comparison between these two models is studied using the empirical standard deviation of the standardized martingale residual processes as a way of assessing the fit of the two models based on the Aalen additive regression model. Here we found both approaches took heterogeneity into account and produce residual standard deviations close to each other both in the simulation study and in the real data set.

Keywords: Dynamic, frailty, misspecification, recurrent events.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2316
7407 Issues and Architecture for Supporting Data Warehouse Queries in Web Portals

Authors: Minsoo Lee, Yoon-kyung Lee, Hyejung Yoon, Soo-kyung Song, Sujeong Cheong

Abstract:

Data Warehousing tools have become very popular and currently many of them have moved to Web-based user interfaces to make it easier to access and use the tools. The next step is to enable these tools to be used within a portal framework. The portal framework consists of pages having several small windows that contain individual data warehouse query results. There are several issues that need to be considered when designing the architecture for a portal enabled data warehouse query tool. Some issues need special techniques that can overcome the limitations that are imposed by the nature of data warehouse queries. Issues such as single sign-on, query result caching and sharing, customization, scheduling and authorization need to be considered. This paper discusses such issues and suggests an architecture to support data warehouse queries within Web portal frameworks.

Keywords: Data Warehousing tools, data warehousing queries, web portal frameworks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2078
7406 Determinants of the Income of Household Level Coir Yarn Labourers in Sri Lanka

Authors: G. H. B. Dilhari, A. A. D. T. Saparamadu

Abstract:

Sri Lanka is one of the prominent countries for the coir production. The coir is one of the by-products of the coconut and the coir industry is considered to be one of the traditional industries in Sri Lanka. Because of the inherent nature of the coir industry, labourers play a significant role in the coir production process. The study has analyzed the determinants of the income of the household level coir yarn labourers. The study was conducted in the Kumarakanda Grama Niladhari division. Simple random sampling was used to generate a sample of 100 household level coir yarn labourers and structured questionnaire, personal interviews, and discussion were performed to gather the required data. The obtained data were statistically analyzed by using Statistical Package for Social Science (SPSS) software. Mann-Whitney U and Kruskal-Wallis test were performed for mean comparison. The findings revealed that the household level coir yarn industry is dominated by the female workers and it was identified that fewer numbers of workers have engaged in this industry as the main occupation. In addition to that, elderly participation in the industry is higher than the younger participation and most of them have engaged in the industry as a source of extra income. Level of education, the methods of engagement, satisfaction, engagement in the industry by the next generation, support from the government, method of government support, working hours per day, employed as a main job, number of completed units per day, suffering from job related diseases and type of the diseases were related with income level of household level coir yarn laboures. The recommendations as to flourish in future includes, technological transformation for coir yarn production, strengthening the raw material base and regulating the raw material supply, introduction of new technologies, markets and training programmes, the establishment of the labourers’ association, the initiation of micro credit schemes and better consideration about the job oriented diseases.

Keywords: Coir, Income, Sri Lanka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1480
7405 Data Mining Using Learning Automata

Authors: M. R. Aghaebrahimi, S. H. Zahiri, M. Amiri

Abstract:

In this paper a data miner based on the learning automata is proposed and is called LA-miner. The LA-miner extracts classification rules from data sets automatically. The proposed algorithm is established based on the function optimization using learning automata. The experimental results on three benchmarks indicate that the performance of the proposed LA-miner is comparable with (sometimes better than) the Ant-miner (a data miner algorithm based on the Ant Colony optimization algorithm) and CNZ (a well-known data mining algorithm for classification).

Keywords: Data mining, Learning automata, Classification rules, Knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1898