Search results for: data annotation models
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9044

Search results for: data annotation models

9014 Statistical Analysis for Overdispersed Medical Count Data

Authors: Y. N. Phang, E. F. Loh

Abstract:

Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.

Keywords: Zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3272
9013 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449
9012 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Keywords: Big data, k-NN, machine learning, traffic speed prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1341
9011 An Experiment on Personal Archiving and Retrieving Image System (PARIS)

Authors: Pei-Jeng Kuo, Terumasa Aoki, Hiroshi Yasuda

Abstract:

PARIS (Personal Archiving and Retrieving Image System) is an experiment personal photograph library, which includes more than 80,000 of consumer photographs accumulated within a duration of approximately five years, metadata based on our proposed MPEG-7 annotation architecture, Dozen Dimensional Digital Content (DDDC), and a relational database structure. The DDDC architecture is specially designed for facilitating the managing, browsing and retrieving of personal digital photograph collections. In annotating process, we also utilize a proposed Spatial and Temporal Ontology (STO) designed based on the general characteristic of personal photograph collections. This paper explains PRAIS system.

Keywords: Ontology, Databases and Information Retrieval, MPEG-7, Spatial-Temporal, Digital Library Designs l, metadata, Semantic Web, semi-automatic annotation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1091
9010 Image Ranking to Assist Object Labeling for Training Detection Models

Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman

Abstract:

Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack rarely seen examples. This is due to the process commonly used during human annotation where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.

Keywords: Computer vision, deep learning, object detection, semiconductor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 762
9009 Use of Bayesian Network in Information Extraction from Unstructured Data Sources

Authors: Quratulain N. Rajput, Sajjad Haider

Abstract:

This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values.

Keywords: Information Extraction, Bayesian Network, ontology, Machine Learning

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2192
9008 Peakwise Smoothing of Data Models using Wavelets

Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan

Abstract:

Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.

Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1719
9007 The Effect of Particle Porosity in Mixed Matrix Membrane Permeation Models

Authors: Z. Sadeghi, M. R. Omidkhah, M. E. Masoomi

Abstract:

The purpose of this paper is to examine gas transport behavior of mixed matrix membranes (MMMs) combined with porous particles. Main existing models are categorized in two main groups; two-phase (ideal contact) and three-phase (non-ideal contact). A new coefficient, J, was obtained to express equations for estimating effect of the particle porosity in two-phase and three-phase models. Modified models evaluates with existing models and experimental data using Matlab software. Comparison of gas permeability of proposed modified models with existing models in different MMMs shows a better prediction of gas permeability in MMMs.

Keywords: Mixed Matrix Membrane, Permeation Models, Porous particles, Porosity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2583
9006 System Identification Based on Stepwise Regression for Dynamic Market Representation

Authors: Alexander Efremov

Abstract:

A system for market identification (SMI) is presented. The resulting representations are multivariable dynamic demand models. The market specifics are analyzed. Appropriate models and identification techniques are chosen. Multivariate static and dynamic models are used to represent the market behavior. The steps of the first stage of SMI, named data preprocessing, are mentioned. Next, the second stage, which is the model estimation, is considered in more details. Stepwise linear regression (SWR) is used to determine the significant cross-effects and the orders of the model polynomials. The estimates of the model parameters are obtained by a numerically stable estimator. Real market data is used to analyze SMI performance. The main conclusion is related to the applicability of multivariate dynamic models for representation of market systems.

Keywords: market identification, dynamic models, stepwise regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1584
9005 Estimation of Missing or Incomplete Data in Road Performance Measurement Systems

Authors: Kristjan Kuhi, Kati K. Kaare, Ott Koppel

Abstract:

Modern management in most fields is performance based; both planning and implementation of maintenance and operational activities are driven by appropriately defined performance indicators. Continuous real-time data collection for management is becoming feasible due to technological advancements. Outdated and insufficient input data may result in incorrect decisions. When using deterministic models the uncertainty of the object state is not visible thus applying the deterministic models are more likely to give false diagnosis. Constructing structured probabilistic models of the performance indicators taking into consideration the surrounding indicator environment enables to estimate the trustworthiness of the indicator values. It also assists to fill gaps in data to improve the quality of the performance analysis and management decisions. In this paper authors discuss the application of probabilistic graphical models in the road performance measurement and propose a high-level conceptual model that enables analyzing and predicting more precisely future pavement deterioration based on road utilization.

Keywords: Probabilistic graphical models, performance indicators, road performance management, data collection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1793
9004 Analysis of Textual Data Based On Multiple 2-Class Classification Models

Authors: Shigeaki Sakurai, Ryohei Orihara

Abstract:

This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2- class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the models. The method extracts expressions from the new items classified into the viewpoints and extracts characteristic expressions corresponding to the viewpoints by comparing the frequency of expressions among the viewpoints. This paper also applies the method to questionnaire data given by guests at a hotel and verifies its effect through numerical experiments.

Keywords: Text mining, Multiple viewpoints, Differential analysis, Questionnaire data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1264
9003 Development of a Real-Time Energy Models for Photovoltaic Water Pumping System

Authors: Ammar Mahjoubi, Ridha Fethi Mechlouch, Belgacem Mahdhaoui, Ammar Ben Brahim

Abstract:

This purpose of this paper is to develop and validate a model to accurately predict the cell temperature of a PV module that adapts to various mounting configurations, mounting locations, and climates while only requiring readily available data from the module manufacturer. Results from this model are also compared to results from published cell temperature models. The models were used to predict real-time performance from a PV water pumping systems in the desert of Medenine, south of Tunisia using 60-min intervals of measured performance data during one complete year. Statistical analysis of the predicted results and measured data highlight possible sources of errors and the limitations and/or adequacy of existing models, to describe the temperature and efficiency of PV-cells and consequently, the accuracy of performance of PV water pumping systems prediction models.

Keywords: Temperature of a photovoltaic module, Predicted models, PV water pumping systems efficiency, Simulation, Desert of southern Tunisia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1810
9002 Imputing Missing Data in Electronic Health Records: A Comparison of Linear and Non-Linear Imputation Models

Authors: Alireza Vafaei Sadr, Vida Abedi, Jiang Li, Ramin Zand

Abstract:

Missing data is a common challenge in medical research and can lead to biased or incomplete results. When the data bias leaks into models, it further exacerbates health disparities; biased algorithms can lead to misclassification and reduced resource allocation and monitoring as part of prevention strategies for certain minorities and vulnerable segments of patient populations, which in turn further reduce data footprint from the same population – thus, a vicious cycle. This study compares the performance of six imputation techniques grouped into Linear and Non-Linear models, on two different real-world electronic health records (EHRs) datasets, representing 17864 patient records. The mean absolute percentage error (MAPE) and root mean squared error (RMSE) are used as performance metrics, and the results show that the Linear models outperformed the Non-Linear models in terms of both metrics. These results suggest that sometimes Linear models might be an optimal choice for imputation in laboratory variables in terms of imputation efficiency and uncertainty of predicted values.

Keywords: EHR, Machine Learning, imputation, laboratory variables, algorithmic bias.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 77
9001 Students’ Perception of Using Dental e-Models in an Inquiry-Based Curriculum

Authors: Yanqi Yang, Chongshan Liao, Cheuk Hin Ho, Susan Bridges

Abstract:

Aim: To investigate students’ perceptions of using e-models in an inquiry-based curriculum. Approach: 52 second-year dental students completed a pre- and post-test questionnaire relating to their perceptions of e-models and their use in inquiry-based learning. The pre-test occurred prior to any learning with e-models. The follow-up survey was conducted after one year's experience of using e-models. Results: There was no significant difference between the two sets of questionnaires regarding students’ perceptions of the usefulness of e-models and their willingness to use e-models in future inquiry-based learning. Most students preferred using both plaster models and e-models in tandem. Conclusion: Students did not change their attitude towards e-models and most of them agreed or were neutral that e-models are useful in inquiry-based learning. Whilst recognizing the utility of 3D models for learning, students' preference for combining these with solid models has implications for the development of haptic sensibility in an operative discipline.

Keywords: E-models, inquiry-based curriculum, education.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783
9000 Improve Safety Performance of Un-Signalized Intersections in Oman

Authors: Siham G. Farag

Abstract:

The main objective of this paper is to provide a new methodology for road safety assessment in Oman through the development of suitable accident prediction models. GLM technique with Poisson or NBR using SAS package was carried out to develop these models. The paper utilized the accidents data of 31 un-signalized T-intersections during three years. Five goodness-of-fit measures were used to assess the overall quality of the developed models. Two types of models were developed separately; the flow-based models including only traffic exposure functions, and the full models containing both exposure functions and other significant geometry and traffic variables. The results show that, traffic exposure functions produced much better fit to the accident data. The most effective geometric variables were major-road mean speed, minor-road 85th percentile speed, major-road lane width, distance to the nearest junction, and right-turn curb radius. The developed models can be used for intersection treatment or upgrading and specify the appropriate design parameters of T-intersections. Finally, the models presented in this thesis reflect the intersection conditions in Oman and could represent the typical conditions in several countries in the middle east area, especially gulf countries.

Keywords: Accidents Prediction Models (APMs), Generalized Linear Model (GLM), T-intersections, Oman.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2022
8999 Improving the Analytical Power of Dynamic DEA Models, by the Consideration of the Shape of the Distribution of Inputs/Outputs Data: A Linear Piecewise Decomposition Approach

Authors: Elias K. Maragos, Petros E. Maravelakis

Abstract:

In Dynamic Data Envelopment Analysis (DDEA), which is a subfield of Data Envelopment Analysis (DEA), the productivity of Decision Making Units (DMUs) is considered in relation to time. In this case, as it is accepted by the most of the researchers, there are outputs, which are produced by a DMU to be used as inputs in a future time. Those outputs are known as intermediates. The common models, in DDEA, do not take into account the shape of the distribution of those inputs, outputs or intermediates data, assuming that the distribution of the virtual value of them does not deviate from linearity. This weakness causes the limitation of the accuracy of the analytical power of the traditional DDEA models. In this paper, the authors, using the concept of piecewise linear inputs and outputs, propose an extended DDEA model. The proposed model increases the flexibility of the traditional DDEA models and improves the measurement of the dynamic performance of DMUs.

Keywords: Data envelopment analysis, Dynamic DEA, Piecewise linear inputs, Piecewise linear outputs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 626
8998 Comparison of Machine Learning Techniques for Single Imputation on Audiograms

Authors: Sarah Beaver, Renee Bryce

Abstract:

Audiograms detect hearing impairment, but missing values pose problems. This work explores imputations in an attempt to improve accuracy. This work implements Linear Regression, Lasso, Linear Support Vector Regression, Bayesian Ridge, K Nearest Neighbors (KNN), and Random Forest machine learning techniques to impute audiogram frequencies ranging from 125 Hz to 8000 Hz. The data contain patients who had or were candidates for cochlear implants. Accuracy is compared across two different Nested Cross-Validation k values. Over 4000 audiograms were used from 800 unique patients. Additionally, training on data combines and compares left and right ear audiograms versus single ear side audiograms. The accuracy achieved using Root Mean Square Error (RMSE) values for the best models for Random Forest ranges from 4.74 to 6.37. The R2 values for the best models for Random Forest ranges from .91 to .96. The accuracy achieved using RMSE values for the best models for KNN ranges from 5.00 to 7.72. The R2 values for the best models for KNN ranges from .89 to .95. The best imputation models received R2 between .89 to .96 and RMSE values less than 8dB. We also show that the accuracy of classification predictive models performed better with our imputation models versus constant imputations by a two percent increase.

Keywords: Machine Learning, audiograms, data imputations, single imputations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 66
8997 Electricity Price Forecasting: A Comparative Analysis with Shallow-ANN and DNN

Authors: Fazıl Gökgöz, Fahrettin Filiz

Abstract:

Electricity prices have sophisticated features such as high volatility, nonlinearity and high frequency that make forecasting quite difficult. Electricity price has a volatile and non-random character so that, it is possible to identify the patterns based on the historical data. Intelligent decision-making requires accurate price forecasting for market traders, retailers, and generation companies. So far, many shallow-ANN (artificial neural networks) models have been published in the literature and showed adequate forecasting results. During the last years, neural networks with many hidden layers, which are referred to as DNN (deep neural networks) have been using in the machine learning community. The goal of this study is to investigate electricity price forecasting performance of the shallow-ANN and DNN models for the Turkish day-ahead electricity market. The forecasting accuracy of the models has been evaluated with publicly available data from the Turkish day-ahead electricity market. Both shallow-ANN and DNN approach would give successful result in forecasting problems. Historical load, price and weather temperature data are used as the input variables for the models. The data set includes power consumption measurements gathered between January 2016 and December 2017 with one-hour resolution. In this regard, forecasting studies have been carried out comparatively with shallow-ANN and DNN models for Turkish electricity markets in the related time period. The main contribution of this study is the investigation of different shallow-ANN and DNN models in the field of electricity price forecast. All models are compared regarding their MAE (Mean Absolute Error) and MSE (Mean Square) results. DNN models give better forecasting performance compare to shallow-ANN. Best five MAE results for DNN models are 0.346, 0.372, 0.392, 0,402 and 0.409.

Keywords: Deep learning, artificial neural networks, energy price forecasting, Turkey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1047
8996 Analyzing Data on Breastfeeding Using Dispersed Statistical Models

Authors: Naushad Mamode Khan, Cheika Jahangeer, Maleika Heenaye-Mamode Khan

Abstract:

Exclusive breastfeeding is the feeding of a baby on no other milk apart from breast milk. Exclusive breastfeeding during the first 6 months of life is very important as it supports optimal growth and development during infancy and reduces the risk of obliterating diseases and problems. Moreover, it helps to reduce the incidence and/or severity of diarrhea, lower respiratory infection and urinary tract infection. In this paper, we make a survey of the factors that influence exclusive breastfeeding and use two dispersed statistical models to analyze data. The models are the Generalized Poisson regression model and the Com-Poisson regression models.

Keywords: Exclusive breastfeeding, regression model, generalized poisson, com-poisson.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526
8995 Towards Development of Solution for Business Process-Oriented Data Analysis

Authors: M. Klimavicius

Abstract:

This paper proposes a modeling methodology for the development of data analysis solution. The Author introduce the approach to address data warehousing issues at the at enterprise level. The methodology covers the process of the requirements eliciting and analysis stage as well as initial design of data warehouse. The paper reviews extended business process model, which satisfy the needs of data warehouse development. The Author considers that the use of business process models is necessary, as it reflects both enterprise information systems and business functions, which are important for data analysis. The Described approach divides development into three steps with different detailed elaboration of models. The Described approach gives possibility to gather requirements and display them to business users in easy manner.

Keywords: Data warehouse, data analysis, business processmanagement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1366
8994 Dynamic Capitalization and Visualization Strategy in Collaborative Knowledge Management System for EI Process

Authors: Bolanle F. Oladejo, Victor T. Odumuyiwa, Amos A. David

Abstract:

Knowledge is attributed to human whose problemsolving behavior is subjective and complex. In today-s knowledge economy, the need to manage knowledge produced by a community of actors cannot be overemphasized. This is due to the fact that actors possess some level of tacit knowledge which is generally difficult to articulate. Problem-solving requires searching and sharing of knowledge among a group of actors in a particular context. Knowledge expressed within the context of a problem resolution must be capitalized for future reuse. In this paper, an approach that permits dynamic capitalization of relevant and reliable actors- knowledge in solving decision problem following Economic Intelligence process is proposed. Knowledge annotation method and temporal attributes are used for handling the complexity in the communication among actors and in contextualizing expressed knowledge. A prototype is built to demonstrate the functionalities of a collaborative Knowledge Management system based on this approach. It is tested with sample cases and the result showed that dynamic capitalization leads to knowledge validation hence increasing reliability of captured knowledge for reuse. The system can be adapted to various domains.

Keywords: Actors' communication, knowledge annotation, recursive knowledge capitalization, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1344
8993 Using Perspective Schemata to Model the ETL Process

Authors: Valeria M. Pequeno, Joao Carlos G. M. Pires

Abstract:

Data Warehouses (DWs) are repositories which contain the unified history of an enterprise for decision support. The data must be Extracted from information sources, Transformed and integrated to be Loaded (ETL) into the DW, using ETL tools. These tools focus on data movement, where the models are only used as a means to this aim. Under a conceptual viewpoint, the authors want to innovate the ETL process in two ways: 1) to make clear compatibility between models in a declarative fashion, using correspondence assertions and 2) to identify the instances of different sources that represent the same entity in the real-world. This paper presents the overview of the proposed framework to model the ETL process, which is based on the use of a reference model and perspective schemata. This approach provides the designer with a better understanding of the semantic associated with the ETL process.

Keywords: conceptual data model, correspondence assertions, data warehouse, data integration, ETL process, object relational database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1475
8992 Data-Driven Decision-Making in Digital Entrepreneurship

Authors: Abeba Nigussie Turi, Xiangming Samuel Li

Abstract:

Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.

Keywords: Startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 761
8991 Geopotential Models Evaluation in Algeria Using Stochastic Method, GPS/Leveling and Topographic Data

Authors: M. A. Meslem

Abstract:

For precise geoid determination, we use a reference field to subtract long and medium wavelength of the gravity field from observations data when we use the remove-compute-restore technique. Therefore, a comparison study between considered models should be made in order to select the optimal reference gravity field to be used. In this context, two recent global geopotential models have been selected to perform this comparison study over Northern Algeria. The Earth Gravitational Model (EGM2008) and the Global Gravity Model (GECO) conceived with a combination of the first model with anomalous potential derived from a GOCE satellite-only global model. Free air gravity anomalies in the area under study have been used to compute residual data using both gravity field models and a Digital Terrain Model (DTM) to subtract the residual terrain effect from the gravity observations. Residual data were used to generate local empirical covariance functions and their fitting to the closed form in order to compare their statistical behaviors according to both cases. Finally, height anomalies were computed from both geopotential models and compared to a set of GPS levelled points on benchmarks using least squares adjustment. The result described in details in this paper regarding these two models has pointed out a slight advantage of GECO global model globally through error degree variances comparison and ground-truth evaluation.

Keywords: Quasigeoid, gravity anomalies, covariance, GGM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 845
8990 IoT Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Seani Rananga

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway, and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. Several results obtained from this study on data privacy models show that when two or more data privacy models are integrated via a fog storage gateway, we often have more secure data. Our main focus in the study is to design a framework for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, including its structure, and its interrelationships.

Keywords: IoT, fog storage, cloud storage, data analysis, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 162
8989 Neuro-fuzzy Model and Regression Model a Comparison Study of MRR in Electrical Discharge Machining of D2 Tool Steel

Authors: M. K. Pradhan, C. K. Biswas,

Abstract:

In the current research, neuro-fuzzy model and regression model was developed to predict Material Removal Rate in Electrical Discharge Machining process for AISI D2 tool steel with copper electrode. Extensive experiments were conducted with various levels of discharge current, pulse duration and duty cycle. The experimental data are split into two sets, one for training and the other for validation of the model. The training data were used to develop the above models and the test data, which was not used earlier to develop these models were used for validation the models. Subsequently, the models are compared. It was found that the predicted and experimental results were in good agreement and the coefficients of correlation were found to be 0.999 and 0.974 for neuro fuzzy and regression model respectively

Keywords: Electrical discharge machining, material removal rate, neuro-fuzzy model, regression model, mountain clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1360
8988 Aggregation Scheduling Algorithms in Wireless Sensor Networks

Authors: Min Kyung An

Abstract:

In Wireless Sensor Networks which consist of tiny wireless sensor nodes with limited battery power, one of the most fundamental applications is data aggregation which collects nearby environmental conditions and aggregates the data to a designated destination, called a sink node. Important issues concerning the data aggregation are time efficiency and energy consumption due to its limited energy, and therefore, the related problem, named Minimum Latency Aggregation Scheduling (MLAS), has been the focus of many researchers. Its objective is to compute the minimum latency schedule, that is, to compute a schedule with the minimum number of timeslots, such that the sink node can receive the aggregated data from all the other nodes without any collision or interference. For the problem, the two interference models, the graph model and the more realistic physical interference model known as Signal-to-Interference-Noise-Ratio (SINR), have been adopted with different power models, uniform-power and non-uniform power (with power control or without power control), and different antenna models, omni-directional antenna and directional antenna models. In this survey article, as the problem has proven to be NP-hard, we present and compare several state-of-the-art approximation algorithms in various models on the basis of latency as its performance measure.

Keywords: Data aggregation, convergecast, gathering, approximation, interference, omni-directional, directional.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 768
8987 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain subgroups of time series data with normal distribution from the inflow into wastewater treatment plant data, composed of several groups differing by mean value. Two simple algorithms, K-mean and EM, were chosen as a clustering method. The Rand index was used to measure the similarity. After simple meta-clustering, a regression model was performed for each subgroups. The final model was a sum of the subgroups models. The quality of the obtained model was compared with the regression model made using the same explanatory variables, but with no clustering of data. Results were compared using determination coefficient (R2), measure of prediction accuracy- mean absolute percentage error (MAPE) and comparison on a linear chart. Preliminary results allow us to foresee the potential of the presented technique.

Keywords: Clustering, Data analysis, Data mining, Predictive models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1918
8986 Empirical Roughness Progression Models of Heavy Duty Rural Pavements

Authors: Nahla H. Alaswadko, Rayya A. Hassan, Bayar N. Mohammed

Abstract:

Empirical deterministic models have been developed to predict roughness progression of heavy duty spray sealed pavements for a dataset representing rural arterial roads. The dataset provides a good representation of the relevant network and covers a wide range of operating and environmental conditions. A sample with a large size of historical time series data for many pavement sections has been collected and prepared for use in multilevel regression analysis. The modelling parameters include road roughness as performance parameter and traffic loading, time, initial pavement strength, reactivity level of subgrade soil, climate condition, and condition of drainage system as predictor parameters. The purpose of this paper is to report the approaches adopted for models development and validation. The study presents multilevel models that can account for the correlation among time series data of the same section and to capture the effect of unobserved variables. Study results show that the models fit the data very well. The contribution and significance of relevant influencing factors in predicting roughness progression are presented and explained. The paper concludes that the analysis approach used for developing the models confirmed their accuracy and reliability by well-fitting to the validation data.

Keywords: Roughness progression, empirical model, pavement performance, heavy duty pavement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 767
8985 Mathematical Rescheduling Models for Railway Services

Authors: Zuraida Alwadood, Adibah Shuib, Norlida Abd Hamid

Abstract:

This paper presents the review of past studies concerning mathematical models for rescheduling passenger railway services, as part of delay management in the occurrence of railway disruption. Many past mathematical models highlighted were aimed at minimizing the service delays experienced by passengers during service disruptions. Integer programming (IP) and mixed-integer programming (MIP) models are critically discussed, focusing on the model approach, decision variables, sets and parameters. Some of them have been tested on real-life data of railway companies worldwide, while a few have been validated on fictive data. Based on selected literatures on train rescheduling, this paper is able to assist researchers in the model formulation by providing comprehensive analyses towards the model building. These analyses would be able to help in the development of new approaches in rescheduling strategies or perhaps to enhance the existing rescheduling models and make them more powerful or more applicable with shorter computing time.

Keywords: Mathematical modelling, Mixed-integer programming, Railway rescheduling, Service delays.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3200