Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 40634

Search results for: data envelopment analysis (DEA)

40214 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 559

40213 Data Presentation of Lane-Changing Events Trajectories Using HighD Dataset

Authors: Basma Khelfa, Antoine Tordeux, Ibrahima Ba

Abstract:

We present a descriptive analysis data of lane-changing events in multi-lane roads. The data are provided from The Highway Drone Dataset (HighD), which are microscopic trajectories in highway. This paper describes and analyses the role of the different parameters and their significance. Thanks to HighD data, we aim to find the most frequent reasons that motivate drivers to change lanes. We used the programming language R for the processing of these data. We analyze the involvement and relationship of different variables of each parameter of the ego vehicle and the four vehicles surrounding it, i.e., distance, speed difference, time gap, and acceleration. This was studied according to the class of the vehicle (car or truck), and according to the maneuver it undertook (overtaking or falling back).

Keywords: autonomous driving, physical traffic model, prediction model, statistical learning process

Procedia PDF Downloads 232

40212 AI Software Algorithms for Drivers Monitoring within Vehicles Traffic - SiaMOTO

Authors: Ioan Corneliu Salisteanu, Valentin Dogaru Ulieru, Mihaita Nicolae Ardeleanu, Alin Pohoata, Bogdan Salisteanu, Stefan Broscareanu

Abstract:

Creating a personalized statistic for an individual within the population using IT systems, based on the searches and intercepted spheres of interest they manifest, is just one 'atom' of the artificial intelligence analysis network. However, having the ability to generate statistics based on individual data intercepted from large demographic areas leads to reasoning like that issued by a human mind with global strategic ambitions. The DiaMOTO device is a technical sensory system that allows the interception of car events caused by a driver, positioning them in time and space. The device's connection to the vehicle allows the creation of a source of data whose analysis can create psychological, behavioural profiles of the drivers involved. The SiaMOTO system collects data from many vehicles equipped with DiaMOTO, driven by many different drivers with a unique fingerprint in their approach to driving. In this paper, we aimed to explain the software infrastructure of the SiaMOTO system, a system designed to monitor and improve driver driving behaviour, as well as the criteria and algorithms underlying the intelligent analysis process.

Keywords: artificial intelligence, data processing, driver behaviour, driver monitoring, SiaMOTO

Procedia PDF Downloads 51

40211 Operating Speed Models on Tangent Sections of Two-Lane Rural Roads

Authors: Dražen Cvitanić, Biljana Maljković

Abstract:

This paper presents models for predicting operating speeds on tangent sections of two-lane rural roads developed on continuous speed data. The data corresponds to 20 drivers of different ages and driving experiences, driving their own cars along an 18 km long section of a state road. The data were first used for determination of maximum operating speeds on tangents and their comparison with speeds in the middle of tangents i.e. speed data used in most of operating speed studies. Analysis of continuous speed data indicated that the spot speed data are not reliable indicators of relevant speeds. After that, operating speed models for tangent sections were developed. There was no significant difference between models developed using speed data in the middle of tangent sections and models developed using maximum operating speeds on tangent sections. All developed models have higher coefficient of determination then models developed on spot speed data. Thus, it can be concluded that the method of measuring has more significant impact on the quality of operating speed model than the location of measurement.

Keywords: operating speed, continuous speed data, tangent sections, spot speed, consistency

Procedia PDF Downloads 424

40210 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data

Authors: S. Nickolas, Shobha K.

Abstract:

The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.

Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing

Procedia PDF Downloads 249

40209 Techno-Economic Analysis of Solar Energy for Cathodic Protection of Oil and Gas Buried Pipelines in Southwestern of Iran

Authors: M. Goodarzi, M. Mohammadi, A. Gharib

Abstract:

Solar energy is a renewable energy which has attracted special attention in many countries. Solar cathodic protectionsystems harness the sun’senergy to protect underground pipelinesand tanks from galvanic corrosion. The object of this study is to design and the economic analysis a cathodic protection system by impressed current supplied with solar energy panels applied to underground pipelines. In the present study, the technical and economic analysis of using solar energy for cathodic protection system in southwestern of Iran (Khuzestan province) is investigated. For this purpose, the ecological conditions such as the weather data, air clearness and sunshine hours are analyzed. The economic analyses were done using computer code to investigate the feasibility analysis from the using of various energy sources in order to cathodic protection system. The overall research methodology is divided into four components: Data collection, design of elements, techno economical evaluation, and output analysis. According to the results, solar renewable energy systems can supply adequate power for cathodic protection system purposes.

Keywords: renewable energy, solar energy, solar cathodic protection station, lifecycle cost method

Procedia PDF Downloads 507

40208 Real-Time Big-Data Warehouse a Next-Generation Enterprise Data Warehouse and Analysis Framework

Authors: Abbas Raza Ali

Abstract:

Big Data technology is gradually becoming a dire need of large enterprises. These enterprises are generating massively large amount of off-line and streaming data in both structured and unstructured formats on daily basis. It is a challenging task to effectively extract useful insights from the large scale datasets, even though sometimes it becomes a technology constraint to manage transactional data history of more than a few months. This paper presents a framework to efficiently manage massively large and complex datasets. The framework has been tested on a communication service provider producing massively large complex streaming data in binary format. The communication industry is bound by the regulators to manage history of their subscribers’ call records where every action of a subscriber generates a record. Also, managing and analyzing transactional data allows service providers to better understand their customers’ behavior, for example, deep packet inspection requires transactional internet usage data to explain internet usage behaviour of the subscribers. However, current relational database systems limit service providers to only maintain history at semantic level which is aggregated at subscriber level. The framework addresses these challenges by leveraging Big Data technology which optimally manages and allows deep analysis of complex datasets. The framework has been applied to offload existing Intelligent Network Mediation and relational Data Warehouse of the service provider on Big Data. The service provider has 50+ million subscriber-base with yearly growth of 7-10%. The end-to-end process takes not more than 10 minutes which involves binary to ASCII decoding of call detail records, stitching of all the interrogations against a call (transformations) and aggregations of all the call records of a subscriber.

Keywords: big data, communication service providers, enterprise data warehouse, stream computing, Telco IN Mediation

Procedia PDF Downloads 150

40207 Syndromic Surveillance Framework Using Tweets Data Analytics

Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden

Abstract:

Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.

Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza

Procedia PDF Downloads 89

40206 Evaluating Performance of an Anomaly Detection Module with Artificial Neural Network Implementation

Authors: Edward Guillén, Jhordany Rodriguez, Rafael Páez

Abstract:

Anomaly detection techniques have been focused on two main components: data extraction and selection and the second one is the analysis performed over the obtained data. The goal of this paper is to analyze the influence that each of these components has over the system performance by evaluating detection over network scenarios with different setups. The independent variables are as follows: the number of system inputs, the way the inputs are codified and the complexity of the analysis techniques. For the analysis, some approaches of artificial neural networks are implemented with different number of layers. The obtained results show the influence that each of these variables has in the system performance.

Keywords: network intrusion detection, machine learning, artificial neural network, anomaly detection module

Procedia PDF Downloads 307

40205 Sensor Data Analysis for a Large Mining Major

Authors: Sudipto Shanker Dasgupta

Abstract:

One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.

Keywords: streaming analytics, data science, big data, Hadoop, high throughput, sensor data

Procedia PDF Downloads 380

40204 Analysis of the Impact of Climate Change on Maize (Zea Mays) Yield in Central Ethiopia

Authors: Takele Nemomsa, Girma Mamo, Tesfaye Balemi

Abstract:

Climate change refers to a change in the state of the climate that can be identified (e.g. using statistical tests) by changes in the mean and/or variance of its properties and that persists for an extended period, typically decades or longer. In Ethiopia; Maize production in relation to climate change at regional and sub- regional scales have not been studied in detail. Thus, this study was aimed to analyse the impact of climate change on maize yield in Ambo Districts, Central Ethiopia. To this effect, weather data, soil data and maize experimental data for Arganne hybrid were used. APSIM software was used to investigate the response of maize (Zea mays) yield to different agronomic management practices using current and future (2020s–2080s) climate data. The climate change projections data which were downscaled using SDSM were used as input of climate data for the impact analysis. Compared to agronomic practices the impact of climate change on Arganne in Central Ethiopia is minute. However, within 2020s-2080s in Ambo area; the yield of Arganne hybrid is projected to reduce by 1.06% to 2.02%, and in 2050s it is projected to reduce by 1.56 While in 2080s; it is projected to increase by 1.03% to 2.07%. Thus, to adapt to the changing climate; farmers should consider increasing plant density and fertilizer rate per hectare.

Keywords: APSIM, downscaling, response, SDSM

Procedia PDF Downloads 351

40203 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 400

40202 Exploiting Kinetic and Kinematic Data to Plot Cyclograms for Managing the Rehabilitation Process of BKAs by Applying Neural Networks

Authors: L. Parisi

Abstract:

Kinematic data wisely correlate vector quantities in space to scalar parameters in time to assess the degree of symmetry between the intact limb and the amputated limb with respect to a normal model derived from the gait of control group participants. Furthermore, these particular data allow a doctor to preliminarily evaluate the usefulness of a certain rehabilitation therapy. Kinetic curves allow the analysis of ground reaction forces (GRFs) to assess the appropriateness of human motion. Electromyography (EMG) allows the analysis of the fundamental lower limb force contributions to quantify the level of gait asymmetry. However, the use of this technological tool is expensive and requires patient’s hospitalization. This research work suggests overcoming the above limitations by applying artificial neural networks.

Keywords: kinetics, kinematics, cyclograms, neural networks, transtibial amputation

Procedia PDF Downloads 422

40201 Does Level of Countries Corruption Affect Firms Working Capital Management?

Authors: Ebrahim Mansoori, Datin Joriah Muhammad

Abstract:

Recent studies in finance have focused on the effect of external variables on working capital management. This study investigates the effect of corruption indexes on firms' working capital management. A large data set that covers data from 2005 to 2013 from five ASEAN countries, namely, Malaysia, Indonesia, Singapore, Thailand, and the Philippines, was selected to investigate how the level of corruption in these countries affect working capital management. The results of panel data analysis include fixed effect estimations showed that a high level of countries' corruption indexes encourages managers to shorten the CCC length. Meanwhile, the managers reduce the level of investment in cash and cash equivalents when the levels of corruption indexes increase. Therefore, increasing the level of countries' corruption indexes encourages managers to select conservative working capital strategies by reducing the level of NLB.

Keywords: ASEAN, corruption indexes, panel data analysis, working capital management

Procedia PDF Downloads 407

40200 A Knee Modular Orthosis Design Based on Kinematic Considerations

Authors: C. Copilusi, C. Ploscaru

Abstract:

This paper addresses attention to a research regarding the design of a knee orthosis in a modular form used on children walking rehabilitation. This research is focused on the human lower limb kinematic analysis which will be used as input data on virtual simulations and prototype validation. From this analysis, important data will be obtained and used as input for virtual simulations of the knee modular orthosis. Thus, a knee orthosis concept was obtained and validated through virtual simulations by using MSC Adams software. Based on the obtained results, the modular orthosis prototype will be manufactured and presented in this article.

Keywords: human lower limb, children orthoses, kinematic analysis, knee orthosis

Procedia PDF Downloads 262

40199 Text Analysis to Support Structuring and Modelling a Public Policy Problem-Outline of an Algorithm to Extract Inferences from Textual Data

Authors: Claudia Ehrentraut, Osama Ibrahim, Hercules Dalianis

Abstract:

Policy making situations are real-world problems that exhibit complexity in that they are composed of many interrelated problems and issues. To be effective, policies must holistically address the complexity of the situation rather than propose solutions to single problems. Formulating and understanding the situation and its complex dynamics, therefore, is a key to finding holistic solutions. Analysis of text based information on the policy problem, using Natural Language Processing (NLP) and Text analysis techniques, can support modelling of public policy problem situations in a more objective way based on domain experts knowledge and scientific evidence. The objective behind this study is to support modelling of public policy problem situations, using text analysis of verbal descriptions of the problem. We propose a formal methodology for analysis of qualitative data from multiple information sources on a policy problem to construct a causal diagram of the problem. The analysis process aims at identifying key variables, linking them by cause-effect relationships and mapping that structure into a graphical representation that is adequate for designing action alternatives, i.e., policy options. This study describes the outline of an algorithm used to automate the initial step of a larger methodological approach, which is so far done manually. In this initial step, inferences about key variables and their interrelationships are extracted from textual data to support a better problem structuring. A small prototype for this step is also presented.

Keywords: public policy, problem structuring, qualitative analysis, natural language processing, algorithm, inference extraction

Procedia PDF Downloads 562

40198 Integration of Big Data to Predict Transportation for Smart Cities

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system. The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.

Keywords: big data, machine learning, smart city, social cost, transportation network

Procedia PDF Downloads 230

40197 Language Errors Used in “The Space between Us” Movie and Their Effects on Translation Quality: Translation Study toward Discourse Analysis Approach

Authors: Mochamad Nuruz Zaman, Mangatur Rudolf Nababan, M. A. Djatmika

Abstract:

Both society and education areas teach to have good communication for building the interpersonal skills up. Everyone has the capacity to understand something new, either well comprehension or worst understanding. Worst understanding makes the language errors when the interactions are done by someone in the first meeting, and they do not know before it because of distance area. “The Space between Us” movie delivers the love-adventure story between Mars Boy and Earth Girl. They are so many missing conversations because of the different climate and environment. As the moviegoer also must be focused on the subtitle in order to enjoy well the movie. Furthermore, Indonesia subtitle and English conversation on the movie still have overlapping understanding in the translation. Translation hereby consists of source language -SL- (English conversation) and target language -TL- (Indonesia subtitle). These research gap above is formulated in research question by how the language errors happened in that movie and their effects on translation quality which is deepest analyzed by translation study toward discourse analysis approach. The research goal is to expand the language errors and their translation qualities in order to create a good atmosphere in movie media. The research is studied by embedded research in qualitative design. The research locations consist of setting, participant, and event as focused determined boundary. Sources of datum are “The Space between Us” movie and informant (translation quality rater). The sampling is criterion-based sampling (purposive sampling). Data collection techniques use content analysis and questioner. Data validation applies data source and method triangulation. Data analysis delivers domain, taxonomy, componential, and cultural theme analysis. Data findings on the language errors happened in the movie are referential, register, society, textual, receptive, expressive, individual, group, analogical, transfer, local, and global errors. Data discussions on their effects to translation quality are concentrated by translation techniques on their data findings; they are amplification, borrowing, description, discursive creation, established equivalent, generalization, literal, modulation, particularization, reduction, substitution, and transposition.

Keywords: discourse analysis, language errors, The Space between Us movie, translation techniques, translation quality instruments

Procedia PDF Downloads 187

40196 Simulations to Predict Solar Energy Potential by ERA5 Application at North Africa

Authors: U. Ali Rahoma, Nabil Esawy, Fawzia Ibrahim Moursy, A. H. Hassan, Samy A. Khalil, Ashraf S. Khamees

Abstract:

The design of any solar energy conversion system requires the knowledge of solar radiation data obtained over a long period. Satellite data has been widely used to estimate solar energy where no ground observation of solar radiation is available, yet there are limitations on the temporal coverage of satellite data. Reanalysis is a “retrospective analysis” of the atmosphere parameters generated by assimilating observation data from various sources, including ground observation, satellites, ships, and aircraft observation with the output of NWP (Numerical Weather Prediction) models, to develop an exhaustive record of weather and climate parameters. The evaluation of the performance of reanalysis datasets (ERA-5) for North Africa against high-quality surface measured data was performed using statistical analysis. The estimation of global solar radiation (GSR) distribution over six different selected locations in North Africa during ten years from the period time 2011 to 2020. The root means square error (RMSE), mean bias error (MBE) and mean absolute error (MAE) of reanalysis data of solar radiation range from 0.079 to 0.222, 0.0145 to 0.198, and 0.055 to 0.178, respectively. The seasonal statistical analysis was performed to study seasonal variation of performance of datasets, which reveals the significant variation of errors in different seasons—the performance of the dataset changes by changing the temporal resolution of the data used for comparison. The monthly mean values of data show better performance, but the accuracy of data is compromised. The solar radiation data of ERA-5 is used for preliminary solar resource assessment and power estimation. The correlation coefficient (R2) varies from 0.93 to 99% for the different selected sites in North Africa in the present research. The goal of this research is to give a good representation for global solar radiation to help in solar energy application in all fields, and this can be done by using gridded data from European Centre for Medium-Range Weather Forecasts ECMWF and producing a new model to give a good result.

Keywords: solar energy, solar radiation, ERA-5, potential energy

Procedia PDF Downloads 182

40195 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 133

40194 Traditional Chinese Medicine Treatment for Coronary Heart Disease: a Meta-Analysis

Authors: Yuxi Wang, Xuan Gao

Abstract:

Traditional Chinese medicine has been used in the treatment of coronary heart disease (CHD) for centuries, and in recent years, the research data on the efficacy of traditional Chinese medicine through clinical trials has gradually increased to explore its real efficacy and internal pharmacology. However, due to the complexity of traditional Chinese medicine prescriptions, the efficacy of each component is difficult to clarify, and pharmacological research is challenging. This study aims to systematically review and clarify the clinical efficacy of traditional Chinese medicine in the treatment of coronary heart disease through a meta-analysis. Based on PubMed, CNKI database, Wanfang data, and other databases, eleven randomized controlled trials and 1091 CHD subjects were included. Two researchers conducted a systematic review of the papers and conducted a meta-analysis supporting the positive therapeutic effect of traditional Chinese medicine in the treatment of CHD.

Keywords: coronary heart disease, Chinese medicine, treatment, meta-analysis

Procedia PDF Downloads 92

40193 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse

Procedia PDF Downloads 384

40192 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 368

40191 Value Chain Analysis and Enhancement Added Value in Palm Oil Supply Chain

Authors: Juliza Hidayati, Sawarni Hasibuan

Abstract:

PT. XYZ is a manufacturing company that produces Crude Palm Oil (CPO). The fierce competition in the global markets not only between companies but also a competition between supply chains. This research aims to analyze the supply chain and value chain of Crude Palm Oil (CPO) in the company. Data analysis method used is qualitative analysis and quantitative analysis. The qualitative analysis describes supply chain and value chain, while the quantitative analysis is used to find out value added and the establishment of the value chain. Based on the analysis, the value chain of crude palm oil (CPO) in the company consists of four main actors that are suppliers of raw materials, processing, distributor, and customer. The value chain analysis consists of two actors; those are palm oil plantation and palm oil processing plant. The palm oil plantation activities include nurseries, planting, plant maintenance, harvesting, and shipping. The palm oil processing plant activities include reception, sterilizing, thressing, pressing, and oil classification. The value added of palm oil plantations was 72.42% and the palm oil processing plant was 10.13%.

Keywords: palm oil, value chain, value added, supply chain

Procedia PDF Downloads 339

40190 Measuring Development through Extreme Observations: An Archetypal Analysis Approach to Index Construction

Authors: Claudeline D. Cellan

Abstract:

Development is multifaceted, and efforts to hasten growth in all these facets have been gaining traction in recent years. Thus, producing a composite index that is reflective of these multidimensional impacts captures the interests of policymakers. The problem lies in going through a mixture of theoretical, methodological and empirical decisions and complexities which, when done carelessly, can lead to inconsistent and unreliable results. This study looks into index computation from a different and less complex perspective. Borrowing the idea of archetypes or ‘pure types’, archetypal analysis looks for points in the convex hull of the multivariate data set that captures as much information in the data as possible. The archetypes or 'pure types' are estimated such that they are convex combinations of all the observations, which in turn are convex combinations of the archetypes. This ensures that the archetypes are realistically observable, therefore achievable. In the sense of composite indices, we look for the best among these archetypes and use this as a benchmark for index computation. Its straightforward and simplistic approach does away with aggregation and substitutability problems which are commonly encountered in index computation. As an example of the application of archetypal analysis in index construction, the country data for the Human Development Index (HDI 2017) of the United Nations Development Programme (UNDP) is used. The goal of this exercise is not to replicate the result of the UNDP-computed HDI, but to illustrate the usability of archetypal analysis in index construction. Here best is defined in the context of life, education and gross national income sub-indices. Results show that the HDI from the archetypal analysis has a linear relationship with the UNDP-computed HDI.

Keywords: archetypes, composite index, convex combination, development

Procedia PDF Downloads 94

40189 Relationship between Driving under the Influence and Traffic Safety

Authors: Eun Hak Lee, Young-Hyun Seo, Hosuk Shin, Seung-Young Kho

Abstract:

Among traffic crashes, driving under the influence (DUI) of alcohol is the most dangerous behavior in Seoul, South Korea. In 2016 alone 40 deaths occurred on of 2,857 cases of DUI. Since DUI is one of the major factors in increasing the severity of crashes, the intensive management of DUI required to reduce traffic crash deaths and the crash damages. This study aims to investigate the relationship between DUI and traffic safety in order to establish countermeasures for traffic safety improvement. The analysis was conducted on the habitual drivers who drove under the influence. Information of habitual drivers is matched to crash data and fine data. The descriptive statistics on data used in this study, which consists of driver license acquisition, traffic fine, and crash data provided by the Korean National Police Agency, are described. The drivers under the influence are classified by statistically significant criteria, such as driver’s age, license type, driving experience, and crash reasons. With the results of the analysis, we propose some countermeasures to enhance traffic safety.

Keywords: driving under influence, traffic safety, traffic crash, traffic fine

Procedia PDF Downloads 199

40188 Reliability Analysis of Geometric Performance of Onboard Satellite Sensors: A Study on Location Accuracy

Authors: Ch. Sridevi, A. Chalapathi Rao, P. Srinivasulu

Abstract:

The location accuracy of data products is a critical parameter in assessing the geometric performance of satellite sensors. This study focuses on reliability analysis of onboard sensors to evaluate their performance in terms of location accuracy performance over time. The analysis utilizes field failure data and employs the weibull distribution to determine the reliability and in turn to understand the improvements or degradations over a period of time. The analysis begins by scrutinizing the location accuracy error which is the root mean square (RMS) error of differences between ground control point coordinates observed on the product and the map and identifying the failure data with reference to time. A significant challenge in this study is to thoroughly analyze the possibility of an infant mortality phase in the data. To address this, the Weibull distribution is utilized to determine if the data exhibits an infant stage or if it has transitioned into the operational phase. The shape parameter beta plays a crucial role in identifying this stage. Additionally, determining the exact start of the operational phase and the end of the infant stage poses another challenge as it is crucial to eliminate residual infant mortality or wear-out from the model, as it can significantly increase the total failure rate. To address this, an approach utilizing the well-established statistical Laplace test is applied to infer the behavior of sensors and to accurately ascertain the duration of different phases in the lifetime and the time required for stabilization. This approach also helps in understanding if the bathtub curve model, which accounts for the different phases in the lifetime of a product, is appropriate for the data and whether the thresholds for the infant period and wear-out phase are accurately estimated by validating the data in individual phases with Weibull distribution curve fitting analysis. Once the operational phase is determined, reliability is assessed using Weibull analysis. This analysis not only provides insights into the reliability of individual sensors with regards to location accuracy over the required period of time, but also establishes a model that can be applied to automate similar analyses for various sensors and parameters using field failure data. Furthermore, the identification of the best-performing sensor through this analysis serves as a benchmark for future missions and designs, ensuring continuous improvement in sensor performance and reliability. Overall, this study provides a methodology to accurately determine the duration of different phases in the life data of individual sensors. It enables an assessment of the time required for stabilization and provides insights into the reliability during the operational phase and the commencement of the wear-out phase. By employing this methodology, designers can make informed decisions regarding sensor performance with regards to location accuracy, contributing to enhanced accuracy in satellite-based applications.

Keywords: bathtub curve, geometric performance, Laplace test, location accuracy, reliability analysis, Weibull analysis

Procedia PDF Downloads 48

40187 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: cluster analysis, education, mathematics, profiles

Procedia PDF Downloads 103

40186 A Comparison of Image Data Representations for Local Stereo Matching

Authors: André Smith, Amr Abdel-Dayem

Abstract:

The stereo matching problem, while having been present for several decades, continues to be an active area of research. The goal of this research is to find correspondences between elements found in a set of stereoscopic images. With these pairings, it is possible to infer the distance of objects within a scene, relative to the observer. Advancements in this field have led to experimentations with various techniques, from graph-cut energy minimization to artificial neural networks. At the basis of these techniques is a cost function, which is used to evaluate the likelihood of a particular match between points in each image. While at its core, the cost is based on comparing the image pixel data; there is a general lack of consistency as to what image data representation to use. This paper presents an experimental analysis to compare the effectiveness of more common image data representations. The goal is to determine the effectiveness of these data representations to reduce the cost for the correct correspondence relative to other possible matches.

Keywords: colour data, local stereo matching, stereo correspondence, disparity map

Procedia PDF Downloads 349

40185 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 186

‹
1
2
...
12
13
14
15
16
17
18
...
1354
1355
›