Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 42409

Search results for: panel data analysis

41539 Real-Time Big-Data Warehouse a Next-Generation Enterprise Data Warehouse and Analysis Framework

Abstract:

Big Data technology is gradually becoming a dire need of large enterprises. These enterprises are generating massively large amount of off-line and streaming data in both structured and unstructured formats on daily basis. It is a challenging task to effectively extract useful insights from the large scale datasets, even though sometimes it becomes a technology constraint to manage transactional data history of more than a few months. This paper presents a framework to efficiently manage massively large and complex datasets. The framework has been tested on a communication service provider producing massively large complex streaming data in binary format. The communication industry is bound by the regulators to manage history of their subscribers’ call records where every action of a subscriber generates a record. Also, managing and analyzing transactional data allows service providers to better understand their customers’ behavior, for example, deep packet inspection requires transactional internet usage data to explain internet usage behaviour of the subscribers. However, current relational database systems limit service providers to only maintain history at semantic level which is aggregated at subscriber level. The framework addresses these challenges by leveraging Big Data technology which optimally manages and allows deep analysis of complex datasets. The framework has been applied to offload existing Intelligent Network Mediation and relational Data Warehouse of the service provider on Big Data. The service provider has 50+ million subscriber-base with yearly growth of 7-10%. The end-to-end process takes not more than 10 minutes which involves binary to ASCII decoding of call detail records, stitching of all the interrogations against a call (transformations) and aggregations of all the call records of a subscriber.

Keywords: big data, communication service providers, enterprise data warehouse, stream computing, Telco IN Mediation

Procedia PDF Downloads 175

41538 Sensor Data Analysis for a Large Mining Major

Authors: Sudipto Shanker Dasgupta

Abstract:

One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.

Keywords: streaming analytics, data science, big data, Hadoop, high throughput, sensor data

Procedia PDF Downloads 404

41537 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 435

41536 Analysis of the Impact of Climate Change on Maize (Zea Mays) Yield in Central Ethiopia

Authors: Takele Nemomsa, Girma Mamo, Tesfaye Balemi

Abstract:

Climate change refers to a change in the state of the climate that can be identified (e.g. using statistical tests) by changes in the mean and/or variance of its properties and that persists for an extended period, typically decades or longer. In Ethiopia; Maize production in relation to climate change at regional and sub- regional scales have not been studied in detail. Thus, this study was aimed to analyse the impact of climate change on maize yield in Ambo Districts, Central Ethiopia. To this effect, weather data, soil data and maize experimental data for Arganne hybrid were used. APSIM software was used to investigate the response of maize (Zea mays) yield to different agronomic management practices using current and future (2020s–2080s) climate data. The climate change projections data which were downscaled using SDSM were used as input of climate data for the impact analysis. Compared to agronomic practices the impact of climate change on Arganne in Central Ethiopia is minute. However, within 2020s-2080s in Ambo area; the yield of Arganne hybrid is projected to reduce by 1.06% to 2.02%, and in 2050s it is projected to reduce by 1.56 While in 2080s; it is projected to increase by 1.03% to 2.07%. Thus, to adapt to the changing climate; farmers should consider increasing plant density and fertilizer rate per hectare.

Keywords: APSIM, downscaling, response, SDSM

Procedia PDF Downloads 383

41535 Techno-Economic Analysis of Solar Energy for Cathodic Protection of Oil and Gas Buried Pipelines in Southwestern of Iran

Authors: M. Goodarzi, M. Mohammadi, A. Gharib

Abstract:

Solar energy is a renewable energy which has attracted special attention in many countries. Solar cathodic protectionsystems harness the sun’senergy to protect underground pipelinesand tanks from galvanic corrosion. The object of this study is to design and the economic analysis a cathodic protection system by impressed current supplied with solar energy panels applied to underground pipelines. In the present study, the technical and economic analysis of using solar energy for cathodic protection system in southwestern of Iran (Khuzestan province) is investigated. For this purpose, the ecological conditions such as the weather data, air clearness and sunshine hours are analyzed. The economic analyses were done using computer code to investigate the feasibility analysis from the using of various energy sources in order to cathodic protection system. The overall research methodology is divided into four components: Data collection, design of elements, techno economical evaluation, and output analysis. According to the results, solar renewable energy systems can supply adequate power for cathodic protection system purposes.

Keywords: renewable energy, solar energy, solar cathodic protection station, lifecycle cost method

Procedia PDF Downloads 542

41534 Role of Vocational Education and Training in Economic Excellence and Social Inclusion

Authors: Muhammad Ali Asadullah, Zafarullah Amir

Abstract:

In recent years, Vocational Education and Training (VET) has been under discussion by the academic researchers and remained in focus in the political grounds. Due to potential contribution of VET, the World Bank and United Nations Educational, Scientific and Cultural Organization (UNESCO) support vocational education to reduce poverty, enhance economic growth and increase competitiveness. This paper examines the impact of Vocational Education and Training on the Economic Growth and Social Inclusion with direct and mediation effect of Social Inclusion. The basic purpose of this study is to assess economic pay-offs as a result of long term investments in VET. Based on the review of Anderson Nilsson, initially we explored the increasing or decreasing trend in investment on VET. Further, the study explores that the countries which invest more on VET, tend to get more economic growth and are socially more ‘inclusive’. It is a longitudinal / panel data study with 12 years of registered data which involves 24 OECD countries. The results of the study indicate the VET has positive association with Social Inclusion and Economic Growth. Further, there is also a positive association of VET and Economic Growth through mediation of Social Inclusion. The current study considers not only issue and challenges in developing VET systems but also contributes to develop the theoretical framework for considering how VET can directly and indirectly improve economic growth and social inclusion. A wider appreciation of how VET’s benefits operate may influence a country’s decisions to invest in it. If policy makers increase investment on VET, the result would be positive in Economic Growth and Social Inclusion. It is also recommended that the same OECD model may be implemented in developing countries like Pakistan.

Keywords: Vocational Education and Training (VET), Social Inclusion, Economic Growth, OECD countries

Procedia PDF Downloads 310

41533 Investigation of Nutritional Values, Sensorial, Flesh Productivity of Parapenaus longirostris between Populations in the Sea of Marmara and in the Northern Aegean Sea

Authors: Onur Gönülal, Zafer Ceylan, Gülgün F. Unal Sengor

Abstract:

The differences of Parapenaus longirostris caught from The North Aegean Sea and the Marmara Sea on proximate composition, sensorial analysis (for raw and cooked samples), flesh productivity of the samples were investigated. The moisture, protein, lipid, ash, carbohydrate, energy contents of shrimp caught from The North Aegean Sea were 74.92 ± 0.1, 20.32 ± 0.16, 2.55 ± 0.1, 2.13 ± 0.08, 0.08, 110.1 kcal/100g, respectively. The moisture, protein, lipid, ash, carbohydrate, energy contents of shrimp caught from Marmara Sea were 76.9 ± 0.02, 19.06 ± 0.03, 2.22 ± 0.08, 1.51 ± 0.04, 0.33, 102.77 kcal/100g, respectively. The protein, lipid, ash and energy values of the Northern Aegean Sea shrimp were higher than The Marmara Sea shrimp. On the other hand, The moisture, carbohydrate values of the Northern Aegean Sea shrimp were lower than the other one. Sensorial analysis was done for raw and cooked samples. Among all properties for raw samples, flesh color, shrimp connective tissue, shrimp body parameters were found different each other according to the result of the panel. According to the result of the cooked shrimp samples among all properties, cooked odour, flavours, texture were found to be different from each other, as well. Especially, flavours and textural properties of cooked shrimps of the Northern Aegean Sea were higher than the Marmara Sea shrimp. Flesh productivity of Northern Aegean Sea shrimp was found as 46.42 %, while that of the Marmara Sea shrimp was found as 47.74 %.

Keywords: shrimp, biological differences, proximate value, sensory, Parapenaus longirostris, flesh productivity

Procedia PDF Downloads 279

41532 Evaluating Performance of an Anomaly Detection Module with Artificial Neural Network Implementation

Authors: Edward Guillén, Jhordany Rodriguez, Rafael Páez

Abstract:

Anomaly detection techniques have been focused on two main components: data extraction and selection and the second one is the analysis performed over the obtained data. The goal of this paper is to analyze the influence that each of these components has over the system performance by evaluating detection over network scenarios with different setups. The independent variables are as follows: the number of system inputs, the way the inputs are codified and the complexity of the analysis techniques. For the analysis, some approaches of artificial neural networks are implemented with different number of layers. The obtained results show the influence that each of these variables has in the system performance.

Keywords: network intrusion detection, machine learning, artificial neural network, anomaly detection module

Procedia PDF Downloads 343

41531 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 162

41530 Exploiting Kinetic and Kinematic Data to Plot Cyclograms for Managing the Rehabilitation Process of BKAs by Applying Neural Networks

Authors: L. Parisi

Abstract:

Kinematic data wisely correlate vector quantities in space to scalar parameters in time to assess the degree of symmetry between the intact limb and the amputated limb with respect to a normal model derived from the gait of control group participants. Furthermore, these particular data allow a doctor to preliminarily evaluate the usefulness of a certain rehabilitation therapy. Kinetic curves allow the analysis of ground reaction forces (GRFs) to assess the appropriateness of human motion. Electromyography (EMG) allows the analysis of the fundamental lower limb force contributions to quantify the level of gait asymmetry. However, the use of this technological tool is expensive and requires patient’s hospitalization. This research work suggests overcoming the above limitations by applying artificial neural networks.

Keywords: kinetics, kinematics, cyclograms, neural networks, transtibial amputation

Procedia PDF Downloads 443

41529 Integration of Big Data to Predict Transportation for Smart Cities

Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin

Abstract:

The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system. The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.

Keywords: big data, machine learning, smart city, social cost, transportation network

Procedia PDF Downloads 260

41528 A Knee Modular Orthosis Design Based on Kinematic Considerations

Authors: C. Copilusi, C. Ploscaru

Abstract:

This paper addresses attention to a research regarding the design of a knee orthosis in a modular form used on children walking rehabilitation. This research is focused on the human lower limb kinematic analysis which will be used as input data on virtual simulations and prototype validation. From this analysis, important data will be obtained and used as input for virtual simulations of the knee modular orthosis. Thus, a knee orthosis concept was obtained and validated through virtual simulations by using MSC Adams software. Based on the obtained results, the modular orthosis prototype will be manufactured and presented in this article.

Keywords: human lower limb, children orthoses, kinematic analysis, knee orthosis

Procedia PDF Downloads 287

41527 The European Pharmacy Market: The Density and its Influencing Factors

Authors: Selina Schwaabe

Abstract:

Community pharmacies deliver high-quality health care and are responsible for medication safety. During the pandemic, accessibility to the nearest pharmacy became more essential to get vaccinated against Covid-19 and to get medical aid. The government's goal is to ensure nationwide, reachable, and affordable medical health care services by pharmacies. Therefore, the density of community pharmacies matters. Overall, the density of community pharmacies is fluctuating, with slightly decreasing tendencies in some countries. So far, the literature has shown that changes in the system affect prices and density. However, a European overview of the development of the density of community pharmacies and its triggers is still missing. This research is essential to counteract against decreasing density consulting in a lack of professional health care through pharmacies. The analysis focuses on liberal versus regulated market structures, mail-order prescription drug regulation, and third-party ownership consequences. In a panel analysis, the relative influence of the measures is examined across 27 European countries over the last 21 years. In addition, the paper examines seven selected countries in depth, selected for the substantial variance in their pharmacy system: Germany, Austria, Portugal, Denmark, Sweden, Finland and Poland. Overall, the results show that regulated pharmacy markets have over 10.75 pharmacies/100.000 inhabitants more than liberal markets. Further, mail-order prescription drugs decrease the density by -17.98 pharmacies/100.000 inhabitants. Countries allowing third-party ownership have 7.67 pharmacies/100.000 inhabitants more. The results are statistically significant at a 0.001 level. The output of this analysis recommends regulated pharmacy markets, with a ban on mail-order prescription drugs allowing third-party ownership to support nationwide medical health care through community pharmacies.

Keywords: community pharmacy, market conditions, pharmacy, pharmacy market, pharmacy lobby, prescription, e-prescription, ownership structures

Procedia PDF Downloads 132

41526 Measurement of Influence of the COVID-19 Pandemic on Efficiency of Japan’s Railway Companies

Authors: Hideaki Endo, Mika Goto

Abstract:

The global outbreak of the COVID-19 pandemic has seriously affected railway businesses. The number of railway passengers decreased due to the decline in the number of commuters and business travelers to avoid crowded trains and a sharp drop in inbound tourists visiting Japan. This has affected not only railway businesses but also related businesses, including hotels, leisure businesses, and retail businesses at station buildings. In 2021, the companies were divided into profitable and loss-making companies. This division suggests that railway companies, particularly loss-making companies, needed to decrease operational inefficiency. To measure the impact of COVID-19 and discuss the sustainable management strategies of railway companies, we examine the cost inefficiency of Japanese listed railway companies by applying stochastic frontier analysis (SFA) to their operational and financial data. First, we employ the stochastic frontier cost function approach to measure inefficiency. The cost frontier function is formulated as a Cobb–Douglas type, and we estimated parameters and variables for inefficiency. This study uses panel data comprising 26 Japanese-listed railway companies from 2005 to 2020. This period includes several events deteriorating the business environment, such as the financial crisis from 2007 to 2008 and the Great East Japan Earthquake of 2011, and we compare those impacts with those of the COVID-19 pandemic after 2020. Second, we identify the characteristics of the best-practice railway companies and examine the drivers of cost inefficiencies. Third, we analyze the factors influencing cost inefficiency by comparing the profiles of the top 10 railway companies and others before and during the pandemic. Finally, we examine the relationship between cost inefficiency and the implementation of efficiency measures for each railway company. We obtained the following four findings. First, most Japanese railway companies showed the lowest cost inefficiency (most efficient) in 2014 and the highest in 2020 (least efficient) during the COVID-19 pandemic. The second worst occurred in 2009 when it was affected by the financial crisis. However, we did not observe a significant impact of the 2011 Great East Japan Earthquake. This is because no railway company was influenced by the earthquake in this operating area, except for JR-EAST. Second, the best-practice railway companies are KEIO and TOKYU. The main reason for their good performance is that both operate in and near the Tokyo metropolitan area, which is densely populated. Third, we found that non-best-practice companies had a larger decrease in passenger kilometers than best-practice companies. This indicates that passengers made fewer long-distance trips because they refrained from inter-prefectural travel during the pandemic. Finally, we found that companies that implement more efficiency improvement measures had higher cost efficiency and they effectively used their customer databases through proactive DX investments in marketing and asset management.

Keywords: COVID-19 pandemic, stochastic frontier analysis, railway sector, cost efficiency

Procedia PDF Downloads 74

41525 Language Errors Used in “The Space between Us” Movie and Their Effects on Translation Quality: Translation Study toward Discourse Analysis Approach

Authors: Mochamad Nuruz Zaman, Mangatur Rudolf Nababan, M. A. Djatmika

Abstract:

Both society and education areas teach to have good communication for building the interpersonal skills up. Everyone has the capacity to understand something new, either well comprehension or worst understanding. Worst understanding makes the language errors when the interactions are done by someone in the first meeting, and they do not know before it because of distance area. “The Space between Us” movie delivers the love-adventure story between Mars Boy and Earth Girl. They are so many missing conversations because of the different climate and environment. As the moviegoer also must be focused on the subtitle in order to enjoy well the movie. Furthermore, Indonesia subtitle and English conversation on the movie still have overlapping understanding in the translation. Translation hereby consists of source language -SL- (English conversation) and target language -TL- (Indonesia subtitle). These research gap above is formulated in research question by how the language errors happened in that movie and their effects on translation quality which is deepest analyzed by translation study toward discourse analysis approach. The research goal is to expand the language errors and their translation qualities in order to create a good atmosphere in movie media. The research is studied by embedded research in qualitative design. The research locations consist of setting, participant, and event as focused determined boundary. Sources of datum are “The Space between Us” movie and informant (translation quality rater). The sampling is criterion-based sampling (purposive sampling). Data collection techniques use content analysis and questioner. Data validation applies data source and method triangulation. Data analysis delivers domain, taxonomy, componential, and cultural theme analysis. Data findings on the language errors happened in the movie are referential, register, society, textual, receptive, expressive, individual, group, analogical, transfer, local, and global errors. Data discussions on their effects to translation quality are concentrated by translation techniques on their data findings; they are amplification, borrowing, description, discursive creation, established equivalent, generalization, literal, modulation, particularization, reduction, substitution, and transposition.

Keywords: discourse analysis, language errors, The Space between Us movie, translation techniques, translation quality instruments

Procedia PDF Downloads 219

41524 Simulations to Predict Solar Energy Potential by ERA5 Application at North Africa

Authors: U. Ali Rahoma, Nabil Esawy, Fawzia Ibrahim Moursy, A. H. Hassan, Samy A. Khalil, Ashraf S. Khamees

Abstract:

The design of any solar energy conversion system requires the knowledge of solar radiation data obtained over a long period. Satellite data has been widely used to estimate solar energy where no ground observation of solar radiation is available, yet there are limitations on the temporal coverage of satellite data. Reanalysis is a “retrospective analysis” of the atmosphere parameters generated by assimilating observation data from various sources, including ground observation, satellites, ships, and aircraft observation with the output of NWP (Numerical Weather Prediction) models, to develop an exhaustive record of weather and climate parameters. The evaluation of the performance of reanalysis datasets (ERA-5) for North Africa against high-quality surface measured data was performed using statistical analysis. The estimation of global solar radiation (GSR) distribution over six different selected locations in North Africa during ten years from the period time 2011 to 2020. The root means square error (RMSE), mean bias error (MBE) and mean absolute error (MAE) of reanalysis data of solar radiation range from 0.079 to 0.222, 0.0145 to 0.198, and 0.055 to 0.178, respectively. The seasonal statistical analysis was performed to study seasonal variation of performance of datasets, which reveals the significant variation of errors in different seasons—the performance of the dataset changes by changing the temporal resolution of the data used for comparison. The monthly mean values of data show better performance, but the accuracy of data is compromised. The solar radiation data of ERA-5 is used for preliminary solar resource assessment and power estimation. The correlation coefficient (R2) varies from 0.93 to 99% for the different selected sites in North Africa in the present research. The goal of this research is to give a good representation for global solar radiation to help in solar energy application in all fields, and this can be done by using gridded data from European Centre for Medium-Range Weather Forecasts ECMWF and producing a new model to give a good result.

Keywords: solar energy, solar radiation, ERA-5, potential energy

Procedia PDF Downloads 211

41523 Exploring Marine Bacteria in the Arabian Gulf Region for Antimicrobial Metabolites

Authors: Julie Connelly, Tanvi Toprani, Xin Xie, Dhinoth Kumar Bangarusamy, Kris C. Gunsalus

Abstract:

The overuse of antibiotics worldwide has contributed to the development of multi-drug resistant (MDR) pathogenic bacterial strains. There is an increasing urgency to discover antibiotics to combat MDR pathogens. The microbiome of the Arabian Gulf is a largely unexplored and potentially rich source of novel bioactive compounds. Microbes that inhabit the Abu Dhabi coastal regions adapt to extreme environments with high salinity, hot temperatures, large temperature fluctuations, and acute exposure to solar energy. The microbes native to this region may produce unique metabolites with therapeutic potential as antibiotics and antifungals. We have isolated 200 pure bacterial strains from mangrove sediments, cyanobacterial mats, and coral reefs of the Abu Dhabi region. In this project, we aim to screen the marine bacterial strains to identify antibiotics, in particular undocumented compounds that show activity against existing antibiotic-resistant strains. We have acquired the ESKAPE pathogen panel, which consists of six antibiotic-resistant gram-positive and gram-negative bacterial pathogens that collectively cause most clinical infections. Our initial efforts of the primary screen using colony-picking co-culture assay have identified several candidate marine strains producing potential antibiotic compounds. We will next apply different assays, including disk-diffusion and broth turbidity growth assay, to confirm the results. This will be followed by bioactivity-guided purification and characterization of target compounds from the scaled-up volume of candidate strains, including SPE fraction, HPLC fraction, LC-MS, and NMR. For antimicrobial compounds with unknown structures, our final goal is to investigate their mode of action by identifying the molecular target.

Keywords: marine bacteria, natural products, drug discovery, ESKAPE panel

Procedia PDF Downloads 75

41522 Text Analysis to Support Structuring and Modelling a Public Policy Problem-Outline of an Algorithm to Extract Inferences from Textual Data

Authors: Claudia Ehrentraut, Osama Ibrahim, Hercules Dalianis

Abstract:

Policy making situations are real-world problems that exhibit complexity in that they are composed of many interrelated problems and issues. To be effective, policies must holistically address the complexity of the situation rather than propose solutions to single problems. Formulating and understanding the situation and its complex dynamics, therefore, is a key to finding holistic solutions. Analysis of text based information on the policy problem, using Natural Language Processing (NLP) and Text analysis techniques, can support modelling of public policy problem situations in a more objective way based on domain experts knowledge and scientific evidence. The objective behind this study is to support modelling of public policy problem situations, using text analysis of verbal descriptions of the problem. We propose a formal methodology for analysis of qualitative data from multiple information sources on a policy problem to construct a causal diagram of the problem. The analysis process aims at identifying key variables, linking them by cause-effect relationships and mapping that structure into a graphical representation that is adequate for designing action alternatives, i.e., policy options. This study describes the outline of an algorithm used to automate the initial step of a larger methodological approach, which is so far done manually. In this initial step, inferences about key variables and their interrelationships are extracted from textual data to support a better problem structuring. A small prototype for this step is also presented.

Keywords: public policy, problem structuring, qualitative analysis, natural language processing, algorithm, inference extraction

Procedia PDF Downloads 589

41521 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects. Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse

Procedia PDF Downloads 409

41520 Measurement of Operational and Environmental Performance of the Coal-Fired Power Plants in India by Using Data Envelopment Analysis

Authors: Vijay Kumar Bajpai, Sudhir Kumar Singh

Abstract:

In this study, the performance analyses of the twenty five coal-fired power plants (CFPPs) used for electricity generation are carried out through various data envelopment analysis (DEA) models. Three efficiency indices are defined and pursued. During the calculation of the operational performance, energy and non-energy variables are used as input, and net electricity produced is used as desired output. CO2 emitted to the environment is used as the undesired output in the computation of the pure environmental performance while in Model-3 CO2 emissions is considered as detrimental input in the calculation of operational and environmental performance. Empirical results show that most of the plants are operating in increasing returns to scale region and Mettur plant is efficient one with regards to energy use and environment. The result also indicates that the undesirable output effect is insignificant in the research sample. The present study will provide clues to plant operators towards raising the operational and environmental performance of CFPPs.

Keywords: coal fired power plants, environmental performance, data envelopment analysis, operational performance

Procedia PDF Downloads 455

41519 Changing Subjective Well-Being and Social Trust in China: 2010-2020

Authors: Mengdie Ruan

Abstract:

The authors investigate how subjective well-being (SWB) and social trust changed in China over the period 2010–2020 by relying on data from six rounds of the China Family Panel Studies (CFPS), then re-examine Easterlin’s hypothesis for China, with a more focus on the role of social trust and estimate income-compensating differentials for social trust. They find that the evolution of well-being is not sensitive to the measures of well-being one uses. Specifically, self-reported life satisfaction scores and hedonic happiness scores experienced a significant increase across all income groups from 2010 to 2020. Social trust seems to have increased based on CFPS in China for all socioeconomic classes in recent years, and male, urban resident individuals with higher income have a higher social trust at a given point in time and over time. However, when we use an alternative measure of social trust, out-group trust, which is a more valid measure of generalized trust and represents “most people”, social trust in China literally declines, and the level is extremely low. In addition, this paper also suggests that in the typical query on social trust, the term "most people" mostly denotes in-groups in China, which contrasts sharply with most Western countries where it predominantly connotes out-groups. Individual fixed effects analysis of well-being that controls for time-invariant variables reveals social trust and relative social status are important correlates of life satisfaction and happiness, whereas absolute income plays a limited role in boosting an individual’s well-being. The income-equivalent value for social capital is approximately tripling of income. It has been found that women, urban and coastal residents, and people with higher income, young people, those with high education care more about social trust in China, irrespective of measures on SWB. Policy aiming at preserving and enhancing SWB should focus on social capital besides economic growth.

Keywords: subjective well-being, life satisfaction, happiness, social trust, China

Procedia PDF Downloads 77

41518 Traditional Chinese Medicine Treatment for Coronary Heart Disease: a Meta-Analysis

Authors: Yuxi Wang, Xuan Gao

Abstract:

Traditional Chinese medicine has been used in the treatment of coronary heart disease (CHD) for centuries, and in recent years, the research data on the efficacy of traditional Chinese medicine through clinical trials has gradually increased to explore its real efficacy and internal pharmacology. However, due to the complexity of traditional Chinese medicine prescriptions, the efficacy of each component is difficult to clarify, and pharmacological research is challenging. This study aims to systematically review and clarify the clinical efficacy of traditional Chinese medicine in the treatment of coronary heart disease through a meta-analysis. Based on PubMed, CNKI database, Wanfang data, and other databases, eleven randomized controlled trials and 1091 CHD subjects were included. Two researchers conducted a systematic review of the papers and conducted a meta-analysis supporting the positive therapeutic effect of traditional Chinese medicine in the treatment of CHD.

Keywords: coronary heart disease, Chinese medicine, treatment, meta-analysis

Procedia PDF Downloads 123

41517 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 229

41516 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 168

41515 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 396

41514 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: cluster analysis, education, mathematics, profiles

Procedia PDF Downloads 126

41513 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction

Procedia PDF Downloads 164

41512 A Comparison of Image Data Representations for Local Stereo Matching

Authors: André Smith, Amr Abdel-Dayem

Abstract:

The stereo matching problem, while having been present for several decades, continues to be an active area of research. The goal of this research is to find correspondences between elements found in a set of stereoscopic images. With these pairings, it is possible to infer the distance of objects within a scene, relative to the observer. Advancements in this field have led to experimentations with various techniques, from graph-cut energy minimization to artificial neural networks. At the basis of these techniques is a cost function, which is used to evaluate the likelihood of a particular match between points in each image. While at its core, the cost is based on comparing the image pixel data; there is a general lack of consistency as to what image data representation to use. This paper presents an experimental analysis to compare the effectiveness of more common image data representations. The goal is to determine the effectiveness of these data representations to reduce the cost for the correct correspondence relative to other possible matches.

Keywords: colour data, local stereo matching, stereo correspondence, disparity map

Procedia PDF Downloads 370

41511 Relationship between Driving under the Influence and Traffic Safety

Authors: Eun Hak Lee, Young-Hyun Seo, Hosuk Shin, Seung-Young Kho

Abstract:

Among traffic crashes, driving under the influence (DUI) of alcohol is the most dangerous behavior in Seoul, South Korea. In 2016 alone 40 deaths occurred on of 2,857 cases of DUI. Since DUI is one of the major factors in increasing the severity of crashes, the intensive management of DUI required to reduce traffic crash deaths and the crash damages. This study aims to investigate the relationship between DUI and traffic safety in order to establish countermeasures for traffic safety improvement. The analysis was conducted on the habitual drivers who drove under the influence. Information of habitual drivers is matched to crash data and fine data. The descriptive statistics on data used in this study, which consists of driver license acquisition, traffic fine, and crash data provided by the Korean National Police Agency, are described. The drivers under the influence are classified by statistically significant criteria, such as driver’s age, license type, driving experience, and crash reasons. With the results of the analysis, we propose some countermeasures to enhance traffic safety.

Keywords: driving under influence, traffic safety, traffic crash, traffic fine

Procedia PDF Downloads 222

41510 Timing and Noise Data Mining Algorithm and Software Tool in Very Large Scale Integration (VLSI) Design

Authors: Qing K. Zhu

Abstract:

Very Large Scale Integration (VLSI) design becomes very complex due to the continuous integration of millions of gates in one chip based on Moore’s law. Designers have encountered numerous report files during design iterations using timing and noise analysis tools. This paper presented our work using data mining techniques combined with HTML tables to extract and represent critical timing/noise data. When we apply this data-mining tool in real applications, the running speed is important. The software employs table look-up techniques in the programming for the reasonable running speed based on performance testing results. We added several advanced features for the application in one industry chip design.

Keywords: VLSI design, data mining, big data, HTML forms, web, VLSI, EDA, timing, noise

Procedia PDF Downloads 254

‹
1
2
...
27
28
29
30
31
32
33
...
1413
1414
›