Search results for: data validation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7719

Search results for: data validation

7479 ATM Service Analysis Using Predictive Data Mining

Authors: S. Madhavi, S. Abirami, C. Bharathi, B. Ekambaram, T. Krishna Sankar, A. Nattudurai, N. Vijayarangan

Abstract:

The high utilization rate of Automated Teller Machine (ATM) has inevitably caused the phenomena of waiting for a long time in the queue. This in turn has increased the out of stock situations. The ATM utilization helps to determine the usage level and states the necessity of the ATM based on the utilization of the ATM system. The time in which the ATM used more frequently (peak time) and based on the predicted solution the necessary actions are taken by the bank management. The analysis can be done by using the concept of Data Mining and the major part are analyzed based on the predictive data mining. The results are predicted from the historical data (past data) and track the relevant solution which is required. Weka tool is used for the analysis of data based on predictive data mining.

Keywords: ATM, Bank Management, Data Mining, Historical data, Predictive Data Mining, Weka tool.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5613
7478 Renewable Energy Trends Analysis: A Patents Study

Authors: Sepulveda Juan

Abstract:

This article explains the elements and considerations taken into account when implementing and applying patent evaluation and scientometric study in the identifications of technology trends, and the tools that led to the implementation of a software application for patent revision. Univariate analysis helped recognize the technological leaders in the field of energy, and steered the way for a multivariate analysis of this sample, which allowed for a graphical description of the techniques of mature technologies, as well as the detection of emerging technologies. This article ends with a validation of the methodology as applied to the case of fuel cells.

Keywords: Energy, technology mapping, patents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2188
7477 File System-Based Data Protection Approach

Authors: Jaechun No

Abstract:

As data to be stored in storage subsystems tremendously increases, data protection techniques have become more important than ever, to provide data availability and reliability. In this paper, we present the file system-based data protection (WOWSnap) that has been implemented using WORM (Write-Once-Read-Many) scheme. In the WOWSnap, once WORM files have been created, only the privileged read requests to them are allowed to protect data against any intentional/accidental intrusions. Furthermore, all WORM files are related to their protection cycle that is a time period during which WORM files should securely be protected. Once their protection cycle is expired, the WORM files are automatically moved to the general-purpose data section without any user interference. This prevents the WORM data section from being consumed by unnecessary files. We evaluated the performance of WOWSnap on Linux cluster.

Keywords: Data protection, Protection cycle, WORM

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1680
7476 The Data Mining usage in Production System Management

Authors: Pavel Vazan, Pavol Tanuska, Michal Kebisek

Abstract:

The paper gives the pilot results of the project that is oriented on the use of data mining techniques and knowledge discoveries from production systems through them. They have been used in the management of these systems. The simulation models of manufacturing systems have been developed to obtain the necessary data about production. The authors have developed the way of storing data obtained from the simulation models in the data warehouse. Data mining model has been created by using specific methods and selected techniques for defined problems of production system management. The new knowledge has been applied to production management system. Gained knowledge has been tested on simulation models of the production system. An important benefit of the project has been proposal of the new methodology. This methodology is focused on data mining from the databases that store operational data about the production process.

Keywords: data mining, data warehousing, management of production system, simulation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3476
7475 Modeling a Multinomial Logit Model of Intercity Travel Mode Choice Behavior for All Trips in Libya

Authors: Manssour A. Abdulsalam Bin Miskeen, Ahmed Mohamed Alhodairi, Riza Atiq Abdullah Bin O. K. Rahmat

Abstract:

In the planning point of view, it is essential to have mode choice, due to the massive amount of incurred in transportation systems. The intercity travellers in Libya have distinct features, as against travellers from other countries, which includes cultural and socioeconomic factors. Consequently, the goal of this study is to recognize the behavior of intercity travel using disaggregate models, for projecting the demand of nation-level intercity travel in Libya. Multinomial Logit Model for all the intercity trips has been formulated to examine the national-level intercity transportation in Libya. The Multinomial logit model was calibrated using nationwide revealed preferences (RP) and stated preferences (SP) survey. The model was developed for deference purpose of intercity trips (work, social and recreational). The variables of the model have been predicted based on maximum likelihood method. The data needed for model development were obtained from all major intercity corridors in Libya. The final sample size consisted of 1300 interviews. About two-thirds of these data were used for model calibration, and the remaining parts were used for model validation. This study, which is the first of its kind in Libya, investigates the intercity traveler’s mode-choice behavior. The intercity travel mode-choice model was successfully calibrated and validated. The outcomes indicate that, the overall model is effective and yields higher precision of estimation. The proposed model is beneficial, due to the fact that, it is receptive to a lot of variables, and can be employed to determine the impact of modifications in the numerous characteristics on the need for various travel modes. Estimations of the model might also be of valuable to planners, who can estimate possibilities for various modes and determine the impact of unique policy modifications on the need for intercity travel.

Keywords: Multinomial logit model, improved intercity transport, intercity mode-choice behavior, disaggregate analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7869
7474 A Review: Comparative Study of Diverse Collection of Data Mining Tools

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

There have been a lot of efforts and researches undertaken in developing efficient tools for performing several tasks in data mining. Due to the massive amount of information embedded in huge data warehouses maintained in several domains, the extraction of meaningful pattern is no longer feasible. This issue turns to be more obligatory for developing several tools in data mining. Furthermore the major aspire of data mining software is to build a resourceful predictive or descriptive model for handling large amount of information more efficiently and user friendly. Data mining mainly contracts with excessive collection of data that inflicts huge rigorous computational constraints. These out coming challenges lead to the emergence of powerful data mining technologies. In this survey a diverse collection of data mining tools are exemplified and also contrasted with the salient features and performance behavior of each tool.

Keywords: Business Analytics, Data Mining, Data Analysis, Machine Learning, Text Mining, Predictive Analytics, Visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3364
7473 Landscape Data Transformation: Categorical Descriptions to Numerical Descriptors

Authors: Dennis A. Apuan

Abstract:

Categorical data based on description of the agricultural landscape imposed some mathematical and analytical limitations. This problem however can be overcome by data transformation through coding scheme and the use of non-parametric multivariate approach. The present study describes data transformation from qualitative to numerical descriptors. In a collection of 103 random soil samples over a 60 hectare field, categorical data were obtained from the following variables: levels of nitrogen, phosphorus, potassium, pH, hue, chroma, value and data on topography, vegetation type, and the presence of rocks. Categorical data were coded, and Spearman-s rho correlation was then calculated using PAST software ver. 1.78 in which Principal Component Analysis was based. Results revealed successful data transformation, generating 1030 quantitative descriptors. Visualization based on the new set of descriptors showed clear differences among sites, and amount of variation was successfully measured. Possible applications of data transformation are discussed.

Keywords: data transformation, numerical descriptors, principalcomponent analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1505
7472 A Survey of Semantic Integration Approaches in Bioinformatics

Authors: Chaimaa Messaoudi, Rachida Fissoune, Hassan Badir

Abstract:

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

Keywords: Semantic data integration, biological ontology, linked data, semantic web, OWL, RDF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
7471 CFD Simulation and Validation of Flow Pattern Transition Boundaries during Moderately Viscous Oil-Water Two-Phase Flow through Horizontal Pipeline

Authors: Anand B. Desamala, Anjali Dasari, Vinayak Vijayan, Bharath K. Goshika, Ashok K. Dasmahapatra, Tapas K. Mandal

Abstract:

In the present study, computational fluid dynamics (CFD) simulation has been executed to investigate the transition boundaries of different flow patterns for moderately viscous oil-water (viscosity ratio 107, density ratio 0.89 and interfacial tension of 0.032 N/m.) two-phase flow through a horizontal pipeline with internal diameter and length of 0.025 m and 7.16 m respectively. Volume of Fluid (VOF) approach including effect of surface tension has been employed to predict the flow pattern. Geometry and meshing of the present problem has been drawn using GAMBIT and ANSYS FLUENT has been used for simulation. A total of 47037 quadrilateral elements are chosen for the geometry of horizontal pipeline. The computation has been performed by assuming unsteady flow, immiscible liquid pair, constant liquid properties, co-axial flow and a T-junction as entry section. The simulation correctly predicts the transition boundaries of wavy stratified to stratified mixed flow. Other transition boundaries are yet to be simulated. Simulated data has been validated with our own experimental results.

Keywords: CFD simulation, flow pattern transition, moderately viscous oil-water flow, prediction of flow transition boundary, VOF technique.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4250
7470 Optimisation of Structural Design by Integrating Genetic Algorithms in the Building Information Modelling Environment

Authors: Tofigh Hamidavi, Sepehr Abrishami, Pasquale Ponterosso, David Begg

Abstract:

Structural design and analysis is an important and time-consuming process, particularly at the conceptual design stage. Decisions made at this stage can have an enormous effect on the entire project, as it becomes ever costlier and more difficult to alter the choices made early on in the construction process. Hence, optimisation of the early stages of structural design can provide important efficiencies in terms of cost and time. This paper suggests a structural design optimisation (SDO) framework in which Genetic Algorithms (GAs) may be used to semi-automate the production and optimisation of early structural design alternatives. This framework has the potential to leverage conceptual structural design innovation in Architecture, Engineering and Construction (AEC) projects. Moreover, this framework improves the collaboration between the architectural stage and the structural stage. It will be shown that this SDO framework can make this achievable by generating the structural model based on the extracted data from the architectural model. At the moment, the proposed SDO framework is in the process of validation, involving the distribution of an online questionnaire among structural engineers in the UK.

Keywords: Building Information Modelling, BIM, Genetic Algorithm, GA, architecture-engineering-construction, AEC, Optimisation, structure, design, population, generation, selection, mutation, crossover, offspring.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 822
7469 Development of Manufacturing Simulation Model for Semiconductor Fabrication

Authors: Syahril Ridzuan Ab Rahim, Ibrahim Ahmad, Mohd Azizi Chik, Ahmad Zafir Md. Rejab, and U. Hashim

Abstract:

This research presents the development of simulation modeling for WIP management in semiconductor fabrication. Manufacturing simulation modeling is needed for productivity optimization analysis due to the complex process flows involved more than 35 percent re-entrance processing steps more than 15 times at same equipment. Furthermore, semiconductor fabrication required to produce high product mixed with total processing steps varies from 300 to 800 steps and cycle time between 30 to 70 days. Besides the complexity, expansive wafer cost that potentially impact the company profits margin once miss due date is another motivation to explore options to experiment any analysis using simulation modeling. In this paper, the simulation model is developed using existing commercial software platform AutoSched AP, with customized integration with Manufacturing Execution Systems (MES) and Advanced Productivity Family (APF) for data collections used to configure the model parameters and data source. Model parameters such as processing steps cycle time, equipment performance, handling time, efficiency of operator are collected through this customization. Once the parameters are validated, few customizations are made to ensure the prior model is executed. The accuracy for the simulation model is validated with the actual output per day for all equipments. The comparison analysis from result of the simulation model compared to actual for achieved 95 percent accuracy for 30 days. This model later was used to perform various what if analysis to understand impacts on cycle time and overall output. By using this simulation model, complex manufacturing environment like semiconductor fabrication (fab) now have alternative source of validation for any new requirements impact analysis.

Keywords: Advanced Productivity Family (APF), Complementary Metal Oxide Semiconductor (CMOS), Manufacturing Execution Systems (MES), Work In Progress (WIP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3220
7468 Development and Validation of the Response to Stressful Situations Scale in the General Population

Authors: C. Barreto Carvalho, C. da Motta, M. Sousa, J. Cabral, A. L. Carvalho, E. B. Peixoto

Abstract:

The aim of the current study was to develop and validate a Response to Stressful Situations Scale (RSSS) for the Portuguese population. This scale assesses the degree of stress experienced in scenarios that can constitute positive, negative and more neutral stressors, and also describes the physiological, emotional and behavioral reactions to those events according to their intensity. These scenarios include typical stressor scenarios relevant to patients with schizophrenia, which are currently absent from most scales, assessing specific risks that these stressors may bring on subjects, which may prove useful in non-clinical and clinical populations (i.e. Patients with mood or anxiety disorders, schizophrenia). Results from Principal Components Analysis and Confirmatory Factor Analysis of two adult samples from general population allowed to confirm a three-factor model with good fit indices: χ2 (144)= 370.211, p = 0.000; GFI = 0.928; CFI = 0.927; TLI = 0.914, RMSEA = 0.055, P(rmsea ≤0.005) = .096; PCFI = .781. Further data analysis of the scale revealed that RSSS is an adequate assessment tool of stress response in adults to be used in further research and clinical settings, with good psychometric characteristics, adequate divergent and convergent validity, good temporal stability and high internal consistency.

Keywords: Assessment, stress events, stress response, stress vulnerability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2124
7467 Role of Lemna minor Lin. in Treating the Textile Industry Wastewater

Authors: D. Sivakumar

Abstract:

Textile industry processes are among the most environmentally unfriendly industrial processes; because, they produce color wastewater that is heavily polluted the environment. Therefore, textile industry wastewater has to be treated before being discharged into the environment. In this study, experiments were conducted for different process parameters like nutrient dosage and dilution ratio against the pH and contact time to remove COD and color in a textile industrial wastewater using aquatic macrophytes Lemna minor L. The experimental results showed that the maximum percentage reduction of COD and color in a textile industry wastewater by Lemna minor L. was obtained at an optimum nutrient dosage of 50g, dilution ratio of 8, pH of 8 and contact time of 4 days. Similarly, the results of validation experiments showed that the experiments were able to reproduce the obtained optimum process parameters. The maximum removal percentage of color in an aqueous solution (86.35%) is higher than the removal of color in a textile industry wastewater (82.85). Further, the first order kinetic model was fitted well with the experimental data of this present study. Finally, this study concluded that Lemna minor L. may be used for removing all types of parameters in any type of textile industry wastewater.

Keywords: Aquatic Macrophyte, Process Parameters, Textile Industry Wastewater.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2610
7466 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: Landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate, Morocco.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 992
7465 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering

Authors: Yunus Doğan, Ahmet Durap

Abstract:

Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.

Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1244
7464 Dimensional Modeling of HIV Data Using Open Source

Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer

Abstract:

Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.

Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1959
7463 Fuzzy Uncertainty Theory for Stealth Fighter Aircraft Selection in Entropic Fuzzy TOPSIS Decision Analysis Process

Authors: C. Ardil

Abstract:

The purpose of this paper is to present fuzzy TOPSIS in an entropic fuzzy environment. Due to the ambiguous concepts often represented in decision data, exact values are insufficient to model real-life situations. In this paper, the rating of each alternative is defined in fuzzy linguistic terms, which can be expressed with triangular fuzzy numbers. The weight of each criterion is then derived from the decision matrix using the entropy weighting method. Next, a vertex method is proposed to calculate the distance between two triangular fuzzy numbers. According to the TOPSIS concept, a closeness coefficient is defined to determine the ranking order of all alternatives by simultaneously calculating the distances to both the fuzzy positive-ideal solution (FPIS) and the fuzzy negative-ideal solution (FNIS). Finally, an illustrative example of selecting stealth fighter aircraft is shown at the end of this article to highlight the procedure of the proposed method. Correlation analysis and validation analysis using TOPSIS, WSM, and WPM methods were performed to compare the ranking order of the alternatives.

Keywords: stealth fighter aircraft selection, fuzzy uncertainty theory (FUT), fuzzy entropic decision (FED), fuzzy linguistic variables, triangular fuzzy numbers, multiple criteria decision making analysis, MCDMA, TOPSIS, WSM, WPM

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 603
7462 Efficient Lossless Compression of Weather Radar Data

Authors: Wei-hua Ai, Wei Yan, Xiang Li

Abstract:

Data compression is used operationally to reduce bandwidth and storage requirements. An efficient method for achieving lossless weather radar data compression is presented. The characteristics of the data are taken into account and the optical linear prediction is used for the PPI images in the weather radar data in the proposed method. The next PPI image is identical to the current one and a dramatic reduction in source entropy is achieved by using the prediction algorithm. Some lossless compression methods are used to compress the predicted data. Experimental results show that for the weather radar data, the method proposed in this paper outperforms the other methods.

Keywords: Lossless compression, weather radar data, optical linear prediction, PPI image

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2258
7461 Conceptualizing the Knowledge to Manage and Utilize Data Assets in the Context of Digitization: Case Studies of Multinational Industrial Enterprises

Authors: Martin Böhmer, Agatha Dabrowski, Boris Otto

Abstract:

The trend of digitization significantly changes the role of data for enterprises. Data turn from an enabler to an intangible organizational asset that requires management and qualifies as a tradeable good. The idea of a networked economy has gained momentum in the data domain as collaborative approaches for data management emerge. Traditional organizational knowledge consequently needs to be extended by comprehensive knowledge about data. The knowledge about data is vital for organizations to ensure that data quality requirements are met and data can be effectively utilized and sovereignly governed. As this specific knowledge has been paid little attention to so far by academics, the aim of the research presented in this paper is to conceptualize it by proposing a “data knowledge model”. Relevant model entities have been identified based on a design science research (DSR) approach that iteratively integrates insights of various industry case studies and literature research.

Keywords: Data management, digitization, Industry 4.0, knowledge engineering, metamodel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1458
7460 A Methodology for Data Migration between Different Database Management Systems

Authors: Bogdan Walek, Cyril Klimes

Abstract:

In present days the area of data migration is very topical. Current tools for data migration in the area of relational database have several disadvantages that are presented in this paper. We propose a methodology for data migration of the database tables and their data between various types of relational database systems (RDBMS). The proposed methodology contains an expert system. The expert system contains a knowledge base that is composed of IFTHEN rules and based on the input data suggests appropriate data types of columns of database tables. The proposed tool, which contains an expert system, also includes the possibility of optimizing the data types in the target RDBMS database tables based on processed data of the source RDBMS database tables. The proposed expert system is shown on data migration of selected database of the source RDBMS to the target RDBMS.

Keywords: Expert system, fuzzy, data migration, database, relational database, data type, relational database management system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3492
7459 Developing a Conjugate Heat Transfer Solver

Authors: Mansour A. Al Qubeissi

Abstract:

The current paper presents a numerical approach in solving the conjugate heat transfer problems. A heat conduction code is coupled internally with a computational fluid dynamics solver for developing a couple conjugate heat transfer solver. Methodology of treating non-matching meshes at interface has also been proposed. The validation results of 1D and 2D cases for the developed conjugate heat transfer code have shown close agreement with the solutions given by analysis.

Keywords: Computational Fluid Dynamics, Conjugate Heat transfer, Heat Conduction, Heat Transfer

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559
7458 Opening up Government Datasets for Big Data Analysis to Support Policy Decisions

Authors: K. Hardy, A. Maurushat

Abstract:

Policy makers are increasingly looking to make evidence-based decisions. Evidence-based decisions have historically used rigorous methodologies of empirical studies by research institutes, as well as less reliable immediate survey/polls often with limited sample sizes. As we move into the era of Big Data analytics, policy makers are looking to different methodologies to deliver reliable empirics in real-time. The question is not why did these people do this for the last 10 years, but why are these people doing this now, and if the this is undesirable, and how can we have an impact to promote change immediately. Big data analytics rely heavily on government data that has been released in to the public domain. The open data movement promises greater productivity and more efficient delivery of services; however, Australian government agencies remain reluctant to release their data to the general public. This paper considers the barriers to releasing government data as open data, and how these barriers might be overcome.

Keywords: Big data, open data, productivity, transparency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1637
7457 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: Big data, correlation analysis, data recommendation system, urban data network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1105
7456 Thermographic Tests of Curved GFRP Structures with Delaminations: Numerical Modelling vs. Experimental Validation

Authors: P. D. Pastuszak

Abstract:

The present work is devoted to thermographic studies of curved composite panels (unidirectional GFRP) with subsurface defects. Various artificial defects, created by inserting PTFE stripe between individual layers of a laminate during manufacturing stage are studied. The analysis is conducted both with the use finite element method and experiments. To simulate transient heat transfer in 3D model with embedded various defect sizes, the ANSYS package is used. Pulsed Thermography combined with optical excitation source provides good results for flat surfaces. Composite structures are mostly used in complex components, e.g., pipes, corners and stiffeners. Local decrease of mechanical properties in these regions can have significant influence on strength decrease of the entire structure. Application of active procedures of thermography to defect detection and evaluation in this type of elements seems to be more appropriate that other NDT techniques. Nevertheless, there are various uncertainties connected with correct interpretation of acquired data. In this paper, important factors concerning Infrared Thermography measurements of curved surfaces in the form of cylindrical panels are considered. In addition, temperature effects on the surface resulting from complex geometry and embedded and real defect are also presented.

Keywords: Active thermography, finite element analysis, composite, curved structures, defects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711
7455 On the Combination of Patient-Generated Data with Data from a Secure Clinical Network Environment – A Practical Example

Authors: Jeroen S. de Bruin, Karin Schindler, Christian Schuh

Abstract:

With increasingly more mobile health applications appearing due to the popularity of smartphones, the possibility arises that these data can be used to improve the medical diagnostic process, as well as the overall quality of healthcare, while at the same time lowering costs. However, as of yet there have been no reports of a successful combination of patient-generated data from smartphones with data from clinical routine. In this paper we describe how these two types of data can be combined in a secure way without modification to hospital information systems, and how they can together be used in a medical expert system for automatic nutritional classification and triage.

Keywords: Data integration, disease-related malnutrition, expert systems, mobile health.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2200
7454 Half Model Testing for Canard of a Hybrid Buoyant Aircraft

Authors: A. U. Haque, W. Asrar, A. A. Omar, E. Sulaeman, J. S. Mohamed Ali

Abstract:

Due to the interference effects, the intrinsic aerodynamic parameters obtained from the individual component testing are always fundamentally different than those obtained for complete model testing. Consideration and limitation for such testing need to be taken into account in any design work related to the component buildup method. In this paper, the scaled model of a straight rectangular canard of a hybrid buoyant aircraft is tested at 50 m/s in IIUM-LSWT (Low Speed Wind Tunnel). Model and its attachment with the balance are kept rigid to have results free from the aeroelastic distortion. Based on the velocity profile of the test section’s floor; the height of the model is kept equal to the corresponding boundary layer displacement. Balance measurements provide valuable but limited information of overall aerodynamic behavior of the model. Zero lift coefficient is obtained at -2.2o and the corresponding drag coefficient was found to be less than that at zero angle of attack. As a part of the validation of low fidelity tool, plot of lift coefficient plot was verified by the experimental data and except the value of zero lift coefficients, the overall trend has under predicted the lift coefficient. Based on this comparative study, a correction factor of 1.36 is proposed for lift curve slope obtained from the panel method.

Keywords: Wind tunnel testing, boundary layer displacement, lift curve slope, canard, aerodynamics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2625
7453 An Evaluation of Algorithms for Single-Echo Biosonar Target Classification

Authors: Turgay Temel, John Hallam

Abstract:

A recent neurospiking coding scheme for feature extraction from biosonar echoes of various plants is examined with avariety of stochastic classifiers. Feature vectors derived are employedin well-known stochastic classifiers, including nearest-neighborhood,single Gaussian and a Gaussian mixture with EM optimization.Classifiers' performances are evaluated by using cross-validation and bootstrapping techniques. It is shown that the various classifers perform equivalently and that the modified preprocessing configuration yields considerably improved results.

Keywords: Classification, neuro-spike coding, non-parametricmodel, parametric model, Gaussian mixture, EM algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671
7452 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes

Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin

Abstract:

Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.

Keywords: Missing data, Imputation, Missing Data Techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1668
7451 Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists

Authors: George E. Tsekouras, Evi Sampanikou

Abstract:

We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.

Keywords: Aesthetic judgment, comics artists, cluster analysis, categorical data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
7450 Developing a Web-Based Tender Evaluation System Based on Fuzzy Multi-Attributes Group Decision Making for Nigerian Public Sector Tendering

Authors: Bello Abdullahi, Yahaya M. Ibrahim, Ahmed D. Ibrahim, Kabir Bala

Abstract:

Public sector tendering has traditionally been conducted using manual paper-based processes which are known to be inefficient, less transparent and more prone to manipulations and errors. The advent of the Internet and the World Wide Web has led to the development of numerous e-Tendering systems that addressed some of the problems associated with the manual paper-based tendering system. However, most of these systems rarely support the evaluation of tenders and where they do it is mostly based on the single decision maker which is not suitable in public sector tendering, where for the sake of objectivity, transparency, and fairness, it is required that the evaluation is conducted through a tender evaluation committee. Currently, in Nigeria, the public tendering process in general and the evaluation of tenders, in particular, are largely conducted using manual paper-based processes. Automating these manual-based processes to digital-based processes can help in enhancing the proficiency of public sector tendering in Nigeria. This paper is part of a larger study to develop an electronic tendering system that supports the whole tendering lifecycle based on Nigerian procurement law. Specifically, this paper presents the design and implementation of part of the system that supports group evaluation of tenders based on a technique called fuzzy multi-attributes group decision making. The system was developed using Object-Oriented methodologies and Unified Modelling Language and hypothetically applied in the evaluation of technical and financial proposals submitted by bidders. The system was validated by professionals with extensive experiences in public sector procurement. The results of the validation showed that the system called NPS-eTender has an average rating of 74% with respect to correct and accurate modelling of the existing manual tendering domain and an average rating of 67.6% with respect to its potential to enhance the proficiency of public sector tendering in Nigeria. Thus, based on the results of the validation, the automation of the evaluation process to support tender evaluation committee is achievable and can lead to a more proficient public sector tendering system.

Keywords: e-Tendering, e-Procurement, public tendering, tender evaluation, tender evaluation committee, web-based group decision support system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 746