Search results for: data validation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7719

Search results for: data validation

7539 Stature Estimation Based On Lower Limb Dimensions in the Malaysian Population

Authors: F. M. Nor, N. Abdullah, Al-M. Mustapa, L. Q. Wen, N. A. Faisal, D. A. A. Ahmad Nazari

Abstract:

Estimation of stature is an important step in developing a biological profile for human identification. It may provide a valuable indicator for unknown individual in a population. The aim of this study was to analyses the relationship between stature and lower limb dimensions in the Malaysian population. The sample comprised 100 corpses, which included 69 males and 31 females between age ranges of 20 to 90 years old. The parameters measured were stature, thigh length, lower leg length, leg length, foot length, foot height and foot breadth. Results showed that mean values in males were significantly higher than those in females (P < 0.05). There were significant correlations between lower limb dimensions and stature. Cross-validation of the equation on 100 individuals showed close approximation between known stature and estimated stature. It was concluded that lower limb dimensions were useful for estimation of stature, which should be validated in future studies. 

Keywords: Forensic anthropology population data, lower leg length, Malaysian, stature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3205
7538 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2794
7537 Geostatistical Analysis and Mapping of Groundlevel Ozone in a Medium Sized Urban Area

Authors: F. J. Moral García, P. Valiente González, F. López Rodríguez

Abstract:

Ground-level tropospheric ozone is one of the air pollutants of most concern. It is mainly produced by photochemical processes involving nitrogen oxides and volatile organic compounds in the lower parts of the atmosphere. Ozone levels become particularly high in regions close to high ozone precursor emissions and during summer, when stagnant meteorological conditions with high insolation and high temperatures are common. In this work, some results of a study about urban ozone distribution patterns in the city of Badajoz, which is the largest and most industrialized city in Extremadura region (southwest Spain) are shown. Fourteen sampling campaigns, at least one per month, were carried out to measure ambient air ozone concentrations, during periods that were selected according to favourable conditions to ozone production, using an automatic portable analyzer. Later, to evaluate the ozone distribution at the city, the measured ozone data were analyzed using geostatistical techniques. Thus, first, during the exploratory analysis of data, it was revealed that they were distributed normally, which is a desirable property for the subsequent stages of the geostatistical study. Secondly, during the structural analysis of data, theoretical spherical models provided the best fit for all monthly experimental variograms. The parameters of these variograms (sill, range and nugget) revealed that the maximum distance of spatial dependence is between 302-790 m and the variable, air ozone concentration, is not evenly distributed in reduced distances. Finally, predictive ozone maps were derived for all points of the experimental study area, by use of geostatistical algorithms (kriging). High prediction accuracy was obtained in all cases as cross-validation showed. Useful information for hazard assessment was also provided when probability maps, based on kriging interpolation and kriging standard deviation, were produced.

Keywords: Kriging, map, tropospheric ozone, variogram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1869
7536 Force Statistics and Wake Structure Mechanism of Flow around a Square Cylinder at Low Reynolds Numbers

Authors: Shams-Ul-Islam, Waqas Sarwar Abbasi, Hamid Rahman

Abstract:

Numerical investigation of flow around a square cylinder are presented using the multi-relaxation-time lattice Boltzmann methods at different Reynolds numbers. A detail analysis are given in terms of time-trace analysis of drag and lift coefficients, power spectra analysis of lift coefficient, vorticity contours visualizations, streamlines and phase diagrams. A number of physical quantities mean drag coefficient, drag coefficient, Strouhal number and root-mean-square values of drag and lift coefficients are calculated and compared with the well resolved experimental data and numerical results available in open literature. The Reynolds numbers affected the physical quantities.

Keywords: Code validation, Force statistics, Multi-relaxation-time lattice Boltzmann method, Reynolds numbers, Square cylinder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3122
7535 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644
7534 Ground Heat Exchanger Modeling Developed for Energy Flows of an Incompressible Fluid

Authors: Paul Christodoulides, Georgios Florides, Panayiotis Pouloupatis, Vassilios Messaritis, Lazaros Lazari

Abstract:

Ground-source heat pumps achieve higher efficiencies than conventional air-source heat pumps because they exchange heat with the ground that is cooler in summer and hotter in winter than the air environment. Earth heat exchangers are essential parts of the ground-source heat pumps and the accurate prediction of their performance is of fundamental importance. This paper presents the development and validation of a numerical model through an incompressible fluid flow, for the simulation of energy and temperature changes in and around a U-tube borehole heat exchanger. The FlexPDE software is used to solve the resulting simultaneous equations that model the heat exchanger. The validated model (through a comparison with experimental data) is then used to extract conclusions on how various parameters like the U-tube diameter, the variation of the ground thermal conductivity and specific heat and the borehole filling material affect the temperature of the fluid.

Keywords: U-tube borehole, energy flow, incompressible fluid, numerical model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2004
7533 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1208
7532 Simulator Dynamic Positioning System with Azimuthal Thruster

Authors: Robson C. Santos, Christian N. Barreto, Gerson G. Cunha, Severino J. C. Neto

Abstract:

This paper aims to project the construction of a prototype azimuthal thruster, mounted with materials of low cost and easy access, testing in a controlled environment to measure their performance, characteristics and feasibility of future projects. The construction of the simulation of dynamic positioning software, responsible for simulating a vessel and reposition it when necessary. Validation tests were performed in the form of partial or complete system. These tests validate the system manually or automatically. The system provides an interface to the user and simulates the conditions unfavorable positioning of a vessel, accurately calculates the azimuth angle, the direction of rotation of the helix and the time that this should be turned on so that the vessel back to position original. A serial communication connects the Simulation Dynamic Positioning System with Embedded System causing the usergenerated data to simulate the DP system arrives in the form of control signals to the motors of the propellant. This article addresses issues in the marine industry employees.

Keywords: Azimuthal Thruster, Dynamic Positioning, Embedded System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2778
7531 Bandwidth Estimation Algorithms for the Dynamic Adaptation of Voice Codec

Authors: Davide Pierattoni, Ivan Macor, Pier Luca Montessoro

Abstract:

In the recent years multimedia traffic and in particular VoIP services are growing dramatically. We present a new algorithm to control the resource utilization and to optimize the voice codec selection during SIP call setup on behalf of the traffic condition estimated on the network path. The most suitable methodologies and the tools that perform realtime evaluation of the available bandwidth on a network path have been integrated with our proposed algorithm: this selects the best codec for a VoIP call in function of the instantaneous available bandwidth on the path. The algorithm does not require any explicit feedback from the network, and this makes it easily deployable over the Internet. We have also performed intensive tests on real network scenarios with a software prototype, verifying the algorithm efficiency with different network topologies and traffic patterns between two SIP PBXs. The promising results obtained during the experimental validation of the algorithm are now the basis for the extension towards a larger set of multimedia services and the integration of our methodology with existing PBX appliances.

Keywords: Integrated voice-data communication, computernetwork performance, resource optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1693
7530 Feature Analysis of Predictive Maintenance Models

Authors: Zhaoan Wang

Abstract:

Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.

Keywords: Automated supply chain, intelligent manufacturing, predictive maintenance machine learning, feature engineering, model interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2005
7529 Development of a Software System for Management and Genetic Analysis of Biological Samples for Forensic Laboratories

Authors: Mariana Lima, Rodrigo Silva, Victor Stange, Teodiano Bastos

Abstract:

Due to the high reliability reached by DNA tests, since the 1980s this kind of test has allowed the identification of a growing number of criminal cases, including old cases that were unsolved, now having a chance to be solved with this technology. Currently, the use of genetic profiling databases is a typical method to increase the scope of genetic comparison. Forensic laboratories must process, analyze, and generate genetic profiles of a growing number of samples, which require time and great storage capacity. Therefore, it is essential to develop methodologies capable to organize and minimize the spent time for both biological sample processing and analysis of genetic profiles, using software tools. Thus, the present work aims the development of a software system solution for laboratories of forensics genetics, which allows sample, criminal case and local database management, minimizing the time spent in the workflow and helps to compare genetic profiles. For the development of this software system, all data related to the storage and processing of samples, workflows and requirements that incorporate the system have been considered. The system uses the following software languages: HTML, CSS, and JavaScript in Web technology, with NodeJS platform as server, which has great efficiency in the input and output of data. In addition, the data are stored in a relational database (MySQL), which is free, allowing a better acceptance for users. The software system here developed allows more agility to the workflow and analysis of samples, contributing to the rapid insertion of the genetic profiles in the national database and to increase resolution of crimes. The next step of this research is its validation, in order to operate in accordance with current Brazilian national legislation.

Keywords: Database, forensic genetics, genetic analysis, sample management, software solution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1169
7528 Integrating Decision Tree and Spatial Cluster Analysis for Landslide Susceptibility Zonation

Authors: Chien-Min Chu, Bor-Wen Tsai, Kang-Tsung Chang

Abstract:

Landslide susceptibility map delineates the potential zones for landslide occurrence. Previous works have applied multivariate methods and neural networks for mapping landslide susceptibility. This study proposed a new approach to integrate decision tree model and spatial cluster statistic for assessing landslide susceptibility spatially. A total of 2057 landslide cells were digitized for developing the landslide decision tree model. The relationships of landslides and instability factors were explicitly represented by using tree graphs in the model. The local Getis-Ord statistics were used to cluster cells with high landslide probability. The analytic result from the local Getis-Ord statistics was classed to create a map of landslide susceptibility zones. The map was validated using new landslide data with 482 cells. Results of validation show an accuracy rate of 86.1% in predicting new landslide occurrence. This indicates that the proposed approach is useful for improving landslide susceptibility mapping.

Keywords: Landslide susceptibility Zonation, Decision treemodel, Spatial cluster, Local Getis-Ord statistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941
7527 Extraction of Forest Plantation Resources in Selected Forest of San Manuel, Pangasinan, Philippines Using LiDAR Data for Forest Status Assessment

Authors: Mark Joseph Quinto, Roan Beronilla, Guiller Damian, Eliza Camaso, Ronaldo Alberto

Abstract:

Forest inventories are essential to assess the composition, structure and distribution of forest vegetation that can be used as baseline information for management decisions. Classical forest inventory is labor intensive and time-consuming and sometimes even dangerous. The use of Light Detection and Ranging (LiDAR) in forest inventory would improve and overcome these restrictions. This study was conducted to determine the possibility of using LiDAR derived data in extracting high accuracy forest biophysical parameters and as a non-destructive method for forest status analysis of San Manual, Pangasinan. Forest resources extraction was carried out using LAS tools, GIS, Envi and .bat scripts with the available LiDAR data. The process includes the generation of derivatives such as Digital Terrain Model (DTM), Canopy Height Model (CHM) and Canopy Cover Model (CCM) in .bat scripts followed by the generation of 17 composite bands to be used in the extraction of forest classification covers using ENVI 4.8 and GIS software. The Diameter in Breast Height (DBH), Above Ground Biomass (AGB) and Carbon Stock (CS) were estimated for each classified forest cover and Tree Count Extraction was carried out using GIS. Subsequently, field validation was conducted for accuracy assessment. Results showed that the forest of San Manuel has 73% Forest Cover, which is relatively much higher as compared to the 10% canopy cover requirement. On the extracted canopy height, 80% of the tree’s height ranges from 12 m to 17 m. CS of the three forest covers based on the AGB were: 20819.59 kg/20x20 m for closed broadleaf, 8609.82 kg/20x20 m for broadleaf plantation and 15545.57 kg/20x20m for open broadleaf. Average tree counts for the tree forest plantation was 413 trees/ha. As such, the forest of San Manuel has high percent forest cover and high CS.

Keywords: Carbon stock, forest inventory, LiDAR, tree count.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1281
7526 The Design of a Vehicle Traffic Flow Prediction Model for a Gauteng Freeway Based on an Ensemble of Multi-Layer Perceptron

Authors: Tebogo Emma Makaba, Barnabas Ndlovu Gatsheni

Abstract:

The cities of Johannesburg and Pretoria both located in the Gauteng province are separated by a distance of 58 km. The traffic queues on the Ben Schoeman freeway which connects these two cities can stretch for almost 1.5 km. Vehicle traffic congestion impacts negatively on the business and the commuter’s quality of life. The goal of this paper is to identify variables that influence the flow of traffic and to design a vehicle traffic prediction model, which will predict the traffic flow pattern in advance. The model will unable motorist to be able to make appropriate travel decisions ahead of time. The data used was collected by Mikro’s Traffic Monitoring (MTM). Multi-Layer perceptron (MLP) was used individually to construct the model and the MLP was also combined with Bagging ensemble method to training the data. The cross—validation method was used for evaluating the models. The results obtained from the techniques were compared using predictive and prediction costs. The cost was computed using combination of the loss matrix and the confusion matrix. The predicted models designed shows that the status of the traffic flow on the freeway can be predicted using the following parameters travel time, average speed, traffic volume and day of month. The implications of this work is that commuters will be able to spend less time travelling on the route and spend time with their families. The logistics industry will save more than twice what they are currently spending.

Keywords: Bagging ensemble methods, confusion matrix, multi-layer perceptron, vehicle traffic flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
7525 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1176
7524 An Extended Domain-Specific Modeling Language for Marine Observatory Relying on Enterprise Architecture

Authors: Charbel Geryes Aoun, Loic Lagadec

Abstract:

A Sensor Network (SN) is considered as an operation of two phases: (1) the observation/measuring, which means the accumulation of the gathered data at each sensor node; (2) transferring the collected data to some processing center (e.g. Fusion Servers) within the SN. Therefore, an underwater sensor network can be defined as a sensor network deployed underwater that monitors underwater activity. The deployed sensors, such as hydrophones, are responsible for registering underwater activity and transferring it to more advanced components. The process of data exchange between the aforementioned components perfectly defines the Marine Observatory (MO) concept which provides information on ocean state, phenomena and processes. The first step towards the implementation of this concept is defining the environmental constraints and the required tools and components (Marine Cables, Smart Sensors, Data Fusion Server, etc). The logical and physical components that are used in these observatories perform some critical functions such as the localization of underwater moving objects. These functions can be orchestrated with other services (e.g. military or civilian reaction). In this paper, we present an extension to our MO meta-model that is used to generate a design tool (ArchiMO). We propose constraints to be taken into consideration at design time. We illustrate our proposal with an example from the MO domain. Additionally, we generate the corresponding simulation code using our self-developed domain-specific model compiler. On the one hand, this illustrates our approach in relying on Enterprise Architecture (EA) framework that respects: multiple-views, perspectives of stakeholders, and domain specificity. On the other hand, it helps reducing both complexity and time spent in design activity, while preventing from design modeling errors during porting this activity in the MO domain. As conclusion, this work aims to demonstrate that we can improve the design activity of complex system based on the use of MDE technologies and a domain-specific modeling language with the associated tooling. The major improvement is to provide an early validation step via models and simulation approach to consolidate the system design.

Keywords: Smart sensors, data fusion, distributed fusion architecture, sensor networks, domain specific modeling language, enterprise architecture, underwater moving object, localization, marine observatory, NS-3, IMS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 258
7523 Prediction of Reusability of Object Oriented Software Systems using Clustering Approach

Authors: Anju Shri, Parvinder S. Sandhu, Vikas Gupta, Sanyam Anand

Abstract:

In literature, there are metrics for identifying the quality of reusable components but the framework that makes use of these metrics to precisely predict reusability of software components is still need to be worked out. These reusability metrics if identified in the design phase or even in the coding phase can help us to reduce the rework by improving quality of reuse of the software component and hence improve the productivity due to probabilistic increase in the reuse level. As CK metric suit is most widely used metrics for extraction of structural features of an object oriented (OO) software; So, in this study, tuned CK metric suit i.e. WMC, DIT, NOC, CBO and LCOM, is used to obtain the structural analysis of OO-based software components. An algorithm has been proposed in which the inputs can be given to K-Means Clustering system in form of tuned values of the OO software component and decision tree is formed for the 10-fold cross validation of data to evaluate the in terms of linguistic reusability value of the component. The developed reusability model has produced high precision results as desired.

Keywords: CK-Metric, Desicion Tree, Kmeans, Reusability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1913
7522 Applying Resilience Engineering to improve Safety Management in a Construction Site: Design and Validation of a Questionnaire

Authors: M. C. Pardo-Ferreira, J. C. Rubio-Romero, M. Martínez-Rojas

Abstract:

Resilience Engineering is a new paradigm of safety management that proposes to change the way of managing the safety to focus on the things that go well instead of the things that go wrong. Many complex and high-risk sectors such as air traffic control, health care, nuclear power plants, railways or emergencies, have applied this new vision of safety and have obtained very positive results. In the construction sector, safety management continues to be a problem as indicated by the statistics of occupational injuries worldwide. Therefore, it is important to improve safety management in this sector. For this reason, it is proposed to apply Resilience Engineering to the construction sector. The Construction Phase Health and Safety Plan emerges as a key element for the planning of safety management. One of the key tools of Resilience Engineering is the Resilience Assessment Grid that allows measuring the four essential abilities (respond, monitor, learn and anticipate) for resilient performance. The purpose of this paper is to develop a questionnaire based on the Resilience Assessment Grid, specifically on the ability to learn, to assess whether a Construction Phase Health and Safety Plans helps companies in a construction site to implement this ability. The research process was divided into four stages: (i) initial design of a questionnaire, (ii) validation of the content of the questionnaire, (iii) redesign of the questionnaire and (iii) application of the Delphi method. The questionnaire obtained could be used as a tool to help construction companies to evolve from Safety-I to Safety-II. In this way, companies could begin to develop the ability to learn, which will serve as a basis for the development of the other abilities necessary for resilient performance. The following steps in this research are intended to develop other questions that allow evaluating the rest of abilities for resilient performance such as monitoring, learning and anticipating.

Keywords: Resilience engineering, construction sector, resilience assessment grid, construction phase health and safety plan.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1004
7521 Feature Selection with Kohonen Self Organizing Classification Algorithm

Authors: Francesco Maiorana

Abstract:

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Keywords: Clustering algorithm, Data mining, Feature selection, Grid, Kohonen Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3052
7520 Steganalysis of Data Hiding via Halftoning and Coordinate Projection

Authors: Woong Hee Kim, Ilhwan Park

Abstract:

Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.

Keywords: Steganography, steganalysis, digital halftoning, data hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1600
7519 Biological Data Integration using SOA

Authors: Noura Meshaan Al-Otaibi, Amin Yousef Noaman

Abstract:

Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. This research suggests the use of Service Oriented Architecture (SOA) to integrate biological data from different data sources. This work shows SOA will solve the problems that facing integration process and if the biologist scientists can access the biological data in easier way. There are several methods to implement SOA but web service is the most popular method. The Microsoft .Net Framework used to implement proposed architecture.

Keywords: Bioinformatics, Biological data, Data Integration, SOA and Web Services.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2474
7518 Geosynthetic Reinforced Unpaved Road: Literature Study and Design Example

Authors: D. Jayalakshmi, S. Bhosale

Abstract:

This paper, in its first part, presents the state-of-the-art literature of design approaches for geosynthetic reinforced unpaved roads. The literature starting since 1970 and the critical appraisal of flexible pavement design by Giroud and Han (2004) and Jonathan Fannin (2006) is presented. The design example is illustrated for Indian conditions. The example emphasizes the results computed by Giroud and Han's (2004) design method with the Indian road congress guidelines by IRC SP 72 -2015. The input data considered are related to the subgrade soil condition of Maharashtra State in India. The unified soil classification of the subgrade soil is inorganic clay with high plasticity (CH), which is expansive with a California bearing ratio (CBR) of 2% to 3%. The example exhibits the unreinforced case and geotextile as reinforcement by varying the rut depth from 25 mm to 100 mm. The present result reveals the base thickness for the unreinforced case from the IRC design catalogs is in good agreement with Giroud and Han (2004) approach for a range of 75 mm to 100 mm rut depth. Since Giroud and Han (2004) method is applicable for both reinforced and unreinforced cases, for the same data with appropriate Nc factor, for the same rut depth, the base thickness for the reinforced case has arrived for the Indian condition. From this trial, for the CBR of 2%, the base thickness reduction due to geotextile inclusion is 35%. For the CBR range of 2% to 5% with different stiffness in geosynthetics, the reduction in base course thickness will be evaluated, and the validation will be executed by the full-scale accelerated pavement testing set up at the College of Engineering Pune (COE), India.

Keywords: Base thickness, design approach, equation, full scale accelerated pavement set up, Indian condition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 655
7517 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2607
7516 PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Authors: Snezhana G. Gocheva-Ilieva, Maya P. Stoimenova

Abstract:

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Keywords: Cross-validation, decision tree, lagged variables, short-term forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 739
7515 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: Communication, computer network, data collection, probe.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1782
7514 Aerodynamic Stall Control of a Generic Airfoil using Synthetic Jet Actuator

Authors: Basharat Ali Haider, Naveed Durrani, Nadeem Aizud, Salimuddin Zahir

Abstract:

The aerodynamic stall control of a baseline 13-percent thick NASA GA(W)-2 airfoil using a synthetic jet actuator (SJA) is presented in this paper. Unsteady Reynolds-averaged Navier-Stokes equations are solved on a hybrid grid using a commercial software to simulate the effects of a synthetic jet actuator located at 13% of the chord from the leading edge at a Reynolds number Re = 2.1x106 and incidence angles from 16 to 22 degrees. The experimental data for the pressure distribution at Re = 3x106 and aerodynamic coefficients at Re = 2.1x106 (angle of attack varied from -16 to 22 degrees) without SJA is compared with the computational fluid dynamic (CFD) simulation as a baseline validation. A good agreement of the CFD simulations is obtained for aerodynamic coefficients and pressure distribution. A working SJA has been integrated with the baseline airfoil and initial focus is on the aerodynamic stall control at angles of attack from 16 to 22 degrees. The results show a noticeable improvement in the aerodynamic performance with increase in lift and decrease in drag at these post stall regimes.

Keywords: Active flow control, Aerodynamic stall, Airfoilperformance, Synthetic jet actuator.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2311
7513 Real-time Haptic Modeling and Simulation for Prosthetic Insertion

Authors: Catherine A. Todd, Fazel Naghdy

Abstract:

In this work a surgical simulator is produced which enables a training otologist to conduct a virtual, real-time prosthetic insertion. The simulator provides the Ear, Nose and Throat surgeon with real-time visual and haptic responses during virtual cochlear implantation into a 3D model of the human Scala Tympani (ST). The parametric model is derived from measured data as published in the literature and accounts for human morphological variance, such as differences in cochlear shape, enabling patient-specific pre- operative assessment. Haptic modeling techniques use real physical data and insertion force measurements, to develop a force model which mimics the physical behavior of an implant as it collides with the ST walls during an insertion. Output force profiles are acquired from the insertion studies conducted in the work, to validate the haptic model. The simulator provides the user with real-time, quantitative insertion force information and associated electrode position as user inserts the virtual implant into the ST model. The information provided by this study may also be of use to implant manufacturers for design enhancements as well as for training specialists in optimal force administration, using the simulator. The paper reports on the methods for anatomical modeling and haptic algorithm development, with focus on simulator design, development, optimization and validation. The techniques may be transferrable to other medical applications that involve prosthetic device insertions where user vision is obstructed.

Keywords: Haptic modeling, medical device insertion, real-time visualization of prosthetic implantation, surgical simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2046
7512 Prediction Modeling of Alzheimer’s Disease and Its Prodromal Stages from Multimodal Data with Missing Values

Authors: M. Aghili, S. Tabarestani, C. Freytes, M. Shojaie, M. Cabrerizo, A. Barreto, N. Rishe, R. E. Curiel, D. Loewenstein, R. Duara, M. Adjouadi

Abstract:

A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.

Keywords: eXtreme Gradient Boosting, missing data, Alzheimer disease, early mild cognitive impairment, late mild cognitive impairment, multiclass classification, ADNI, support vector machine, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 958
7511 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: Data mining, fuzzy sets, linguistic summarization, patent data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1218
7510 Comparison of Polynomial and Radial Basis Kernel Functions based SVR and MLR in Modeling Mass Transfer by Vertical and Inclined Multiple Plunging Jets

Authors: S. Deswal, M. Pal

Abstract:

Presently various computational techniques are used in modeling and analyzing environmental engineering data. In the present study, an intra-comparison of polynomial and radial basis kernel functions based on Support Vector Regression and, in turn, an inter-comparison with Multi Linear Regression has been attempted in modeling mass transfer capacity of vertical (θ = 90O) and inclined (θ multiple plunging jets (varying from 1 to 16 numbers). The data set used in this study consists of four input parameters with a total of eighty eight cases, forty four each for vertical and inclined multiple plunging jets. For testing, tenfold cross validation was used. Correlation coefficient values of 0.971 and 0.981 along with corresponding root mean square error values of 0.0025 and 0.0020 were achieved by using polynomial and radial basis kernel functions based Support Vector Regression respectively. An intra-comparison suggests improved performance by radial basis function in comparison to polynomial kernel based Support Vector Regression. Further, an inter-comparison with Multi Linear Regression (correlation coefficient = 0.973 and root mean square error = 0.0024) reveals that radial basis kernel functions based Support Vector Regression performs better in modeling and estimating mass transfer by multiple plunging jets.

Keywords: Mass transfer, multiple plunging jets, polynomial and radial basis kernel functions, Support Vector Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1434