Search results for: data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7484

Search results for: data mining

6314 Generalized Method for Estimating Best-Fit Vertical Alignments for Profile Data

Authors: Said M. Easa, Shinya Kikuchi

Abstract:

When the profile information of an existing road is missing or not up-to-date and the parameters of the vertical alignment are needed for engineering analysis, the engineer has to recreate the geometric design features of the road alignment using collected profile data. The profile data may be collected using traditional surveying methods, global positioning systems, or digital imagery. This paper develops a method that estimates the parameters of the geometric features that best characterize the existing vertical alignments in terms of tangents and the expressions of the curve, that may be symmetrical, asymmetrical, reverse, and complex vertical curves. The method is implemented using an Excel-based optimization method that minimizes the differences between the observed profile and the profiles estimated from the equations of the vertical curve. The method uses a 'wireframe' representation of the profile that makes the proposed method applicable to all types of vertical curves. A secondary contribution of this paper is to introduce the properties of the equal-arc asymmetrical curve that has been recently developed in the highway geometric design field.

Keywords: Optimization, parameters, data, reverse, spreadsheet, vertical curves

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2396
6313 Dynamics of Mini Hydraulic Backhoe Excavator: A Lagrange-Euler (L-E) Approach

Authors: Bhaveshkumar P. Patel, J. M. Prajapati

Abstract:

Excavators are high power machines used in the mining, agricultural and construction industry whose principal functions are digging (material removing), ground leveling and material transport operations. During the digging task there are certain unknown forces exerted by the bucket on the soil and the digging operation is repetitive in nature. Automation of the digging task can be performed by an automatically controlled excavator system, which is not only control the forces but also follow the planned digging trajectories. To develop such a controller for automated excavation, it is required to develop a dynamic model to describe the behavior of the control system during digging operation and motion of excavator with time. The presented work described a dynamic model needed for controller design and which is derived by applying Lagrange-Euler approach. The developed dynamic model is intended for further development of an automated excavation control system for light duty construction work and can be applied for heavy duty or all types of backhoe excavators.

Keywords: Backhoe excavator, controller, digging, excavation, trajectory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4404
6312 Aggregation Scheduling Algorithms in Wireless Sensor Networks

Authors: Min Kyung An

Abstract:

In Wireless Sensor Networks which consist of tiny wireless sensor nodes with limited battery power, one of the most fundamental applications is data aggregation which collects nearby environmental conditions and aggregates the data to a designated destination, called a sink node. Important issues concerning the data aggregation are time efficiency and energy consumption due to its limited energy, and therefore, the related problem, named Minimum Latency Aggregation Scheduling (MLAS), has been the focus of many researchers. Its objective is to compute the minimum latency schedule, that is, to compute a schedule with the minimum number of timeslots, such that the sink node can receive the aggregated data from all the other nodes without any collision or interference. For the problem, the two interference models, the graph model and the more realistic physical interference model known as Signal-to-Interference-Noise-Ratio (SINR), have been adopted with different power models, uniform-power and non-uniform power (with power control or without power control), and different antenna models, omni-directional antenna and directional antenna models. In this survey article, as the problem has proven to be NP-hard, we present and compare several state-of-the-art approximation algorithms in various models on the basis of latency as its performance measure.

Keywords: Data aggregation, convergecast, gathering, approximation, interference, omni-directional, directional.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 761
6311 Health Assessment of Electronic Products using Mahalanobis Distance and Projection Pursuit Analysis

Authors: Sachin Kumar, Vasilis Sotiris, Michael Pecht

Abstract:

With increasing complexity in electronic systems there is a need for system level anomaly detection and fault isolation. Anomaly detection based on vector similarity to a training set is used in this paper through two approaches, one the preserves the original information, Mahalanobis Distance (MD), and the other that compresses the data into its principal components, Projection Pursuit Analysis. These methods have been used to detect deviations in system performance from normal operation and for critical parameter isolation in multivariate environments. The study evaluates the detection capability of each approach on a set of test data with known faults against a baseline set of data representative of such “healthy" systems.

Keywords: Mahalanobis distance, Principle components, Projection pursuit, Health assessment, Anomaly.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1638
6310 Application of the Data Distribution Service for Flexible Manufacturing Automation

Authors: Marco Ryll, Svetan Ratchev

Abstract:

This paper discusses the applicability of the Data Distribution Service (DDS) for the development of automated and modular manufacturing systems which require a flexible and robust communication infrastructure. DDS is an emergent standard for datacentric publish/subscribe middleware systems that provides an infrastructure for platform-independent many-to-many communication. It particularly addresses the needs of real-time systems that require deterministic data transfer, have low memory footprints and high robustness requirements. After an overview of the standard, several aspects of DDS are related to current challenges for the development of modern manufacturing systems with distributed architectures. Finally, an example application is presented based on a modular active fixturing system to illustrate the described aspects.

Keywords: Flexible Manufacturing, Publish/Subscribe, Plug & Produce.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2311
6309 Impacts of Building Design Factors on Auckland School Energy Consumptions

Authors: Bin Su

Abstract:

This study focuses on the impact of school building design factors on winter extra energy consumption which mainly includes space heating, water heating and other appliances related to winter indoor thermal conditions. A number of Auckland schools were randomly selected for the study which introduces a method of using real monthly energy consumption data for a year to calculate winter extra energy data of school buildings. The study seeks to identify the relationships between winter extra energy data related to school building design data related to the main architectural features, building envelope and elements of the sample schools. The relationships can be used to estimate the approximate saving in winter extra energy consumption which would result from a changed design datum for future school development, and identify any major energy-efficient design problems. The relationships are also valuable for developing passive design guides for school energy efficiency.

Keywords: Building energy efficiency, Building thermal design, Building thermal performance, School building design.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1897
6308 Tree Based Data Aggregation to Resolve Funneling Effect in Wireless Sensor Network

Authors: G. Rajesh, B. Vinayaga Sundaram, C. Aarthi

Abstract:

In wireless sensor network, sensor node transmits the sensed data to the sink node in multi-hop communication periodically. This high traffic induces congestion at the node which is present one-hop distance to the sink node. The packet transmission and reception rate of these nodes should be very high, when compared to other sensor nodes in the network. Therefore, the energy consumption of that node is very high and this effect is known as the “funneling effect”. The tree based-data aggregation technique (TBDA) is used to reduce the energy consumption of the node. The throughput of the overall performance shows a considerable decrease in the number of packet transmissions to the sink node. The proposed scheme, TBDA, avoids the funneling effect and extends the lifetime of the wireless sensor network. The average case time complexity for inserting the node in the tree is O(n log n) and for the worst case time complexity is O(n2).

Keywords: Data Aggregation, Funneling Effect, Traffic Congestion, Wireless Sensor Network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1270
6307 Product Features Extraction from Opinions According to Time

Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou

Abstract:

Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.

Keywords: Opinion mining, product feature extraction, sentiment analysis, SentiWordNet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1265
6306 Sleep Scheduling Schemes Based on Location of Mobile User in Sensor-Cloud

Authors: N. Mahendran, R. Priya

Abstract:

The mobile cloud computing (MCC) with wireless sensor networks (WSNs) technology gets more attraction by research scholars because its combines the sensors data gathering ability with the cloud data processing capacity. This approach overcomes the limitation of data storage capacity and computational ability of sensor nodes. Finally, the stored data are sent to the mobile users when the user sends the request. The most of the integrated sensor-cloud schemes fail to observe the following criteria: 1) The mobile users request the specific data to the cloud based on their present location. 2) Power consumption since most of them are equipped with non-rechargeable batteries. Mostly, the sensors are deployed in hazardous and remote areas. This paper focuses on above observations and introduces an approach known as collaborative location-based sleep scheduling (CLSS) scheme. Both awake and asleep status of each sensor node is dynamically devised by schedulers and the scheduling is done purely based on the of mobile users’ current location; in this manner, large amount of energy consumption is minimized at WSN. CLSS work depends on two different methods; CLSS1 scheme provides lower energy consumption and CLSS2 provides the scalability and robustness of the integrated WSN.

Keywords: Sleep scheduling, mobile cloud computing, wireless sensor network, integration, location, network lifetime.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 939
6305 Impact of Safety and Quality Considerations of Housing Clients on the Construction Firms’ Intention to Adopt Quality Function Deployment: A Case of Construction Sector

Authors: Saif Ul Haq

Abstract:

The current study intends to examine the safety and quality considerations of clients of housing projects and their impact on the adoption of Quality Function Deployment (QFD) by the construction firm. Mixed method research technique has been used to collect and analyze the data wherein a survey was conducted to collect the data from 220 clients of housing projects in Saudi Arabia. Then, the telephonic and Skype interviews were conducted to collect data of 15 professionals working in the top ten real estate companies of Saudi Arabia. Data were analyzed by using partial least square (PLS) and thematic analysis techniques. Findings reveal that today’s customer prioritizes the safety and quality requirements of their houses and as a result, construction firms adopt QFD to address the needs of customers. The findings are of great importance for the clients of housing projects as well as for the construction firms as they could apply QFD in housing projects to address the safety and quality concerns of their clients.

Keywords: Construction industry, quality considerations, quality function deployment, safety considerations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 847
6304 Input Data Balancing in a Neural Network PM-10 Forecasting System

Authors: Suk-Hyun Yu, Heeyong Kwon

Abstract:

Recently PM-10 has become a social and global issue. It is one of major air pollutants which affect human health. Therefore, it needs to be forecasted rapidly and precisely. However, PM-10 comes from various emission sources, and its level of concentration is largely dependent on meteorological and geographical factors of local and global region, so the forecasting of PM-10 concentration is very difficult. Neural network model can be used in the case. But, there are few cases of high concentration PM-10. It makes the learning of the neural network model difficult. In this paper, we suggest a simple input balancing method when the data distribution is uneven. It is based on the probability of appearance of the data. Experimental results show that the input balancing makes the neural networks’ learning easy and improves the forecasting rates.

Keywords: AI, air quality prediction, neural networks, pattern recognition, PM-10.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 789
6303 Tree Based Data Fusion Clustering Routing Algorithm for Illimitable Network Administration in Wireless Sensor Network

Authors: Y. Harold Robinson, M. Rajaram, E. Golden Julie, S. Balaji

Abstract:

In wireless sensor networks, locality and positioning information can be captured using Global Positioning System (GPS). This message can be congregated initially from spot to identify the system. Users can retrieve information of interest from a wireless sensor network (WSN) by injecting queries and gathering results from the mobile sink nodes. Routing is the progression of choosing optimal path in a mobile network. Intermediate node employs permutation of device nodes into teams and generating cluster heads that gather the data from entity cluster’s node and encourage the collective data to base station. WSNs are widely used for gathering data. Since sensors are power-constrained devices, it is quite vital for them to reduce the power utilization. A tree-based data fusion clustering routing algorithm (TBDFC) is used to reduce energy consumption in wireless device networks. Here, the nodes in a tree use the cluster formation, whereas the elevation of the tree is decided based on the distance of the member nodes to the cluster-head. Network simulation shows that this scheme improves the power utilization by the nodes, and thus considerably improves the lifetime.

Keywords: WSN, TBDFC, LEACH, PEGASIS, TREEPSI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1076
6302 Holistic Face Recognition using Multivariate Approximation, Genetic Algorithms and AdaBoost Classifier: Preliminary Results

Authors: C. Villegas-Quezada, J. Climent

Abstract:

Several works regarding facial recognition have dealt with methods which identify isolated characteristics of the face or with templates which encompass several regions of it. In this paper a new technique which approaches the problem holistically dispensing with the need to identify geometrical characteristics or regions of the face is introduced. The characterization of a face is achieved by randomly sampling selected attributes of the pixels of its image. From this information we construct a set of data, which correspond to the values of low frequencies, gradient, entropy and another several characteristics of pixel of the image. Generating a set of “p" variables. The multivariate data set with different polynomials minimizing the data fitness error in the minimax sense (L∞ - Norm) is approximated. With the use of a Genetic Algorithm (GA) it is able to circumvent the problem of dimensionality inherent to higher degree polynomial approximations. The GA yields the degree and values of a set of coefficients of the polynomials approximating of the image of a face. By finding a family of characteristic polynomials from several variables (pixel characteristics) for each face (say Fi ) in the data base through a resampling process the system in use, is trained. A face (say F ) is recognized by finding its characteristic polynomials and using an AdaBoost Classifier from F -s polynomials to each of the Fi -s polynomials. The winner is the polynomial family closer to F -s corresponding to target face in data base.

Keywords: AdaBoost Classifier, Holistic Face Recognition, Minimax Multivariate Approximation, Genetic Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446
6301 Application of Exact String Matching Algorithms towards SMILES Representation of Chemical Structure

Authors: Ahmad Fadel Klaib, Zurinahni Zainol, Nurul Hashimah Ahamed, Rosma Ahmad, Wahidah Hussin

Abstract:

Bioinformatics and Cheminformatics use computer as disciplines providing tools for acquisition, storage, processing, analysis, integrate data and for the development of potential applications of biological and chemical data. A chemical database is one of the databases that exclusively designed to store chemical information. NMRShiftDB is one of the main databases that used to represent the chemical structures in 2D or 3D structures. SMILES format is one of many ways to write a chemical structure in a linear format. In this study we extracted Antimicrobial Structures in SMILES format from NMRShiftDB and stored it in our Local Data Warehouse with its corresponding information. Additionally, we developed a searching tool that would response to user-s query using the JME Editor tool that allows user to draw or edit molecules and converts the drawn structure into SMILES format. We applied Quick Search algorithm to search for Antimicrobial Structures in our Local Data Ware House.

Keywords: Exact String-matching Algorithms, NMRShiftDB, SMILES Format, Antimicrobial Structures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2173
6300 Intrusion Detection based on Distance Combination

Authors: Joffroy Beauquier, Yongjie Hu

Abstract:

The intrusion detection problem has been frequently studied, but intrusion detection methods are often based on a single point of view, which always limits the results. In this paper, we introduce a new intrusion detection model based on the combination of different current methods. First we use a notion of distance to unify the different methods. Second we combine these methods using the Pearson correlation coefficients, which measure the relationship between two methods, and we obtain a combined distance. If the combined distance is greater than a predetermined threshold, an intrusion is detected. We have implemented and tested the combination model with two different public data sets: the data set of masquerade detection collected by Schonlau & al., and the data set of program behaviors from the University of New Mexico. The results of the experiments prove that the combination model has better performances.

Keywords: Intrusion detection, combination, distance, Pearson correlation coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1804
6299 Fault Tolerance in Distributed Database Systems

Authors: M. A. Adeboyejo, O. O. Adeosun

Abstract:

Pioneer networked systems assume that connections are reliable, and a faulty operation will be considered in case of losing a connection. Transient connections are typical of mobile devices. Areas of application of data sharing system such as these, lead to the conclusion that network connections may not always be reliable, and that the conventional approaches can be improved. Nigerian commercial banking industry is a critical system whose operation is increasingly becoming dependent on information technology (IT) driven information system. The proposed solution to this problem makes use of a hierarchically clustered network structure which we selected to reflect (as much as possible) the typical organizational structure of the Nigerian commercial banks. Representative transactions such as data updates and replication of the results of such updates were used to simulate the proposed model to show its applicability.

Keywords: Dependability, reliability, data redundancy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3309
6298 Normalization Discriminant Independent Component Analysis

Authors: Liew Yee Ping, Pang Ying Han, Lau Siong Hoe, Ooi Shih Yin, Housam Khalifa Bashier Babiker

Abstract:

In face recognition, feature extraction techniques attempts to search for appropriate representation of the data. However, when the feature dimension is larger than the samples size, it brings performance degradation. Hence, we propose a method called Normalization Discriminant Independent Component Analysis (NDICA). The input data will be regularized to obtain the most reliable features from the data and processed using Independent Component Analysis (ICA). The proposed method is evaluated on three face databases, Olivetti Research Ltd (ORL), Face Recognition Technology (FERET) and Face Recognition Grand Challenge (FRGC). NDICA showed it effectiveness compared with other unsupervised and supervised techniques.

Keywords: Face recognition, small sample size, regularization, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1920
6297 Daily Global Solar Radiation Modeling Using Multi-Layer Perceptron (MLP) Neural Networks

Authors: Seyed Fazel Ziaei Asl, Ali Karami, Gholamreza Ashari, Azam Behrang, Arezoo Assareh, N.Hedayat

Abstract:

Predict daily global solar radiation (GSR) based on meteorological variables, using Multi-layer perceptron (MLP) neural networks is the main objective of this study. Daily mean air temperature, relative humidity, sunshine hours, evaporation, wind speed, and soil temperature values between 2002 and 2006 for Dezful city in Iran (32° 16' N, 48° 25' E), are used in this study. The measured data between 2002 and 2005 are used to train the neural networks while the data for 214 days from 2006 are used as testing data.

Keywords: Multi-layer Perceptron (MLP) Neural Networks;Global Solar Radiation (GSR), Meteorological Parameters, Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2929
6296 The Effect of CPU Location in Total Immersion of Microelectronics

Authors: A. Almaneea, N. Kapur, J. L. Summers, H. M. Thompson

Abstract:

Meeting the growth in demand for digital services such as social media, telecommunications, and business and cloud services requires large scale data centres, which has led to an increase in their end use energy demand. Generally, over 30% of data centre power is consumed by the necessary cooling overhead. Thus energy can be reduced by improving the cooling efficiency. Air and liquid can both be used as cooling media for the data centre. Traditional data centre cooling systems use air, however liquid is recognised as a promising method that can handle the more densely packed data centres. Liquid cooling can be classified into three methods; rack heat exchanger, on-chip heat exchanger and full immersion of the microelectronics. This study quantifies the improvements of heat transfer specifically for the case of immersed microelectronics by varying the CPU and heat sink location. Immersion of the server is achieved by filling the gap between the microelectronics and a water jacket with a dielectric liquid which convects the heat from the CPU to the water jacket on the opposite side. Heat transfer is governed by two physical mechanisms, which is natural convection for the fixed enclosure filled with dielectric liquid and forced convection for the water that is pumped through the water jacket. The model in this study is validated with published numerical and experimental work and shows good agreement with previous work. The results show that the heat transfer performance and Nusselt number (Nu) is improved by 89% by placing the CPU and heat sink on the bottom of the microelectronics enclosure.

Keywords: CPU location, data centre cooling, heat sink in enclosures, Immersed microelectronics, turbulent natural convection in enclosures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2128
6295 A Study on the Cloud Simulation with a Network Topology Generator

Authors: Jun-Kwon Jung, Sung-Min Jung, Tae-Kyung Kim, Tai-Myoung Chung

Abstract:

CloudSim is a useful tool to simulate the cloud environment. It shows the service availability, the power consumption, and the network traffic of services on the cloud environment. Moreover, it supports to calculate a network communication delay through a network topology data easily. CloudSim allows inputting a file of topology data, but it does not provide any generating process. Thus, it needs the file of topology data generated from some other tools. The BRITE is typical network topology generator. Also, it supports various type of topology generating algorithms. If CloudSim can include the BRITE, network simulation for clouds is easier than existing version. This paper shows the potential of connection between BRITE and CloudSim. Also, it proposes the direction to link between them.

Keywords: Cloud, simulation, topology, BRITE, network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3731
6294 Low Power Circuit Architecture of AES Crypto Module for Wireless Sensor Network

Authors: MooSeop Kim, Juhan Kim, Yongje Choi

Abstract:

Recently, much research has been conducted for security for wireless sensor networks and ubiquitous computing. Security issues such as authentication and data integrity are major requirements to construct sensor network systems. Advanced Encryption Standard (AES) is considered as one of candidate algorithms for data encryption in wireless sensor networks. In this paper, we will present the hardware architecture to implement low power AES crypto module. Our low power AES crypto module has optimized architecture of data encryption unit and key schedule unit which could be applicable to wireless sensor networks. We also details low power design methods used to design our low power AES crypto module.

Keywords: Algorithm, Low Power Crypto Circuit, AES, Security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2461
6293 Role of Credit on Production Efficiency of Farming Sector in Pakistan(A Data Envelopment Analysis)

Authors: Saima Ayaz, Zakir Hussain, Maqbool Hussain Sial

Abstract:

The study identified the sources of production inefficiency of the farming sector in district Faisalabad in the Punjab province of Pakistan. Data Envelopment Analysis (DEA) technique was utilized at farm level survey data of 300 farmers for the year 2009. The overall mean efficiency score was 0.78 indicating 22 percent inefficiency of the sample farmers. Computed efficiency scores were then regressed on farm specific variables using Tobit regression analysis. Farming experience, education, access to farming credit, herd size and number of cultivation practices showed constructive and significant effect on the farmer-s technical efficiency.

Keywords: Agricultural credit, DEA, Technical efficiency, Tobit analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2300
6292 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: Metagenomics, phenotype prediction, deep learning, embeddings, multiple instance learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 833
6291 Establishing a Probabilistic Model of Extrapolated Wind Speed Data for Wind Energy Prediction

Authors: Mussa I. Mgwatu, Reuben R. M. Kainkwa

Abstract:

Wind is among the potential energy resources which can be harnessed to generate wind energy for conversion into electrical power. Due to the variability of wind speed with time and height, it becomes difficult to predict the generated wind energy more optimally. In this paper, an attempt is made to establish a probabilistic model fitting the wind speed data recorded at Makambako site in Tanzania. Wind speeds and direction were respectively measured using anemometer (type AN1) and wind Vane (type WD1) both supplied by Delta-T-Devices at a measurement height of 2 m. Wind speeds were then extrapolated for the height of 10 m using power law equation with an exponent of 0.47. Data were analysed using MINITAB statistical software to show the variability of wind speeds with time and height, and to determine the underlying probability model of the extrapolated wind speed data. The results show that wind speeds at Makambako site vary cyclically over time; and they conform to the Weibull probability distribution. From these results, Weibull probability density function can be used to predict the wind energy.

Keywords: Probabilistic models, wind speed, wind energy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2306
6290 Demographic Factors Influencing Employees’ Salary Expectations and Labor Turnover

Authors: M. Osipova

Abstract:

Thanks to informational technologies development every sphere of economics is becoming more and more datacentralized as people are generating huge datasets containing information on any aspect of their life. Applying research of such data to human resources management allows getting scarce statistics on labor market state including salary expectations and potential employees’ typical career behavior, and this information can become a reliable basis for management decisions. The following article presents results of career behavior research based on freely accessible resume data. Information used for study is much wider than one usually uses in human resources surveys. That is why there is enough data for statistically significant results even for subgroups analysis.

Keywords: Human resources management, labor market, salary expectations, statistics, turnover.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1804
6289 Mathematical Modeling to Predict Surface Roughness in CNC Milling

Authors: Ab. Rashid M.F.F., Gan S.Y., Muhammad N.Y.

Abstract:

Surface roughness (Ra) is one of the most important requirements in machining process. In order to obtain better surface roughness, the proper setting of cutting parameters is crucial before the process take place. This research presents the development of mathematical model for surface roughness prediction before milling process in order to evaluate the fitness of machining parameters; spindle speed, feed rate and depth of cut. 84 samples were run in this study by using FANUC CNC Milling α-Τ14ιE. Those samples were randomly divided into two data sets- the training sets (m=60) and testing sets(m=24). ANOVA analysis showed that at least one of the population regression coefficients was not zero. Multiple Regression Method was used to determine the correlation between a criterion variable and a combination of predictor variables. It was established that the surface roughness is most influenced by the feed rate. By using Multiple Regression Method equation, the average percentage deviation of the testing set was 9.8% and 9.7% for training data set. This showed that the statistical model could predict the surface roughness with about 90.2% accuracy of the testing data set and 90.3% accuracy of the training data set.

Keywords: Surface roughness, regression analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2076
6288 Parameter Estimation using Maximum Likelihood Method from Flight Data at High Angles of Attack

Authors: Rakesh Kumar, A. K. Ghosh

Abstract:

The paper presents the modeling of nonlinear longitudinal aerodynamics using flight data of Hansa-3 aircraft at high angles of attack near stall. The Kirchhoff-s quasi-steady stall model has been used to incorporate nonlinear aerodynamic effects in the aerodynamic model used to estimate the parameters, thereby, making the aerodynamic model nonlinear. The Maximum Likelihood method has been applied to the flight data (at high angles of attack) for the estimation of parameters (aerodynamic and stall characteristics) using the nonlinear aerodynamic model. To improve the accuracy level of the estimates, an approach of fixing the strong parameters has also been presented.

Keywords: Maximum Likelihood, nonlinear, parameters, stall.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2170
6287 Automatic Thresholding for Data Gap Detection for a Set of Sensors in Instrumented Buildings

Authors: Houda Najeh, Stéphane Ploix, Mahendra Pratap Singh, Karim Chabir, Mohamed Naceur Abdelkrim

Abstract:

Building systems are highly vulnerable to different kinds of faults and failures. In fact, various faults, failures and human behaviors could affect the building performance. This paper tackles the detection of unreliable sensors in buildings. Different literature surveys on diagnosis techniques for sensor grids in buildings have been published but all of them treat only bias and outliers. Occurences of data gaps have also not been given an adequate span of attention in the academia. The proposed methodology comprises the automatic thresholding for data gap detection for a set of heterogeneous sensors in instrumented buildings. Sensor measurements are considered to be regular time series. However, in reality, sensor values are not uniformly sampled. So, the issue to solve is from which delay each sensor become faulty? The use of time series is required for detection of abnormalities on the delays. The efficiency of the method is evaluated on measurements obtained from a real power plant: an office at Grenoble Institute of technology equipped by 30 sensors.

Keywords: Building system, time series, diagnosis, outliers, delay, data gap.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 840
6286 Daily and Seasonal Changes of Air Pollution in Kuwait

Authors: H. Ettouney, A. AL-Haddad, S. Saqer

Abstract:

This paper focuses on assessment of air pollution in Umm-Alhyman, Kuwait, which is located south to oil refineries, power station, oil field, and highways. The measurements were made over a period of four days in March and July in 2001, 2004, and 2008. The measured pollutants included methanated and nonmethanated hydrocarbons (MHC, NMHC), CO, CO2, SO2, NOX, O3, and PM10. Also, meteorological parameters were measured, which includes temperature, wind speed and direction, and solar radiation. Over the study period, data analysis showed increase in measured SO2, NOX and CO by factors of 1.2, 5.5 and 2, respectively. This is explained in terms of increase in industrial activities, motor vehicle density, and power generation. Predictions of the measured data were made by the ISC-AERMOD software package and by using the ISCST3 model option. Finally, comparison was made between measured data against international standards.

Keywords: Air pollution, Emission inventory, ISCST3 model, Modeling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2369
6285 Regular Data Broadcasting Plan with Grouping in Wireless Mobile Environment

Authors: John T. Tsiligaridis

Abstract:

The broadcast problem including the plan design is considered. The data are inserted and numbered at predefined order into customized size relations. The server ability to create a full, regular Broadcast Plan (RBP) with single and multiple channels after some data transformations is examined. The Regular Geometric Algorithm (RGA) prepares a RBP and enables the users to catch their items avoiding energy waste of their devices. Moreover, the Grouping Dimensioning Algorithm (GDA) based on integrated relations can guarantee the discrimination of services with a minimum number of channels. This last property among the selfmonitoring, self-organizing, can be offered by servers today providing also channel availability and less energy consumption by using smaller number of channels. Simulation results are provided.

Keywords: Broadcast, broadcast plan, mobile computing, wireless networks, scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1408