Search results for: data preprocessing.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7522

Search results for: data preprocessing.

6532 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2337
6531 An Analysis of Compression Methods and Implementation of Medical Images in Wireless Network

Authors: C. Rajan, K. Geetha, S. Geetha

Abstract:

The motivation of image compression technique is to reduce the irrelevance and redundancy of the image data in order to store or pass data in an efficient way from one place to another place. There are several types of compression methods available. Without the help of compression technique, the file size is knowingly larger, usually several megabytes, but by doing the compression technique, it is possible to reduce file size up to 10% as of the original without noticeable loss in quality. Image compression can be lossless or lossy. The compression technique can be applied to images, audio, video and text data. This research work mainly concentrates on methods of encoding, DCT, compression methods, security, etc. Different methodologies and network simulations have been analyzed here. Various methods of compression methodologies and its performance metrics has been investigated and presented in a table manner.

Keywords: Image compression techniques, encoding, DCT, lossy compression, lossless compression, JPEG.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1189
6530 A Meta-Analytic Path Analysis of e-Learning Acceptance Model

Authors: David W.S. Tai, Ren-Cheng Zhang, Sheng-Hung Chang, Chin-Pin Chen, Jia-Ling Chen

Abstract:

This study reports results of a meta-analytic path analysis e-learning Acceptance Model with k = 27 studies, Databases searched included Information Sciences Institute (ISI) website. Variables recorded included perceived usefulness, perceived ease of use, attitude toward behavior, and behavioral intention to use e-learning. A correlation matrix of these variables was derived from meta-analytic data and then analyzed by using structural path analysis to test the fitness of the e-learning acceptance model to the observed aggregated data. Results showed the revised hypothesized model to be a reasonable, good fit to aggregated data. Furthermore, discussions and implications are given in this article.

Keywords: E-learning, Meta Analytic Path Analysis, Technology Acceptance Model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2449
6529 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining

Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato

Abstract:

Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.

Keywords: Data mining, data science, trajectory, animal behavior.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 921
6528 The Integration of Patient Health Record Generated from Wearable and Internet of Things Devices into Health Information Exchanges

Authors: Dalvin D. Hill, Hector M. Castro Garcia

Abstract:

A growing number of individuals utilize wearable devices on a daily basis. The usage and functionality of these wearable devices vary from user to user. One popular usage of said devices is to track health-related activities that are typically stored on a device’s memory or uploaded to an account in the cloud; based on the current trend, the data accumulated from the wearable device are stored in a standalone location. In many of these cases, this health related datum is not a factor when considering the holistic view of a user’s health lifestyle or record. This health-related data generated from wearable and Internet of Things (IoT) devices can serve as empirical information to a medical provider, as the standalone data can add value to the holistic health record of a patient. This paper proposes a solution to incorporate the data gathered from these wearable and IoT devices, with that a patient’s Personal Health Record (PHR) stored within the confines of a Health Information Exchange (HIE).

Keywords: Electronic health record, health information exchanges, Internet of Things, personal health records, wearable devices, wearables.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1763
6527 Cooperative Data Caching in WSN

Authors: Narottam Chand

Abstract:

Wireless sensor networks (WSNs) have gained tremendous attention in recent years due to their numerous applications. Due to the limited energy resource, energy efficient operation of sensor nodes is a key issue in wireless sensor networks. Cooperative caching which ensures sharing of data among various nodes reduces the number of communications over the wireless channels and thus enhances the overall lifetime of a wireless sensor network. In this paper, we propose a cooperative caching scheme called ZCS (Zone Cooperation at Sensors) for wireless sensor networks. In ZCS scheme, one-hop neighbors of a sensor node form a cooperative cache zone and share the cached data with each other. Simulation experiments show that the ZCS caching scheme achieves significant improvements in byte hit ratio and average query latency in comparison with other caching strategies.

Keywords: Admission control, cache replacement, cooperative caching, WSN, zone cooperation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2760
6526 Exploiting Kinetic and Kinematic Data to Plot Cyclograms for Managing the Rehabilitation Process of BKAs by Applying Neural Networks

Authors: L. Parisi

Abstract:

Kinematic data wisely correlate vector quantities in space to scalar parameters in time to assess the degree of symmetry between the intact limb and the amputated limb with respect to a normal model derived from the gait of control group participants. Furthermore, these particular data allow a doctor to preliminarily evaluate the usefulness of a certain rehabilitation therapy. Kinetic curves allow the analysis of ground reaction forces (GRFs) to assess the appropriateness of human motion. Electromyography (EMG) allows the analysis of the fundamental lower limb force contributions to quantify the level of gait asymmetry. However, the use of this technological tool is expensive and requires patient’s hospitalization. This research work suggests overcoming the above limitations by applying artificial neural networks.

Keywords: Kinetics, kinematics, cyclograms, neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2090
6525 The Automated Selective Acquisition System

Authors: Atisthan Wuttimanop, Suchada Rianmora

Abstract:

To support design process for launching the product on time, reverse engineering (RE) process has been introduced for quickly generating 3D CAD model from its physical object. The accuracy of the 3D CAD model depends upon the data acquisition technique selected, contact or non-contact methods. In order to reduce times used for acquiring surface and eliminating noises, the automated selective acquisition system has been developed and presented in this research as the alternative channel for non-contact acquisition technique where the data is selectively and locally scanned contour by contour without performing data reduction process. The results present as the organized contour points which are directly used to generate 3D virtual model. The comparison between the proposed technique and another non-contact scanning technique has been presented and discussed.

Keywords: Automated selective acquisition system, Non-contact acquisition, Reverse engineering, 3D scanners.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609
6524 Node Insertion in Coalescence Hidden-Variable Fractal Interpolation Surface

Authors: Srijanani Anurag Prasad

Abstract:

The Coalescence Hidden-variable Fractal Interpolation Surface (CHFIS) was built by combining interpolation data from the Iterated Function System (IFS). The interpolation data in a CHFIS comprise a row and/or column of uncertain values when a single point is entered. Alternatively, a row and/or column of additional points are placed in the given interpolation data to demonstrate the node added CHFIS. There are three techniques for inserting new points that correspond to the row and/or column of nodes inserted, and each method is further classified into four types based on the values of the inserted nodes. As a result, numerous forms of node insertion can be found in a CHFIS.

Keywords: Fractal, interpolation, iterated function system, coalescence, node insertion, knot insertion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 343
6523 Comparative Study on Swarm Intelligence Techniques for Biclustering of Microarray Gene Expression Data

Authors: R. Balamurugan, A. M. Natarajan, K. Premalatha

Abstract:

Microarray gene expression data play a vital in biological processes, gene regulation and disease mechanism. Biclustering in gene expression data is a subset of the genes indicating consistent patterns under the subset of the conditions. Finding a biclustering is an optimization problem. In recent years, swarm intelligence techniques are popular due to the fact that many real-world problems are increasingly large, complex and dynamic. By reasons of the size and complexity of the problems, it is necessary to find an optimization technique whose efficiency is measured by finding the near optimal solution within a reasonable amount of time. In this paper, the algorithmic concepts of the Particle Swarm Optimization (PSO), Shuffled Frog Leaping (SFL) and Cuckoo Search (CS) algorithms have been analyzed for the four benchmark gene expression dataset. The experiment results show that CS outperforms PSO and SFL for 3 datasets and SFL give better performance in one dataset. Also this work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.

Keywords: Particle swarm optimization, Shuffled frog leaping, Cuckoo search, biclustering, gene expression data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2663
6522 Using Business Intelligence Capabilities to Improve the Quality of Decision-Making: A Case Study of Mellat Bank

Authors: Jalal Haghighat Monfared, Zahra Akbari

Abstract:

Today, business executives need to have useful information to make better decisions. Banks have also been using information tools so that they can direct the decision-making process in order to achieve their desired goals by rapidly extracting information from sources with the help of business intelligence. The research seeks to investigate whether there is a relationship between the quality of decision making and the business intelligence capabilities of Mellat Bank. Each of the factors studied is divided into several components, and these and their relationships are measured by a questionnaire. The statistical population of this study consists of all managers and experts of Mellat Bank's General Departments (including 190 people) who use commercial intelligence reports. The sample size of this study was 123 randomly determined by statistical method. In this research, relevant statistical inference has been used for data analysis and hypothesis testing. In the first stage, using the Kolmogorov-Smirnov test, the normalization of the data was investigated and in the next stage, the construct validity of both variables and their resulting indexes were verified using confirmatory factor analysis. Finally, using the structural equation modeling and Pearson's correlation coefficient, the research hypotheses were tested. The results confirmed the existence of a positive relationship between decision quality and business intelligence capabilities in Mellat Bank. Among the various capabilities, including data quality, correlation with other systems, user access, flexibility and risk management support, the flexibility of the business intelligence system was the most correlated with the dependent variable of the present research. This shows that it is necessary for Mellat Bank to pay more attention to choose the required business intelligence systems with high flexibility in terms of the ability to submit custom formatted reports. Subsequently, the quality of data on business intelligence systems showed the strongest relationship with quality of decision making. Therefore, improving the quality of data, including the source of data internally or externally, the type of data in quantitative or qualitative terms, the credibility of the data and perceptions of who uses the business intelligence system, improves the quality of decision making in Mellat Bank.

Keywords: Business intelligence, business intelligence capability, decision making, decision quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1385
6521 Producing Outdoor Design Conditions Based on the Dependency between Meteorological Elements: Copula Approach

Authors: Zhichao Jiao, Craig Farnham, Jihui Yuan, Kazuo Emura

Abstract:

It is common to use the outdoor design weather data to select the air-conditioning capacity in the building design stage. The meteorological elements of outdoor design weather data are usually selected based on their excess frequency separately while the dependency between the elements is not well considered. It means that the simultaneous occurrence probability of these elements is smaller than the original excess frequency which may cause an overestimation of selecting air-conditioning capacity. Therefore, the copula approach which can capture the dependency between multivariate data was used to model the joint distributions of the meteorological elements, like air temperature and global solar radiation. We suggest a method based on the specific simultaneous occurrence probability of these two elements of selecting more credible outdoor design conditions. The hourly weather data at 12 noon from 2001 to 2010 in Tokyo, Japan are used to analyze the dependency structure and joint distribution, the Gaussian copula represents the dependence of data best. According to calculating the air temperature and global solar radiation in specific simultaneous occurrence probability and the common exceeding, the results show that both the air temperature and global solar radiation based on simultaneous occurrence probability are lower than these based on the conventional method in the same probability.

Keywords: Copula approach, Design weather database, energy conservation, HVAC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 363
6520 Finding Fuzzy Association Rules Using FWFP-Growth with Linguistic Supports and Confidences

Authors: Chien-Hua Wang, Chin-Tzong Pang

Abstract:

In data mining, the association rules are used to search for the relations of items of the transactions database. Following the data is collected and stored, it can find rules of value through association rules, and assist manager to proceed marketing strategy and plan market framework. In this paper, we attempt fuzzy partition methods and decide membership function of quantitative values of each transaction item. Also, by managers we can reflect the importance of items as linguistic terms, which are transformed as fuzzy sets of weights. Next, fuzzy weighted frequent pattern growth (FWFP-Growth) is used to complete the process of data mining. The method above is expected to improve Apriori algorithm for its better efficiency of the whole association rules. An example is given to clearly illustrate the proposed approach.

Keywords: Association Rule, Fuzzy Partition Methods, FWFP-Growth, Apiroir algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1652
6519 Change Point Analysis in Average Ozone Layer Temperature Using Exponential Lomax Distribution

Authors: Amjad Abdullah, Amjad Yahya, Bushra Aljohani, Amani S. Alghamdi

Abstract:

Change point detection is an important part of data analysis. The presence of a change point refers to a significant change in the behavior of a time series. In this article, we examine the detection of multiple change points of parameters of the exponential Lomax distribution, which is broad and flexible compared with other distributions while fitting data. We used the Schwarz information criterion and binary segmentation to detect multiple change points in publicly available data on the average temperature in the ozone layer. The change points were successfully located.

Keywords: Binary segmentation, change point, exponential Lomax distribution, information criterion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 345
6518 Two-Phase Optimization for Selecting Materialized Views in a Data Warehouse

Authors: Jiratta Phuboon-ob, Raweewan Auepanwiriyakul

Abstract:

A data warehouse (DW) is a system which has value and role for decision-making by querying. Queries to DW are critical regarding to their complexity and length. They often access millions of tuples, and involve joins between relations and aggregations. Materialized views are able to provide the better performance for DW queries. However, these views have maintenance cost, so materialization of all views is not possible. An important challenge of DW environment is materialized view selection because we have to realize the trade-off between performance and view maintenance. Therefore, in this paper, we introduce a new approach aimed to solve this challenge based on Two-Phase Optimization (2PO), which is a combination of Simulated Annealing (SA) and Iterative Improvement (II), with the use of Multiple View Processing Plan (MVPP). Our experiments show that 2PO outperform the original algorithms in terms of query processing cost and view maintenance cost.

Keywords: Data warehouse, materialized views, view selectionproblem, two-phase optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1707
6517 Stakeholder Analysis of Agricultural Drone Policy: A Case Study of the Agricultural Drone Ecosystem of Thailand

Authors: Thanomsin Chakreeves, Atichat Preittigun, Ajchara Phu-ang

Abstract:

This paper presents a stakeholder analysis of agricultural drone policies that meet the government's goal of building an agricultural drone ecosystem in Thailand. Firstly, case studies from other countries are reviewed. The stakeholder analysis method and qualitative data from the interviews are then presented including data from the Institute of Innovation and Management, the Office of National Higher Education Science Research and Innovation Policy Council, agricultural entrepreneurs and farmers. Study and interview data are then employed to describe the current ecosystem and to guide the implementation of agricultural drone policies that are suitable for the ecosystem of Thailand. Finally, policy recommendations are then made that the Thai government should adopt in the future.

Keywords: Drone public policy, drone ecosystem, policy development, agricultural drone.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 810
6516 Study and Analysis of Optical Intersatellite Links

Authors: Boudene Maamar, Xu Mai

Abstract:

Optical Intersatellite Links (OISLs) are wireless communications using optical signals to interconnect satellites. It is expected to be the next generation wireless communication technology according to its inherent characteristics like: an increased bandwidth, a high data rate, a data transmission security, an immunity to interference, and an unregulated spectrum etc. Optical space links are the best choice for the classical communication schemes due to its distinctive properties; high frequency, small antenna diameter and lowest transmitted power, which are critical factors to define a space communication. This paper discusses the development of free space technology and analyses the parameters and factors to establish a reliable intersatellite links using an optical signal to exchange data between satellites.

Keywords: Optical intersatellite links, optical wireless communications, free space optical communications, next generation wireless communication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3038
6515 Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

Authors: Frank Emmert-Streib, Matthias Dehmer, Jing Liu, Max Muhlhauser

Abstract:

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755
6514 Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs with Time-lapse Seismic Data

Authors: Md Khairullah, Hai-Xiang Lin, Remus G. Hanea, Arnold W. Heemink

Abstract:

In this paper we describe the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir history matching problem. The use of large number of observations from time-lapse seismic leads to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. For efficient parallelization it is important to consider parallel computation at the analysis step. Our experiments show that parallelization of the analysis step in addition to the forecast step has good scalability, exploiting the same set of resources with some additional efforts.

Keywords: EnKF, Data assimilation, Parallel computing, Parallel efficiency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2282
6513 Bandwidth Allocation for ABR Service in Cellular Networks

Authors: Khaja Kamaluddin, Muhammed Yousoof

Abstract:

Available Bit Rate Service (ABR) is the lower priority service and the better service for the transmission of data. On wireline ATM networks ABR source is always getting the feedback from switches about increase or decrease of bandwidth according to the changing network conditions and minimum bandwidth is guaranteed. In wireless networks guaranteeing the minimum bandwidth is really a challenging task as the source is always in mobile and traveling from one cell to another cell. Re establishment of virtual circuits from start to end every time causes the delay in transmission. In our proposed solution we proposed the mechanism to provide more available bandwidth to the ABR source by re-usage of part of old Virtual Channels and establishing the new ones. We want the ABR source to transmit the data continuously (non-stop) inorderto avoid the delay. In worst case scenario at least minimum bandwidth is to be allocated. In order to keep the data flow continuously, priority is given to the handoff ABR call against new ABR call.

Keywords: Bandwidth allocation, Virtual Channel (VC), CBR, ABR, MCR and QOS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1601
6512 Data Acquisition System for Automotive Testing According to the European Directive 2004/104/EC

Authors: Herminio Martínez-García, Juan Gámiz, Yolanda Bolea, Antoni Grau

Abstract:

This article presents an interactive system for data acquisition in vehicle testing according to the test process defined in automotive directive 2004/104/EC. The project has been designed and developed by authors for the Spanish company Applus-LGAI. The developed project will result in a new process, which will involve the creation of braking cycle test defined in the aforementioned automotive directive. It will also allow the analysis of new vehicle features that was not feasible, allowing an increasing interaction with the vehicle. Potential users of this system in the short term will be vehicle manufacturers and in a medium term the system can be extended to testing other automotive components and EMC tests.

Keywords: Automotive process, data acquisition system, electromagnetic compatibility (EMC) testing, European Directive 2004/104/EC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465
6511 Blockchain-Based Assignment Management System

Authors: Amogh Katti, J. Sai Asritha, D. Nivedh, M. Kalyan Srinivas, B. Somnath Chakravarthi

Abstract:

Today's modern education system uses Learning Management System (LMS) portals for the scoring and grading of student performances, to maintain student records, and teachers are instructed to accept assignments through online submissions of .pdf, .doc, .ppt, etc. There is a risk of data tampering in the traditional portals; we will apply the Blockchain model instead of this traditional model to avoid data tampering and also provide a decentralized mechanism for overall fairness. Blockchain technology is a better and also recommended model because of the following features: consensus mechanism, decentralized system, cryptographic encryption, smart contracts, Ethereum blockchain. The proposed system ensures data integrity and tamper-proof assignment submission and grading, which will be helpful for both students and also educators.

Keywords: Education technology, learning management system, decentralized applications, blockchain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166
6510 Predicting DHF Incidence in Northern Thailand using Time Series Analysis Technique

Authors: S. Wongkoon, M. Pollar, M. Jaroensutasinee, K. Jaroensutasinee

Abstract:

This study aimed at developing a forecasting model on the number of Dengue Haemorrhagic Fever (DHF) incidence in Northern Thailand using time series analysis. We developed Seasonal Autoregressive Integrated Moving Average (SARIMA) models on the data collected between 2003-2006 and then validated the models using the data collected between January-September 2007. The results showed that the regressive forecast curves were consistent with the pattern of actual values. The most suitable model was the SARIMA(2,0,1)(0,2,0)12 model with a Akaike Information Criterion (AIC) of 12.2931 and a Mean Absolute Percent Error (MAPE) of 8.91713. The SARIMA(2,0,1)(0,2,0)12 model fitting was adequate for the data with the Portmanteau statistic Q20 = 8.98644 ( x20,95= 27.5871, P>0.05). This indicated that there was no significant autocorrelation between residuals at different lag times in the SARIMA(2,0,1)(0,2,0)12 model.

Keywords: Dengue, SARIMA, Time Series Analysis, Northern Thailand.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1992
6509 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles

Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi

Abstract:

Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.

Keywords: Artificial neural networks, fuel consumption, machine learning, regression, statistical tests.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 834
6508 Migration of the Relational Data Base (RDB) to the Object Relational Data Base (ORDB)

Authors: Alae El Alami, Mohamed Bahaj

Abstract:

This paper proposes an approach for translating an existing relational database (RDB) schema into ORDB. The transition is done with methods that can extract various functions from a RDB which is based on aggregations, associations between the various tables, and the reflexive relationships. These methods can extract even the inheritance knowing that no process of reverse engineering can know that it is an Inheritance; therefore, our approach exceeded all of the previous studies made for ​​the transition from RDB to ORDB. In summation, the creation of the New Data Model (NDM) that stocks the RDB in a form of a structured table, and from the NDM we create our navigational model in order to simplify the implementation object from which we develop our different types. Through these types we precede to the last step, the creation of tables.

The step mentioned above does not require any human interference. All this is done automatically, and a prototype has already been created which proves the effectiveness of this approach.

Keywords: Relational databases, Object-relational databases, Semantic enrichment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1955
6507 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach

Authors: K. Thangavel, R. Rathipriya

Abstract:

For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.

Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2134
6506 Morphology of Parts of the Middle Benue Trough of Nigeria from Spectral Analysis of Aeromagnetic Data (Akiri Sheet 232 and Lafia Sheet 231)

Authors: B. S. Jatau, Nandom Abu

Abstract:

Structural interpretation of aeromagnetic data and Landsat imagery over the Middle Benue Trough was carried out to determine the depth to basement, delineate the basement morphology and relief, and the structural features within the basin. The aeromagnetic and Landsat data were subjected to various image and data enhancement and transformation routines. Results of the study revealed lineaments with trend directions in the N-S, NE-SW, NWSE and E-W directions, with the NE-SW trends been dominant. The depths to basement within the trough were established to be at 1.8, 0.3 and 0.8km, as shown from the spectral analysis plot. The Source Parameter Imaging (SPI) plot generated showed the centralsouth/ eastern portion of the study area as being deeper in contrast to the western-south-west portion. The basement morphology of the trough was interpreted as having parallel sets of micro-basins which could be considered as grabens and horsts in agreement with the general features interpreted by early workers.

Keywords: Morphology, Middle Benue Trough, Spectral Analysis, Source Parameter Imaging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4068
6505 Imputing Missing Data in Electronic Health Records: A Comparison of Linear and Non-Linear Imputation Models

Authors: Alireza Vafaei Sadr, Vida Abedi, Jiang Li, Ramin Zand

Abstract:

Missing data is a common challenge in medical research and can lead to biased or incomplete results. When the data bias leaks into models, it further exacerbates health disparities; biased algorithms can lead to misclassification and reduced resource allocation and monitoring as part of prevention strategies for certain minorities and vulnerable segments of patient populations, which in turn further reduce data footprint from the same population – thus, a vicious cycle. This study compares the performance of six imputation techniques grouped into Linear and Non-Linear models, on two different real-world electronic health records (EHRs) datasets, representing 17864 patient records. The mean absolute percentage error (MAPE) and root mean squared error (RMSE) are used as performance metrics, and the results show that the Linear models outperformed the Non-Linear models in terms of both metrics. These results suggest that sometimes Linear models might be an optimal choice for imputation in laboratory variables in terms of imputation efficiency and uncertainty of predicted values.

Keywords: EHR, Machine Learning, imputation, laboratory variables, algorithmic bias.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 185
6504 Geospatial Network Analysis Using Particle Swarm Optimization

Authors: Varun Singh, Mainak Bandyopadhyay, Maharana Pratap Singh

Abstract:

The shortest path (SP) problem concerns with finding the shortest path from a specific origin to a specified destination in a given network while minimizing the total cost associated with the path. This problem has widespread applications. Important applications of the SP problem include vehicle routing in transportation systems particularly in the field of in-vehicle Route Guidance System (RGS) and traffic assignment problem (in transportation planning). Well known applications of evolutionary methods like Genetic Algorithms (GA), Ant Colony Optimization, Particle Swarm Optimization (PSO) have come up to solve complex optimization problems to overcome the shortcomings of existing shortest path analysis methods. It has been reported by various researchers that PSO performs better than other evolutionary optimization algorithms in terms of success rate and solution quality. Further Geographic Information Systems (GIS) have emerged as key information systems for geospatial data analysis and visualization. This research paper is focused towards the application of PSO for solving the shortest path problem between multiple points of interest (POI) based on spatial data of Allahabad City and traffic speed data collected using GPS. Geovisualization of results of analysis is carried out in GIS.

Keywords: GIS, Outliers, PSO, Traffic Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2893
6503 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: Binary data, similarity measure, Tθ measures, Agglomerative Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3446