Search results for: Multidimensional Sequence Data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7807

Search results for: Multidimensional Sequence Data

7267 XML Data Management in Compressed Relational Database

Authors: Hongzhi Wang, Jianzhong Li, Hong Gao

Abstract:

XML is an important standard of data exchange and representation. As a mature database system, using relational database to support XML data may bring some advantages. But storing XML in relational database has obvious redundancy that wastes disk space, bandwidth and disk I/O when querying XML data. For the efficiency of storage and query XML, it is necessary to use compressed XML data in relational database. In this paper, a compressed relational database technology supporting XML data is presented. Original relational storage structure is adaptive to XPath query process. The compression method keeps this feature. Besides traditional relational database techniques, additional query process technologies on compressed relations and for special structure for XML are presented. In this paper, technologies for XQuery process in compressed relational database are presented..

Keywords: XML, compression, query processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
7266 An Automatic Gridding and Contour Based Segmentation Approach Applied to DNA Microarray Image Analysis

Authors: Alexandra Oliveros, Miguel Sotaquirá

Abstract:

DNA microarray technology is widely used by geneticists to diagnose or treat diseases through gene expression. This technology is based on the hybridization of a tissue-s DNA sequence into a substrate and the further analysis of the image formed by the thousands of genes in the DNA as green, red or yellow spots. The process of DNA microarray image analysis involves finding the location of the spots and the quantification of the expression level of these. In this paper, a tool to perform DNA microarray image analysis is presented, including a spot addressing method based on the image projections, the spot segmentation through contour based segmentation and the extraction of relevant information due to gene expression.

Keywords: Contour segmentation, DNA microarrays, edge detection, image processing, segmentation, spot addressing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1371
7265 A System for Analyzing and Eliciting Public Grievances Using Cache Enabled Big Data

Authors: P. Kaladevi, N. Giridharan

Abstract:

The system for analyzing and eliciting public grievances serves its main purpose to receive and process all sorts of complaints from the public and respond to users. Due to the more number of complaint data becomes big data which is difficult to store and process. The proposed system uses HDFS to store the big data and uses MapReduce to process the big data. The concept of cache was applied in the system to provide immediate response and timely action using big data analytics. Cache enabled big data increases the response time of the system. The unstructured data provided by the users are efficiently handled through map reduce algorithm. The processing of complaints takes place in the order of the hierarchy of the authority. The drawbacks of the traditional database system used in the existing system are set forth by our system by using Cache enabled Hadoop Distributed File System. MapReduce framework codes have the possible to leak the sensitive data through computation process. We propose a system that add noise to the output of the reduce phase to avoid signaling the presence of sensitive data. If the complaints are not processed in the ample time, then automatically it is forwarded to the higher authority. Hence it ensures assurance in processing. A copy of the filed complaint is sent as a digitally signed PDF document to the user mail id which serves as a proof. The system report serves to be an essential data while making important decisions based on legislation.

Keywords: Big Data, Hadoop, HDFS, Caching, MapReduce, web personalization, e-governance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574
7264 An Efficient Ant Colony Optimization Algorithm for Multiobjective Flow Shop Scheduling Problem

Authors: Ahmad Rabanimotlagh

Abstract:

In this paper an ant colony optimization algorithm is developed to solve the permutation flow shop scheduling problem. In the permutation flow shop scheduling problem which has been vastly studied in the literature, there are a set of m machines and a set of n jobs. All the jobs are processed on all the machines and the sequence of jobs being processed is the same on all the machines. Here this problem is optimized considering two criteria, makespan and total flow time. Then the results are compared with the ones obtained by previously developed algorithms. Finally it is visible that our proposed approach performs best among all other algorithms in the literature.

Keywords: Scheduling, Flow shop, Ant colony optimization, Makespan, Flow time

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2405
7263 Bifurcation Analysis of a Delayed Predator-prey Fishery Model with Prey Reserve in Frequency Domain

Authors: Changjin Xu

Abstract:

In this paper, applying frequency domain approach, a delayed predator-prey fishery model with prey reserve is investigated. By choosing the delay τ as a bifurcation parameter, It is found that Hopf bifurcation occurs as the bifurcation parameter τ passes a sequence of critical values. That is, a family of periodic solutions bifurcate from the equilibrium when the bifurcation parameter exceeds a critical value. The length of delay which preserves the stability of the positive equilibrium is calculated. Some numerical simulations are included to justify the theoretical analysis results. Finally, main conclusions are given.

Keywords: Predator-prey model, stability, Hopf bifurcation, frequency domain, Nyquist criterion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1388
7262 Screen of MicroRNA Targets in Zebrafish Using Heterogeneous Data Sources: A Case Study for Dre-miR-10 and Dre-miR-196

Authors: Yanju Zhang, Joost M. Woltering, Fons J. Verbeek

Abstract:

It has been established that microRNAs (miRNAs) play an important role in gene expression by post-transcriptional regulation of messengerRNAs (mRNAs). However, the precise relationships between microRNAs and their target genes in sense of numbers, types and biological relevance remain largely unclear. Dissecting the miRNA-target relationships will render more insights for miRNA targets identification and validation therefore promote the understanding of miRNA function. In miRBase, miRanda is the key algorithm used for target prediction for Zebrafish. This algorithm is high-throughput but brings lots of false positives (noise). Since validation of a large scale of targets through laboratory experiments is very time consuming, several computational methods for miRNA targets validation should be developed. In this paper, we present an integrative method to investigate several aspects of the relationships between miRNAs and their targets with the final purpose of extracting high confident targets from miRanda predicted targets pool. This is achieved by using the techniques ranging from statistical tests to clustering and association rules. Our research focuses on Zebrafish. It was found that validated targets do not necessarily associate with the highest sequence matching. Besides, for some miRNA families, the frequency of their predicted targets is significantly higher in the genomic region nearby their own physical location. Finally, in a case study of dre-miR-10 and dre-miR-196, it was found that the predicted target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR- 10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar characteristics as validated target genes and therefore represent high confidence target candidates.

Keywords: MicroRNA targets validation, microRNA-target relationships, dre-miR-10, dre-miR-196.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1960
7261 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.

Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3666
7260 Automatic Moment-Based Texture Segmentation

Authors: Tudor Barbu

Abstract:

An automatic moment-based texture segmentation approach is proposed in this paper. First, we describe the related work in this computer vision domain. Our texture feature extraction, the first part of the texture recognition process, produces a set of moment-based feature vectors. For each image pixel, a texture feature vector is computed as a sequence of area moments. Then, an automatic pixel classification approach is proposed. The feature vectors are clustered using an unsupervised classification algorithm, the optimal number of clusters being determined using a measure based on validation indexes. From the resulted pixel classes one determines easily the desired texture regions of the image.

Keywords: Image segmentation, moment-based texture analysis, automatic classification, validity indexes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2362
7259 Public Key Cryptosystem based on Number Theoretic Transforms

Authors: C. Porkodi, R. Arumuganathan

Abstract:

In this paper a Public Key Cryptosystem is proposed using the number theoretic transforms (NTT) over a ring of integer modulo a composite number. The key agreement is similar to ElGamal public key algorithm. The security of the system is based on solution of multivariate linear congruence equations and discrete logarithm problem. In the proposed cryptosystem only fixed numbers of multiplications are carried out (constant complexity) and hence the encryption and decryption can be done easily. At the same time, it is very difficult to attack the cryptosystem, since the cipher text is a sequence of integers which are interrelated. The system provides authentication also. Using Mathematica version 5.0 the proposed algorithm is justified with a numerical example.

Keywords: Cryptography, decryption, discrete logarithm problem encryption, Integer Factorization problem, Key agreement, Number Theoretic Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
7258 Complex-Valued Neural Network in Image Recognition: A Study on the Effectiveness of Radial Basis Function

Authors: Anupama Pande, Vishik Goel

Abstract:

A complex valued neural network is a neural network, which consists of complex valued input and/or weights and/or thresholds and/or activation functions. Complex-valued neural networks have been widening the scope of applications not only in electronics and informatics, but also in social systems. One of the most important applications of the complex valued neural network is in image and vision processing. In Neural networks, radial basis functions are often used for interpolation in multidimensional space. A Radial Basis function is a function, which has built into it a distance criterion with respect to a centre. Radial basis functions have often been applied in the area of neural networks where they may be used as a replacement for the sigmoid hidden layer transfer characteristic in multi-layer perceptron. This paper aims to present exhaustive results of using RBF units in a complex-valued neural network model that uses the back-propagation algorithm (called 'Complex-BP') for learning. Our experiments results demonstrate the effectiveness of a Radial basis function in a complex valued neural network in image recognition over a real valued neural network. We have studied and stated various observations like effect of learning rates, ranges of the initial weights randomly selected, error functions used and number of iterations for the convergence of error on a neural network model with RBF units. Some inherent properties of this complex back propagation algorithm are also studied and discussed.

Keywords: Complex valued neural network, Radial BasisFunction, Image recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2385
7257 Mobile Phone as a Tool for Data Collection in Field Research

Authors: Sandro Mourão, Karla Okada

Abstract:

The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.

Keywords: Data Gathering, Field Research, Mobile Phone, Survey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2036
7256 Vision Based People Tracking System

Authors: Boukerch Haroun, Luo Qing Sheng, Li Hua Shi, Boukraa Sebti

Abstract:

In this paper we present the design and the implementation of a target tracking system where the target is set to be a moving person in a video sequence. The system can be applied easily as a vision system for mobile robot. The system is composed of two major parts the first is the detection of the person in the video frame using the SVM learning machine based on the “HOG” descriptors. The second part is the tracking of a moving person it’s done by using a combination of the Kalman filter and a modified version of the Camshift tracking algorithm by adding the target motion feature to the color feature, the experimental results had shown that the new algorithm had overcame the traditional Camshift algorithm in robustness and in case of occlusion.

Keywords: Camshift Algorithm, Computer Vision, Kalman Filter, Object tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1297
7255 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1720
7254 Sub-Impact Phenomenon of Elasto-Plastic Free-Free Beam during a Strike

Authors: H. Rong, X. C. Yin, J. Yang, Y. N. Shen

Abstract:

Based on Rayleigh beam theory, the sub-impacts of a free-free beam struck horizontally by a round-nosed rigid mass is simulated by the finite difference method and the impact-separation conditions. In order to obtain the sub-impact force, a uniaxial compression elastic-plastic contact model is employed to analyze the local deformation field on contact zone. It is found that the horizontal impact is a complicated process including the elastic plastic sub-impacts in sequence. There are two sub-zones of sub-impact. In addition, it found that the elastic energy of the free-free beam is more suitable for the Poisson collision hypothesis to explain compression and recovery processes.

Keywords: beam, sub-impact, elastic-plastic deformation, finite difference method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1827
7253 Analysis of DNA-Recognizing Enzyme Interaction using Deaminated Lesions

Authors: Seung Pil Pack

Abstract:

Deaminated lesions were produced via nitrosative oxidation of natural nucleobases; uracul (Ura, U) from cytosine (Cyt, C), hypoxanthine (Hyp, H) from adenine (Ade, A), and xanthine (Xan, X) and oxanine (Oxa, O) from guanine (Gua, G). Such damaged nucleobases may induce mutagenic problems, so that much attentions and efforts have been poured on the revealing of their mechanisms in vivo or in vitro. In this study, we employed these deaminated lesions as useful probes for analysis of DNA-binding/recognizing proteins or enzymes. Since the pyrimidine lesions such as Hyp, Oxa and Xan are employed as analogues of guanine, their comparative uses are informative for analyzing the role of Gua in DNA sequence in DNA-protein interaction. Several DNA oligomers containing such Hyp, Oxa or Xan substituted for Gua were designed to reveal the molecular interaction between DNA and protein. From this approach, we have got useful information to understand the molecular mechanisms of the DNA-recognizing enzymes, which have not ever been observed using conventional DNA oligomer composed of just natural nucleobases.

Keywords: Deaminated lesion, DNA-protein interaction, DNA-recognizing enzymes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1261
7252 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: Cluster analysis, education, mathematics, profiles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 865
7251 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data

Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan

Abstract:

The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.

Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2366
7250 Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning

Authors: Chunming Xu

Abstract:

Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.

Keywords: sparse representation, dimensionality reduction, labelinformation, sparse subspace learning, gene-expression data classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
7249 Multiple Targets Classification and Fuzzy Logic Decision Fusion in Wireless Sensor Networks

Authors: Ahmad Aljaafreh

Abstract:

This paper proposes a hierarchical hidden Markov model (HHMM) to model the detection of M vehicles in a wireless sensor network (WSN). The HHMM model contains an extra level of hidden Markov model to model the temporal transitions of each state of the first HMM. By modeling the temporal transitions, only those hypothesis with nonzero transition probabilities needs to be tested. Thus, this method efficiently reduces the computation load, which is preferable in WSN applications.This paper integrates several techniques to optimize the detection performance. The output of the states of the first HMM is modeled as Gaussian Mixture Model (GMM), where the number of states and the number of Gaussians are experimentally determined, while the other parameters are estimated using Expectation Maximization (EM). HHMM is used to model the sequence of the local decisions which are based on multiple hypothesis testing with maximum likelihood approach. The states in the HHMM represent various combinations of vehicles of different types. Due to the statistical advantages of multisensor data fusion, we propose a heuristic based on fuzzy weighted majority voting to enhance cooperative classification of moving vehicles within a region that is monitored by a wireless sensor network. A fuzzy inference system weighs each local decision based on the signal to noise ratio of the acoustic signal for target detection and the signal to noise ratio of the radio signal for sensor communication. The spatial correlation among the observations of neighboring sensor nodes is efficiently utilized as well as the temporal correlation. Simulation results demonstrate the efficiency of this scheme.

Keywords: Classification, decision fusion, fuzzy logic, hidden Markov model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6227
7248 Modeling and Visualizing Seismic Wave Propagation in Elastic Medium Using Multi-Dimension Wave Digital Filtering Approach

Authors: Jason Chien-Hsun Tseng, Nguyen Dong-Thai Dao, Chong-Ching Chang

Abstract:

A novel PDE solver using the multidimensional wave digital filtering (MDWDF) technique to achieve the solution of a 2D seismic wave system is presented. In essence, the continuous physical system served by a linear Kirchhoff circuit is transformed to an equivalent discrete dynamic system implemented by a MD wave digital filtering (MDWDF) circuit. This amounts to numerically approximating the differential equations used to describe elements of a MD passive electronic circuit by a grid-based difference equations implemented by the so-called state quantities within the passive MDWDF circuit. So the digital model can track the wave field on a dense 3D grid of points. Details about how to transform the continuous system into a desired discrete passive system are addressed. In addition, initial and boundary conditions are properly embedded into the MDWDF circuit in terms of state quantities. Graphic results have clearly demonstrated some physical effects of seismic wave (P-wave and S–wave) propagation including radiation, reflection, and refraction from and across the hard boundaries. Comparison between the MDWDF technique and the finite difference time domain (FDTD) approach is also made in terms of the computational efficiency.

Keywords: Seismic Wave Propagation, Multi-dimension WaveDigital Filters, Partial Differential Equations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1411
7247 Long Term Examination of the Profitability Estimation Focused on Benefits

Authors: Stephan Printz, Kristina Lahl, René Vossen, Sabina Jeschke

Abstract:

Strategic investment decisions are characterized by high innovation potential and long-term effects on the competitiveness of enterprises. Due to the uncertainty and risks involved in this complex decision making process, the need arises for well-structured support activities. A method that considers cost and the long-term added value is the cost-benefit effectiveness estimation. One of those methods is the “profitability estimation focused on benefits – PEFB”-method developed at the Institute of Management Cybernetics at RWTH Aachen University. The method copes with the challenges associated with strategic investment decisions by integrating long-term non-monetary aspects whilst also mapping the chronological sequence of an investment within the organization’s target system. Thus, this method is characterized as a holistic approach for the evaluation of costs and benefits of an investment. This participation-oriented method was applied to business environments in many workshops. The results of the workshops are a library of more than 96 cost aspects, as well as 122 benefit aspects. These aspects are preprocessed and comparatively analyzed with regards to their alignment to a series of risk levels. For the first time, an accumulation and a distribution of cost and benefit aspects regarding their impact and probability of occurrence are given. The results give evidence that the PEFB-method combines precise measures of financial accounting with the incorporation of benefits. Finally, the results constitute the basics for using information technology and data science for decision support when applying within the PEFB-method.

Keywords: Cost-benefit analysis, multi-criteria decision, profitability estimation focused on benefits, risk and uncertainty analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1471
7246 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1696
7245 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1044
7244 Entrepreneurship and the Discovery and Exploitation of Business Opportunities: Empirical Evidence from the Malawian Tourism Sector

Authors: Aravind Mohan Krishnan

Abstract:

This paper identifies a research gap in the literature on tourism entrepreneurship in Malawi, Africa, and investigates how entrepreneurs from the Malawian tourism sector discover and exploit business opportunities. In particular, the importance of prior experience and business networks in the opportunity development process is debated. Another area of empirical research examined here is the opportunity recognition-venture creation sequence. While Malawi presents fruitful business opportunities, exploiting these opportunities into fully realized business ideas is a real challenge due to the country’s difficult business environment and poor promotional and marketing efforts. The study concludes by calling for further research in Sub-Saharan Africa in order to develop our understanding of entrepreneurship in this (African) context.

Keywords: Tourism, entrepreneurship, Malawi, business opportunities.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2322
7243 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain

Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami

Abstract:

To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.

Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 774
7242 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach

Authors: Sarisa Pinkham, Kanyarat Bussaban

Abstract:

The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.

Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1735
7241 Automatic Generation of Ontology from Data Source Directed by Meta Models

Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas

Abstract:

Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.

Keywords: Meta model, model, ontology, data source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1975
7240 Entomological Origin of Honey Discriminated by NMR Chloroform Extracts in Ecuadorian Honey

Authors: P. Vit, J. Uddin, V. Zuccato, F. Maza, E. Schievano

Abstract:

Honeys are produced by Apis mellifera and stingless bees (Meliponini) in Ecuador. We studied honey produced in beeswax combs by Apis mellifera, and honey produced in pots by Geotrigona and Scaptotrigona bees. Chloroform extracts of honey were obtained for fast NMR spectra. The 1D spectra were acquired at 298 K, with a 600 MHz NMR Bruker instrument, using a modified double pulsed field gradient spin echoes (DPFGSE) sequence. Signals of 1H NMR spectra were integrated and used as inputs for PCA, PLS-DA analysis, and labelled sets of classes were successfully identified, enhancing the separation between the three groups of honey according to the entomological origin: A. mellifera, Geotrigona and Scaptotrigona. This procedure is therefore recommended for authenticity test of honey in Ecuador.

Keywords: Apis mellifera, honey, 1H NMR, entomological origin, Meliponini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3112
7239 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia

Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray

Abstract:

The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.

Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1982
7238 Test Data Compression Using a Hybrid of Bitmask Dictionary and 2n Pattern Runlength Coding Methods

Authors: C. Kalamani, K. Paramasivam

Abstract:

In VLSI, testing plays an important role. Major problem in testing are test data volume and test power. The important solution to reduce test data volume and test time is test data compression. The Proposed technique combines the bit maskdictionary and 2n pattern run length-coding method and provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. This method has been implemented using Mat lab and HDL Language to reduce test data volume and memory requirements. This method is applied on various benchmark test sets and compared the results with other existing methods. The proposed technique can achieve a compression ratio up to 86%.

Keywords: Bit Mask dictionary, 2n pattern run length code, system-on-chip, SOC, test data compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1900