Search results for: hierarchical cluster analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8969

Search results for: hierarchical cluster analysis

8609 Hierarchical Checkpoint Protocol in Data Grids

Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed

Abstract:

Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.

Keywords: Data grids, fault tolerance, chandy-lamport, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 909
8608 Forecasting Malaria Cases in Bujumbura

Authors: Hermenegilde Nkurunziza, Albrecht Gebhardt, Juergen Pilz

Abstract:

The focus in this work is to assess which method allows a better forecasting of malaria cases in Bujumbura ( Burundi) when taking into account association between climatic factors and the disease. For the period 1996-2007, real monthly data on both malaria epidemiology and climate in Bujumbura are described and analyzed. We propose a hierarchical approach to achieve our objective. We first fit a Generalized Additive Model to malaria cases to obtain an accurate predictor, which is then used to predict future observations. Various well-known forecasting methods are compared leading to different results. Based on in-sample mean average percentage error (MAPE), the multiplicative exponential smoothing state space model with multiplicative error and seasonality performed better.

Keywords: Burundi, Forecasting, Malaria, Regressionmodel, State space model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1948
8607 Identifying Autism Spectrum Disorder Using Optimization-Based Clustering

Authors: Sharifah Mousli, Sona Taheri, Jiayuan He

Abstract:

Autism spectrum disorder (ASD) is a complex developmental condition involving persistent difficulties with social communication, restricted interests, and repetitive behavior. The challenges associated with ASD can interfere with an affected individual’s ability to function in social, academic, and employment settings. Although there is no effective medication known to treat ASD, to our best knowledge, early intervention can significantly improve an affected individual’s overall development. Hence, an accurate diagnosis of ASD at an early phase is essential. The use of machine learning approaches improves and speeds up the diagnosis of ASD. In this paper, we focus on the application of unsupervised clustering methods in ASD, as a large volume of ASD data generated through hospitals, therapy centers, and mobile applications has no pre-existing labels. We conduct a comparative analysis using seven clustering approaches, such as K-means, agglomerative hierarchical, model-based, fuzzy-C-means, affinity propagation, self organizing maps, linear vector quantisation – as well as the recently developed optimization-based clustering (COMSEP-Clust) approach. We evaluate the performances of the clustering methods extensively on real-world ASD datasets encompassing different age groups: toddlers, children, adolescents, and adults. Our experimental results suggest that the COMSEP-Clust approach outperforms the other seven methods in recognizing ASD with well-separated clusters.

Keywords: Autism spectrum disorder, clustering, optimization, unsupervised machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 196
8606 Multivariate Assessment of Mathematics Test Scores of Students in Qatar

Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski

Abstract:

Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.

Keywords: Cluster analysis, education, mathematics, profiles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 847
8605 Effect of Different Conditions on the Sorption Behavior of Co2+ Using Celatom- ZeoliteY Composite

Authors: Salam K. Al-Nasri, SM Holmes

Abstract:

Composite of Celatom-ZeoliteY (Cel-ZY) was used to remove cobalt ion from an aqueous solution using batch mode. ZeoliteY has successfully superimposed on Celatom FW-14 surface using hydrothermal treatment .The product was synthesized as a novel of hierarchical porous material. It was observed from the results that Cel-ZY has higher ability to remove cobalt ions than the pure ZeoliteY powder (PZY) synthesized under the same conditions. Several parameters were studied in this project to investigate the effect of removal cobalt ion such as pH and initial cobalt concentration. It was clearly observed that the uptake of cobalt ions was affected with increase these parameters. The results proved that the product can be used effectively to remove Co2+ ions from wastewater as an environmentally friendly alternative.

Keywords: Adsorption, Celatom-Zeolite, Cobalt ions, Isotherm models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2351
8604 Adaptive Transmission Scheme Based on Channel State in Dual-Hop System

Authors: Seung-Jun Yu, Yong-Jun Kim, Jung-In Baik, Hyoung-Kyu Song

Abstract:

In this paper, a dual-hop relay based on channel state is studied. In the conventional relay scheme, a relay uses the same modulation method without reference to channel state. But, a relay uses an adaptive modulation method with reference to channel state. If the channel state is poor, a relay eliminates latter 2 bits and uses Quadrature Phase Shift Keying (QPSK) modulation. If channel state is good, a relay modulates the received symbols with 16-QAM symbols by using 4 bits. The performance of the proposed scheme for Symbol Error Rate (SER) and throughput is analyzed.

Keywords: Adaptive transmission, channel state, dual-hop, hierarchical modulation, relay.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1768
8603 Predicting Foreign Direct Investment of IC Design Firms from Taiwan to East and South China Using Lotka-Volterra Model

Authors: Bi-Huei Tsai

Abstract:

This work explores the inter-region investment behaviors of Integrated Circuit (IC) design industry from Taiwan to China using the amount of foreign direct investment (FDI). According to the mutual dependence among different IC design industrial locations, Lotka-Volterra model is utilized to explore the FDI interactions between South and East China. Effects of inter-regional collaborations on FDI flows into China are considered. The analysis results show that FDIs into South China for IC design industry significantly inspire the subsequent FDIs into East China, while FDIs into East China for Taiwan’s IC design industry significantly hinder the subsequent FDIs into South China. Because the supply chain along IC industry includes upstream IC design, midstream manufacturing, as well as downstream packing and testing enterprises, IC design industry has to cooperate with IC manufacturing, packaging and testing industries in the same area to form a strong IC industrial cluster. Taiwan’s IC design industry implement the largest FDI amount into East China and the second largest FDI amount into South China among the four regions: North, East, Mid-West and South China. If IC design houses undertake more FDIs in South China, those in East China are urged to incrementally implement more FDIs into East China to maintain the competitive advantages of the IC supply chain in East China. On the other hand, as the FDIs in East China rise, the FDIs in South China will successively decline since capitals have concentrated in East China. In addition, this investigation proves that the prediction of Lotka-Volterra model in FDI trends is accurate because the industrial interactions between the two regions are included. Finally, this work confirms that the FDI flows cannot reach a stable equilibrium point, so the FDI inflows into East and South China will expand in the future.

Keywords: Lotka-Volterra model, Foreign direct investment, Competitive, Equilibrium analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1439
8602 A Flexible and Scalable Agent Platform for Multi-Agent Systems

Authors: Ae Hee Park, So Hyun Park, Hee Yong Youn

Abstract:

Multi-agent system is composed by several agents capable of reaching the goal cooperatively. The system needs an agent platform for efficient and stable interaction between intelligent agents. In this paper we propose a flexible and scalable agent platform by composing the containers with multiple hierarchical agent groups. It also allows efficient implementation of multiple domain presentations of the agents unlike JADE. The proposed platform provides both group management and individual management of agents for efficiency. The platform has been implemented and tested, and it can be used as a flexible foundation of the dynamic multi-agent system targeting seamless delivery of ubiquitous services.

Keywords: Agent platform, container, multi-agent system, services, ubiquitous computing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1939
8601 NSBS: Design of a Network Storage Backup System

Authors: Xinyan Zhang, Zhipeng Tan, Shan Fan

Abstract:

The first layer of defense against data loss is the backup data. This paper implements an agent-based network backup system used the backup, server-storage and server-backup agent these tripartite construction, and the snapshot and hierarchical index are used in the NSBS. It realizes the control command and data flow separation, balances the system load, thereby improving efficiency of the system backup and recovery. The test results show the agent-based network backup system can effectively improve the task-based concurrency, reasonably allocate network bandwidth, the system backup performance loss costs smaller and improves data recovery efficiency by 20%.

Keywords: Agent, network backup system, three architecture model, NSBS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2191
8600 Teaching Approach and Self-Confidence Effect Model Consistency between Taiwan and Singapore Multi-Group HLM

Authors: PeiWen Liao, Tsung Hau Jen

Abstract:

This study was conducted to explore the effects of two countries model comparison program in Taiwan and Singapore in TIMSS database. The researchers used Multi-Group Hierarchical Linear Modeling techniques to compare the effects of two different country models and we tested our hypotheses on 4,046 Taiwan students and 4,599 Singapore students in 2007 at two levels: the class level and student (individual) level. Design quality is a class level variable. Student level variables are achievement and self-confidence. The results challenge the widely held view that retention has a positive impact on self-confidence. Suggestions for future research are discussed.

Keywords: Teaching approach, self-confidence, achievement, multi-group HLM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1801
8599 Svision: Visual Identification of Scanning and Denial of Service Attacks

Authors: Iosif-Viorel Onut, Bin Zhu, Ali A. Ghorbani

Abstract:

We propose a novel graphical technique (SVision) for intrusion detection, which pictures the network as a community of hosts independently roaming in a 3D space defined by the set of services that they use. The aim of SVision is to graphically cluster the hosts into normal and abnormal ones, highlighting only the ones that are considered as a threat to the network. Our experimental results using DARPA 1999 and 2000 intrusion detection and evaluation datasets show the proposed technique as a good candidate for the detection of various threats of the network such as vertical and horizontal scanning, Denial of Service (DoS), and Distributed DoS (DDoS) attacks.

Keywords: Anomaly Visualization, Network Security, Intrusion Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1669
8598 Corporate Social Responsibility in China Apparel Industry

Authors: Zhao Linfei, Gu Qingliang

Abstract:

China apparel industry, which is deeply embedded in the global production network (GPN), faces the dual pressures of social upgrading and economic upgrading. Based on the survey in Ningbo apparel cluster, the paper shows the state of corporate social responsibility (CSR) in China apparel industry is better than before. And the investigation indicates that the firms who practice CSR actively perform better both socially and economically than those who inactively. The research demonstrates that CSR can be an initial capital rather than cost, and “doing well by doing good" is also existed in labor intensive industry.

Keywords: Global production network, corporate social responsibility, China apparel industry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2883
8597 Mining Genes Relations in Microarray Data Combined with Ontology in Colon Cancer Automated Diagnosis System

Authors: A. Gruzdz, A. Ihnatowicz, J. Siddiqi, B. Akhgar

Abstract:

MATCH project [1] entitle the development of an automatic diagnosis system that aims to support treatment of colon cancer diseases by discovering mutations that occurs to tumour suppressor genes (TSGs) and contributes to the development of cancerous tumours. The constitution of the system is based on a) colon cancer clinical data and b) biological information that will be derived by data mining techniques from genomic and proteomic sources The core mining module will consist of the popular, well tested hybrid feature extraction methods, and new combined algorithms, designed especially for the project. Elements of rough sets, evolutionary computing, cluster analysis, self-organization maps and association rules will be used to discover the annotations between genes, and their influence on tumours [2]-[11]. The methods used to process the data have to address their high complexity, potential inconsistency and problems of dealing with the missing values. They must integrate all the useful information necessary to solve the expert's question. For this purpose, the system has to learn from data, or be able to interactively specify by a domain specialist, the part of the knowledge structure it needs to answer a given query. The program should also take into account the importance/rank of the particular parts of data it analyses, and adjusts the used algorithms accordingly.

Keywords: Bioinformatics, gene expression, ontology, selforganizingmaps.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1941
8596 An HCI Template for Distributed Applications

Authors: Xizhi Li

Abstract:

Both software applications and their development environment are becoming more and more distributed. This trend impacts not only the way software computes, but also how it looks. This article proposes a Human Computer Interface (HCI) template from three representative applications we have developed. These applications include a Multi-Agent System based software, a 3D Internet computer game with distributed game world logic, and a programming language environment used in constructing distributed neural network and its visualizations. HCI concepts that are common to these applications are described in abstract terms in the template. These include off-line presentation of global entities, entities inside a hierarchical namespace, communication and languages, reconfiguration of entity references in a graph, impersonation and access right, etc. We believe the metaphor that underlies an HCI concept as well as the relationships between a bunch of HCI concepts are crucial to the design of software systems and vice versa.

Keywords: HCI, MAS, computer game, programming language

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491
8595 Discovering the Dimension of Abstractness: Structure-Based Model that Learns New Categories and Categorizes on Different Levels of Abstraction

Authors: Georgi I. Petkov, Ivan I. Vankov, Yolina A. Petrova

Abstract:

A structure-based model of category learning and categorization at different levels of abstraction is presented. The model compares different structures and expresses their similarity implicitly in the forms of mappings. Based on this similarity, the model can categorize different targets either as members of categories that it already has or creates new categories. The model is novel using two threshold parameters to evaluate the structural correspondence. If the similarity between two structures exceeds the higher threshold, a new sub-ordinate category is created. Vice versa, if the similarity does not exceed the higher threshold but does the lower one, the model creates a new category on higher level of abstraction.

Keywords: Analogy-making, categorization, learning of categories, abstraction, hierarchical structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 744
8594 Increasing Performance of Autopilot Guided Small Unmanned Helicopter

Authors: Tugrul Oktay, Mehmet Konar, Mustafa Soylak, Firat Sal, Murat Onay, Orhan Kizilkaya

Abstract:

In this paper, autonomous performance of a small manufactured unmanned helicopter is tried to be increased. For this purpose, a small unmanned helicopter is manufactured in Erciyes University, Faculty of Aeronautics and Astronautics. It is called as ZANKA-Heli-I. For performance maximization, autopilot parameters are determined via minimizing a cost function consisting of flight performance parameters such as settling time, rise time, overshoot during trajectory tracking. For this purpose, a stochastic optimization method named as simultaneous perturbation stochastic approximation is benefited. Using this approach, considerable autonomous performance increase (around %23) is obtained.

Keywords: Small helicopters, hierarchical control, stochastic optimization, autonomous performance maximization, autopilots.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596
8593 ICAM-2, A Protein of Antitumor Immune Response in Mekong Giant Catfish (Pangasianodon gigas)

Authors: Jiraporn Rojtinnakorn

Abstract:

ICAM-2 (intercellular adhesion molecule 2) or CD102 (Cluster of Differentiation 102) is type I transmembrane glycoproteins, composing 2-9 immunoglobulin-like C2-type domains. ICAM-2 plays the particular role in immune response and cell surveillance. It is concerned in innate and specific immunity, cell survival signal, apoptosis, and anticancer. EST clone of ICAM-2, from P. gigas blood cell EST libraries, showed high identity to human ICAM-2 (92%) with conserve region of ICAM N-terminal domain and part of Ig superfamily. Gene and protein of ICAM-2 has been founded in mammals. This is the first report of ICAM-2 in fish

Keywords: ICAM-2, CD102, Pangasianodon gigas, antitumor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1788
8592 Size-Reduction Strategies for Iris Codes

Authors: Jutta Hämmerle-Uhl, Georg Penn, Gerhard Pötzelsberger, Andreas Uhl

Abstract:

Iris codes contain bits with different entropy. This work investigates different strategies to reduce the size of iris code templates with the aim of reducing storage requirements and computational demand in the matching process. Besides simple subsampling schemes, also a binary multi-resolution representation as used in the JBIG hierarchical coding mode is assessed. We find that iris code template size can be reduced significantly while maintaining recognition accuracy. Besides, we propose a two-stage identification approach, using small-sized iris code templates in a pre-selection stage, and full resolution templates for final identification, which shows promising recognition behaviour.

Keywords: Iris recognition, compact iris code, fast matching, best bits, pre-selection identification, two-stage identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752
8591 Cluster-Based Multi-Path Routing Algorithm in Wireless Sensor Networks

Authors: Si-Gwan Kim

Abstract:

Small-size and low-power sensors with sensing, signal processing and wireless communication capabilities is suitable for the wireless sensor networks. Due to the limited resources and battery constraints, complex routing algorithms used for the ad-hoc networks cannot be employed in sensor networks. In this paper, we propose node-disjoint multi-path hexagon-based routing algorithms in wireless sensor networks. We suggest the details of the algorithm and compare it with other works. Simulation results show that the proposed scheme achieves better performance in terms of efficiency and message delivery ratio.

Keywords: Clustering, multi-path, routing protocol, sensor network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2425
8590 A Survey of Access Control Schemes in Wireless Sensor Networks

Authors: Youssou Faye, Ibrahima Niang, Thomas Noel

Abstract:

Access control is a critical security service in Wire- less Sensor Networks (WSNs). To prevent malicious nodes from joining the sensor network, access control is required. On one hand, WSN must be able to authorize and grant users the right to access to the network. On the other hand, WSN must organize data collected by sensors in such a way that an unauthorized entity (the adversary) cannot make arbitrary queries. This restricts the network access only to eligible users and sensor nodes, while queries from outsiders will not be answered or forwarded by nodes. In this paper we presentee different access control schemes so as to ?nd out their objectives, provision, communication complexity, limits, etc. Using the node density parameter, we also provide a comparison of these proposed access control algorithms based on the network topology which can be flat or hierarchical.

Keywords: Access Control, Authentication, Key Management, Wireless Sensor Networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2622
8589 A Hybrid P2P Storage Scheme Based on Erasure Coding and Replication

Authors: Usman Mahmood, Khawaja M. U. Suleman

Abstract:

A peer-to-peer storage system has challenges like; peer availability, data protection, churn rate. To address these challenges different redundancy, replacement and repair schemes are used. This paper presents a hybrid scheme of redundancy using replication and erasure coding. We calculate and compare the storage, access, and maintenance costs of our proposed scheme with existing redundancy schemes. For realistic behaviour of peers a trace of live peer-to-peer system is used. The effect of different replication, and repair schemes are also shown. The proposed hybrid scheme performs better than existing double coding hybrid scheme in all metrics and have an improved maintenance cost than hierarchical codes.

Keywords: Erasure Coding, P2P, Redundancy, Replication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1680
8588 Experimental teaching, Perceived usefulness, Ease of use, Learning Interest and Science Achievement of Taiwan 8th Graders in TIMSS 2007 Database

Authors: Pei Wen Liao, Tsung Hau Jen

Abstract:

the data of Taiwanese 8th grader in the 4th cycle of Trends in International Mathematics and Science Study (TIMSS) are analyzed to examine the influence of the science teachers- preference in experimental teaching on the relationships between the affective variables ( the perceived usefulness of science, ease of using science and science learning interest) and the academic achievement in science. After dealing with the missing data, 3711 students and 145 science teacher-s data were analyzed through a Hierarchical Linear Modeling technique. The major objective of this study was to determine the role of the experimental teaching moderates the relationship between perceived usefulness and achievement.

Keywords: TIMSS database, Science achievement, Experimental teaching, Perceived Usefulness, Perceived Ease of Use

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609
8587 Biomechanics Analysis When Delivering Baby

Authors: Kristyanto B.

Abstract:

Plenty of analyses based on Biomechanics were carried out on many jobs in manufactures or services. Now Biomechanics analysis is being applied on mothers who are giving birth. The analysis conducted in terms of normal condition of the birth process without Gyn Bed (Obstetric Bed). The aim of analysis is to study whether it is risky or not when choosing the position of mother’s postures when delivering the baby. This investigation was applied on two positions that generally appear in common birth process. Results will show the analysis of both positions to support the birth process based on the Biomechanics analysis (Ergonomic approaches). 

Keywords: Biomechanics analysis, Birth process, Position of postures analysis, Ergonomic approaches.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2252
8586 Selection Initial modes for Belief K-modes Method

Authors: Sarra Ben Hariz, Zied Elouedi, Khaled Mellouli

Abstract:

The belief K-modes method (BKM) approach is a new clustering technique handling uncertainty in the attribute values of objects in both the cluster construction task and the classification one. Like the standard version of this method, the BKM results depend on the chosen initial modes. So, one selection method of initial modes is developed, in this paper, aiming at improving the performances of the BKM approach. Experiments with several sets of real data show that by considered the developed selection initial modes method, the clustering algorithm produces more accurate results.

Keywords: Clustering, Uncertainty, Belief function theory, Belief K-modes Method, Initial modes selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1779
8585 An Ontology Model for Systems Engineering Derived from ISO/IEC/IEEE 15288: 2015: Systems and Software Engineering - System Life Cycle Processes

Authors: Lan Yang, Kathryn Cormican, Ming Yu

Abstract:

ISO/IEC/IEEE 15288: 2015, Systems and Software Engineering - System Life Cycle Processes is an international standard that provides generic top-level process descriptions to support systems engineering (SE). However, the processes defined in the standard needs improvement to lift integrity and consistency. The goal of this research is to explore the way by building an ontology model for the SE standard to manage the knowledge of SE. The ontology model gives a whole picture of the SE knowledge domain by building connections between SE concepts. Moreover, it creates a hierarchical classification of the concepts to fulfil different requirements of displaying and analysing SE knowledge.

Keywords: Knowledge management, model-based systems engineering, ontology modelling, systems engineering ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1811
8584 An Improved Fast Search Method Using Histogram Features for DNA Sequence Database

Authors: Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi

Abstract:

In this paper, we propose an efficient hierarchical DNA sequence search method to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search method using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.

Keywords: Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1297
8583 Modelling Peer Group Dieting Behaviour

Authors: M. J. Cunha

Abstract:

The aim of this paper is to understand how peers can influence adolescent girls- dieting behaviour and their body image. Departing from imitation and social learning theories, we study whether adolescent girls tend to model their peer group dieting behaviours, thus influencing their body image construction. Our study was conducted through an enquiry applied to a cluster sample of 466 adolescent high school girls in Lisbon city public schools. Our main findings point to an association between girls- and peers- dieting behaviours, thus reinforcing the modelling hypothesis.

Keywords: Modelling, Diet, Body image, Adolescent girls, Peer group.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1724
8582 Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Authors: Reza Nadimi, Fariborz Jolai

Abstract:

This article combines two techniques: data envelopment analysis (DEA) and Factor analysis (FA) to data reduction in decision making units (DMU). Data envelopment analysis (DEA), a popular linear programming technique is useful to rate comparatively operational efficiency of decision making units (DMU) based on their deterministic (not necessarily stochastic) input–output data and factor analysis techniques, have been proposed as data reduction and classification technique, which can be applied in data envelopment analysis (DEA) technique for reduction input – output data. Numerical results reveal that the new approach shows a good consistency in ranking with DEA.

Keywords: Effectiveness, Decision Making, Data EnvelopmentAnalysis, Factor Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2385
8581 A Hybrid GMM/SVM System for Text Independent Speaker Identification

Authors: Rafik Djemili, Mouldi Bedda, Hocine Bourouba

Abstract:

This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.

Keywords: Speaker identification, Gaussian mixture model (GMM), support vector machine (SVM), hybrid GMM/SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2191
8580 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: G. Candel, D. Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: Concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 443