Search results for: Multivariate Data Analysis

13367 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2785

13366 Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu

Abstract:

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.

Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783

13365 Tidal Data Analysis using ANN

Authors: Ritu Vijay, Rekha Govil

Abstract:

The design of a complete expansion that allows for compact representation of certain relevant classes of signals is a central problem in signal processing applications. Achieving such a representation means knowing the signal features for the purpose of denoising, classification, interpolation and forecasting. Multilayer Neural Networks are relatively a new class of techniques that are mathematically proven to approximate any continuous function arbitrarily well. Radial Basis Function Networks, which make use of Gaussian activation function, are also shown to be a universal approximator. In this age of ever-increasing digitization in the storage, processing, analysis and communication of information, there are numerous examples of applications where one needs to construct a continuously defined function or numerical algorithm to approximate, represent and reconstruct the given discrete data of a signal. Many a times one wishes to manipulate the data in a way that requires information not included explicitly in the data, which is done through interpolation and/or extrapolation. Tidal data are a very perfect example of time series and many statistical techniques have been applied for tidal data analysis and representation. ANN is recent addition to such techniques. In the present paper we describe the time series representation capabilities of a special type of ANN- Radial Basis Function networks and present the results of tidal data representation using RBF. Tidal data analysis & representation is one of the important requirements in marine science for forecasting.

Keywords: ANN, RBF, Tidal Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1650

13364 2D Graphical Analysis of Wastewater Influent Capacity Time Series

Authors: Monika Chuchro, Maciej Dwornik

Abstract:

The extraction of meaningful information from image could be an alternative method for time series analysis. In this paper, we propose a graphical analysis of time series grouped into table with adjusted colour scale for numerical values. The advantages of this method are also discussed. The proposed method is easy to understand and is flexible to implement the standard methods of pattern recognition and verification, especially for noisy environmental data.

Keywords: graphical analysis, time series, seasonality, noisy environmental data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1444

13363 Kinematic Analysis of an Assistive Robotic Leg for Hemiplegic and Hemiparetic Patients

Authors: M.R. Safizadeh, M. Hussein, K. F. Samat, M.S. Che Kob, M.S. Yaacob, M.Z. Md Zain

Abstract:

The aim of this paper is to present the kinematic analysis and mechanism design of an assistive robotic leg for hemiplegic and hemiparetic patients. In this work, the priority is to design and develop the lightweight, effective and single driver mechanism on the basis of experimental hip and knee angles- data for walking speed of 1 km/h. A mechanism of cam-follower with three links is suggested for this purpose. The kinematic analysis is carried out and analysed using commercialized MATLAB software based on the prototype-s links sizes and kinematic relationships. In order to verify the kinematic analysis of the prototype, kinematic analysis data are compared with the experimental data. A good agreement between them proves that the anthropomorphic design of the lower extremity exoskeleton follows the human walking gait.

Keywords: Kinematic analysis, assistive robotic leg, lower extremity exoskeleton, cam-follower mechanism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1902

13362 Application of Data Envelopment Analysis to Assess Quality Management Efficiency

Authors: Chuen Tse Kuah, Kuan Yew Wong, Farzad Behrouzi

Abstract:

This paper is aimed to give an illustration on the application of Data Envelopment Analysis (DEA) as a tool to assess Quality Management (QM) efficiency. A variant of DEA, slack based measure (SBM) is used for this purpose. From this study, it is found that DEA is suitable to measure QM efficiency and give improvement suggestions to the inefficient QM.

Keywords: Quality Management, Data Envelopment Analysis, Slack Based Measure, Efficiency Measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082

13361 Statistical Analysis of Interferon-γ for the Effectiveness of an Anti-Tuberculous Treatment

Authors: Shishen Xie, Yingda L. Xie

Abstract:

Tuberculosis (TB) is a potentially serious infectious disease that remains a health concern. The Interferon Gamma Release Assay (IGRA) is a blood test to find out if an individual is tuberculous positive or negative. This study applies statistical analysis to the clinical data of interferon-gamma levels of seventy-three subjects who diagnosed pulmonary TB in an anti-tuberculous treatment. Data analysis is performed to determine if there is a significant decline in interferon-gamma levels for the subjects during a period of six months, and to infer if the anti-tuberculous treatment is effective.

Keywords: Data analysis, interferon gamma release assay, statistical methods, tuberculosis infection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1950

13360 The Non-Stationary BINARMA(1,1) Process with Poisson Innovations: An Application on Accident Data

Authors: Y. Sunecher, N. Mamode Khan, V. Jowaheer

Abstract:

This paper considers the modelling of a non-stationary bivariate integer-valued autoregressive moving average of order one (BINARMA(1,1)) with correlated Poisson innovations. The BINARMA(1,1) model is specified using the binomial thinning operator and by assuming that the cross-correlation between the two series is induced by the innovation terms only. Based on these assumptions, the non-stationary marginal and joint moments of the BINARMA(1,1) are derived iteratively by using some initial stationary moments. As regards to the estimation of parameters of the proposed model, the conditional maximum likelihood (CML) estimation method is derived based on thinning and convolution properties. The forecasting equations of the BINARMA(1,1) model are also derived. A simulation study is also proposed where BINARMA(1,1) count data are generated using a multivariate Poisson R code for the innovation terms. The performance of the BINARMA(1,1) model is then assessed through a simulation experiment and the mean estimates of the model parameters obtained are all efficient, based on their standard errors. The proposed model is then used to analyse a real-life accident data on the motorway in Mauritius, based on some covariates: policemen, daily patrol, speed cameras, traffic lights and roundabouts. The BINARMA(1,1) model is applied on the accident data and the CML estimates clearly indicate a significant impact of the covariates on the number of accidents on the motorway in Mauritius. The forecasting equations also provide reliable one-step ahead forecasts.

Keywords: Non-stationary, BINARMA(1, 1) model, Poisson Innovations, CML

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 582

13359 Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1834

13358 Characteristics and Outcomes of COVID-19 Related Stroke: A Cohort Study

Authors: Kasra Afsahi, Maryam Soheilifar

Abstract:

Cerebrovascular accident (CVA) is a neurological side effect of COVID-19 disease wit high rate in pandemics. Effect of COVID-19 disease on disorder is unclear. In this cohort, patients with COVID-19 disease were assessed. 60 CVA cases were assessed in a referral hospital in 2020. The major factor was mortality and the cases were those with and without death. The groups were compared for all features about mortality in the patients with COVID-19 and CVA. Totally 23 out of 60 cases (38.3%) were expired. In univariate analysis there was significant association for death by ischemic heart disease (P = 0.015), high-severity stroke (P = 0.012), high C-reactive protein (CRP) (P = 0.001), high ESR (P = 0.009), pleural effusion (P = 0.005), pericardial effusion (P = 0.027), cardiomegaly (P = 0.005), ground glass opacity (P = 0.001), and consolidation (P = 0.001). Among these factors, there was significant association only for CRP (P = 0.001) and consolidation (P = 0.003) in multivariate analysis. Mortality in the cases with COVID-19-related CVA is one-third and it has relationship to elevated CRP and finding the consolidation in the computerized tomography scan of the lungs.

Keywords: COVID-19, stroke, prognosis, C-reactive protein, CRP.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 338

13357 Spatial Econometric Approaches for Count Data: An Overview and New Directions

Authors: Paula Simões, Isabel Natário

Abstract:

This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.

Keywords: Spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2697

13356 Multinomial Dirichlet Gaussian Process Model for Classification of Multidimensional Data

Authors: Wanhyun Cho, Soonja Kang, Sangkyoon Kim, Soonyoung Park

Abstract:

We present probabilistic multinomial Dirichlet classification model for multidimensional data and Gaussian process priors. Here, we have considered efficient computational method that can be used to obtain the approximate posteriors for latent variables and parameters needed to define the multiclass Gaussian process classification model. We first investigated the process of inducing a posterior distribution for various parameters and latent function by using the variational Bayesian approximations and important sampling method, and next we derived a predictive distribution of latent function needed to classify new samples. The proposed model is applied to classify the synthetic multivariate dataset in order to verify the performance of our model. Experiment result shows that our model is more accurate than the other approximation methods.

Keywords: Multinomial dirichlet classification model, Gaussian process priors, variational Bayesian approximation, Importance sampling, approximate posterior distribution, Marginal likelihood evidence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608

13355 Weigh-in-Motion Data Analysis Software for Developing Traffic Data for Mechanistic Empirical Pavement Design

Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder

Abstract:

Currently, there are few user friendly Weigh-in- Motion (WIM) data analysis softwares available which can produce traffic input data for the recently developed AASHTOWare pavement Mechanistic-Empirical (ME) design software. However, these softwares have only rudimentary Quality Control (QC) processes. Therefore, they cannot properly deal with erroneous WIM data. As the pavement performance is highly sensible to the quality of WIM data, it is highly recommended to use more refined QC process on raw WIM data to get a good result. This study develops a userfriendly software, which can produce traffic input for the ME design software. This software takes the raw data (Class and Weight data) collected from the WIM station and processes it with a sophisticated QC procedure. Traffic data such as traffic volume, traffic distribution, axle load spectra, etc. can be obtained from this software; which can directly be used in the ME design software.

Keywords: Weigh-in-motion, software, axle load spectra, traffic distribution, AASHTOWare.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892

13354 Characterization of Atmospheric Particulate Matter using PIXE Technique

Authors: P.Kothai, P. Prathibha, I.V.Saradhi, G.G. Pandit, V.D. Puranik

Abstract:

Coarse and fine particulate matter were collected at a residential area at Vashi, Navi Mumbai and the filter samples were analysed for trace elements using PIXE technique. The trend of particulate matter showed higher concentrations during winter than the summer and monsoon concentration levels. High concentrations of elements related to soil and sea salt were found in PM10 and PM2.5. Also high levels of zinc and sulphur found in the particulates of both the size fractions. EF analysis showed enrichment of Cu, Cr and Mn only in the fine fraction suggesting their origin from anthropogenic sources. The EF value was observed to be maximum for As, Pb and Zn in the fine particulates. However, crustal derived elements showed very low EF values indicating their origin from soil. The PCA based multivariate studies identified soil, sea salt, combustion and Se sources as common sources for coarse and additionally an industrial source has also been identified for fine particles.

Keywords: EF analysis, PM10, PM2.5, PIXE, PCA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1848

13353 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1653

13352 Effects of Energy Consumption on Indoor Air Quality

Authors: M. Raatikainen, J-P. Skön, M. Johansson, K. Leiviskä, M. Kolehmainen

Abstract:

Continuous measurements and multivariate methods are applied in researching the effects of energy consumption on indoor air quality (IAQ) in a Finnish one-family house. Measured data used in this study was collected continuously in a house in Kuopio, Eastern Finland, during fourteen months long period. Consumption parameters measured were the consumptions of district heat, electricity and water. Indoor parameters gathered were temperature, relative humidity (RH), the concentrations of carbon dioxide (CO2) and carbon monoxide (CO) and differential air pressure. In this study, self-organizing map (SOM) and Sammon's mapping were applied to resolve the effects of energy consumption on indoor air quality. Namely, the SOM was qualified as a suitable method having a property to summarize the multivariable dependencies into easily observable two-dimensional map. Accompanying that, the Sammon's mapping method was used to cluster pre-processed data to find similarities of the variables, expressing distances and groups in the data. The methods used were able to distinguish 7 different clusters characterizing indoor air quality and energy efficiency in the study house. The results indicate, that the cost implications in euros of heating and electricity energy vary according to the differential pressure, concentration of carbon dioxide, temperature and season.

Keywords: Indoor air quality, Energy efficiency, Self- organizing map, Sammon's mapping

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871

13351 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights

Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan

Abstract:

The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyse huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic wellbeing is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that support the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.

Keywords: COVID-19, big data, data analysis, indexing, NoSQL, sharding, scalability, poverty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 52

13350 Steganalysis of Data Hiding via Halftoning and Coordinate Projection

Authors: Woong Hee Kim, Ilhwan Park

Abstract:

Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.

Keywords: Steganography, steganalysis, digital halftoning, data hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593

13349 Profitability Assessment of Granite Aggregate Production and the Development of a Profit Assessment Model

Authors: Melodi Mbuyi Mata, Blessing Olamide Taiwo, Afolabi Ayodele David

Abstract:

The purpose of this research is to create empirical models for assessing the profitability of granite aggregate production in Akure, Ondo state aggregate quarries. In addition, an Artificial Neural Network (ANN) model and multivariate predicting models for granite profitability were developed in the study. A formal survey questionnaire was used to collect data for the study. The data extracted from the case study mine for this study include granite marketing operations, royalty, production costs, and mine production information. The following methods were used to achieve the goal of this study: descriptive statistics, MATLAB 2017, and SPSS16.0 software in analyzing and modeling the data collected from granite traders in the study areas. The ANN and Multi Variant Regression models' prediction accuracy was compared using a coefficient of determination (R2), Root Mean Square Error (RMSE), and mean square error (MSE). Due to the high prediction error, the model evaluation indices revealed that the ANN model was suitable for predicting generated profit in a typical quarry. More quarries in Nigeria's southwest region and other geopolitical zones should be considered to improve ANN prediction accuracy.

Keywords: National development, granite, profitability assessment, ANN models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 68

13348 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data

Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon

Abstract:

Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.

Keywords: Ant colony system, biological data, clustering, DNA chip.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1968

13347 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang

Abstract:

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: Text mining, Twitter, topic model, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1800

13346 Comparing Data Analysis, Communication and Information Technologies Expertise Levels in Undergraduate Psychology Students

Authors: Ana Cázares

Abstract:

Aims for this study: first, to compare the expertise level in data analysis, communication and information technologies in undergraduate psychology students. Second, to verify the factor structure of E-ETICA (Escala de Experticia en Tecnologias de la Informacion, la Comunicacion y el Análisis or Data Analysis, Communication and Information'Expertise Scale) which had shown an excellent internal consistency (α= 0.92) as well as a simple factor structure. Three factors, Complex, Basic Information and Communications Technologies and E-Searching and Download Abilities, explains 63% of variance. In the present study, 260 students (119 juniors and 141 seniors) were asked to respond to ETICA (16 items Likert scale of five points 1: null domain to 5: total domain). The results show that both junior and senior students report having very similar expertise level; however, E-ETICA presents a different factor structure for juniors and four factors explained also 63% of variance: Information E-Searching, Download and Process; Data analysis; Organization; and Communication technologies.

Keywords: Data analysis, Information, Communications Technologies, Expertise'Levels.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1280

13345 Thailand National Biodiversity Database System with webMathematica and Google Earth

Authors: W. Katsarapong, W. Srisang, K. Jaroensutasinee, M. Jaroensutasinee

Abstract:

National Biodiversity Database System (NBIDS) has been developed for collecting Thai biodiversity data. The goal of this project is to provide advanced tools for querying, analyzing, modeling, and visualizing patterns of species distribution for researchers and scientists. NBIDS data record two types of datasets: biodiversity data and environmental data. Biodiversity data are specie presence data and species status. The attributes of biodiversity data can be further classified into two groups: universal and projectspecific attributes. Universal attributes are attributes that are common to all of the records, e.g. X/Y coordinates, year, and collector name. Project-specific attributes are attributes that are unique to one or a few projects, e.g., flowering stage. Environmental data include atmospheric data, hydrology data, soil data, and land cover data collecting by using GLOBE protocols. We have developed webbased tools for data entry. Google Earth KML and ArcGIS were used as tools for map visualization. webMathematica was used for simple data visualization and also for advanced data analysis and visualization, e.g., spatial interpolation, and statistical analysis. NBIDS will be used by park rangers at Khao Nan National Park, and researchers.

Keywords: GLOBE protocol, Biodiversity, Database System, ArcGIS, Google Earth and webMathematica.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1974

13344 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489

13343 Analysis of Urban Population Using Twitter Distribution Data: Case Study of Makassar City, Indonesia

Authors: Yuyun Wabula, B. J. Dewancker

Abstract:

In the past decade, the social networking app has been growing very rapidly. Geolocation data is one of the important features of social media that can attach the user's location coordinate in the real world. This paper proposes the use of geolocation data from the Twitter social media application to gain knowledge about urban dynamics, especially on human mobility behavior. This paper aims to explore the relation between geolocation Twitter with the existence of people in the urban area. Firstly, the study will analyze the spread of people in the particular area, within the city using Twitter social media data. Secondly, we then match and categorize the existing place based on the same individuals visiting. Then, we combine the Twitter data from the tracking result and the questionnaire data to catch the Twitter user profile. To do that, we used the distribution frequency analysis to learn the visitors’ percentage. To validate the hypothesis, we compare it with the local population statistic data and land use mapping released by the city planning department of Makassar local government. The results show that there is the correlation between Twitter geolocation and questionnaire data. Thus, integration the Twitter data and survey data can reveal the profile of the social media users.

Keywords: Geolocation, Twitter, distribution analysis, human mobility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1182

13342 A Quantitative Tool for Analyze Process Design

Authors: Andrés Carrión García, Aura López de Murillo, José Jabaloyes Vivas, Angela Grisales del Río

Abstract:

Some quality control tools use non metric subjective information coming from experts, who qualify the intensity of relations existing inside processes, but without quantifying them. In this paper we have developed a quality control analytic tool, measuring the impact or strength of the relationship between process operations and product characteristics. The tool includes two models: a qualitative model, allowing relationships description and analysis; and a formal quantitative model, by means of which relationship quantification is achieved. In the first one, concepts from the Graphs Theory were applied to identify those process elements which can be sources of variation, that is, those quality characteristics or operations that have some sort of prelacy over the others and that should become control items. Also the most dependent elements can be identified, that is those elements receiving the effects of elements identified as variation sources. If controls are focused in those dependent elements, efficiency of control is compromised by the fact that we are controlling effects, not causes. The second model applied adapts the multivariate statistical technique of Covariance Structural Analysis. This approach allowed us to quantify the relationships. The computer package LISREL was used to obtain statistics and to validate the model.

Keywords: Characteristics matrix, covariance structure analysis, LISREL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591

13341 Rethinking the Analysis of Means-End Chain Data in Marketing Research

Authors: P. Puustinen, A. Kanto

Abstract:

This paper proposes a new procedure for analyzing means-end chain data in marketing research. Most commonly the collected data is summarized in the Hierarchical Value Map (HVM) illustrating the main attribute-consequence-value linkages. This paper argues that traditionally constructed HVM may give an erroneous impression of the results of a means-end study. To justify the arguments, an alternative procedure to (1) determine the dominant attribute-consequence-value linkages and (2) construct HVM in a precise manner is presented. The current approach makes a contribution to means-end analysis, allowing marketers to address a set of marketing problems, such as advertising strategy.

Keywords: Means-end chain analysis, Laddering, Hierarchical Value Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2784

13340 MONARC: A Case Study on Simulation Analysis for LHC Activities

Authors: Ciprian Dobre

Abstract:

The scale, complexity and worldwide geographical spread of the LHC computing and data analysis problems are unprecedented in scientific research. The complexity of processing and accessing this data is increased substantially by the size and global span of the major experiments, combined with the limited wide area network bandwidth available. We present the latest generation of the MONARC (MOdels of Networked Analysis at Regional Centers) simulation framework, as a design and modeling tool for large scale distributed systems applied to HEP experiments. We present simulation experiments designed to evaluate the capabilities of the current real-world distributed infrastructure to support existing physics analysis processes and the means by which the experiments bands together to meet the technical challenges posed by the storage, access and computing requirements of LHC data analysis within the CMS experiment.

Keywords: Modeling and simulation, evaluation, large scale distributed systems, LHC experiments, CMS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1806

13339 The Comparison of Parental Childrearing Styles and Anxiety in Children with Stuttering and Normal Population

Authors: Pegah Farokhzad

Abstract:

Family has a crucial role in maintaining the physical, social and mental health of the children. Most of the mental and anxiety problems of children reflect the complex interpersonal situations among family members, especially parents. In other words, anxiety problems of the children are correlated with deficit relationships of family members and improper childrearing styles. The parental child rearing styles leads to positive and negative consequences which affect the children’s mental health. Therefore, the present research was aimed to compare the parental childrearing styles and anxiety of children with stuttering and normal population. It was also aimed to study the relationship between parental child rearing styles and anxiety of children. The research sample included 54 boys with stuttering and 54 normal boys who were selected from the children (boys) of Tehran, Iran in the age range of 5 to 8 years in 2013. In order to collect data, Baum-rind Childrearing Styles Inventory and Spence Parental Anxiety Inventory were used. Appropriate descriptive statistical methods and multivariate variance analysis and t test for independent groups were used to test the study hypotheses. Statistical data analyses demonstrated that there was a significant difference between stuttering boys and normal boys in anxiety (t = 7.601, p< 0.01); but there was no significant difference between stuttering boys and normal boys in parental childrearing styles (F = 0.129). There was also not found significant relationship between parental childrearing styles and children anxiety (F = 0.135, p< 0.05). It can be concluded that the influential factors of children’s society are parents, school, teachers, peers and media. So, parental childrearing styles are not the only influential factors on anxiety of children, and other factors including genetic, environment and child experiences are effective in anxiety as well. Details are discussed.

Keywords: Anxiety, Childrearing Styles, Stuttering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3068

13338 Normalization Discriminant Independent Component Analysis

Authors: Liew Yee Ping, Pang Ying Han, Lau Siong Hoe, Ooi Shih Yin, Housam Khalifa Bashier Babiker

Abstract:

In face recognition, feature extraction techniques attempts to search for appropriate representation of the data. However, when the feature dimension is larger than the samples size, it brings performance degradation. Hence, we propose a method called Normalization Discriminant Independent Component Analysis (NDICA). The input data will be regularized to obtain the most reliable features from the data and processed using Independent Component Analysis (ICA). The proposed method is evaluated on three face databases, Olivetti Research Ltd (ORL), Face Recognition Technology (FERET) and Face Recognition Grand Challenge (FRGC). NDICA showed it effectiveness compared with other unsupervised and supervised techniques.

Keywords: Face recognition, small sample size, regularization, independent component analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1948