Search results for: data analysis
13492 Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process
Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek
Abstract:
Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.
Keywords: Die-Map Clustering, Feature Extraction, Pattern Recognition, Semiconductor Manufacturing Process.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 315113491 Multidimensional Visualization Tools for Analysis of Expression Data
Authors: Urska Cvek, Marjan Trutschl, Randolph Stone II, Zanobia Syed, John L. Clifford, Anita L. Sabichi
Abstract:
Expression data analysis is based mostly on the statistical approaches that are indispensable for the study of biological systems. Large amounts of multidimensional data resulting from the high-throughput technologies are not completely served by biostatistical techniques and are usually complemented with visual, knowledge discovery and other computational tools. In many cases, in biological systems we only speculate on the processes that are causing the changes, and it is the visual explorative analysis of data during which a hypothesis is formed. We would like to show the usability of multidimensional visualization tools and promote their use in life sciences. We survey and show some of the multidimensional visualization tools in the process of data exploration, such as parallel coordinates and radviz and we extend them by combining them with the self-organizing map algorithm. We use a time course data set of transitional cell carcinoma of the bladder in our examples. Analysis of data with these tools has the potential to uncover additional relationships and non-trivial structures.Keywords: microarrays, visualization, parallel coordinates, radviz, self-organizing maps.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 250813490 Landscape Data Transformation: Categorical Descriptions to Numerical Descriptors
Authors: Dennis A. Apuan
Abstract:
Categorical data based on description of the agricultural landscape imposed some mathematical and analytical limitations. This problem however can be overcome by data transformation through coding scheme and the use of non-parametric multivariate approach. The present study describes data transformation from qualitative to numerical descriptors. In a collection of 103 random soil samples over a 60 hectare field, categorical data were obtained from the following variables: levels of nitrogen, phosphorus, potassium, pH, hue, chroma, value and data on topography, vegetation type, and the presence of rocks. Categorical data were coded, and Spearman-s rho correlation was then calculated using PAST software ver. 1.78 in which Principal Component Analysis was based. Results revealed successful data transformation, generating 1030 quantitative descriptors. Visualization based on the new set of descriptors showed clear differences among sites, and amount of variation was successfully measured. Possible applications of data transformation are discussed.Keywords: data transformation, numerical descriptors, principalcomponent analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 150513489 A Novel Web Metric for the Evaluation of Internet Trends
Authors: Radek Malinský, Ivan Jelínek
Abstract:
Web 2.0 (social networking, blogging and online forums) can serve as a data source for social science research because it contains vast amount of information from many different users. The volume of that information has been growing at a very high rate and becoming a network of heterogeneous data; this makes things difficult to find and is therefore not almost useful. We have proposed a novel theoretical model for gathering and processing data from Web 2.0, which would reflect semantic content of web pages in better way. This article deals with the analysis part of the model and its usage for content analysis of blogs. The introductory part of the article describes methodology for the gathering and processing data from blogs. The next part of the article is focused on the evaluation and content analysis of blogs, which write about specific trend.Keywords: Blog, Sentiment Analysis, Web 2.0, Webometrics
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 354413488 Comparative Analysis of the Public Funding for Greek Universities: An Ordinal DEA/MCDM Approach
Authors: Yiannis Smirlis, Dimitris K. Despotis
Abstract:
This study performs a comparative analysis of the 21 Greek Universities in terms of their public funding, awarded for covering their operating expenditure. First it introduces a DEA/MCDM model that allocates the fund into four expenditure factors in the most favorable way for each university. Then, it presents a common, consensual assessment model to reallocate the amounts, remaining in the same level of total public budget. From the analysis it derives that a number of universities cannot justify the public funding in terms of their size and operational workload. For them, the sufficient reduction of their public funding amount is estimated as a future target. Due to the lack of precise data for a number of expenditure criteria, the analysis is based on a mixed crisp-ordinal data set.Keywords: Data envelopment analysis, Greek universities, operating expenditures, ordinal data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 176713487 Comparison of Imputation Techniques for Efficient Prediction of Software Fault Proneness in Classes
Authors: Geeta Sikka, Arvinder Kaur Takkar, Moin Uddin
Abstract:
Missing data is a persistent problem in almost all areas of empirical research. The missing data must be treated very carefully, as data plays a fundamental role in every analysis. Improper treatment can distort the analysis or generate biased results. In this paper, we compare and contrast various imputation techniques on missing data sets and make an empirical evaluation of these methods so as to construct quality software models. Our empirical study is based on NASA-s two public dataset. KC4 and KC1. The actual data sets of 125 cases and 2107 cases respectively, without any missing values were considered. The data set is used to create Missing at Random (MAR) data Listwise Deletion(LD), Mean Substitution(MS), Interpolation, Regression with an error term and Expectation-Maximization (EM) approaches were used to compare the effects of the various techniques.Keywords: Missing data, Imputation, Missing Data Techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166813486 Estimating the Life-Distribution Parameters of Weibull-Life PV Systems Utilizing Non-Parametric Analysis
Authors: Saleem Z. Ramadan
Abstract:
In this paper, a model is proposed to determine the life distribution parameters of the useful life region for the PV system utilizing a combination of non-parametric and linear regression analysis for the failure data of these systems. Results showed that this method is dependable for analyzing failure time data for such reliable systems when the data is scarce.Keywords: Masking, Bathtub model, reliability, non-parametric analysis, useful life.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 184713485 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement
Authors: Wang Lin, Li Zhiqiang
Abstract:
The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.Keywords: Behavior pattern, cooperative learning, data analyze, K-means clustering algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 81513484 What the Future Holds for Social Media Data Analysis
Authors: P. Wlodarczak, J. Soar, M. Ally
Abstract:
The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.
Keywords: Social Media, text mining, knowledge discovery, predictive analysis, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 384913483 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering
Authors: Yunus Doğan, Ahmet Durap
Abstract:
Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.
Keywords: Clustering algorithms, coastal engineering, data mining, data summarization, statistical methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 124413482 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Keywords: Sentiment Analysis, Social Media, Twitter, Amazon, Data Mining, Machine Learning, Text Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 351813481 Data Quality Enhancement with String Length Distribution
Authors: Qi Xiu, Hiromu Hota, Yohsuke Ishii, Takuya Oda
Abstract:
Recently, collectable manufacturing data are rapidly increasing. On the other hand, mega recall is getting serious as a social problem. Under such circumstances, there are increasing needs for preventing mega recalls by defect analysis such as root cause analysis and abnormal detection utilizing manufacturing data. However, the time to classify strings in manufacturing data by traditional method is too long to meet requirement of quick defect analysis. Therefore, we present String Length Distribution Classification method (SLDC) to correctly classify strings in a short time. This method learns character features, especially string length distribution from Product ID, Machine ID in BOM and asset list. By applying the proposal to strings in actual manufacturing data, we verified that the classification time of strings can be reduced by 80%. As a result, it can be estimated that the requirement of quick defect analysis can be fulfilled.Keywords: Data quality, feature selection, probability distribution, string classification, string length.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 132813480 Detection Efficient Enterprises via Data Envelopment Analysis
Authors: S. Turkan
Abstract:
In this paper, the Turkey’s Top 500 Industrial Enterprises data in 2014 were analyzed by data envelopment analysis. Data envelopment analysis is used to detect efficient decision-making units such as universities, hospitals, schools etc. by using inputs and outputs. The decision-making units in this study are enterprises. To detect efficient enterprises, some financial ratios are determined as inputs and outputs. For this reason, financial indicators related to productivity of enterprises are considered. The efficient foreign weighted owned capital enterprises are detected via super efficiency model. According to the results, it is said that Mercedes-Benz is the most efficient foreign weighted owned capital enterprise in Turkey.Keywords: Data envelopment analysis, super efficiency, financial ratios, BCC model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 87613479 Revisiting the Concept of Risk Analysis within the Context of Geospatial Database Design: A Collaborative Framework
Authors: J. Grira, Y. Bédard, S. Roche
Abstract:
The aim of this research is to design a collaborative framework that integrates risk analysis activities into the geospatial database design (GDD) process. Risk analysis is rarely undertaken iteratively as part of the present GDD methods in conformance to requirement engineering (RE) guidelines and risk standards. Accordingly, when risk analysis is performed during the GDD, some foreseeable risks may be overlooked and not reach the output specifications especially when user intentions are not systematically collected. This may lead to ill-defined requirements and ultimately in higher risks of geospatial data misuse. The adopted approach consists of 1) reviewing risk analysis process within the scope of RE and GDD, 2) analyzing the challenges of risk analysis within the context of GDD, and 3) presenting the components of a risk-based collaborative framework that improves the collection of the intended/forbidden usages of the data and helps geo-IT experts to discover implicit requirements and risks.Keywords: Collaborative risk analysis, intention of use, Geospatial database design, Geospatial data misuse.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166613478 Morphometric Analysis of Tor tambroides by Stepwise Discriminant and Neural Network Analysis
Authors: M. Pollar, M. Jaroensutasinee, K. Jaroensutasinee
Abstract:
The population structure of the Tor tambroides was investigated with morphometric data (i.e. morphormetric measurement and truss measurement). A morphometric analysis was conducted to compare specimens from three waterfalls: Sunanta, Nan Chong Fa and Wang Muang waterfalls at Khao Nan National Park, Nakhon Si Thammarat, Southern Thailand. The results of stepwise discriminant analysis on seven morphometric variables and 21 truss variables per individual were the same as from a neural network. Fish from three waterfalls were separated into three groups based on their morphometric measurements. The morphometric data shows that the nerual network model performed better than the stepwise discriminant analysis.Keywords: Morphometric, Tor tambroides, Stepwise Discriminant Analysis , Neural Network Analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 215013477 Imputation Technique for Feature Selection in Microarray Data Set
Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam
Abstract:
Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.
Keywords: DNA microarray, feature selection, missing data, bioinformatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 279513476 Model Discovery and Validation for the Qsar Problem using Association Rule Mining
Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu
Abstract:
There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178813475 Tidal Data Analysis using ANN
Authors: Ritu Vijay, Rekha Govil
Abstract:
The design of a complete expansion that allows for compact representation of certain relevant classes of signals is a central problem in signal processing applications. Achieving such a representation means knowing the signal features for the purpose of denoising, classification, interpolation and forecasting. Multilayer Neural Networks are relatively a new class of techniques that are mathematically proven to approximate any continuous function arbitrarily well. Radial Basis Function Networks, which make use of Gaussian activation function, are also shown to be a universal approximator. In this age of ever-increasing digitization in the storage, processing, analysis and communication of information, there are numerous examples of applications where one needs to construct a continuously defined function or numerical algorithm to approximate, represent and reconstruct the given discrete data of a signal. Many a times one wishes to manipulate the data in a way that requires information not included explicitly in the data, which is done through interpolation and/or extrapolation. Tidal data are a very perfect example of time series and many statistical techniques have been applied for tidal data analysis and representation. ANN is recent addition to such techniques. In the present paper we describe the time series representation capabilities of a special type of ANN- Radial Basis Function networks and present the results of tidal data representation using RBF. Tidal data analysis & representation is one of the important requirements in marine science for forecasting.Keywords: ANN, RBF, Tidal Data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 165613474 2D Graphical Analysis of Wastewater Influent Capacity Time Series
Authors: Monika Chuchro, Maciej Dwornik
Abstract:
The extraction of meaningful information from image could be an alternative method for time series analysis. In this paper, we propose a graphical analysis of time series grouped into table with adjusted colour scale for numerical values. The advantages of this method are also discussed. The proposed method is easy to understand and is flexible to implement the standard methods of pattern recognition and verification, especially for noisy environmental data.Keywords: graphical analysis, time series, seasonality, noisy environmental data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145013473 Kinematic Analysis of an Assistive Robotic Leg for Hemiplegic and Hemiparetic Patients
Authors: M.R. Safizadeh, M. Hussein, K. F. Samat, M.S. Che Kob, M.S. Yaacob, M.Z. Md Zain
Abstract:
The aim of this paper is to present the kinematic analysis and mechanism design of an assistive robotic leg for hemiplegic and hemiparetic patients. In this work, the priority is to design and develop the lightweight, effective and single driver mechanism on the basis of experimental hip and knee angles- data for walking speed of 1 km/h. A mechanism of cam-follower with three links is suggested for this purpose. The kinematic analysis is carried out and analysed using commercialized MATLAB software based on the prototype-s links sizes and kinematic relationships. In order to verify the kinematic analysis of the prototype, kinematic analysis data are compared with the experimental data. A good agreement between them proves that the anthropomorphic design of the lower extremity exoskeleton follows the human walking gait.Keywords: Kinematic analysis, assistive robotic leg, lower extremity exoskeleton, cam-follower mechanism.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 190613472 Application of Data Envelopment Analysis to Assess Quality Management Efficiency
Authors: Chuen Tse Kuah, Kuan Yew Wong, Farzad Behrouzi
Abstract:
This paper is aimed to give an illustration on the application of Data Envelopment Analysis (DEA) as a tool to assess Quality Management (QM) efficiency. A variant of DEA, slack based measure (SBM) is used for this purpose. From this study, it is found that DEA is suitable to measure QM efficiency and give improvement suggestions to the inefficient QM.Keywords: Quality Management, Data Envelopment Analysis, Slack Based Measure, Efficiency Measurement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 209013471 Statistical Analysis of Interferon-γ for the Effectiveness of an Anti-Tuberculous Treatment
Authors: Shishen Xie, Yingda L. Xie
Abstract:
Tuberculosis (TB) is a potentially serious infectious disease that remains a health concern. The Interferon Gamma Release Assay (IGRA) is a blood test to find out if an individual is tuberculous positive or negative. This study applies statistical analysis to the clinical data of interferon-gamma levels of seventy-three subjects who diagnosed pulmonary TB in an anti-tuberculous treatment. Data analysis is performed to determine if there is a significant decline in interferon-gamma levels for the subjects during a period of six months, and to infer if the anti-tuberculous treatment is effective.Keywords: Data analysis, interferon gamma release assay, statistical methods, tuberculosis infection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 195613470 Analysis of Diverse Cluster Ensemble Techniques
Authors: S. Sarumathi, N. Shanthi, P. Ranjetha
Abstract:
Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 184113469 Spatial Econometric Approaches for Count Data: An Overview and New Directions
Authors: Paula Simões, Isabel Natário
Abstract:
This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.Keywords: Spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 270513468 Weigh-in-Motion Data Analysis Software for Developing Traffic Data for Mechanistic Empirical Pavement Design
Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder
Abstract:
Currently, there are few user friendly Weigh-in- Motion (WIM) data analysis softwares available which can produce traffic input data for the recently developed AASHTOWare pavement Mechanistic-Empirical (ME) design software. However, these softwares have only rudimentary Quality Control (QC) processes. Therefore, they cannot properly deal with erroneous WIM data. As the pavement performance is highly sensible to the quality of WIM data, it is highly recommended to use more refined QC process on raw WIM data to get a good result. This study develops a userfriendly software, which can produce traffic input for the ME design software. This software takes the raw data (Class and Weight data) collected from the WIM station and processes it with a sophisticated QC procedure. Traffic data such as traffic volume, traffic distribution, axle load spectra, etc. can be obtained from this software; which can directly be used in the ME design software.Keywords: Weigh-in-motion, software, axle load spectra, traffic distribution, AASHTOWare.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 189613467 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques
Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel
Abstract:
Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.
Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166713466 Steganalysis of Data Hiding via Halftoning and Coordinate Projection
Authors: Woong Hee Kim, Ilhwan Park
Abstract:
Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.Keywords: Steganography, steganalysis, digital halftoning, data hiding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 160013465 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights
Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan
Abstract:
The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyse huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic wellbeing is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that support the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.
Keywords: COVID-19, big data, data analysis, indexing, NoSQL, sharding, scalability, poverty.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7013464 An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data
Authors: Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon-kyung Lee, Hyejung Yoon
Abstract:
Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Keywords: Ant colony system, biological data, clustering, DNA chip.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 197413463 Comparing Data Analysis, Communication and Information Technologies Expertise Levels in Undergraduate Psychology Students
Authors: Ana Cázares
Abstract:
Aims for this study: first, to compare the expertise level in data analysis, communication and information technologies in undergraduate psychology students. Second, to verify the factor structure of E-ETICA (Escala de Experticia en Tecnologias de la Informacion, la Comunicacion y el Análisis or Data Analysis, Communication and Information'Expertise Scale) which had shown an excellent internal consistency (α= 0.92) as well as a simple factor structure. Three factors, Complex, Basic Information and Communications Technologies and E-Searching and Download Abilities, explains 63% of variance. In the present study, 260 students (119 juniors and 141 seniors) were asked to respond to ETICA (16 items Likert scale of five points 1: null domain to 5: total domain). The results show that both junior and senior students report having very similar expertise level; however, E-ETICA presents a different factor structure for juniors and four factors explained also 63% of variance: Information E-Searching, Download and Process; Data analysis; Organization; and Communication technologies.Keywords: Data analysis, Information, Communications Technologies, Expertise'Levels.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1286