Search results for: multivariate categorical data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7513

Search results for: multivariate categorical data

7393 Using Structural Equation Modeling in Causal Relationship Design for Balanced-Scorecards' Strategic Map

Authors: A. Saghaei, R. Ghasemi

Abstract:

Through 1980s, management accounting researchers described the increasing irrelevance of traditional control and performance measurement systems. The Balanced Scorecard (BSC) is a critical business tool for a lot of organizations. It is a performance measurement system which translates mission and strategy into objectives. Strategy map approach is a development variant of BSC in which some necessary causal relations must be established. To recognize these relations, experts usually use experience. It is also possible to utilize regression for the same purpose. Structural Equation Modeling (SEM), which is one of the most powerful methods of multivariate data analysis, obtains more appropriate results than traditional methods such as regression. In the present paper, we propose SEM for the first time to identify the relations between objectives in the strategy map, and a test to measure the importance of relations. In SEM, factor analysis and test of hypotheses are done in the same analysis. SEM is known to be better than other techniques at supporting analysis and reporting. Our approach provides a framework which permits the experts to design the strategy map by applying a comprehensive and scientific method together with their experience. Therefore this scheme is a more reliable method in comparison with the previously established methods.

Keywords: BSC, SEM, Strategy map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2705
7392 Emotional Analysis for Text Search Queries on Internet

Authors: Gemma García López

Abstract:

The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.

Keywords: Emotion classification, text search queries, emotional analysis, sentiment analysis in text, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 713
7391 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6091
7390 Alternative Computational Arrangements on g-Group (g > 2) Profile Analysis

Authors: Emmanuel U. Ohaegbulem, Felix N. Nwobi

Abstract:

Alternative and simple computational arrangements in carrying out multivariate profile analysis when more than two groups (populations) are involved are presented. These arrangements have been demonstrated to not only yield equivalent results for the test statistics (the Wilks lambdas), but they have less computational efforts relative to other arrangements so far presented in the literature; in addition to being quite simple and easy to apply.

Keywords: Coincident profiles, g-group profile analysis, level profiles, parallel profiles, repeated measures MANOVA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1240
7389 Risk Management Analysis: An Empirical Study Using Bivariate GARCH

Authors: Chin Wen Cheong

Abstract:

This study employs a bivariate asymmetric GARCH model to reveal the hidden dynamics price changes and volatility among the emerging markets of Thailand and Malaysian after the Asian financial crisis from January 2001 to December 2008. Our results indicated that the equity markets are sharing the common information (shock) that transmitted among each others. These empirical findings are used to demonstrate the importance of shock and volatility dynamic transmissions in the cross-market hedging and market risk.

Keywords: multivariate ARCH, structural change, value at risk.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1418
7388 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4875
7387 Gender Justice and Feminist Self-Management Practices in the Solidarity Economy: A Quantitative Analysis of the Factors that Impact Enterprises Formed by Women in Brazil

Authors: Maria de Nazaré Moraes Soares, Silvia Maria Dias Pedro Rebouças, José Carlos Lázaro

Abstract:

The Solidarity Economy (SE) acts in the re-articulation of the economic field to the other spheres of social action. The significant participation of women in SE resulted in the formation of a national network of self-managed enterprises in Brazil: The Solidarity and Feminist Economy Network (SFEN). The objective of the research is to identify factors of gender justice and feminist self-management practices that adhere to the reality of women in SE enterprises. The conceptual apparatus related to feminist studies in this research covers Nancy Fraser approaches on gender justice, and Patricia Yancey Martin approaches on feminist management practices, and authors of postcolonial feminism such as Mohanty and Maria Lugones, who lead the discussion to peripheral contexts, a necessary perspective when observing the women’s movement in SE. The research has a quantitative nature in the phases of data collection and analysis. The data collection was performed through two data sources: the database mapped in Brazil in 2010-2013 by the National Information System in Solidary Economy and 150 questionnaires with women from 16 enterprises in SFEN, in a state of Brazilian northeast. The data were analyzed using the multivariate statistical technique of Factor Analysis. The results show that the factors that define gender justice and feminist self-management practices in SE are interrelated in several levels, proving statistically the intersectional condition of the issue of women. The evidence from the quantitative analysis allowed us to understand the dimensions of gender justice and feminist management practices intersectionality; in this sense, the non-distribution of domestic work interferes in non-representation of women in public spaces, especially in peripheral contexts. The study contributes with important reflections to the studies of this area and can be complemented in the future with a qualitative research that approaches the perspective of women in the context of the SE self-management paradigm.

Keywords: Feminist management practices, gender justice, self-management, solidarity economy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 624
7386 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2612
7385 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559
7384 Infrastructure Change Monitoring Using Multitemporal Multispectral Satellite Images

Authors: U. Datta

Abstract:

The main objective of this study is to find a suitable approach to monitor the land infrastructure growth over a period of time using multispectral satellite images. Bi-temporal change detection method is unable to indicate the continuous change occurring over a long period of time. To achieve this objective, the approach used here estimates a statistical model from series of multispectral image data over a long period of time, assuming there is no considerable change during that time period and then compare it with the multispectral image data obtained at a later time. The change is estimated pixel-wise. Statistical composite hypothesis technique is used for estimating pixel based change detection in a defined region. The generalized likelihood ratio test (GLRT) is used to detect the changed pixel from probabilistic estimated model of the corresponding pixel. The changed pixel is detected assuming that the images have been co-registered prior to estimation. To minimize error due to co-registration, 8-neighborhood pixels around the pixel under test are also considered. The multispectral images from Sentinel-2 and Landsat-8 from 2015 to 2018 are used for this purpose. There are different challenges in this method. First and foremost challenge is to get quite a large number of datasets for multivariate distribution modelling. A large number of images are always discarded due to cloud coverage. Due to imperfect modelling there will be high probability of false alarm. Overall conclusion that can be drawn from this work is that the probabilistic method described in this paper has given some promising results, which need to be pursued further.

Keywords: Co-registration, GLRT, infrastructure growth, multispectral, multitemporal, pixel-based change detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 730
7383 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2480
7382 Effects of Different Meteorological Variables on Reference Evapotranspiration Modeling: Application of Principal Component Analysis

Authors: Akinola Ikudayisi, Josiah Adeyemo

Abstract:

The correct estimation of reference evapotranspiration (ETₒ) is required for effective irrigation water resources planning and management. However, there are some variables that must be considered while estimating and modeling ETₒ. This study therefore determines the multivariate analysis of correlated variables involved in the estimation and modeling of ETₒ at Vaalharts irrigation scheme (VIS) in South Africa using Principal Component Analysis (PCA) technique. Weather and meteorological data between 1994 and 2014 were obtained both from South African Weather Service (SAWS) and Agricultural Research Council (ARC) in South Africa for this study. Average monthly data of minimum and maximum temperature (°C), rainfall (mm), relative humidity (%), and wind speed (m/s) were the inputs to the PCA-based model, while ETₒ is the output. PCA technique was adopted to extract the most important information from the dataset and also to analyze the relationship between the five variables and ETₒ. This is to determine the most significant variables affecting ETₒ estimation at VIS. From the model performances, two principal components with a variance of 82.7% were retained after the eigenvector extraction. The results of the two principal components were compared and the model output shows that minimum temperature, maximum temperature and windspeed are the most important variables in ETₒ estimation and modeling at VIS. In order words, ETₒ increases with temperature and windspeed. Other variables such as rainfall and relative humidity are less important and cannot be used to provide enough information about ETₒ estimation at VIS. The outcome of this study has helped to reduce input variable dimensionality from five to the three most significant variables in ETₒ modelling at VIS, South Africa.

Keywords: Irrigation, principal component analysis, reference evapotranspiration, Vaalharts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1061
7381 Renewable Energy Trends Analysis: A Patents Study

Authors: Sepulveda Juan

Abstract:

This article explains the elements and considerations taken into account when implementing and applying patent evaluation and scientometric study in the identifications of technology trends, and the tools that led to the implementation of a software application for patent revision. Univariate analysis helped recognize the technological leaders in the field of energy, and steered the way for a multivariate analysis of this sample, which allowed for a graphical description of the techniques of mature technologies, as well as the detection of emerging technologies. This article ends with a validation of the methodology as applied to the case of fuel cells.

Keywords: Energy, technology mapping, patents.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2187
7380 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3775
7379 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
7378 The Comparison of Parental Childrearing Styles and Anxiety in Children with Stuttering and Normal Population

Authors: Pegah Farokhzad

Abstract:

Family has a crucial role in maintaining the physical, social and mental health of the children. Most of the mental and anxiety problems of children reflect the complex interpersonal situations among family members, especially parents. In other words, anxiety problems of the children are correlated with deficit relationships of family members and improper childrearing styles. The parental child rearing styles leads to positive and negative consequences which affect the children’s mental health. Therefore, the present research was aimed to compare the parental childrearing styles and anxiety of children with stuttering and normal population. It was also aimed to study the relationship between parental child rearing styles and anxiety of children. The research sample included 54 boys with stuttering and 54 normal boys who were selected from the children (boys) of Tehran, Iran in the age range of 5 to 8 years in 2013. In order to collect data, Baum-rind Childrearing Styles Inventory and Spence Parental Anxiety Inventory were used. Appropriate descriptive statistical methods and multivariate variance analysis and t test for independent groups were used to test the study hypotheses. Statistical data analyses demonstrated that there was a significant difference between stuttering boys and normal boys in anxiety (t = 7.601, p< 0.01); but there was no significant difference between stuttering boys and normal boys in parental childrearing styles (F = 0.129). There was also not found significant relationship between parental childrearing styles and children anxiety (F = 0.135, p< 0.05). It can be concluded that the influential factors of children’s society are parents, school, teachers, peers and media. So, parental childrearing styles are not the only influential factors on anxiety of children, and other factors including genetic, environment and child experiences are effective in anxiety as well. Details are discussed.

Keywords: Anxiety, Childrearing Styles, Stuttering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3073
7377 Study on Optimal Control Strategy of PM2.5 in Wuhan, China

Authors: Qiuling Xie, Shanliang Zhu, Zongdi Sun

Abstract:

In this paper, we analyzed the correlation relationship among PM2.5 from other five Air Quality Indices (AQIs) based on the grey relational degree, and built a multivariate nonlinear regression equation model of PM2.5 and the five monitoring indexes. For the optimal control problem of PM2.5, we took the partial large Cauchy distribution of membership equation as satisfaction function. We established a nonlinear programming model with the goal of maximum performance to price ratio. And the optimal control scheme is given.

Keywords: Grey relational degree, multiple linear regression, membership function, nonlinear programming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1408
7376 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
7375 Testing the Validity of Maturity Model for E-Government Implementation in Indonesia

Authors: Darmawan Napitupulu, Dana Indra Sensuse, Aniati Murni

Abstract:

The research was conducted to empirically validate the proposed maturity model of e-Government implementation, composed of four dimensions, further specified by 54 success factors as attributes. To do so, there are two steps were performed. First, expert’s judgment was conducted to test its content validity. The second, reliability study was performed to evaluate inter-rater agreement by using Fleiss Kappa approach. The kappa statistic (kappa coefficient) is the most commonly used method for testing the consistency among raters. Fleiss Kappa was a generalization of Kappa in extensions to the case of more than two raters (multiple raters) with multi-categorical ratings. Our findings show that most attributes of the proposed model were related to their corresponding dimensions. According to our results, The percentage of agree answers given by the experts was 73.69% in dimension A, 89.76% in B, 81.5% in C and 60.37% in D. This means that more than half of the attributes of each dimensions were appropriate or relevant to the dimensions they were supposed to measure, while 85% of attributes were relevant enough to their corresponding dimensions. Inter-rater reliability coefficient also showed satisfactory result and interpreted as substantial agreement among raters. Therefore, the proposed model in this paper was valid and reliable to measure the maturity of e-Government implementation.

Keywords: E-Government, Model, Maturity, Validity, Reliability Kappa.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201
7374 Automatic Iterative Methods for the Multivariate Solution of Nonlinear Algebraic Equations

Authors: Rafat Alshorman, Safwan Al-Shara', I. Obeidat

Abstract:

Most real world systems express themselves formally as a set of nonlinear algebraic equations. As applications grow, the size and complexity of these equations also increase. In this work, we highlight the key concepts in using the homotopy analysis method as a methodology used to construct efficient iteration formulas for nonlinear equations solving. The proposed method is experimentally characterized according to a set of determined parameters which affect the systems. The experimental results show the potential and limitations of the new method and imply directions for future work.

Keywords: Nonlinear Algebraic Equations, Iterative Methods, Homotopy Analysis Method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1912
7373 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010
7372 Diversity Analysis of a Quinoa (Chenopodium quinoa Willd.) Germplasm during Two Seasons

Authors: M. Mhada, E. N. Jellen, S. E. Jacobsen, O. Benlhabib

Abstract:

The present work has been carried out to evaluate the diversity of a collection of 78 quinoa accessions developed through recurrent selection from Andean germplasm introduced to Morocco in the winter of 2000. Twenty-three quantitative and qualitative characters were used for the evaluation of genetic diversity and the relationship between the accessions, and also for the establishment of a core collection in Morocco. Important variation was found among the accessions in terms of plant morphology and growth behavior. Data analysis showed positive correlation of the plant height, the plant fresh and the dry weight with the grain yield, while days to flowering was found to be negatively correlated with grain yield. The first four PCs contributed 74.76% of the variability; the first PC showed significant variation with 42.86% of the total variation, PC2 with 15.37%, PC3 with 9.05% and PC4 contributed 7.49% of the total variation. Plant size, days to grain filling and days to maturity are correlated to the PC1; and seed size, inflorescence density and mildew resistance are correlated to the PC2. Hierarchical cluster analysis rearranged the 78 quinoa accessions into four main groups and ten sub-clusters. Clustering was found in associations with days to maturity and also with plant size and seed-size traits.

Keywords: Character association, Chenopodium quinoa, Diversity analysis, Morphotypic cluster, Multivariate analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2586
7371 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2143
7370 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2791
7369 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642
7368 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1208
7367 Emotions in Health Tweets: Analysis of American Government Official Accounts

Authors: García López

Abstract:

The Government Departments of Health have the task of informing and educating citizens about public health issues. For this, they use channels like Twitter, key in the search for health information and the propagation of content. The tweets, important in the virality of the content, may contain emotions that influence the contagion and exchange of knowledge. The goal of this study is to perform an analysis of the emotional projection of health information shared on Twitter by official American accounts: the disease control account CDCgov, National Institutes of Health, NIH, the government agency HHSGov, and the professional organization PublicHealth. For this, we used Tone Analyzer, an International Business Machines Corporation (IBM) tool specialized in emotion detection in text, corresponding to the categorical model of emotion representation. For 15 days, all tweets from these accounts were analyzed with the emotional analysis tool in text. The results showed that their tweets contain an important emotional load, a determining factor in the success of their communications. This exposes that official accounts also use subjective language and contain emotions. The predominance of emotion joy over sadness and the strong presence of emotions in their tweets stimulate the virality of content, a key in the work of informing that government health departments have.

Keywords: Emotions in tweets emotion detection in text, health information on Twitter, American health official accounts, emotions on Twitter, emotions and content.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 697
7366 Object-Oriented Multivariate Proportional-Integral-Derivative Control of Hydraulic Systems

Authors: J. Fernandez de Canete, S. Fernandez-Calvo, I. García-Moral

Abstract:

This paper presents and discusses the application of the object-oriented modelling software SIMSCAPE to hydraulic systems, with particular reference to multivariable proportional-integral-derivative (PID) control. As a result, a particular modelling approach of a double cylinder-piston coupled system is proposed and motivated, and the SIMULINK based PID tuning tool has also been used to select the proper controller parameters. The paper demonstrates the usefulness of the object-oriented approach when both physical modelling and control are tackled.

Keywords: Object-oriented modeling, multivariable hydraulic system, multivariable PID control, computer simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1106
7365 Investigation of the Effect of Teaching a Thinking and Research Lesson by Cooperative and Traditional Methods on the Creativity of Sixth Grade Students

Authors: Faroogh Khakzad, Marzieh Dehghani, Elahe Hejazi

Abstract:

The present study investigates the effect of teaching a Thinking and Research lesson by cooperative and traditional methods on the creativity of sixth-grade students in Piranshahr province. The statistical society includes all the sixth-grade students of Piranshahr province. The sample of this studytable was selected by available sampling from among male elementary schools of Piranshahr. They were randomly assigned into two groups of cooperative teaching method and traditional teaching method. The design of the study is quasi-experimental with a control group. In this study, to assess students’ creativity, Abedi’s creativity questionnaire was used. Based on Cronbach’s alpha coefficient, the reliability of the factor flow was 0.74, innovation was 0.61, flexibility was 0.63, and expansion was 0.68. To analyze the data, t-test, univariate and multivariate covariance analysis were used for evaluation of the difference of means and the pretest and posttest scores. The findings of the research showed that cooperative teaching method does not significantly increase creativity (p > 0.05). Moreover, cooperative teaching method was found to have significant effect on flow factor (p < 0.05), but in innovation and expansion factors no significant effect was observed (p < 0.05).

Keywords: Cooperative teaching method, traditional teaching method, creativity, flow, innovation, flexibility, expansion, thinking and research lesson.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 692
7364 Steganalysis of Data Hiding via Halftoning and Coordinate Projection

Authors: Woong Hee Kim, Ilhwan Park

Abstract:

Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.

Keywords: Steganography, steganalysis, digital halftoning, data hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1600