Search results for: Twitter data clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24495

Search results for: Twitter data clustering

24435 A Structured Mechanism for Identifying Political Influencers on Social Media Platforms: Top 10 Saudi Political Twitter Users

Authors: Ahmad Alsolami, Darren Mundy, Manuel Hernandez-Perez

Abstract:

Social media networks, such as Twitter, offer the perfect opportunity to either positively or negatively affect political attitudes on large audiences. The existence of influential users who have developed a reputation for their knowledge and experience of specific topics is a major factor contributing to this impact. Therefore, knowledge of the mechanisms to identify influential users on social media is vital for understanding their effect on their audience. The concept of the influential user is related to the concept of opinion leaders' to indicate that ideas first flow from mass media to opinion leaders and then to the rest of the population. Hence, the objective of this research was to provide reliable and accurate structural mechanisms to identify influential users, which could be applied to different platforms, places, and subjects. Twitter was selected as the platform of interest, and Saudi Arabia as the context for the investigation. These were selected because Saudi Arabia has a large number of Twitter users, some of whom are considerably active in setting agendas and disseminating ideas. The study considered the scientific methods that have been used to identify public opinion leaders before, utilizing metrics software on Twitter. The key findings propose multiple novel metrics to compare Twitter influencers, including the number of followers, social authority and the use of political hashtags, and four secondary filtering measures. Thus, using ratio and percentage calculations to classify the most influential users, Twitter accounts were filtered, analyzed and included. The structured approach is used as a mechanism to explore the top ten influencers on Twitter from the political domain in Saudi Arabia.

Keywords: Twitter, influencers, structured mechanism, Saudi Arabia

Procedia PDF Downloads 94
24434 Using Genetic Algorithms and Rough Set Based Fuzzy K-Modes to Improve Centroid Model Clustering Performance on Categorical Data

Authors: Rishabh Srivastav, Divyam Sharma

Abstract:

We propose an algorithm to cluster categorical data named as ‘Genetic algorithm initialized rough set based fuzzy K-Modes for categorical data’. We propose an amalgamation of the simple K-modes algorithm, the Rough and Fuzzy set based K-modes and the Genetic Algorithm to form a new algorithm,which we hypothesise, will provide better Centroid Model clustering results, than existing standard algorithms. In the proposed algorithm, the initialization and updation of modes is done by the use of genetic algorithms while the membership values are calculated using the rough set and fuzzy logic.

Keywords: categorical data, fuzzy logic, genetic algorithm, K modes clustering, rough sets

Procedia PDF Downloads 212
24433 Extracting Attributes for Twitter Hashtag Communities

Authors: Ashwaq Alsulami, Jianhua Shao

Abstract:

Various organisations often need to understand discussions on social media, such as what trending topics are and characteristics of the people engaged in the discussion. A number of approaches have been proposed to extract attributes that would characterise a discussion group. However, these approaches are largely based on supervised learning, and as such they require a large amount of labelled data. We propose an approach in this paper that does not require labelled data, but rely on lexical sources to detect meaningful attributes for online discussion groups. Our findings show an acceptable level of accuracy in detecting attributes for Twitter discussion groups.

Keywords: attributed community, attribute detection, community, social network

Procedia PDF Downloads 127
24432 The Experiences of Agency in the Utilization of Twitter for English Language Learning in a Saudi EFL Context

Authors: Fahd Hamad Alqasham

Abstract:

This longitudinal study investigates Saudi students’ use trajectory and experiences of Twitter as an innovative tool for in-class learning of the English language in a Saudi tertiary English as a foreign language (EFL) context for a 12-week semester. The study adopted van Lier’s agency theory (2008, 2010) as the analytical framework to obtain an in-depth analysis of how the learners’ could utilize Twitter to create innovative ways for them to engage in English learning inside the language classroom. The study implemented a mixed methods approach, including six data collection instruments consisting of a research log, observations, focus group participation, initial and post-project interviews, and a post-project questionnaire. The study was conducted at Qassim University, specifically at Preparatory Year Program (PYP) on the main campus. The sample included 25 male students studying in the first level of PYP. The findings results revealed that although Twitter’s affordances initially paled a crucial role in motivating the learners to initiate their agency inside the classroom to learn English, the contextual constraints, mainly anxiety, the university infrastructure, and the teacher’s role negatively influenced the sustainability of Twitter’s use past week nine of its implementation.

Keywords: CALL, agency, innovation, EFL, language learning

Procedia PDF Downloads 50
24431 Clustering Performance Analysis using New Correlation-Based Cluster Validity Indices

Authors: Nathakhun Wiroonsri

Abstract:

There are various cluster validity measures used for evaluating clustering results. One of the main objectives of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weaknesses that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal option that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points are located in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios, including the well-known iris data set and a real-world marketing application, have been conducted to compare the proposed validity indices with several well-known ones.

Keywords: clustering algorithm, cluster validity measure, correlation, data partitions, iris data set, marketing, pattern recognition

Procedia PDF Downloads 81
24430 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering

Authors: K. Umbleja, M. Ichino

Abstract:

Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.

Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis

Procedia PDF Downloads 132
24429 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement

Authors: Wang Lin, Li Zhiqiang

Abstract:

The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.

Keywords: behavior pattern, cooperative learning, data analyze, k-means clustering algorithm

Procedia PDF Downloads 154
24428 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique

Authors: Yeliz Karaca, Rana Karabudak

Abstract:

Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.

Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques

Procedia PDF Downloads 133
24427 Resale Housing Development Board Price Prediction Considering Covid-19 through Sentiment Analysis

Authors: Srinaath Anbu Durai, Wang Zhaoxia

Abstract:

Twitter sentiment has been used as a predictor to predict price values or trends in both the stock market and housing market. The pioneering works in this stream of research drew upon works in behavioural economics to show that sentiment or emotions impact economic decisions. Latest works in this stream focus on the algorithm used as opposed to the data used. A literature review of works in this stream through the lens of data used shows that there is a paucity of work that considers the impact of sentiments caused due to an external factor on either the stock or the housing market. This is despite an abundance of works in behavioural economics that show that sentiment or emotions caused due to an external factor impact economic decisions. To address this gap, this research studies the impact of Twitter sentiment pertaining to the Covid-19 pandemic on resale Housing Development Board (HDB) apartment prices in Singapore. It leverages SNSCRAPE to collect tweets pertaining to Covid-19 for sentiment analysis, lexicon based tools VADER and TextBlob are used for sentiment analysis, Granger Causality is used to examine the relationship between Covid-19 cases and the sentiment score, and neural networks are leveraged as prediction models. Twitter sentiment pertaining to Covid-19 as a predictor of HDB price in Singapore is studied in comparison with the traditional predictors of housing prices i.e., the structural and neighbourhood characteristics. The results indicate that using Twitter sentiment pertaining to Covid19 leads to better prediction than using only the traditional predictors and performs better as a predictor compared to two of the traditional predictors. Hence, Twitter sentiment pertaining to an external factor should be considered as important as traditional predictors. This paper demonstrates the real world economic applications of sentiment analysis of Twitter data.

Keywords: sentiment analysis, Covid-19, housing price prediction, tweets, social media, Singapore HDB, behavioral economics, neural networks

Procedia PDF Downloads 81
24426 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 480
24425 A Structured Mechanism for Identifying Political Influencers on Social Media Platforms Top 10 Saudi Political Twitter Users

Authors: Ahmad Alsolami, Darren Mundy, Manuel Hernandez-Perez

Abstract:

Social media networks, such as Twitter, offer the perfect opportunity to either positively or negatively affect political attitudes on large audiences. A most important factor contributing to this effect is the existence of influential users, who have developed a reputation for their awareness and experience on specific subjects. Therefore, knowledge of the mechanisms to identify influential users on social media is vital for understanding their effect on their audience. The concept of the influential user is based on the pioneering work of Katz and Lazarsfeld (1959), who created the concept of opinion leaders' to indicate that ideas first flow from mass media to opinion leaders and then to the rest of the population. Hence, the objective of this research was to provide reliable and accurate structural mechanisms to identify influential users, which could be applied to different platforms, places, and subjects. Twitter was selected as the platform of interest, and Saudi Arabia as the context for the investigation. These were selected because Saudi Arabia has a large number of Twitter users, some of whom are considerably active in setting agendas and disseminating ideas. The study considered the scientific methods that have been used to identify public opinion leaders before, utilizing metrics software on Twitter. The key findings propose multiple novel metrics to compare Twitter influencers, including the number of followers, social authority and the use of political hashtags, and four secondary filtering measures. Thus, using ratio and percentage calculations to classify the most influential users, Twitter accounts were filtered, analyzed and included. The structured approach is used as a mechanism to explore the top ten influencers on Twitter from the political domain in Saudi Arabia.

Keywords: twitter, influencers, structured mechanism, Saudi Arabia

Procedia PDF Downloads 104
24424 Social Networking Sites and Employee Engagement

Authors: Sultan Ali Suleiman AlMazrouei

Abstract:

Purpose: The purpose of this paper is to examine the effect of communication through social networking sites (Facebook, Twitter) on employee engagement. Methodology: A quantitative survey was used to collect data from 440 employees from the Ministry of Education in Oman. SPSS software was used to analyze the data. Findings: The results revealed a positive significant relationship between communication via Facebook and employee engagement. However, communication via Twitter does not influence employee engagement significantly. Practical implications: Managers can benefit from the study by understanding the importance of communication via Facebook with employees in order to increase their engagement. They should post their views and thoughts on Facebook and encourage their employees to be members which would be reflected on their psychological side positively. That gives them a feeling of belonging to a network. Originality/value: The study enriches the human resources management literature by examining a theoretical framework about the influence of social networking sites usage on employee engagement. This is one of the few studies that focus on the relationship of social networking sites usage with employees' engagement. It is the first study in an Omani context.

Keywords: employee engagement, social networking sites, Facebook, Twitter

Procedia PDF Downloads 314
24423 Multimedia Data Fusion for Event Detection in Twitter by Using Dempster-Shafer Evidence Theory

Authors: Samar M. Alqhtani, Suhuai Luo, Brian Regan

Abstract:

Data fusion technology can be the best way to extract useful information from multiple sources of data. It has been widely applied in various applications. This paper presents a data fusion approach in multimedia data for event detection in twitter by using Dempster-Shafer evidence theory. The methodology applies a mining algorithm to detect the event. There are two types of data in the fusion. The first is features extracted from text by using the bag-ofwords method which is calculated using the term frequency-inverse document frequency (TF-IDF). The second is the visual features extracted by applying scale-invariant feature transform (SIFT). The Dempster - Shafer theory of evidence is applied in order to fuse the information from these two sources. Our experiments have indicated that comparing to the approaches using individual data source, the proposed data fusion approach can increase the prediction accuracy for event detection. The experimental result showed that the proposed method achieved a high accuracy of 0.97, comparing with 0.93 with texts only, and 0.86 with images only.

Keywords: data fusion, Dempster-Shafer theory, data mining, event detection

Procedia PDF Downloads 374
24422 Hybridized Approach for Distance Estimation Using K-Means Clustering

Authors: Ritu Vashistha, Jitender Kumar

Abstract:

Clustering using the K-means algorithm is a very common way to understand and analyze the obtained output data. When a similar object is grouped, this is called the basis of Clustering. There is K number of objects and C number of cluster in to single cluster in which k is always supposed to be less than C having each cluster to be its own centroid but the major problem is how is identify the cluster is correct based on the data. Formulation of the cluster is not a regular task for every tuple of row record or entity but it is done by an iterative process. Each and every record, tuple, entity is checked and examined and similarity dissimilarity is examined. So this iterative process seems to be very lengthy and unable to give optimal output for the cluster and time taken to find the cluster. To overcome the drawback challenge, we are proposing a formula to find the clusters at the run time, so this approach can give us optimal results. The proposed approach uses the Euclidian distance formula as well melanosis to find the minimum distance between slots as technically we called clusters and the same approach we have also applied to Ant Colony Optimization(ACO) algorithm, which results in the production of two and multi-dimensional matrix.

Keywords: ant colony optimization, data clustering, centroids, data mining, k-means

Procedia PDF Downloads 101
24421 GCM Based Fuzzy Clustering to Identify Homogeneous Climatic Regions of North-East India

Authors: Arup K. Sarma, Jayshree Hazarika

Abstract:

The North-eastern part of India, which receives heavier rainfall than other parts of the subcontinent, is of great concern now-a-days with regard to climate change. High intensity rainfall for short duration and longer dry spell, occurring due to impact of climate change, affects river morphology too. In the present study, an attempt is made to delineate the North-Eastern region of India into some homogeneous clusters based on the Fuzzy Clustering concept and to compare the resulting clusters obtained by using conventional methods and non conventional methods of clustering. The concept of clustering is adapted in view of the fact that, impact of climate change can be studied in a homogeneous region without much variation, which can be helpful in studies related to water resources planning and management. 10 IMD (Indian Meteorological Department) stations, situated in various regions of the North-east, have been selected for making the clusters. The results of the Fuzzy C-Means (FCM) analysis show different clustering patterns for different conditions. From the analysis and comparison it can be concluded that non conventional method of using GCM data is somehow giving better results than the others. However, further analysis can be done by taking daily data instead of monthly means to reduce the effect of standardization.

Keywords: climate change, conventional and nonconventional methods of clustering, FCM analysis, homogeneous regions

Procedia PDF Downloads 354
24420 Max-Entropy Feed-Forward Clustering Neural Network

Authors: Xiaohan Bookman, Xiaoyan Zhu

Abstract:

The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be represented as the distribution of this sample for different clusters. Entropy-Based Principle is the principle with which we could estimate the unknown distribution under some limited conditions. As this paper defines two processes in Feed-Forward Neural Network, our limited condition is the abstracted features of samples which are worked out in the abstraction process. And the final outputs are the probability distribution for different clusters in the clustering process. As Entropy-Based Principle is considered into the feed-forward neural network, a clustering method is born. We have conducted some experiments on six open UCI data sets, comparing with a few baselines and applied purity as the measurement. The results illustrate that our method outperforms all the other baselines that are most popular clustering methods.

Keywords: feed-forward neural network, clustering, max-entropy principle, probabilistic models

Procedia PDF Downloads 407
24419 Unsupervised Sentiment Analysis for Indonesian Political Message on Twitter

Authors: Omar Abdillah, Mirna Adriani

Abstract:

In this work, we perform new approach for analyzing public sentiment towards the presidential candidate in the 2014 Indonesian election that expressed in Twitter. In this study we propose such procedure for analyzing sentiment over Indonesian political message by understanding the behavior of Indonesian society in sending message on Twitter. We took different approach from previous works by utilizing punctuation mark and Indonesian sentiment lexicon that completed with the new procedure in determining sentiment towards the candidates. Our experiment shows the performance that yields up to 83.31% of average precision. In brief, this work makes two contributions: first, this work is the preliminary study of sentiment analysis in the domain of political message that has not been addressed yet before. Second, we propose such method to conduct sentiment analysis by creating decision making procedure in which it is in line with the characteristic of Indonesian message on Twitter.

Keywords: unsupervised sentiment analysis, political message, lexicon based, user behavior understanding

Procedia PDF Downloads 449
24418 An Analysis on Clustering Based Gene Selection and Classification for Gene Expression Data

Authors: K. Sathishkumar, V. Thiagarasu

Abstract:

Due to recent advances in DNA microarray technology, it is now feasible to obtain gene expression profiles of tissue samples at relatively low costs. Many scientists around the world use the advantage of this gene profiling to characterize complex biological circumstances and diseases. Microarray techniques that are used in genome-wide gene expression and genome mutation analysis help scientists and physicians in understanding of the pathophysiological mechanisms, in diagnoses and prognoses, and choosing treatment plans. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. This work presents an analysis of several clustering algorithms proposed to deals with the gene expression data effectively. The existing clustering algorithms like Support Vector Machine (SVM), K-means algorithm and evolutionary algorithm etc. are analyzed thoroughly to identify the advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. In order to improve the classification performance of the best approach in terms of Accuracy, Convergence Behavior and processing time, a hybrid clustering based optimization approach has been proposed.

Keywords: microarray technology, gene expression data, clustering, gene Selection

Procedia PDF Downloads 289
24417 An Energy Efficient Clustering Approach for Underwater ‎Wireless Sensor Networks

Authors: Mohammad Reza Taherkhani‎

Abstract:

Wireless sensor networks that are used to monitor a special environment, are formed from a large number of sensor nodes. The role of these sensors is to sense special parameters from ambient and to make a connection. In these networks, the most important challenge is the management of energy usage. Clustering is one of the methods that are broadly used to face this challenge. In this paper, a distributed clustering protocol based on learning automata is proposed for underwater wireless sensor networks. The proposed algorithm that is called LA-Clustering forms clusters in the same energy level, based on the energy level of nodes and the connection radius regardless of size and the structure of sensor network. The proposed approach is simulated and is compared with some other protocols with considering some metrics such as network lifetime, number of alive nodes, and number of transmitted data. The simulation results demonstrate the efficiency of the proposed approach.

Keywords: underwater sensor networks, clustering, learning automata, energy consumption

Procedia PDF Downloads 334
24416 Design of a Graphical User Interface for Data Preprocessing and Image Segmentation Process in 2D MRI Images

Authors: Enver Kucukkulahli, Pakize Erdogmus, Kemal Polat

Abstract:

The 2D image segmentation is a significant process in finding a suitable region in medical images such as MRI, PET, CT etc. In this study, we have focused on 2D MRI images for image segmentation process. We have designed a GUI (graphical user interface) written in MATLABTM for 2D MRI images. In this program, there are two different interfaces including data pre-processing and image clustering or segmentation. In the data pre-processing section, there are median filter, average filter, unsharp mask filter, Wiener filter, and custom filter (a filter that is designed by user in MATLAB). As for the image clustering, there are seven different image segmentations for 2D MR images. These image segmentation algorithms are as follows: PSO (particle swarm optimization), GA (genetic algorithm), Lloyds algorithm, k-means, the combination of Lloyds and k-means, mean shift clustering, and finally BBO (Biogeography Based Optimization). To find the suitable cluster number in 2D MRI, we have designed the histogram based cluster estimation method and then applied to these numbers to image segmentation algorithms to cluster an image automatically. Also, we have selected the best hybrid method for each 2D MR images thanks to this GUI software.

Keywords: image segmentation, clustering, GUI, 2D MRI

Procedia PDF Downloads 348
24415 A Concept of Data Mining with XML Document

Authors: Akshay Agrawal, Anand K. Srivastava

Abstract:

The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.

Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering

Procedia PDF Downloads 346
24414 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network

Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar

Abstract:

Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.

Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network

Procedia PDF Downloads 484
24413 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 172
24412 Social Media and Political Expression: Examining Affordances and Spiral of Silence Theories

Authors: Mustafa Oz

Abstract:

This study compares how do people express their opinions on the Facebook versus on Twitter. It was sought to understand whether people were more willing to express their opinions on some social media channels than others. It was assumed that fear of isolation and affordances may influence users’ opinion expression behaviors on social media websites. Thus two most popular social media websites, Twitter and Facebook, were compared. This study aims to provide the comprehensive understanding of political expression on social media platforms. An online survey (N=535) was conducted to understand respondents’ opinion expression behaviors. Overall, the results suggested that people were more likely to express their opinion on Twitter than Facebook when they think the majority does not support their opinion. The study concluded that people operate differently on Facebook versus Twitter.

Keywords: social media, spiral of silence, affordances, political expression

Procedia PDF Downloads 109
24411 An Interpretable Data-Driven Approach for the Stratification of the Cardiorespiratory Fitness

Authors: D.Mendes, J. Henriques, P. Carvalho, T. Rocha, S. Paredes, R. Cabiddu, R. Trimer, R. Mendes, A. Borghi-Silva, L. Kaminsky, E. Ashley, R. Arena, J. Myers

Abstract:

The continued exploration of clinically relevant predictive models continues to be an important pursuit. Cardiorespiratory fitness (CRF) portends clinical vital information and as such its accurate prediction is of high importance. Therefore, the aim of the current study was to develop a data-driven model, based on computational intelligence techniques and, in particular, clustering approaches, to predict CRF. Two prediction models were implemented and compared: 1) the traditional Wasserman/Hansen Equations; and 2) an interpretable clustering approach. Data used for this analysis were from the 'FRIEND - Fitness Registry and the Importance of Exercise: The National Data Base'; in the present study a subset of 10690 apparently healthy individuals were utilized. The accuracy of the models was performed through the computation of sensitivity, specificity, and geometric mean values. The results show the superiority of the clustering approach in the accurate estimation of CRF (i.e., maximal oxygen consumption).

Keywords: cardiorespiratory fitness, data-driven models, knowledge extraction, machine learning

Procedia PDF Downloads 256
24410 Factors Promoting French-English Tweets in France

Authors: Taoues Hadour

Abstract:

Twitter has become a popular means of communication used in a variety of fields, such as politics, journalism, and academia. This widely used online platform has an impact on the way people express themselves and is changing language usage worldwide at an unprecedented pace. The language used online reflects the linguistic battle that has been going on for several decades in French society. This study enables a deeper understanding of users' linguistic behavior online. The implications are important and allow for a rise in awareness of intercultural and cross-language exchanges. This project investigates the mixing of French-English language usage among French users of Twitter using a topic analysis approach. This analysis draws on Gumperz's theory of conversational switching. In order to collect tweets at a large scale, the data was collected in R using the rtweet package to access and retrieve French tweets data through Twitter’s REST and stream APIs (Application Program Interface) using the software RStudio, the integrated development environment for R. The dataset was filtered manually and certain repetitions of themes were observed. A total of nine topic categories were identified and analyzed in this study: entertainment, internet/social media, events/community, politics/news, sports, sex/pornography, innovation/technology, fashion/make up, and business. The study reveals that entertainment is the most frequent topic discussed on Twitter. Entertainment includes movies, music, games, and books. Anglicisms such as trailer, spoil, and live are identified in the data. Change in language usage is inevitable and is a natural result of linguistic interactions. The use of different languages online is just an example of what the real world would look like without linguistic regulations. Social media reveals a multicultural and multilinguistic richness which can deepen and expand our understanding of contemporary human attitudes.

Keywords: code-switching, French, sociolinguistics, Twitter

Procedia PDF Downloads 98
24409 Saudi Twitter Corpus for Sentiment Analysis

Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari

Abstract:

Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.

Keywords: Arabic, sentiment analysis, Twitter, annotation

Procedia PDF Downloads 592
24408 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 325
24407 Hierarchical Checkpoint Protocol in Data Grids

Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed

Abstract:

Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.

Keywords: data grids, fault tolerance, clustering, chandy-lamport

Procedia PDF Downloads 302
24406 An Observation of the Information Technology Research and Development Based on Article Data Mining: A Survey Study on Science Direct

Authors: Muhammet Dursun Kaya, Hasan Asil

Abstract:

One of the most important factors of research and development is the deep insight into the evolutions of scientific development. The state-of-the-art tools and instruments can considerably assist the researchers, and many of the world organizations have become aware of the advantages of data mining for the acquisition of the knowledge required for the unstructured data. This paper was an attempt to review the articles on the information technology published in the past five years with the aid of data mining. A clustering approach was used to study these articles, and the research results revealed that three topics, namely health, innovation, and information systems, have captured the special attention of the researchers.

Keywords: information technology, data mining, scientific development, clustering

Procedia PDF Downloads 246