Search results for: Twitter data clustering
24622 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process
Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek
Abstract:
Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process
Procedia PDF Downloads 37924621 Semi-Supervised Hierarchical Clustering Given a Reference Tree of Labeled Documents
Authors: Ying Zhao, Xingyan Bin
Abstract:
Semi-supervised clustering algorithms have been shown effective to improve clustering process with even limited supervision. However, semi-supervised hierarchical clustering remains challenging due to the complexities of expressing constraints for agglomerative clustering algorithms. This paper proposes novel semi-supervised agglomerative clustering algorithms to build a hierarchy based on a known reference tree. We prove that by enforcing distance constraints defined by a reference tree during the process of hierarchical clustering, the resultant tree is guaranteed to be consistent with the reference tree. We also propose a framework that allows the hierarchical tree generation be aware of levels of levels of the agglomerative tree under creation, so that metric weights can be learned and adopted at each level in a recursive fashion. The experimental evaluation shows that the additional cost of our contraint-based semi-supervised hierarchical clustering algorithm (HAC) is negligible, and our combined semi-supervised HAC algorithm outperforms the state-of-the-art algorithms on real-world datasets. The experiments also show that our proposed methods can improve clustering performance even with a small number of unevenly distributed labeled data.Keywords: semi-supervised clustering, hierarchical agglomerative clustering, reference trees, distance constraints
Procedia PDF Downloads 51524620 Application of Data Mining for Aquifer Environmental Assessment
Authors: Saman Javadi, Mehdi Hashemy, Mohahammad Mahmoodi
Abstract:
Vulnerability maps are employed as an important solution in order to handle entrance of pollution into the aquifers. The common way to provide vulnerability map is DRASTIC. Meanwhile, application of the method is not easy to apply for any aquifer due to choosing appropriate constant values of weights and ranks. In this study, a new approach using k-means clustering is applied to make vulnerability maps. Four features of depth to groundwater, hydraulic conductivity, recharge value and vadose zone were considered at the same time as features of clustering. Five regions are recognized out of the case study represent zones with different level of vulnerability. The finding results show that clustering provides a realistic vulnerability map so that, Pearson’s correlation coefficients between nitrate concentrations and clustering vulnerability is obtained 61%.Keywords: clustering, data mining, groundwater, vulnerability assessment
Procedia PDF Downloads 57224619 An Improved K-Means Algorithm for Gene Expression Data Clustering
Authors: Billel Kenidra, Mohamed Benmohammed
Abstract:
Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.Keywords: microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization
Procedia PDF Downloads 16624618 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency
Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami
Abstract:
Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means
Procedia PDF Downloads 24024617 Using Closed Frequent Itemsets for Hierarchical Document Clustering
Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu
Abstract:
Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.Keywords: FIHC, documents clustering, ontology, closed frequent itemset
Procedia PDF Downloads 37124616 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures
Authors: Salima Kouici, Abdelkader Khelladi
Abstract:
In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering
Procedia PDF Downloads 46124615 Time Series Regression with Meta-Clusters
Authors: Monika Chuchro
Abstract:
This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.Keywords: clustering, data analysis, data mining, predictive models
Procedia PDF Downloads 44524614 Hybrid Hierarchical Clustering Approach for Community Detection in Social Network
Authors: Radhia Toujani, Jalel Akaichi
Abstract:
Social Networks generally present a hierarchy of communities. To determine these communities and the relationship between them, detection algorithms should be applied. Most of the existing algorithms, proposed for hierarchical communities identification, are based on either agglomerative clustering or divisive clustering. In this paper, we present a hybrid hierarchical clustering approach for community detection based on both bottom-up and bottom-down clustering. Obviously, our approach provides more relevant community structure than hierarchical method which considers only divisive or agglomerative clustering to identify communities. Moreover, we performed some comparative experiments to enhance the quality of the clustering results and to show the effectiveness of our algorithm.Keywords: agglomerative hierarchical clustering, community structure, divisive hierarchical clustering, hybrid hierarchical clustering, opinion mining, social network, social network analysis
Procedia PDF Downloads 33624613 Tracing Digital Traces of Phatic Communion in #Mooc
Authors: Judith Enriquez-Gibson
Abstract:
This paper meddles with the notion of phatic communion introduced 90 years ago by Malinowski, who was a Polish-born British anthropologist. It explores the phatic in Twitter within the contents of tweets related to moocs (massive online open courses) as a topic or trend. It is not about moocs though. It is about practices that could easily be hidden or neglected if we let big or massive topics take the lead or if we simply follow the computational or secret codes behind Twitter itself and third party software analytics. It draws from media and cultural studies. Though at first it appears data-driven as I submitted data collection and analytics into the hands of a third party software, Twitonomy, the aim is to follow how phatic communion might be practised in a social media site, such as Twitter. Lurking becomes its research method to analyse mooc-related tweets. A total of 3,000 tweets were collected on 11 October 2013 (UK timezone). The emphasis of lurking is to engage with Twitter as a system of connectivity. One interesting finding is that a click is in fact a phatic practice. A click breaks the silence. A click in one of the mooc website is actually a tweet. A tweet was posted on behalf of a user who simply chose to click without formulating the text and perhaps without knowing that it contains #mooc. Surely, this mechanism is not about reciprocity. To break the silence, users did not use words. They just clicked the ‘tweet button’ on a mooc website. A click performs and maintains connectivity – and Twitter as the medium in attendance in our everyday, available when needed to be of service. In conclusion, the phatic culture of breaking silence in Twitter does not have to submit to the power of code and analytics. It is a matter of human code.Keywords: click, Twitter, phatic communion, social media data, mooc
Procedia PDF Downloads 38924612 A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters
Authors: Imèn Khanchouch, Malika Charrad, Mohamed Limam
Abstract:
The interpretation of the quality of clusters and the determination of the optimal number of clusters is still a crucial problem in clustering. We focus in this paper on multi-SOM clustering method which overcomes the problem of extracting the number of clusters from the SOM map through the use of a clustering validity index. We then tested multi-SOM using real and artificial data sets with different evaluation criteria not used previously such as Davies Bouldin index, Dunn index and silhouette index. The developed multi-SOM algorithm is compared to k-means and Birch methods. Results show that it is more efficient than classical clustering methods.Keywords: clustering, SOM, multi-SOM, DB index, Dunn index, silhouette index
Procedia PDF Downloads 57524611 Self-Supervised Attributed Graph Clustering with Dual Contrastive Loss Constraints
Authors: Lijuan Zhou, Mengqi Wu, Changyong Niu
Abstract:
Attributed graph clustering can utilize the graph topology and node attributes to uncover hidden community structures and patterns in complex networks, aiding in the understanding and analysis of complex systems. Utilizing contrastive learning for attributed graph clustering can effectively exploit meaningful implicit relationships between data. However, existing attributed graph clustering methods based on contrastive learning suffer from the following drawbacks: 1) Complex data augmentation increases computational cost, and inappropriate data augmentation may lead to semantic drift. 2) The selection of positive and negative samples neglects the intrinsic cluster structure learned from graph topology and node attributes. Therefore, this paper proposes a method called self-supervised Attributed Graph Clustering with Dual Contrastive Loss constraints (AGC-DCL). Firstly, Siamese Multilayer Perceptron (MLP) encoders are employed to generate two views separately to avoid complex data augmentation. Secondly, the neighborhood contrastive loss is introduced to constrain node representation using local topological structure while effectively embedding attribute information through attribute reconstruction. Additionally, clustering-oriented contrastive loss is applied to fully utilize clustering information in global semantics for discriminative node representations, regarding the cluster centers from two views as negative samples to fully leverage effective clustering information from different views. Comparative clustering results with existing attributed graph clustering algorithms on six datasets demonstrate the superiority of the proposed method.Keywords: attributed graph clustering, contrastive learning, clustering-oriented, self-supervised learning
Procedia PDF Downloads 1924610 Charting Sentiments with Naive Bayes and Logistic Regression
Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri
Abstract:
The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.Keywords: machine learning, sentiment analysis, visualisation, python
Procedia PDF Downloads 2924609 Decision Trees Constructing Based on K-Means Clustering Algorithm
Authors: Loai Abdallah, Malik Yousef
Abstract:
A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.Keywords: ensemble clustering, decision trees, classification, K nearest neighbors
Procedia PDF Downloads 16424608 An Experimental Study on Some Conventional and Hybrid Models of Fuzzy Clustering
Authors: Jeugert Kujtila, Kristi Hoxhalli, Ramazan Dalipi, Erjon Cota, Ardit Murati, Erind Bedalli
Abstract:
Clustering is a versatile instrument in the analysis of collections of data providing insights of the underlying structures of the dataset and enhancing the modeling capabilities. The fuzzy approach to the clustering problem increases the flexibility involving the concept of partial memberships (some value in the continuous interval [0, 1]) of the instances in the clusters. Several fuzzy clustering algorithms have been devised like FCM, Gustafson-Kessel, Gath-Geva, kernel-based FCM, PCM etc. Each of these algorithms has its own advantages and drawbacks, so none of these algorithms would be able to perform superiorly in all datasets. In this paper we will experimentally compare FCM, GK, GG algorithm and a hybrid two-stage fuzzy clustering model combining the FCM and Gath-Geva algorithms. Firstly we will theoretically dis-cuss the advantages and drawbacks for each of these algorithms and we will describe the hybrid clustering model exploiting the advantages and diminishing the drawbacks of each algorithm. Secondly we will experimentally compare the accuracy of the hybrid model by applying it on several benchmark and synthetic datasets.Keywords: fuzzy clustering, fuzzy c-means algorithm (FCM), Gustafson-Kessel algorithm, hybrid clustering model
Procedia PDF Downloads 48424607 Sales Patterns Clustering Analysis on Seasonal Product Sales Data
Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho
Abstract:
As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.Keywords: clustering, distribution, sales pattern, seasonal product
Procedia PDF Downloads 57624606 A Neural Network Based Clustering Approach for Imputing Multivariate Values in Big Data
Authors: S. Nickolas, Shobha K.
Abstract:
The treatment of incomplete data is an important step in the data pre-processing. Missing values creates a noisy environment in all applications and it is an unavoidable problem in big data management and analysis. Numerous techniques likes discarding rows with missing values, mean imputation, expectation maximization, neural networks with evolutionary algorithms or optimized techniques and hot deck imputation have been introduced by researchers for handling missing data. Among these, imputation techniques plays a positive role in filling missing values when it is necessary to use all records in the data and not to discard records with missing values. In this paper we propose a novel artificial neural network based clustering algorithm, Adaptive Resonance Theory-2(ART2) for imputation of missing values in mixed attribute data sets. The process of ART2 can recognize learned models fast and be adapted to new objects rapidly. It carries out model-based clustering by using competitive learning and self-steady mechanism in dynamic environment without supervision. The proposed approach not only imputes the missing values but also provides information about handling the outliers.Keywords: ART2, data imputation, clustering, missing data, neural network, pre-processing
Procedia PDF Downloads 25424605 Women Hashtactivism: Civic Engagement in Saudi Arabia
Authors: Mohammed Ibahrine
Abstract:
One of the prominent trends in the Saudi digital space in recent years is the boom in the use of social networking sites such as Facebook, YouTube, and Twitter. As of 2016, Twitter has over six million users in Saudi Arabia. In the wake of the recent political instability in the Arab region, digital platforms have gained importance for both, personal and professional purposes. A conspicuously observable tide of social activism has risen, with Twitter playing an increasingly important role. One of their primary goals is to enforce the logic of public visibility, social mobility and civic participation in the Saudi society. Saudi women use Twitter to disseminate specific and relevant information and promote their social agenda that remained unrecognized and invisible in the mainstream media and thus in the public sphere. The question is to what extent does Twitter empower Saudi women or reinforces their social immobility and invisibility? This paper focuses on three kinds of empowerment through Twitter in the religiously conservative and socially patriarchal Saudi society. It traces and analyses how Saudi female hashtactivism is increasingly becoming a site of struggle over visibility, mobility, control, and civic participation. The underlying thesis is that Twitter makes a contribution to the development of participatory culture, especially in the lives of women.Keywords: civic, hashtactivism, Saudi Arabia, Twiterverse
Procedia PDF Downloads 30024604 Clustering of Extremes in Financial Returns: A Comparison between Developed and Emerging Markets
Authors: Sara Ali Alokley, Mansour Saleh Albarrak
Abstract:
This paper investigates the dependency or clustering of extremes in the financial returns data by estimating the extremal index value θ∈[0,1]. The smaller the value of θ the more clustering we have. Here we apply the method of Ferro and Segers (2003) to estimate the extremal index for a range of threshold values. We compare the dependency structure of extremes in the developed and emerging markets. We use the financial returns of the stock market index in the developed markets of US, UK, France, Germany and Japan and the emerging markets of Brazil, Russia, India, China and Saudi Arabia. We expect that more clustering occurs in the emerging markets. This study will help to understand the dependency structure of the financial returns data.Keywords: clustring, extremes, returns, dependency, extermal index
Procedia PDF Downloads 37924603 Interpretation and Clustering Framework for Analyzing ECG Survey Data
Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif
Abstract:
As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix
Procedia PDF Downloads 44524602 Analysis of ECGs Survey Data by Applying Clustering Algorithm
Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif
Abstract:
As Indo-pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring the prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix
Procedia PDF Downloads 33024601 ACOPIN: An ACO Algorithm with TSP Approach for Clustering Proteins in Protein Interaction Networks
Authors: Jamaludin Sallim, Rozlina Mohamed, Roslina Abdul Hamid
Abstract:
In this paper, we proposed an Ant Colony Optimization (ACO) algorithm together with Traveling Salesman Problem (TSP) approach to investigate the clustering problem in Protein Interaction Networks (PIN). We named this combination as ACOPIN. The purpose of this work is two-fold. First, to test the efficacy of ACO in clustering PIN and second, to propose the simple generalization of the ACO algorithm that might allow its application in clustering proteins in PIN. We split this paper to three main sections. First, we describe the PIN and clustering proteins in PIN. Second, we discuss the steps involved in each phase of ACO algorithm. Finally, we present some results of the investigation with the clustering patterns.Keywords: ant colony optimization algorithm, searching algorithm, protein functional module, protein interaction network
Procedia PDF Downloads 57924600 Analysis of Expression Data Using Unsupervised Techniques
Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe
Abstract:
his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation
Procedia PDF Downloads 12224599 Learning Grammars for Detection of Disaster-Related Micro Events
Authors: Josef Steinberger, Vanni Zavarella, Hristo Tanev
Abstract:
Natural disasters cause tens of thousands of victims and massive material damages. We refer to all those events caused by natural disasters, such as damage on people, infrastructure, vehicles, services and resource supply, as micro events. This paper addresses the problem of micro - event detection in online media sources. We present a natural language grammar learning algorithm and apply it to online news. The algorithm in question is based on distributional clustering and detection of word collocations. We also explore the extraction of micro-events from social media and describe a Twitter mining robot, who uses combinations of keywords to detect tweets which talk about effects of disasters.Keywords: online news, natural language processing, machine learning, event extraction, crisis computing, disaster effects, Twitter
Procedia PDF Downloads 46024598 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy
Authors: Kemal Polat
Abstract:
In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.Keywords: machine learning, data weighting, classification, data mining
Procedia PDF Downloads 30724597 Spectral Clustering for Manufacturing Cell Formation
Authors: Yessica Nataliani, Miin-Shen Yang
Abstract:
Cell formation (CF) is an important step in group technology. It is used in designing cellular manufacturing systems using similarities between parts in relation to machines so that it can identify part families and machine groups. There are many CF methods in the literature, but there is less spectral clustering used in CF. In this paper, we propose a spectral clustering algorithm for machine-part CF. Some experimental examples are used to illustrate its efficiency. Overall, the spectral clustering algorithm can be used in CF with a wide variety of machine/part matrices.Keywords: group technology, cell formation, spectral clustering, grouping efficiency
Procedia PDF Downloads 38024596 Data Mining Spatial: Unsupervised Classification of Geographic Data
Authors: Chahrazed Zouaoui
Abstract:
In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.Keywords: mining, GIS, geo-clustering, neighborhood
Procedia PDF Downloads 35924595 Interpretation of the Russia-Ukraine 2022 War via N-Gram Analysis
Authors: Elcin Timur Cakmak, Ayse Oguzlar
Abstract:
This study presents the results of the tweets sent by Twitter users on social media about the Russia-Ukraine war by bigram and trigram methods. On February 24, 2022, Russian President Vladimir Putin declared a military operation against Ukraine, and all eyes were turned to this war. Many people living in Russia and Ukraine reacted to this war and protested and also expressed their deep concern about this war as they felt the safety of their families and their futures were at stake. Most people, especially those living in Russia and Ukraine, express their views on the war in different ways. The most popular way to do this is through social media. Many people prefer to convey their feelings using Twitter, one of the most frequently used social media tools. Since the beginning of the war, it is seen that there have been thousands of tweets about the war from many countries of the world on Twitter. These tweets accumulated in data sources are extracted using various codes for analysis through Twitter API and analysed by Python programming language. The aim of the study is to find the word sequences in these tweets by the n-gram method, which is known for its widespread use in computational linguistics and natural language processing. The tweet language used in the study is English. The data set consists of the data obtained from Twitter between February 24, 2022, and April 24, 2022. The tweets obtained from Twitter using the #ukraine, #russia, #war, #putin, #zelensky hashtags together were captured as raw data, and the remaining tweets were included in the analysis stage after they were cleaned through the preprocessing stage. In the data analysis part, the sentiments are found to present what people send as a message about the war on Twitter. Regarding this, negative messages make up the majority of all the tweets as a ratio of %63,6. Furthermore, the most frequently used bigram and trigram word groups are found. Regarding the results, the most frequently used word groups are “he, is”, “I, do”, “I, am” for bigrams. Also, the most frequently used word groups are “I, do, not”, “I, am, not”, “I, can, not” for trigrams. In the machine learning phase, the accuracy of classifications is measured by Classification and Regression Trees (CART) and Naïve Bayes (NB) algorithms. The algorithms are used separately for bigrams and trigrams. We gained the highest accuracy and F-measure values by the NB algorithm and the highest precision and recall values by the CART algorithm for bigrams. On the other hand, the highest values for accuracy, precision, and F-measure values are achieved by the CART algorithm, and the highest value for the recall is gained by NB for trigrams.Keywords: classification algorithms, machine learning, sentiment analysis, Twitter
Procedia PDF Downloads 5424594 Communication of Sensors in Clustering for Wireless Sensor Networks
Authors: Kashish Sareen, Jatinder Singh Bal
Abstract:
The use of wireless sensor networks (WSNs) has grown vastly in the last era, pointing out the crucial need for scalable and energy-efficient routing and data gathering and aggregation protocols in corresponding large-scale environments. Wireless Sensor Networks have now recently emerged as a most important computing platform and continue to grow in diverse areas to provide new opportunities for networking and services. However, the energy constrained and limited computing resources of the sensor nodes present major challenges in gathering data. The sensors collect data about their surrounding and forward it to a command centre through a base station. The past few years have witnessed increased interest in the potential use of wireless sensor networks (WSNs) as they are very useful in target detecting and other applications. However, hierarchical clustering protocols have maximum been used in to overall system lifetime, scalability and energy efficiency. In this paper, the state of the art in corresponding hierarchical clustering approaches for large-scale WSN environments is shown.Keywords: clustering, DLCC, MLCC, wireless sensor networks
Procedia PDF Downloads 45424593 Quantitative Research on the Effects of Following Brands on Twitter on Consumer Brand Attitude
Authors: Yujie Wei
Abstract:
Twitter uses a variety of narrative methods (e.g., messages, featured videos, music, and actual events) to strengthen its cultivation effect. Consumers are receiving mass-produced brand stores or images made by brand managers according to strict market specifications. Drawing on the cultivation theory, this quantitative research investigates how following a brand on Twitter for 12 weeks can cultivate their attitude toward the brand and influence their purchase intentions. We conducted three field experiments on Twitter to test the cultivation effects of following a brand for 12 weeks on consumer attitude toward the followed brand. The cultivation effects were measured by comparing the changes in consumer attitudes before and after they have followed a brand over time. The findings of our experiments suggest that when consumers are exposed to a brand’s stable, pervasive, and recurrent tweets on Twitter for 12 weeks, their attitude toward a brand can be significantly changed, which confirms the cultivating effects on consumer attitude. Also, the results indicate that branding activities on Twitter, when properly implemented, can be very effective in changing consumer attitudes toward a brand, increasing the purchase intentions, and increasing their willingness to spread the word-of-mouth for the brand on social media. The cultivation effects are moderated by brand type and consumer age. The research provides three major marketing implications. First, Twitter marketers should create unique content to engage their brand followers to change their brand attitude through steady, cumulative exposure to the branding activities on Twitter. Second, there is a significant moderating effect of brand type on the cultivation effects, so Twitter marketers should align their branding content with the brand type to better meet the needs and wants of consumers for different types of brands. Finally, Twitter marketers should adapt their tweeting strategies according to the media consumption preferences of different age groups of their target markets. This empirical research proves that content is king.Keywords: tweeting, cultivation theory, consumer brand attitude, purchase intentions, word-of-mouth
Procedia PDF Downloads 90