Search results for: semantic clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1073

Search results for: semantic clustering

953 Spectral Clustering from the Discrepancy View and Generalized Quasirandomness

Authors: Marianna Bolla

Abstract:

The aim of this paper is to compare spectral, discrepancy, and degree properties of expanding graph sequences. As we can prove equivalences and implications between them and the definition of the generalized (multiclass) quasirandomness of Lovasz–Sos (2008), they can be regarded as generalized quasirandom properties akin to the equivalent quasirandom properties of the seminal Chung-Graham-Wilson paper (1989) in the one-class scenario. Since these properties are valid for deterministic graph sequences, irrespective of stochastic models, the partial implications also justify for low-dimensional embedding of large-scale graphs and for discrepancy minimizing spectral clustering.

Keywords: generalized random graphs, multiway discrepancy, normalized modularity spectra, spectral clustering

Procedia PDF Downloads 163
952 A Clustering Algorithm for Massive Texts

Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen

Abstract:

Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.

Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process

Procedia PDF Downloads 404
951 Effect of Bi-Dispersity on Particle Clustering in Sedimentation

Authors: Ali Abbas Zaidi

Abstract:

In free settling or sedimentation, particles form clusters at high Reynolds number and dilute suspensions. It is due to the entrapment of particles in the wakes of upstream particles. In this paper, the effect of bi-dispersity of settling particles on particle clustering is investigated using particle-resolved direct numerical simulation. Immersed boundary method is used for particle fluid interactions and discrete element method is used for particle-particle interactions. The solid volume fraction used in the simulation is 1% and the Reynolds number based on Sauter mean diameter is 350. Both solid volume fraction and Reynolds number lie in the clustering regime of sedimentation. In simulations, the particle diameter ratio (i.e. diameter of larger particle to smaller particle (d₁/d₂)) is varied from 2:1, 3:1 and 4:1. For each case of particle diameter ratio, solid volume fraction for each particle size (φ₁/φ₂) is varied from 1:1, 1:2 and 2:1. For comparison, simulations are also performed for monodisperse particles. For studying particles clustering, radial distribution function and instantaneous location of particles in the computational domain are studied. It is observed that the degree of particle clustering decreases with the increase in the bi-dispersity of settling particles. The smallest degree of particle clustering or dispersion of particles is observed for particles with d₁/d₂ equal to 4:1 and φ₁/φ₂ equal to 1:2. Simulations showed that the reduction in particle clustering by increasing bi-dispersity is due to the difference in settling velocity of particles. Particles with larger size settle faster and knockout the smaller particles from clustered regions of particles in the computational domain.

Keywords: dispersion in bi-disperse settling particles, particle microstructures in bi-disperse suspensions, particle resolved direct numerical simulations, settling of bi-disperse particles

Procedia PDF Downloads 174
950 A Learning-Based EM Mixture Regression Algorithm

Authors: Yi-Cheng Tian, Miin-Shen Yang

Abstract:

The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.

Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model

Procedia PDF Downloads 478
949 Communication of Sensors in Clustering for Wireless Sensor Networks

Authors: Kashish Sareen, Jatinder Singh Bal

Abstract:

The use of wireless sensor networks (WSNs) has grown vastly in the last era, pointing out the crucial need for scalable and energy-efficient routing and data gathering and aggregation protocols in corresponding large-scale environments. Wireless Sensor Networks have now recently emerged as a most important computing platform and continue to grow in diverse areas to provide new opportunities for networking and services. However, the energy constrained and limited computing resources of the sensor nodes present major challenges in gathering data. The sensors collect data about their surrounding and forward it to a command centre through a base station. The past few years have witnessed increased interest in the potential use of wireless sensor networks (WSNs) as they are very useful in target detecting and other applications. However, hierarchical clustering protocols have maximum been used in to overall system lifetime, scalability and energy efficiency. In this paper, the state of the art in corresponding hierarchical clustering approaches for large-scale WSN environments is shown.

Keywords: clustering, DLCC, MLCC, wireless sensor networks

Procedia PDF Downloads 449
948 Interpretation and Clustering Framework for Analyzing ECG Survey Data

Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif

Abstract:

As Indo-Pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.

Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix

Procedia PDF Downloads 441
947 An Extraction of Cancer Region from MR Images Using Fuzzy Clustering Means and Morphological Operations

Authors: Ramandeep Kaur, Gurjit Singh Bhathal

Abstract:

Cancer diagnosis is very difficult task. Magnetic resonance imaging (MRI) scan is used to produce image of any part of the body and provides an efficient way for diagnosis of cancer or tumor. In existing method, fuzzy clustering mean (FCM) is used for the diagnosis of the tumor. In the proposed method FCM is used to diagnose the cancer of the foot. FCM finds the centroids of the clusters of the foot cancer obtained from MRI images. FCM thresholding result shows the extract region of the cancer. Morphological operations are applied to get extracted region of cancer.

Keywords: magnetic resonance imaging (MRI), fuzzy C mean clustering, segmentation, morphological operations

Procedia PDF Downloads 362
946 Analysis of ECGs Survey Data by Applying Clustering Algorithm

Authors: Irum Matloob, Shoab Ahmad Khan, Fahim Arif

Abstract:

As Indo-pak has been the victim of heart diseases since many decades. Many surveys showed that percentage of cardiac patients is increasing in Pakistan day by day, and special attention is needed to pay on this issue. The framework is proposed for performing detailed analysis of ECG survey data which is conducted for measuring the prevalence of heart diseases statistics in Pakistan. The ECG survey data is evaluated or filtered by using automated Minnesota codes and only those ECGs are used for further analysis which is fulfilling the standardized conditions mentioned in the Minnesota codes. Then feature selection is performed by applying proposed algorithm based on discernibility matrix, for selecting relevant features from the database. Clustering is performed for exposing natural clusters from the ECG survey data by applying spectral clustering algorithm using fuzzy c means algorithm. The hidden patterns and interesting relationships which have been exposed after this analysis are useful for further detailed analysis and for many other multiple purposes.

Keywords: arrhythmias, centroids, ECG, clustering, discernibility matrix

Procedia PDF Downloads 327
945 Double Clustering as an Unsupervised Approach for Order Picking of Distributed Warehouses

Authors: Hsin-Yi Huang, Ming-Sheng Liu, Jiun-Yan Shiau

Abstract:

Planning the order picking lists of warehouses to achieve when the costs associated with logistics on the operational performance is a significant challenge. In e-commerce era, this task is especially important productive processes are high. Nowadays, many order planning techniques employ supervised machine learning algorithms. However, the definition of which features should be processed by such algorithms is not a simple task, being crucial to the proposed technique’s success. Against this background, we consider whether unsupervised algorithms can enhance the planning of order-picking lists. A Zone2 picking approach, which is based on using clustering algorithms twice, is developed. A simplified example is given to demonstrate the merit of our approach.

Keywords: order picking, warehouse, clustering, unsupervised learning

Procedia PDF Downloads 122
944 An Adaptive Dimensionality Reduction Approach for Hyperspectral Imagery Semantic Interpretation

Authors: Akrem Sellami, Imed Riadh Farah, Basel Solaiman

Abstract:

With the development of HyperSpectral Imagery (HSI) technology, the spectral resolution of HSI became denser, which resulted in large number of spectral bands, high correlation between neighboring, and high data redundancy. However, the semantic interpretation is a challenging task for HSI analysis due to the high dimensionality and the high correlation of the different spectral bands. In fact, this work presents a dimensionality reduction approach that allows to overcome the different issues improving the semantic interpretation of HSI. Therefore, in order to preserve the spatial information, the Tensor Locality Preserving Projection (TLPP) has been applied to transform the original HSI. In the second step, knowledge has been extracted based on the adjacency graph to describe the different pixels. Based on the transformation matrix using TLPP, a weighted matrix has been constructed to rank the different spectral bands based on their contribution score. Thus, the relevant bands have been adaptively selected based on the weighted matrix. The performance of the presented approach has been validated by implementing several experiments, and the obtained results demonstrate the efficiency of this approach compared to various existing dimensionality reduction techniques. Also, according to the experimental results, we can conclude that this approach can adaptively select the relevant spectral improving the semantic interpretation of HSI.

Keywords: band selection, dimensionality reduction, feature extraction, hyperspectral imagery, semantic interpretation

Procedia PDF Downloads 328
943 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 218
942 Parallel Querying of Distributed Ontologies with Shared Vocabulary

Authors: Sharjeel Aslam, Vassil Vassilev, Karim Ouazzane

Abstract:

Ontologies and various semantic repositories became a convenient approach for implementing model-driven architectures of distributed systems on the Web. SPARQL is the standard query language for querying such. However, although SPARQL is well-established standard for querying semantic repositories in RDF and OWL format and there are commonly used APIs which supports it, like Jena for Java, its parallel option is not incorporated in them. This article presents a complete framework consisting of an object algebra for parallel RDF and an index-based implementation of the parallel query engine capable of dealing with the distributed RDF ontologies which share common vocabulary. It has been implemented in Java, and for validation of the algorithms has been applied to the problem of organizing virtual exhibitions on the Web.

Keywords: distributed ontologies, parallel querying, semantic indexing, shared vocabulary, SPARQL

Procedia PDF Downloads 164
941 From Shallow Semantic Representation to Deeper One: Verb Decomposition Approach

Authors: Aliaksandr Huminski

Abstract:

Semantic Role Labeling (SRL) as shallow semantic parsing approach includes recognition and labeling arguments of a verb in a sentence. Verb participants are linked with specific semantic roles (Agent, Patient, Instrument, Location, etc.). Thus, SRL can answer on key questions such as ‘Who’, ‘When’, ‘What’, ‘Where’ in a text and it is widely applied in dialog systems, question-answering, named entity recognition, information retrieval, and other fields of NLP. However, SRL has the following flaw: Two sentences with identical (or almost identical) meaning can have different semantic role structures. Let consider 2 sentences: (1) John put butter on the bread. (2) John buttered the bread. SRL for (1) and (2) will be significantly different. For the verb put in (1) it is [Agent + Patient + Goal], but for the verb butter in (2) it is [Agent + Goal]. It happens because of one of the most interesting and intriguing features of a verb: Its ability to capture participants as in the case of the verb butter, or their features as, say, in the case of the verb drink where the participant’s feature being liquid is shared with the verb. This capture looks like a total fusion of meaning and cannot be decomposed in direct way (in comparison with compound verbs like babysit or breastfeed). From this perspective, SRL looks really shallow to represent semantic structure. If the key point in semantic representation is an opportunity to use it for making inferences and finding hidden reasons, it assumes by default that two different but semantically identical sentences must have the same semantic structure. Otherwise we will have different inferences from the same meaning. To overcome the above-mentioned flaw, the following approach is suggested. Assume that: P is a participant of relation; F is a feature of a participant; Vcp is a verb that captures a participant; Vcf is a verb that captures a feature of a participant; Vpr is a primitive verb or a verb that does not capture any participant and represents only a relation. In another word, a primitive verb is a verb whose meaning does not include meanings from its surroundings. Then Vcp and Vcf can be decomposed as: Vcp = Vpr +P; Vcf = Vpr +F. If all Vcp and Vcf will be represented this way, then primitive verbs Vpr can be considered as a canonical form for SRL. As a result of that, there will be no hidden participants caught by a verb since all participants will be explicitly unfolded. An obvious example of Vpr is the verb go, which represents pure movement. In this case the verb drink can be represented as man-made movement of liquid into specific direction. Extraction and using primitive verbs for SRL create a canonical representation unique for semantically identical sentences. It leads to the unification of semantic representation. In this case, the critical flaw related to SRL will be resolved.

Keywords: decomposition, labeling, primitive verbs, semantic roles

Procedia PDF Downloads 338
940 Online Topic Model for Broadcasting Contents Using Semantic Correlation Information

Authors: Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script which is a series of texts including directions and dialogues. The other is blogposts which possesses relatively abstracted contents, stories and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. In order to improve the quality of topics, it needs a method to consider the word difference. In this paper, we introduce a semantic vocabulary expansion method to solve the word difference. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can learn more salient topics for broadcasting contents.

Keywords: broadcasting script analysis, topic expansion, semantic correlation analysis, word2vec

Procedia PDF Downloads 230
939 Lexical-Semantic Deficits in Sinhala Speaking Persons with Post Stroke Aphasia: Evidence from Single Word Auditory Comprehension Task

Authors: D. W. M. S. Samarathunga, Isuru Dharmarathne

Abstract:

In aphasia, various levels of symbolic language processing (semantics) are affected. It is shown that Persons with Aphasia (PWA) often experience more problems comprehending some categories of words than others. The study aimed to determine lexical semantic deficits seen in Auditory Comprehension (AC) and to describe lexical-semantic deficits across six selected word categories. Thirteen (n =13) persons diagnosed with post-stroke aphasia (PSA) were recruited to perform an AC task. Foods, objects, clothes, vehicles, body parts and animals were selected as the six categories. As the test stimuli, black and white line drawings were adapted from a picture set developed for semantic studies by Snodgrass and Vanderwart. A pilot study was conducted with five (n=5) healthy nonbrain damaged Sinhala speaking adults to decide familiarity and applicability of the test material. In the main study, participants were scored based on the accuracy and number of errors shown. The results indicate similar trends of lexical semantic deficits identified in the literature confirming ‘animals’ to be the easiest category to comprehend. Mann-Whitney U test was performed to determine the association between the selected variables and the participants’ performance on AC task. No statistical significance was found between the errors and the type of aphasia reflecting similar patterns described in aphasia literature in other languages. The current study indicates the presence of selectivity of lexical semantic deficits in AC and a hierarchy was developed based on the complexity of the categories to comprehend by Sinhala speaking PWA, which might be clinically beneficial when improving language skills of Sinhala speaking persons with post-stroke aphasia. However, further studies on aphasia should be conducted with larger samples for a longer period to study deficits in Sinhala and other Sri Lankan languages (Tamil and Malay).

Keywords: aphasia, auditory comprehension, selective lexical-semantic deficits, semantic categories

Procedia PDF Downloads 224
938 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning

Authors: Karthik Mittal

Abstract:

This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.

Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA

Procedia PDF Downloads 116
937 Energy-Efficient Clustering Protocol in Wireless Sensor Networks for Healthcare Monitoring

Authors: Ebrahim Farahmand, Ali Mahani

Abstract:

Wireless sensor networks (WSNs) can facilitate continuous monitoring of patients and increase early detection of emergency conditions and diseases. High density WSNs helps us to accurately monitor a remote environment by intelligently combining the data from the individual nodes. Due to energy capacity limitation of sensors, enhancing the lifetime and the reliability of WSNs are important factors in designing of these networks. The clustering strategies are verified as effective and practical algorithms for reducing energy consumption in WSNs and can tackle WSNs limitations. In this paper, an Energy-efficient weight-based Clustering Protocol (EWCP) is presented. Artificial retina is selected as a case study of WSNs applied in body sensors. Cluster heads’ (CHs) selection is equipped with energy efficient parameters. Moreover, cluster members are selected based on their distance to the selected CHs. Comparing with the other benchmark protocols, the lifetime of EWCP is improved significantly.

Keywords: WSN, healthcare monitoring, weighted based clustering, lifetime

Procedia PDF Downloads 286
936 A Semantic and Concise Structure to Represent Human Actions

Authors: Tobias Strübing, Fatemeh Ziaeetabar

Abstract:

Humans usually manipulate objects with their hands. To represent these actions in a simple and understandable way, we need to use a semantic framework. For this purpose, the Semantic Event Chain (SEC) method has already been presented which is done by consideration of touching and non-touching relations between manipulated objects in a scene. This method was improved by a computational model, the so-called enriched Semantic Event Chain (eSEC), which incorporates the information of static (e.g. top, bottom) and dynamic spatial relations (e.g. moving apart, getting closer) between objects in an action scene. This leads to a better action prediction as well as the ability to distinguish between more actions. Each eSEC manipulation descriptor is a huge matrix with thirty rows and a massive set of the spatial relations between each pair of manipulated objects. The current eSEC framework has so far only been used in the category of manipulation actions, which eventually involve two hands. Here, we would like to extend this approach to a whole body action descriptor and make a conjoint activity representation structure. For this purpose, we need to do a statistical analysis to modify the current eSEC by summarizing while preserving its features, and introduce a new version called Enhanced eSEC or (e2SEC). This summarization can be done from two points of the view: 1) reducing the number of rows in an eSEC matrix, 2) shrinking the set of possible semantic spatial relations. To achieve these, we computed the importance of each matrix row in an statistical way, to see if it is possible to remove a particular one while all manipulations are still distinguishable from each other. On the other hand, we examined which semantic spatial relations can be merged without compromising the unity of the predefined manipulation actions. Therefore by performing the above analyses, we made the new e2SEC framework which has 20% fewer rows, 16.7% less static spatial and 11.1% less dynamic spatial relations. This simplification, while preserving the salient features of a semantic structure in representing actions, has a tremendous impact on the recognition and prediction of complex actions, as well as the interactions between humans and robots. It also creates a comprehensive platform to integrate with the body limbs descriptors and dramatically increases system performance, especially in complex real time applications such as human-robot interaction prediction.

Keywords: enriched semantic event chain, semantic action representation, spatial relations, statistical analysis

Procedia PDF Downloads 83
935 Clustering Based Level Set Evaluation for Low Contrast Images

Authors: Bikshalu Kalagadda, Srikanth Rangu

Abstract:

The important object of images segmentation is to extract objects with respect to some input features. One of the important methods for image segmentation is Level set method. Generally medical images and synthetic images with low contrast of pixel profile, for such images difficult to locate interested features in images. In conventional level set function, develops irregularity during its process of evaluation of contour of objects, this destroy the stability of evolution process. For this problem a remedy is proposed, a new hybrid algorithm is Clustering Level Set Evolution. Kernel fuzzy particles swarm optimization clustering with the Distance Regularized Level Set (DRLS) and Selective Binary, and Gaussian Filtering Regularized Level Set (SBGFRLS) methods are used. The ability of identifying different regions becomes easy with improved speed. Efficiency of the modified method can be evaluated by comparing with the previous method for similar specifications. Comparison can be carried out by considering medical and synthetic images.

Keywords: segmentation, clustering, level set function, re-initialization, Kernel fuzzy, swarm optimization

Procedia PDF Downloads 327
934 Component Based Testing Using Clustering and Support Vector Machine

Authors: Iqbaldeep Kaur, Amarjeet Kaur

Abstract:

Software Reusability is important part of software development. So component based software development in case of software testing has gained a lot of practical importance in the field of software engineering from academic researcher and also from software development industry perspective. Finding test cases for efficient reuse of test cases is one of the important problems aimed by researcher. Clustering reduce the search space, reuse test cases by grouping similar entities according to requirements ensuring reduced time complexity as it reduce the search time for retrieval the test cases. In this research paper we proposed approach for re-usability of test cases by unsupervised approach. In unsupervised learning we proposed k-mean and Support Vector Machine. We have designed the algorithm for requirement and test case document clustering according to its tf-idf vector space and the output is set of highly cohesive pattern groups.

Keywords: software testing, reusability, clustering, k-mean, SVM

Procedia PDF Downloads 398
933 Clustering Performance Analysis using New Correlation-Based Cluster Validity Indices

Authors: Nathakhun Wiroonsri

Abstract:

There are various cluster validity measures used for evaluating clustering results. One of the main objectives of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weaknesses that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal option that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points are located in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios, including the well-known iris data set and a real-world marketing application, have been conducted to compare the proposed validity indices with several well-known ones.

Keywords: clustering algorithm, cluster validity measure, correlation, data partitions, iris data set, marketing, pattern recognition

Procedia PDF Downloads 81
932 Personalize E-Learning System Based on Clustering and Sequence Pattern Mining Approach

Authors: H. S. Saini, K. Vijayalakshmi, Rishi Sayal

Abstract:

Network-based education has been growing rapidly in size and quality. Knowledge clustering becomes more important in personalized information retrieval for web-learning. A personalized-Learning service after the learners’ knowledge has been classified with clustering. Through automatic analysis of learners’ behaviors, their partition with similar data level and interests may be discovered so as to produce learners with contents that best match educational needs for collaborative learning. We present a specific mining tool and a recommender engine that we have integrated in the online learning in order to help the teacher to carry out the whole e-learning process. We propose to use sequential pattern mining algorithms to discover the most used path by the students and from this information can recommend links to the new students automatically meanwhile they browse in the course. We have Developed a specific author tool in order to help the teacher to apply all the data mining process. We tend to report on many experiments with real knowledge so as to indicate the quality of using both clustering and sequential pattern mining algorithms together for discovering personalized e-learning systems.

Keywords: e-learning, cluster, personalization, sequence, pattern

Procedia PDF Downloads 399
931 Data Clustering Algorithm Based on Multi-Objective Periodic Bacterial Foraging Optimization with Two Learning Archives

Authors: Chen Guo, Heng Tang, Ben Niu

Abstract:

Clustering splits objects into different groups based on similarity, making the objects have higher similarity in the same group and lower similarity in different groups. Thus, clustering can be treated as an optimization problem to maximize the intra-cluster similarity or inter-cluster dissimilarity. In real-world applications, the datasets often have some complex characteristics: sparse, overlap, high dimensionality, etc. When facing these datasets, simultaneously optimizing two or more objectives can obtain better clustering results than optimizing one objective. However, except for the objectives weighting methods, traditional clustering approaches have difficulty in solving multi-objective data clustering problems. Due to this, evolutionary multi-objective optimization algorithms are investigated by researchers to optimize multiple clustering objectives. In this paper, the Data Clustering algorithm based on Multi-objective Periodic Bacterial Foraging Optimization with two Learning Archives (DC-MPBFOLA) is proposed. Specifically, first, to reduce the high computing complexity of the original BFO, periodic BFO is employed as the basic algorithmic framework. Then transfer the periodic BFO into a multi-objective type. Second, two learning strategies are proposed based on the two learning archives to guide the bacterial swarm to move in a better direction. On the one hand, the global best is selected from the global learning archive according to the convergence index and diversity index. On the other hand, the personal best is selected from the personal learning archive according to the sum of weighted objectives. According to the aforementioned learning strategies, a chemotaxis operation is designed. Third, an elite learning strategy is designed to provide fresh power to the objects in two learning archives. When the objects in these two archives do not change for two consecutive times, randomly initializing one dimension of objects can prevent the proposed algorithm from falling into local optima. Fourth, to validate the performance of the proposed algorithm, DC-MPBFOLA is compared with four state-of-art evolutionary multi-objective optimization algorithms and one classical clustering algorithm on evaluation indexes of datasets. To further verify the effectiveness and feasibility of designed strategies in DC-MPBFOLA, variants of DC-MPBFOLA are also proposed. Experimental results demonstrate that DC-MPBFOLA outperforms its competitors regarding all evaluation indexes and clustering partitions. These results also indicate that the designed strategies positively influence the performance improvement of the original BFO.

Keywords: data clustering, multi-objective optimization, bacterial foraging optimization, learning archives

Procedia PDF Downloads 110
930 Review: Wavelet New Tool for Path Loss Prediction

Authors: Danladi Ali, Abdullahi Mukaila

Abstract:

In this work, GSM signal strength (power) was monitored in an indoor environment. Samples of the GSM signal strength was measured on mobile equipment (ME). One-dimensional multilevel wavelet is used to predict the fading phenomenon of the GSM signal measured and neural network clustering to determine the average power received in the study area. The wavelet prediction revealed that the GSM signal is attenuated due to the fast fading phenomenon which fades about 7 times faster than the radio wavelength while the neural network clustering determined that -75dBm appeared more frequently followed by -85dBm. The work revealed that significant part of the signal measured is dominated by weak signal and the signal followed more of Rayleigh than Gaussian distribution. This confirmed the wavelet prediction.

Keywords: decomposition, clustering, propagation, model, wavelet, signal strength and spectral efficiency

Procedia PDF Downloads 422
929 Assessing the Structure of Non-Verbal Semantic Knowledge: The Evaluation and First Results of the Hungarian Semantic Association Test

Authors: Alinka Molnár-Tóth, Tímea Tánczos, Regina Barna, Katalin Jakab, Péter Klivényi

Abstract:

Supported by neuroscientific findings, the so-called Hub-and-Spoke model of the human semantic system is based on two subcomponents of semantic cognition, namely the semantic control process and semantic representation. Our semantic knowledge is multimodal in nature, as the knowledge system stored in relation to a conception is extensive and broad, while different aspects of the conception may be relevant depending on the purpose. The motivation of our research is to develop a new diagnostic measurement procedure based on the preservation of semantic representation, which is appropriate to the specificities of the Hungarian language and which can be used to compare the non-verbal semantic knowledge of healthy and aphasic persons. The development of the test will broaden the Hungarian clinical diagnostic toolkit, which will allow for more specific therapy planning. The sample of healthy persons (n=480) was determined by the last census data for the representativeness of the sample. Based on the concept of the Pyramids and Palm Tree Test, and according to the characteristics of the Hungarian language, we have elaborated a test based on different types of semantic information, in which the subjects are presented with three pictures: they have to choose the one that best fits the target word above from the two lower options, based on the semantic relation defined. We have measured 5 types of semantic knowledge representations: associative relations, taxonomy, motional representations, concrete as well as abstract verbs. As the first step in our data analysis, we examined the normal distribution of our results, and since it was not normally distributed (p < 0.05), we used nonparametric statistics further into the analysis. Using descriptive statistics, we could determine the frequency of the correct and incorrect responses, and with this knowledge, we could later adjust and remove the items of questionable reliability. The reliability was tested using Cronbach’s α, and it can be safely said that all the results were in an acceptable range of reliability (α = 0.6-0.8). We then tested for the potential gender differences using the Mann Whitney-U test, however, we found no difference between the two (p < 0.05). Likewise, we didn’t see that the age had any effect on the results using one-way ANOVA (p < 0.05), however, the level of education did influence the results (p > 0.05). The relationships between the subtests were observed by the nonparametric Spearman’s rho correlation matrix, showing statistically significant correlation between the subtests (p > 0.05), signifying a linear relationship between the measured semantic functions. A margin of error of 5% was used in all cases. The research will contribute to the expansion of the clinical diagnostic toolkit and will be relevant for the individualised therapeutic design of treatment procedures. The use of a non-verbal test procedure will allow an early assessment of the most severe language conditions, which is a priority in the differential diagnosis. The measurement of reaction time is expected to advance prodrome research, as the tests can be easily conducted in the subclinical phase.

Keywords: communication disorders, diagnostic toolkit, neurorehabilitation, semantic knowlegde

Procedia PDF Downloads 69
928 Semantic Processing in Chinese: Category Effects, Task Effects and Age Effects

Authors: Yi-Hsiu Lai

Abstract:

The present study aimed to elucidate the nature of semantic processing in Chinese. Language and cognition related to the issue of aging are examined from the perspective of picture naming and category fluency tasks. Twenty Chinese-speaking adults (ranging from 25 to 45 years old) and twenty Chinese-speaking seniors (ranging from 65 to 75 years old) in Taiwan participated in this study. Each of them individually completed two tasks: a picture naming task and a category fluency task. Instruments for the naming task were sixty black-and-white pictures: thirty-five object and twenty-five action pictures. Category fluency task also consisted of two semantic categories – objects (or nouns) and actions (or verbs). Participants were asked to report as many items within a category as possible in one minute. Scores of action fluency and of object fluency were a summation of correct responses in these two categories. Category effects (actions vs. objects) and age effects were examined in these tasks. Objects were further divided into two major types: living objects and non-living objects. Actions were also categorized into two major types: action verbs and process verbs. Reaction time to each picture/question was additionally calculated and analyzed. Results of the category fluency task indicated that the content of information in Chinese seniors was comparatively deteriorated, thus producing smaller number of semantic-lexical items. Significant group difference was also found in the results of reaction time. Category Effect was significant for both Chinese adults and seniors in the semantic fluency task. Findings in the present study helped characterize the nature of semantic processing in Chinese-speaking adults and seniors and contributed to the issue of language and aging.

Keywords: semantic processing, aging, Chinese, category effects

Procedia PDF Downloads 333
927 Semantic Search Engine Based on Query Expansion with Google Ranking and Similarity Measures

Authors: Ahmad Shahin, Fadi Chakik, Walid Moudani

Abstract:

Our study is about elaborating a potential solution for a search engine that involves semantic technology to retrieve information and display it significantly. Semantic search engines are not used widely over the web as the majorities are still in Beta stage or under construction. Many problems face the current applications in semantic search, the major problem is to analyze and calculate the meaning of query in order to retrieve relevant information. Another problem is the ontology based index and its updates. Ranking results according to concept meaning and its relation with query is another challenge. In this paper, we are offering a light meta-engine (QESM) which uses Google search, and therefore Google’s index, with some adaptations to its returned results by adding multi-query expansion. The mission was to find a reliable ranking algorithm that involves semantics and uses concepts and meanings to rank results. At the beginning, the engine finds synonyms of each query term entered by the user based on a lexical database. Then, query expansion is applied to generate different semantically analogous sentences. These are generated randomly by combining the found synonyms and the original query terms. Our model suggests the use of semantic similarity measures between two sentences. Practically, we used this method to calculate semantic similarity between each query and the description of each page’s content generated by Google. The generated sentences are sent to Google engine one by one, and ranked again all together with the adapted ranking method (QESM). Finally, our system will place Google pages with higher similarities on the top of the results. We have conducted experimentations with 6 different queries. We have observed that most ranked results with QESM were altered with Google’s original generated pages. With our experimented queries, QESM generates frequently better accuracy than Google. In some worst cases, it behaves like Google.

Keywords: semantic search engine, Google indexing, query expansion, similarity measures

Procedia PDF Downloads 400
926 Improved Color-Based K-Mean Algorithm for Clustering of Satellite Image

Authors: Sangeeta Yadav, Mantosh Biswas

Abstract:

In this paper, we proposed an improved color based K-mean algorithm for clustering of satellite Image (SAR). Our method comprises of two stages. The first step is an interactive selection process where users are required to input the number of colors (ncolor), number of clusters, and then they are prompted to select the points in each color cluster. In the second step these points are given as input to K-mean clustering algorithm that clusters the image based on color and Minimum Square Euclidean distance. The proposed method reduces the mixed pixel problem to a great extent.

Keywords: cluster, ncolor method, K-mean method, interactive selection process

Procedia PDF Downloads 258
925 An Ontology for Semantic Enrichment of RFID Systems

Authors: Haitham S. Hamza, Mohamed Maher, Shourok Alaa, Aya Khattab, Hadeal Ismail, Kamilia Hosny

Abstract:

Radio Frequency Identification (RFID) has become a key technology in the margining concept of Internet of Things (IoT). Naturally, business applications would require the deployment of various RFID systems that are developed by different vendors and use various data formats. This heterogeneity poses a real challenge in developing large-scale IoT systems with RFID as integration is becoming very complex and challenging. Semantic integration is a key approach to deal with this challenge. To do so, ontology for RFID systems need to be developed in order to annotated semantically RFID systems, and hence, facilitate their integration. Accordingly, in this paper, we propose ontology for RFID systems. The proposed ontology can be used to semantically enrich RFID systems, and hence, improve their usage and reasoning. The usage of the proposed ontology is explained through a simple scenario in the health care domain.

Keywords: RFID, semantic technology, ontology, sparql query language, heterogeneity

Procedia PDF Downloads 438
924 New Ways of Vocabulary Enlargement

Authors: S. Pesina, T. Solonchak

Abstract:

Lexical invariants, being a sort of stereotypes within the frames of ordinary consciousness, are created by the members of a language community as a result of uniform division of reality. The invariant meaning is formed in person’s mind gradually in the course of different actualizations of secondary meanings in various contexts. We understand lexical the invariant as abstract language essence containing a set of semantic components. In one of its configurations it is the basis or all or a number of the meanings making up the semantic structure of the word.

Keywords: lexical invariant, invariant theories, polysemantic word, cognitive linguistics

Procedia PDF Downloads 285