Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8

unsupervised learning Related Abstracts

8 Unsupervised Assistive and Adaptative Intelligent Agent in Smart Enviroment

Authors: Sebastião Pais, João Casal, Ricardo Ponciano, Sérgio Lorenço

Abstract:

The adaptation paradigm is a basic defining feature for pervasive computing systems. Adaptation systems must work efficiently in a smart environment while providing suitable information relevant to the user system interaction. The key objective is to deduce the information needed information changes. Therefore relying on fixed operational models would be inappropriate. This paper presents a study on developing an Intelligent Personal Assistant to assist the user in interacting with their Smart Environment. We propose an Unsupervised and Language-Independent Adaptation through Intelligent Speech Interface and a set of methods of Acquiring Knowledge, namely Semantic Similarity and Unsupervised Learning.

Keywords: Knowledge Acquisition, unsupervised learning, intelligent personal assistants, intelligent speech interface, language-independent, association measures, symmetric word similarities, attributional word similarities

Procedia PDF Downloads 363
7 Unsupervised Assistive and Adaptive Intelligent Agent in Smart Environment

Authors: Sebastião Pais, João Casal, Ricardo Ponciano, Sérgio Lourenço

Abstract:

The adaptation paradigm is a basic defining feature for pervasive computing systems. Adaptation systems must work efficiently in smart environment while providing suitable information relevant to the user system interaction. The key objective is to deduce the information needed information changes. Therefore, relying on fixed operational models would be inappropriate. This paper presents a study on developing a Intelligent Personal Assistant to assist the user in interacting with their Smart Environment. We propose a Unsupervised and Language-Independent Adaptation through Intelligent Speech Interface and a set of methods of Acquiring Knowledge, namely Semantic Similarity and Unsupervised Learning.

Keywords: Knowledge Acquisition, unsupervised learning, intelligent personal assistants, intelligent speech interface, language-independent, association measures, symmetric word similarities, attributional word similarities

Procedia PDF Downloads 430
6 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.

Keywords: Algorithms, Clustering, unsupervised learning, hierarchical

Procedia PDF Downloads 399
5 Classification of Forest Types Using Remote Sensing and Self-Organizing Maps

Authors: Wanderson Goncalves e Goncalves, José Alberto Silva de Sá

Abstract:

Human actions are a threat to the balance and conservation of the Amazon forest. Therefore the environmental monitoring services play an important role as the preservation and maintenance of this environment. This study classified forest types using data from a forest inventory provided by the 'Florestal e da Biodiversidade do Estado do Pará' (IDEFLOR-BIO), located between the municipalities of Santarém, Juruti and Aveiro, in the state of Pará, Brazil, covering an area approximately of 600,000 hectares, Bands 3, 4 and 5 of the TM-Landsat satellite image, and Self - Organizing Maps. The information from the satellite images was extracted using QGIS software 2.8.1 Wien and was used as a database for training the neural network. The midpoints of each sample of forest inventory have been linked to images. Later the Digital Numbers of the pixels have been extracted, composing the database that fed the training process and testing of the classifier. The neural network was trained to classify two forest types: Rain Forest of Lowland Emerging Canopy (Dbe) and Rain Forest of Lowland Emerging Canopy plus Open with palm trees (Dbe + Abp) in the Mamuru Arapiuns glebes of Pará State, and the number of examples in the training data set was 400, 200 examples for each class (Dbe and Dbe + Abp), and the size of the test data set was 100, with 50 examples for each class (Dbe and Dbe + Abp). Therefore, total mass of data consisted of 500 examples. The classifier was compiled in Orange Data Mining 2.7 Software and was evaluated in terms of the confusion matrix indicators. The results of the classifier were considered satisfactory, and being obtained values of the global accuracy equal to 89% and Kappa coefficient equal to 78% and F1 score equal to 0,88. It evaluated also the efficiency of the classifier by the ROC plot (receiver operating characteristics), obtaining results close to ideal ratings, showing it to be a very good classifier, and demonstrating the potential of this methodology to provide ecosystem services, particularly in anthropogenic areas in the Amazon.

Keywords: Pattern Recognition, Computational Intelligence, unsupervised learning, Artificial Neural Network

Procedia PDF Downloads 224
4 Optimal Pricing Based on Real Estate Demand Data

Authors: Vanessa Kummer, Maik Meusel

Abstract:

Real estate demand estimates are typically derived from transaction data. However, in regions with excess demand, transactions are driven by supply and therefore do not indicate what people are actually looking for. To estimate the demand for housing in Switzerland, search subscriptions from all important Swiss real estate platforms are used. These data do, however, suffer from missing information—for example, many users do not specify how many rooms they would like or what price they would be willing to pay. In economic analyses, it is often the case that only complete data is used. Usually, however, the proportion of complete data is rather small which leads to most information being neglected. Also, the data might have a strong distortion if it is complete. In addition, the reason that data is missing might itself also contain information, which is however ignored with that approach. An interesting issue is, therefore, if for economic analyses such as the one at hand, there is an added value by using the whole data set with the imputed missing values compared to using the usually small percentage of complete data (baseline). Also, it is interesting to see how different algorithms affect that result. The imputation of the missing data is done using unsupervised learning. Out of the numerous unsupervised learning approaches, the most common ones, such as clustering, principal component analysis, or neural networks techniques are applied. By training the model iteratively on the imputed data and, thereby, including the information of all data into the model, the distortion of the first training set—the complete data—vanishes. In a next step, the performances of the algorithms are measured. This is done by randomly creating missing values in subsets of the data, estimating those values with the relevant algorithms and several parameter combinations, and comparing the estimates to the actual data. After having found the optimal parameter set for each algorithm, the missing values are being imputed. Using the resulting data sets, the next step is to estimate the willingness to pay for real estate. This is done by fitting price distributions for real estate properties with certain characteristics, such as the region or the number of rooms. Based on these distributions, survival functions are computed to obtain the functional relationship between characteristics and selling probabilities. Comparing the survival functions shows that estimates which are based on imputed data sets do not differ significantly from each other; however, the demand estimate that is derived from the baseline data does. This indicates that the baseline data set does not include all available information and is therefore not representative for the entire sample. Also, demand estimates derived from the whole data set are much more accurate than the baseline estimation. Thus, in order to obtain optimal results, it is important to make use of all available data, even though it involves additional procedures such as data imputation.

Keywords: Real Estate, unsupervised learning, demand estimate, missing-data imputation

Procedia PDF Downloads 153
3 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: Pattern Recognition, Knowledge Discovery, Clustering, unsupervised learning, k-means, categorical datasets

Procedia PDF Downloads 117
2 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: unsupervised learning, k-means, POS tagging, Amharic

Procedia PDF Downloads 257
1 Unsupervised Images Generation Based on Sloan Digital Sky Survey with Deep Convolutional Generative Neural Networks

Authors: Guanghua Zhang, Fubao Wang, Weijun Duan

Abstract:

Convolution neural network (CNN) has attracted more and more attention on recent years. Especially in the field of computer vision and image classification. However, unsupervised learning with CNN has received less attention than supervised learning. In this work, we use a new powerful tool which is deep convolutional generative adversarial networks (DCGANs) to generate images from Sloan Digital Sky Survey. Training by various star and galaxy images, it shows that both the generator and the discriminator are good for unsupervised learning. In this paper, we also took several experiments to choose the best value for hyper-parameters and which could help to stabilize the training process and promise a good quality of the output.

Keywords: unsupervised learning, generator, convolution neural network, discriminator

Procedia PDF Downloads 117