Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5042

Search results for: unsupervised learning

5042 Unsupervised Images Generation Based on Sloan Digital Sky Survey with Deep Convolutional Generative Neural Networks

Authors: Guanghua Zhang, Fubao Wang, Weijun Duan


Convolution neural network (CNN) has attracted more and more attention on recent years. Especially in the field of computer vision and image classification. However, unsupervised learning with CNN has received less attention than supervised learning. In this work, we use a new powerful tool which is deep convolutional generative adversarial networks (DCGANs) to generate images from Sloan Digital Sky Survey. Training by various star and galaxy images, it shows that both the generator and the discriminator are good for unsupervised learning. In this paper, we also took several experiments to choose the best value for hyper-parameters and which could help to stabilize the training process and promise a good quality of the output.

Keywords: convolution neural network, discriminator, generator, unsupervised learning

Procedia PDF Downloads 172
5041 Modern Machine Learning Conniptions for Automatic Speech Recognition

Authors: S. Jagadeesh Kumar


This expose presents a luculent of recent machine learning practices as employed in the modern and as pertinent to prospective automatic speech recognition schemes. The aspiration is to promote additional traverse ablution among the machine learning and automatic speech recognition factions that have transpired in the precedent. The manuscript is structured according to the chief machine learning archetypes that are furthermore trendy by now or have latency for building momentous hand-outs to automatic speech recognition expertise. The standards offered and convoluted in this article embraces adaptive and multi-task learning, active learning, Bayesian learning, discriminative learning, generative learning, supervised and unsupervised learning. These learning archetypes are aggravated and conferred in the perspective of automatic speech recognition tools and functions. This manuscript bequeaths and surveys topical advances of deep learning and learning with sparse depictions; further limelight is on their incessant significance in the evolution of automatic speech recognition.

Keywords: automatic speech recognition, deep learning methods, machine learning archetypes, Bayesian learning, supervised and unsupervised learning

Procedia PDF Downloads 335
5040 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning

Authors: Karthik Mittal


This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.

Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA

Procedia PDF Downloads 15
5039 Unsupervised Learning of Spatiotemporally Coherent Metrics

Authors: Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun


Current state-of-the-art classification and detection algorithms rely on supervised training. In this work we study unsupervised feature learning in the context of temporally coherent video data. We focus on feature learning from unlabeled video data, using the assumption that adjacent video frames contain semantically similar information. This assumption is exploited to train a convolutional pooling auto-encoder regularized by slowness and sparsity. We establish a connection between slow feature learning to metric learning and show that the trained encoder can be used to define a more temporally and semantically coherent metric.

Keywords: machine learning, pattern clustering, pooling, classification

Procedia PDF Downloads 348
5038 Double Clustering as an Unsupervised Approach for Order Picking of Distributed Warehouses

Authors: Hsin-Yi Huang, Ming-Sheng Liu, Jiun-Yan Shiau


Planning the order picking lists of warehouses to achieve when the costs associated with logistics on the operational performance is a significant challenge. In e-commerce era, this task is especially important productive processes are high. Nowadays, many order planning techniques employ supervised machine learning algorithms. However, the definition of which features should be processed by such algorithms is not a simple task, being crucial to the proposed technique’s success. Against this background, we consider whether unsupervised algorithms can enhance the planning of order-picking lists. A Zone2 picking approach, which is based on using clustering algorithms twice, is developed. A simplified example is given to demonstrate the merit of our approach.

Keywords: order picking, warehouse, clustering, unsupervised learning

Procedia PDF Downloads 72
5037 Unsupervised Assistive and Adaptative Intelligent Agent in Smart Enviroment

Authors: Sebastião Pais, João Casal, Ricardo Ponciano, Sérgio Lorenço


The adaptation paradigm is a basic defining feature for pervasive computing systems. Adaptation systems must work efficiently in a smart environment while providing suitable information relevant to the user system interaction. The key objective is to deduce the information needed information changes. Therefore relying on fixed operational models would be inappropriate. This paper presents a study on developing an Intelligent Personal Assistant to assist the user in interacting with their Smart Environment. We propose an Unsupervised and Language-Independent Adaptation through Intelligent Speech Interface and a set of methods of Acquiring Knowledge, namely Semantic Similarity and Unsupervised Learning.

Keywords: intelligent personal assistants, intelligent speech interface, unsupervised learning, language-independent, knowledge acquisition, association measures, symmetric word similarities, attributional word similarities

Procedia PDF Downloads 415
5036 Unsupervised Assistive and Adaptive Intelligent Agent in Smart Environment

Authors: Sebastião Pais, João Casal, Ricardo Ponciano, Sérgio Lourenço


The adaptation paradigm is a basic defining feature for pervasive computing systems. Adaptation systems must work efficiently in smart environment while providing suitable information relevant to the user system interaction. The key objective is to deduce the information needed information changes. Therefore, relying on fixed operational models would be inappropriate. This paper presents a study on developing a Intelligent Personal Assistant to assist the user in interacting with their Smart Environment. We propose a Unsupervised and Language-Independent Adaptation through Intelligent Speech Interface and a set of methods of Acquiring Knowledge, namely Semantic Similarity and Unsupervised Learning.

Keywords: intelligent personal assistants, intelligent speech interface, unsupervised learning, language-independent, knowledge acquisition, association measures, symmetric word similarities, attributional word similarities

Procedia PDF Downloads 493
5035 Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model

Authors: Youngjae Jin, Daeshik Kim


This paper describes a cycle accurate simulation results of weight values learned by an auto-encoder behavior model in terms of pre-route simulation. Given the results we visualized the first layer representations with natural images. Many common deep learning threads have focused on learning high-level abstraction of unlabeled raw data by unsupervised feature learning. However, in the process of handling such a huge amount of data, the learning method’s computation complexity and time limited advanced research. These limitations came from the fact these algorithms were computed by using only single core CPUs. For this reason, parallel-based hardware, FPGAs, was seen as a possible solution to overcome these limitations. We adopted and simulated the ready-made auto-encoder to design a behavior model in Verilog HDL before designing hardware. With the auto-encoder behavior model pre-route simulation, we obtained the cycle accurate results of the parameter of each hidden layer by using MODELSIM. The cycle accurate results are very important factor in designing a parallel-based digital hardware. Finally this paper shows an appropriate operation of behavior model based pre-route simulation. Moreover, we visualized learning latent representations of the first hidden layer with Kyoto natural image dataset.

Keywords: auto-encoder, behavior model simulation, digital hardware design, pre-route simulation, Unsupervised feature learning

Procedia PDF Downloads 364
5034 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun


Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 314
5033 Contrastive Learning for Unsupervised Object Segmentation in Sequential Images

Authors: Tian Zhang


Unsupervised object segmentation aims at segmenting objects in sequential images and obtaining the mask of each object without any manual intervention. Unsupervised segmentation remains a challenging task due to the lack of prior knowledge about these objects. Previous methods often require manually specifying the action of each object, which is often difficult to obtain. Instead, this paper does not need action information of objects and automatically learns the actions and relations among objects from the structured environment. To obtain the object segmentation of sequential images, the relationships between objects and images are extracted to infer the action and interaction of objects based on the multi-head attention mechanism. Three types of objects’ relationships in the object segmentation task are proposed: the relationship between objects in the same frame, the relationship between objects in two frames, and the relationship between objects and historical information. Based on these relationships, the proposed model (1) is effective in multiple objects segmentation tasks, (2) just needs images as input, and (3) produces better segmentation results as more relationships are considered. The experimental results on multiple datasets show that this paper’s method achieves state-of-art performance. The quantitative and qualitative analyses of the result are conducted. The proposed method could be easily extended to other similar applications.

Keywords: unsupervised object segmentation, attention mechanism, contrastive learning, structured environment

Procedia PDF Downloads 10
5032 Deep Learning Based Unsupervised Sport Scene Recognition and Highlights Generation

Authors: Ksenia Meshkova


With increasing amount of multimedia data, it is very important to automate and speed up the process of obtaining meta. This process means not just recognition of some object or its movement, but recognition of the entire scene versus separate frames and having timeline segmentation as a final result. Labeling datasets is time consuming, besides, attributing characteristics to particular scenes is clearly difficult due to their nature. In this article, we will consider autoencoders application to unsupervised scene recognition and clusterization based on interpretable features. Further, we will focus on particular types of auto encoders that relevant to our study. We will take a look at the specificity of deep learning related to information theory and rate-distortion theory and describe the solutions empowering poor interpretability of deep learning in media content processing. As a conclusion, we will present the results of the work of custom framework, based on autoencoders, capable of scene recognition as was deeply studied above, with highlights generation resulted out of this recognition. We will not describe in detail the mathematical description of neural networks work but will clarify the necessary concepts and pay attention to important nuances.

Keywords: neural networks, computer vision, representation learning, autoencoders

Procedia PDF Downloads 50
5031 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein


The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka

Procedia PDF Downloads 213
5030 Evolving Knowledge Extraction from Online Resources

Authors: Zhibo Xiao, Tharini Nayanika de Silva, Kezhi Mao


In this paper, we present an evolving knowledge extraction system named AKEOS (Automatic Knowledge Extraction from Online Sources). AKEOS consists of two modules, including a one-time learning module and an evolving learning module. The one-time learning module takes in user input query, and automatically harvests knowledge from online unstructured resources in an unsupervised way. The output of the one-time learning is a structured vector representing the harvested knowledge. The evolving learning module automatically schedules and performs repeated one-time learning to extract the newest information and track the development of an event. In addition, the evolving learning module summarizes the knowledge learned at different time points to produce a final knowledge vector about the event. With the evolving learning, we are able to visualize the key information of the event, discover the trends, and track the development of an event.

Keywords: evolving learning, knowledge extraction, knowledge graph, text mining

Procedia PDF Downloads 378
5029 A Family of Distributions on Learnable Problems without Uniform Convergence

Authors: César Garza


In supervised binary classification and regression problems, it is well-known that learnability is equivalent to a uniform convergence of the hypothesis class, and if a problem is learnable, it is learnable by empirical risk minimization. For the general learning setting of unsupervised learning tasks, there are non-trivial learning problems where uniform convergence does not hold. We present here the task of learning centers of mass with an extra feature that “activates” some of the coordinates over the unit ball in a Hilbert space. We show that the learning problem is learnable under a stable RLM rule. We introduce a family of distributions over the domain space with some mild restrictions for which the sample complexity of uniform convergence for these problems must grow logarithmically with the dimension of the Hilbert space. If we take this dimension to infinity, we obtain a learnable problem for which the uniform convergence property fails for a vast family of distributions.

Keywords: statistical learning theory, learnability, uniform convergence, stability, regularized loss minimization

Procedia PDF Downloads 4
5028 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar


Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 72
5027 Visualization-Based Feature Extraction for Classification in Real-Time Interaction

Authors: Ágoston Nagy


This paper introduces a method of using unsupervised machine learning to visualize the feature space of a dataset in 2D, in order to find most characteristic segments in the set. After dimension reduction, users can select clusters by manual drawing. Selected clusters are recorded into a data model that is used for later predictions, based on realtime data. Predictions are made with supervised learning, using Gesture Recognition Toolkit. The paper introduces two example applications: a semantic audio organizer for analyzing incoming sounds, and a gesture database organizer where gestural data (recorded by a Leap motion) is visualized for further manipulation.

Keywords: gesture recognition, machine learning, real-time interaction, visualization

Procedia PDF Downloads 272
5026 Using a Neural Network to Identify Pornographic Materials Involving Minors: Case Study

Authors: Wojciech Oronowicz-Jaśkowiak, Piotr Girdwoyń, Piotr Wasilewski


The aim of the article is to present the application of a neural network (using both supervised and unsupervised learning) to identify pornographic materials involving minors. In judicial practice, visual and audiovisual materials depicting minors engaging in sexual activity or presenting various degrees of nudity are considered pornographic materials involving minors. This will be presented as a case study – a summary of a forensic sexological opinion with an appropriate theoretical commentary and description of research methodology. The subject of the proceedings was the storage of pornographic materials involving minors; therefore, a forensic opinion of a sexology expert was sought in order to determine whether the prosecuted could be diagnosed with sexual preference disorders. The character and type of the secured materials were an important clinical premise for the expert. As a result of the preliminary analysis, it was found that the secured material contained a very large number of photographs and videos, exceeding the possibility of their manual evaluation. For this reason, it was decided to use a neural network to identify photographs corresponding to a given type of pornography. A neural network was created using the library, an original architecture, and ResNet152 architecture, and both supervised learning (classification of pornographic materials into groups involving only adults and involving minors) and unsupervised learning techniques (using the modified SOM algorithm). This solution allowed to verify one of the theses put forward by the accused. To the authors’ knowledge, this is the first application of a neural network in forensic sexology.

Keywords: child sexual abuse material, forensic sexology, data science, CSAM, COPINE

Procedia PDF Downloads 13
5025 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim


Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 238
5024 Optimal Pricing Based on Real Estate Demand Data

Authors: Vanessa Kummer, Maik Meusel


Real estate demand estimates are typically derived from transaction data. However, in regions with excess demand, transactions are driven by supply and therefore do not indicate what people are actually looking for. To estimate the demand for housing in Switzerland, search subscriptions from all important Swiss real estate platforms are used. These data do, however, suffer from missing information—for example, many users do not specify how many rooms they would like or what price they would be willing to pay. In economic analyses, it is often the case that only complete data is used. Usually, however, the proportion of complete data is rather small which leads to most information being neglected. Also, the data might have a strong distortion if it is complete. In addition, the reason that data is missing might itself also contain information, which is however ignored with that approach. An interesting issue is, therefore, if for economic analyses such as the one at hand, there is an added value by using the whole data set with the imputed missing values compared to using the usually small percentage of complete data (baseline). Also, it is interesting to see how different algorithms affect that result. The imputation of the missing data is done using unsupervised learning. Out of the numerous unsupervised learning approaches, the most common ones, such as clustering, principal component analysis, or neural networks techniques are applied. By training the model iteratively on the imputed data and, thereby, including the information of all data into the model, the distortion of the first training set—the complete data—vanishes. In a next step, the performances of the algorithms are measured. This is done by randomly creating missing values in subsets of the data, estimating those values with the relevant algorithms and several parameter combinations, and comparing the estimates to the actual data. After having found the optimal parameter set for each algorithm, the missing values are being imputed. Using the resulting data sets, the next step is to estimate the willingness to pay for real estate. This is done by fitting price distributions for real estate properties with certain characteristics, such as the region or the number of rooms. Based on these distributions, survival functions are computed to obtain the functional relationship between characteristics and selling probabilities. Comparing the survival functions shows that estimates which are based on imputed data sets do not differ significantly from each other; however, the demand estimate that is derived from the baseline data does. This indicates that the baseline data set does not include all available information and is therefore not representative for the entire sample. Also, demand estimates derived from the whole data set are much more accurate than the baseline estimation. Thus, in order to obtain optimal results, it is important to make use of all available data, even though it involves additional procedures such as data imputation.

Keywords: demand estimate, missing-data imputation, real estate, unsupervised learning

Procedia PDF Downloads 202
5023 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering

Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal


The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.

Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease

Procedia PDF Downloads 128
5022 Component Based Testing Using Clustering and Support Vector Machine

Authors: Iqbaldeep Kaur, Amarjeet Kaur


Software Reusability is important part of software development. So component based software development in case of software testing has gained a lot of practical importance in the field of software engineering from academic researcher and also from software development industry perspective. Finding test cases for efficient reuse of test cases is one of the important problems aimed by researcher. Clustering reduce the search space, reuse test cases by grouping similar entities according to requirements ensuring reduced time complexity as it reduce the search time for retrieval the test cases. In this research paper we proposed approach for re-usability of test cases by unsupervised approach. In unsupervised learning we proposed k-mean and Support Vector Machine. We have designed the algorithm for requirement and test case document clustering according to its tf-idf vector space and the output is set of highly cohesive pattern groups.

Keywords: software testing, reusability, clustering, k-mean, SVM

Procedia PDF Downloads 334
5021 A Review of Machine Learning for Big Data

Authors: Devatha Kalyan Kumar, Aravindraj D., Sadathulla A.


Big data are now rapidly expanding in all engineering and science and many other domains. The potential of large or massive data is undoubtedly significant, make sense to require new ways of thinking and learning techniques to address the various big data challenges. Machine learning is continuously unleashing its power in a wide range of applications. In this paper, the latest advances and advancements in the researches on machine learning for big data processing. First, the machine learning techniques methods in recent studies, such as deep learning, representation learning, transfer learning, active learning and distributed and parallel learning. Then focus on the challenges and possible solutions of machine learning for big data.

Keywords: active learning, big data, deep learning, machine learning

Procedia PDF Downloads 309
5020 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis


With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: clustering, load profiling, load modeling, machine learning, energy efficiency and quality

Procedia PDF Downloads 81
5019 Leveraging Learning Analytics to Inform Learning Design in Higher Education

Authors: Mingming Jiang


This literature review aims to offer an overview of existing research on learning analytics and learning design, the alignment between the two, and how learning analytics has been leveraged to inform learning design in higher education. Current research suggests a need to create more alignment and integration between learning analytics and learning design in order to not only ground learning analytics on learning sciences but also enable data-driven decisions in learning design to improve learning outcomes. In addition, multiple conceptual frameworks have been proposed to enhance the synergy and alignment between learning analytics and learning design. Future research should explore this synergy further in the unique context of higher education, identifying learning analytics metrics in higher education that can offer insight into learning processes, evaluating the effect of learning analytics outcomes on learning design decision-making in higher education, and designing learning environments in higher education that make the capturing and deployment of learning analytics outcomes more efficient.

Keywords: learning analytics, learning design, big data in higher education, online learning environments

Procedia PDF Downloads 8
5018 A Two-Step Framework for Unsupervised Speaker Segmentation Using BIC and Artificial Neural Network

Authors: Ahmad Alwosheel, Ahmed Alqaraawi


This work proposes a new speaker segmentation approach for two speakers. It is an online approach that does not require a prior information about speaker models. It has two phases, a conventional approach such as unsupervised BIC-based is utilized in the first phase to detect speaker changes and train a Neural Network, while in the second phase, the output trained parameters from the Neural Network are used to predict next incoming audio stream. Using this approach, a comparable accuracy to similar BIC-based approaches is achieved with a significant improvement in terms of computation time.

Keywords: artificial neural network, diarization, speaker indexing, speaker segmentation

Procedia PDF Downloads 362
5017 Advancing the Analysis of Physical Activity Behaviour in Diverse, Rapidly Evolving Populations: Using Unsupervised Machine Learning to Segment and Cluster Accelerometer Data

Authors: Christopher Thornton, Niina Kolehmainen, Kianoush Nazarpour


Background: Accelerometers are widely used to measure physical activity behavior, including in children. The traditional method for processing acceleration data uses cut points, relying on calibration studies that relate the quantity of acceleration to energy expenditure. As these relationships do not generalise across diverse populations, they must be parametrised for each subpopulation, including different age groups, which is costly and makes studies across diverse populations difficult. A data-driven approach that allows physical activity intensity states to emerge from the data under study without relying on parameters derived from external populations offers a new perspective on this problem and potentially improved results. We evaluated the data-driven approach in a diverse population with a range of rapidly evolving physical and mental capabilities, namely very young children (9-38 months old), where this new approach may be particularly appropriate. Methods: We applied an unsupervised machine learning approach (a hidden semi-Markov model - HSMM) to segment and cluster the accelerometer data recorded from 275 children with a diverse range of physical and cognitive abilities. The HSMM was configured to identify a maximum of six physical activity intensity states and the output of the model was the time spent by each child in each of the states. For comparison, we also processed the accelerometer data using published cut points with available thresholds for the population. This provided us with time estimates for each child’s sedentary (SED), light physical activity (LPA), and moderate-to-vigorous physical activity (MVPA). Data on the children’s physical and cognitive abilities were collected using the Paediatric Evaluation of Disability Inventory (PEDI-CAT). Results: The HSMM identified two inactive states (INS, comparable to SED), two lightly active long duration states (LAS, comparable to LPA), and two short-duration high-intensity states (HIS, comparable to MVPA). Overall, the children spent on average 237/392 minutes per day in INS/SED, 211/129 minutes per day in LAS/LPA, and 178/168 minutes in HIS/MVPA. We found that INS overlapped with 53% of SED, LAS overlapped with 37% of LPA and HIS overlapped with 60% of MVPA. We also looked at the correlation between the time spent by a child in either HIS or MVPA and their physical and cognitive abilities. We found that HIS was more strongly correlated with physical mobility (R²HIS =0.5, R²MVPA= 0.28), cognitive ability (R²HIS =0.31, R²MVPA= 0.15), and age (R²HIS =0.15, R²MVPA= 0.09), indicating increased sensitivity to key attributes associated with a child’s mobility. Conclusion: An unsupervised machine learning technique can segment and cluster accelerometer data according to the intensity of movement at a given time. It provides a potentially more sensitive, appropriate, and cost-effective approach to analysing physical activity behavior in diverse populations, compared to the current cut points approach. This, in turn, supports research that is more inclusive across diverse populations.

Keywords: physical activity, machine learning, under 5s, disability, accelerometer

Procedia PDF Downloads 96
5016 Unsupervised Neural Architecture for Saliency Detection

Authors: Natalia Efremova, Sergey Tarasenko


We propose a novel neural network architecture for visual saliency detections, which utilizes neuro physiologically plausible mechanisms for extraction of salient regions. The model has been significantly inspired by recent findings from neuro physiology and aimed to simulate the bottom-up processes of human selective attention. Two types of features were analyzed: color and direction of maximum variance. The mechanism we employ for processing those features is PCA, implemented by means of normalized Hebbian learning and the waves of spikes. To evaluate performance of our model we have conducted psychological experiment. Comparison of simulation results with those of experiment indicates good performance of our model.

Keywords: neural network models, visual saliency detection, normalized Hebbian learning, Oja's rule, psychological experiment

Procedia PDF Downloads 280
5015 Unsupervised Reciter Recognition Using Gaussian Mixture Models

Authors: Ahmad Alwosheel, Ahmed Alqaraawi


This work proposes an unsupervised text-independent probabilistic approach to recognize Quran reciter voice. It is an accurate approach that works on real time applications. This approach does not require a prior information about reciter models. It has two phases, where in the training phase the reciters' acoustical features are modeled using Gaussian Mixture Models, while in the testing phase, unlabeled reciter's acoustical features are examined among GMM models. Using this approach, a high accuracy results are achieved with efficient computation time process.

Keywords: Quran, speaker recognition, reciter recognition, Gaussian Mixture Model

Procedia PDF Downloads 291
5014 Self-Supervised Learning for Hate-Speech Identification

Authors: Shrabani Ghosh


Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.

Keywords: attention learning, language model, offensive language detection, self-supervised learning

Procedia PDF Downloads 25
5013 How to Guide Students from Surface to Deep Learning: Applied Philosophy in Management Education

Authors: Lihong Wu, Raymond Young


The ability to learn is one of the most critical skills in the information age. However, many students do not have a clear understanding of what learning is, what they are learning, and why they are learning. Many students study simply to pass rather than to learn something useful for their career and their life. They have a misconception about learning and a wrong attitude towards learning. This research explores student attitudes to study in management education and explores how to intercede to lead students from shallow to deeper modes of learning.

Keywords: knowledge, surface learning, deep learning, education

Procedia PDF Downloads 83