Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 90

Search results for: k-means

90 Development of a Small-Group Teaching Method for Enhancing the Learning of Basic Acupuncture Manipulation Optimized with the Theory of Motor Learning

Authors: Wen-Chao Tang, Tang-Yi Liu, Ming Gao, Gang Xu, Hua-Yuan Yang

Abstract:

This study developed a method for teaching acupuncture manipulation in small groups optimized with the theory of motor learning. Sixty acupuncture students and their teacher participated in our research. Motion videos were recorded of their manipulations using the lifting-thrusting method. These videos were analyzed using Simi Motion software to acquire the movement parameters of the thumb tip. The parameter velocity curves along Y axis was used to generate small teaching groups clustered by a self-organized map (SOM) and K-means. Ten groups were generated. All the targeted instruction based on the comparative results groups as well as the videos of teacher and student was provided to the members of each group respectively. According to the theory and research of motor learning, the factors or technologies such as video instruction, observational learning, external focus and summary feedback were integrated into this teaching method. Such efforts were desired to improve and enhance the effectiveness of current acupuncture teaching methods in limited classroom teaching time and extracurricular training.

Keywords: Acupuncture, group teaching, video instruction, observational learning, external focus, summary feedback.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 45
89 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia

Authors: Carol Anne Hargreaves

Abstract:

A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.

Keywords: Machine learning, stock market trading, logistic principal component analysis, automated stock investment system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 62
88 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis

Abstract:

With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: Clustering, load profiling, load modeling, machine learning, energy efficiency and quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 366
87 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement

Authors: Wang Lin, Li Zhiqiang

Abstract:

The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.

Keywords: Behavior pattern, cooperative learning, data analyze, K-means clustering algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 303
86 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques

Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian

Abstract:

Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.

Keywords: Data mining, K-means, road traffic accidents, Waze, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 444
85 An Improved K-Means Algorithm for Gene Expression Data Clustering

Authors: Billel Kenidra, Mohamed Benmohammed

Abstract:

Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.

Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 515
84 Lane Detection Using Labeling Based RANSAC Algorithm

Authors: Yeongyu Choi, Ju H. Park, Ho-Youl Jung

Abstract:

In this paper, we propose labeling based RANSAC algorithm for lane detection. Advanced driver assistance systems (ADAS) have been widely researched to avoid unexpected accidents. Lane detection is a necessary system to assist keeping lane and lane departure prevention. The proposed vision based lane detection method applies Canny edge detection, inverse perspective mapping (IPM), K-means algorithm, mathematical morphology operations and 8 connected-component labeling. Next, random samples are selected from each labeling region for RANSAC. The sampling method selects the points of lane with a high probability. Finally, lane parameters of straight line or curve equations are estimated. Through the simulations tested on video recorded at daytime and nighttime, we show that the proposed method has better performance than the existing RANSAC algorithm in various environments.

Keywords: Canny edge detection, k-means algorithm, RANSAC, inverse perspective mapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 531
83 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1209
82 A Parallel Implementation of k-Means in MATLAB

Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas

Abstract:

The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.

Keywords: K-means algorithm, clustering, parallel computations, MATLAB.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 555
81 K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Authors: Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang

Abstract:

Matching high dimensional features between images is computationally expensive for exhaustive search approaches in computer vision. Although the dimension of the feature can be degraded by simplifying the prior knowledge of homography, matching accuracy may degrade as a tradeoff. In this paper, we present a feature matching method based on k-means algorithm that reduces the matching cost and matches the features between images instead of using a simplified geometric assumption. Experimental results show that the proposed method outperforms the previous linear exhaustive search approaches in terms of the inlier ratio of matched pairs.

Keywords: Feature matching, k-means clustering, scale invariant feature transform, linear exhaustive search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 595
80 A Computational Cost-Effective Clustering Algorithm in Multidimensional Space Using the Manhattan Metric: Application to the Global Terrorism Database

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.

Keywords: Pattern recognition, partitional clustering, K-means clustering, Manhattan distance, terrorism data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 698
79 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: Clustering, k-means, categorical datasets, pattern recognition, unsupervised learning, knowledge discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2799
78 A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm

Authors: J. Yang, Y. Ma, X. Zhang, S. Li, Y. Zhang

Abstract:

The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.

Keywords: Degree, initial cluster center, k-means, minimum spanning tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 862
77 Chemical Reaction Algorithm for Expectation Maximization Clustering

Authors: Li Ni, Pen ManMan, Li KenLi

Abstract:

Clustering is an intensive research for some years because of its multifaceted applications, such as biology, information retrieval, medicine, business and so on. The expectation maximization (EM) is a kind of algorithm framework in clustering methods, one of the ten algorithms of machine learning. Traditionally, optimization of objective function has been the standard approach in EM. Hence, research has investigated the utility of evolutionary computing and related techniques in the regard. Chemical Reaction Optimization (CRO) is a recently established method. So the property embedded in CRO is used to solve optimization problems. This paper presents an algorithm framework (EM-CRO) with modified CRO operators based on EM cluster problems. The hybrid algorithm is mainly to solve the problem of initial value sensitivity of the objective function optimization clustering algorithm. Our experiments mainly take the EM classic algorithm:k-means and fuzzy k-means as an example, through the CRO algorithm to optimize its initial value, get K-means-CRO and FKM-CRO algorithm. The experimental results of them show that there is improved efficiency for solving objective function optimization clustering problems.

Keywords: Chemical reaction optimization, expectation maximization, initial, objective function clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 876
76 A Neural Approach for Color-Textured Images Segmentation

Authors: Khalid Salhi, El Miloud Jaara, Mohammed Talibi Alaoui

Abstract:

In this paper, we present a neural approach for unsupervised natural color-texture image segmentation, which is based on both Kohonen maps and mathematical morphology, using a combination of the texture and the image color information of the image, namely, the fractal features based on fractal dimension are selected to present the information texture, and the color features presented in RGB color space. These features are then used to train the network Kohonen, which will be represented by the underlying probability density function, the segmentation of this map is made by morphological watershed transformation. The performance of our color-texture segmentation approach is compared first, to color-based methods or texture-based methods only, and then to k-means method.

Keywords: Segmentation, color-texture, neural networks, fractal, watershed.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1001
75 Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems

Authors: Bruno Trstenjak, Dzenana Donko

Abstract:

Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.

Keywords: Case based reasoning, classification, expert's knowledge, hybrid model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1006
74 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: Share of electricity generation, CO2 emission, targets, multivariate methods, hierarchical clustering, K-means clustering, discriminant analyzed, correlation, EU member countries.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 708
73 Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Authors: Diego Saqui, José H. Saito, José R. Campos, Lúcio A. de C. Jorge

Abstract:

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Keywords: Band selection, fuzzy C-means, K-means, hyperspectral image.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
72 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: Fuzzy C-means clustering, Fuzzy C-means clustering based attribute weighting, Pima Indians diabetes dataset, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1260
71 Customer Segmentation Model in E-commerce Using Clustering Techniques and LRFM Model: The Case of Online Stores in Morocco

Authors: Rachid Ait daoud, Abdellah Amine, Belaid Bouikhalene, Rachid Lbibb

Abstract:

Given the increase in the number of e-commerce sites, the number of competitors has become very important. This means that companies have to take appropriate decisions in order to meet the expectations of their customers and satisfy their needs. In this paper, we present a case study of applying LRFM (length, recency, frequency and monetary) model and clustering techniques in the sector of electronic commerce with a view to evaluating customers’ values of the Moroccan e-commerce websites and then developing effective marketing strategies. To achieve these objectives, we adopt LRFM model by applying a two-stage clustering method. In the first stage, the self-organizing maps method is used to determine the best number of clusters and the initial centroid. In the second stage, kmeans method is applied to segment 730 customers into nine clusters according to their L, R, F and M values. The results show that the cluster 6 is the most important cluster because the average values of L, R, F and M are higher than the overall average value. In addition, this study has considered another variable that describes the mode of payment used by customers to improve and strengthen clusters’ analysis. The clusters’ analysis demonstrates that the payment method is one of the key indicators of a new index which allows to assess the level of customers’ confidence in the company's Website.

Keywords: Customer value, LRFM model, Cluster analysis, Self-Organizing Maps method (SOM), K-means algorithm, loyalty.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4811
70 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 965
69 A Study on the Relation among Primary Care Professionals Serving the Disadvantaged Community, Socioeconomic Status, and Adverse Health Outcome

Authors: Chau-Kuang Chen, Juanita Buford, Colette Davis, Raisha Allen, John Hughes, Jr., James Tyus, Dexter Samuels

Abstract:

During the post-Civil War era, the city of Nashville, Tennessee, had the highest mortality rate in the United States. The elevated death and disease rates among former slaves were attributable to lack of quality healthcare. To address the paucity of healthcare services, Meharry Medical College, an institution with the mission of educating minority professionals and serving the underserved population, was established in 1876. Purpose: The social ecological framework and partial least squares (PLS) path modeling were used to quantify the impact of socioeconomic status and adverse health outcome on primary care professionals serving the disadvantaged community. Thus, the study results could demonstrate the accomplishment of the College’s mission of training primary care professionals to serve in underserved areas. Methods: Various statistical methods were used to analyze alumni data from 1975 – 2013. K-means cluster analysis was utilized to identify individual medical and dental graduates in the cluster groups of the practice communities (Disadvantaged or Non-disadvantaged Communities). Discriminant analysis was implemented to verify the classification accuracy of cluster analysis. The independent t-test was performed to detect the significant mean differences of respective clustering and criterion variables. Chi-square test was used to test if the proportions of primary care and non-primary care specialists are consistent with those of medical and dental graduates practicing in the designated community clusters. Finally, the PLS path model was constructed to explore the construct validity of analytic model by providing the magnitude effects of socioeconomic status and adverse health outcome on primary care professionals serving the disadvantaged community. Results: Approximately 83% (3,192/3,864) of Meharry Medical College’s medical and dental graduates from 1975 to 2013 were practicing in disadvantaged communities. Independent t-test confirmed the content validity of the cluster analysis model. Also, the PLS path modeling demonstrated that alumni served as primary care professionals in communities with significantly lower socioeconomic status and higher adverse health outcome (p < .001). The PLS path modeling exhibited the meaningful interrelation between primary care professionals practicing communities and surrounding environments (socioeconomic statues and adverse health outcome), which yielded model reliability, validity, and applicability. Conclusion: This study applied social ecological theory and analytic modeling approaches to assess the attainment of Meharry Medical College’s mission of training primary care professionals to serve in underserved areas, particularly in communities with low socioeconomic status and high rates of adverse health outcomes. In summary, the majority of medical and dental graduates from Meharry Medical College provided primary care services to disadvantaged communities with low socioeconomic status and high adverse health outcome, which demonstrated that Meharry Medical College has fulfilled its mission. The high reliability, validity, and applicability of this model imply that it could be replicated for comparable universities and colleges elsewhere.

Keywords: Disadvantaged Community, K-means Cluster Analysis, PLS Path Modeling, Primary care.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1642
68 Online Forums Hotspot Detection and Analysis Using Aging Theory

Authors: K. Nirmala Devi, V. Murali Bhaskaran

Abstract:

The exponential growth of social media arouses much attention on public opinion information. The online forums, blogs, micro blogs are proving to be extremely valuable resources and are having bulk volume of information. However, most of the social media data is unstructured and semi structured form. So that it is more difficult to decipher automatically. Therefore, it is very much essential to understand and analyze those data for making a right decision. The online forums hotspot detection is a promising research field in the web mining and it guides to motivate the user to take right decision in right time. The proposed system consist of a novel approach to detect a hotspot forum for any given time period. It uses aging theory to find the hot terms and E-K-means for detecting the hotspot forum. Experimental results demonstrate that the proposed approach outperforms k-means for detecting the hotspot forums with the improved accuracy.

Keywords: Hotspot forums, Micro blog, Blog, Sentiment Analysis, Opinion Mining, Social media, Twitter, Web mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1424
67 Automatic LV Segmentation with K-means Clustering and Graph Searching on Cardiac MRI

Authors: Hae-Yeoun Lee

Abstract:

Quantification of cardiac function is performed by calculating blood volume and ejection fraction in routine clinical practice. However, these works have been performed by manual contouring, which requires computational costs and varies on the observer. In this paper, an automatic left ventricle segmentation algorithm on cardiac magnetic resonance images (MRI) is presented. Using knowledge on cardiac MRI, a K-mean clustering technique is applied to segment blood region on a coil-sensitivity corrected image. Then, a graph searching technique is used to correct segmentation errors from coil distortion and noises. Finally, blood volume and ejection fraction are calculated. Using cardiac MRI from 15 subjects, the presented algorithm is tested and compared with manual contouring by experts to show outstanding performance.

Keywords: Cardiac MRI, Graph searching, Left ventricle segmentation, K-means clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734
66 MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold be defined a priori which can be difficult to determine by novice users.

Keywords: Data mining, k-means, MCOKE, overlapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2243
65 Unsupervised Segmentation Technique for Acute Leukemia Cells Using Clustering Algorithms

Authors: N. H. Harun, A. S. Abdul Nasir, M. Y. Mashor, R. Hassan

Abstract:

Leukaemia is a blood cancer disease that contributes to the increment of mortality rate in Malaysia each year. There are two main categories for leukaemia, which are acute and chronic leukaemia. The production and development of acute leukaemia cells occurs rapidly and uncontrollable. Therefore, if the identification of acute leukaemia cells could be done fast and effectively, proper treatment and medicine could be delivered. Due to the requirement of prompt and accurate diagnosis of leukaemia, the current study has proposed unsupervised pixel segmentation based on clustering algorithm in order to obtain a fully segmented abnormal white blood cell (blast) in acute leukaemia image. In order to obtain the segmented blast, the current study proposed three clustering algorithms which are k-means, fuzzy c-means and moving k-means algorithms have been applied on the saturation component image. Then, median filter and seeded region growing area extraction algorithms have been applied, to smooth the region of segmented blast and to remove the large unwanted regions from the image, respectively. Comparisons among the three clustering algorithms are made in order to measure the performance of each clustering algorithm on segmenting the blast area. Based on the good sensitivity value that has been obtained, the results indicate that moving kmeans clustering algorithm has successfully produced the fully segmented blast region in acute leukaemia image. Hence, indicating that the resultant images could be helpful to haematologists for further analysis of acute leukaemia.

Keywords: Acute Leukaemia Images, Clustering Algorithms, Image Segmentation, Moving k-Means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2142
64 Feature Based Unsupervised Intrusion Detection

Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein

Abstract:

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
63 Evaluation of Robust Feature Descriptors for Texture Classification

Authors: Jia-Hong Lee, Mei-Yi Wu, Hsien-Tsung Kuo

Abstract:

Texture is an important characteristic in real and synthetic scenes. Texture analysis plays a critical role in inspecting surfaces and provides important techniques in a variety of applications. Although several descriptors have been presented to extract texture features, the development of object recognition is still a difficult task due to the complex aspects of texture. Recently, many robust and scaling-invariant image features such as SIFT, SURF and ORB have been successfully used in image retrieval and object recognition. In this paper, we have tried to compare the performance for texture classification using these feature descriptors with k-means clustering. Different classifiers including K-NN, Naive Bayes, Back Propagation Neural Network , Decision Tree and Kstar were applied in three texture image sets - UIUCTex, KTH-TIPS and Brodatz, respectively. Experimental results reveal SIFTS as the best average accuracy rate holder in UIUCTex, KTH-TIPS and SURF is advantaged in Brodatz texture set. BP neuro network works best in the test set classification among all used classifiers.

Keywords: Texture classification, texture descriptor, SIFT, SURF, ORB.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1202
62 Enhancing Privacy-Preserving Cloud Database Querying by Preventing Brute Force Attacks

Authors: Ambika Vishal Pawar, Ajay Dani

Abstract:

Considering the complexities involved in Cloud computing, there are still plenty of issues that affect the privacy of data in cloud environment. Unless these problems get solved, we think that the problem of preserving privacy in cloud databases is still open. In tokenization and homomorphic cryptography based solutions for privacy preserving cloud database querying, there is possibility that by colluding with service provider adversary may run brute force attacks that will reveal the attribute values.

In this paper we propose a solution by defining the variant of K –means clustering algorithm that effectively detects such brute force attacks and enhances privacy of cloud database querying by preventing this attacks.

Keywords: Privacy, Database, Cloud Computing, Clustering, K-means, Cryptography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2191
61 Automatic Detection of Breast Tumors in Sonoelastographic Images Using DWT

Authors: A. Sindhuja, V. Sadasivam

Abstract:

Breast Cancer is the most common malignancy in women and the second leading cause of death for women all over the world. Earlier the detection of cancer, better the treatment. The diagnosis and treatment of the cancer rely on segmentation of Sonoelastographic images. Texture features has not considered for Sonoelastographic segmentation. Sonoelastographic images of 15 patients containing both benign and malignant tumorsare considered for experimentation.The images are enhanced to remove noise in order to improve contrast and emphasize tumor boundary. It is then decomposed into sub-bands using single level Daubechies wavelets varying from single co-efficient to six coefficients. The Grey Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP) features are extracted and then selected by ranking it using Sequential Floating Forward Selection (SFFS) technique from each sub-band. The resultant images undergo K-Means clustering and then few post-processing steps to remove the false spots. The tumor boundary is detected from the segmented image. It is proposed that Local Binary Pattern (LBP) from the vertical coefficients of Daubechies wavelet with two coefficients is best suited for segmentation of Sonoelastographic breast images among the wavelet members using one to six coefficients for decomposition. The results are also quantified with the help of an expert radiologist. The proposed work can be used for further diagnostic process to decide if the segmented tumor is benign or malignant.

Keywords: Breast Cancer, Segmentation, Sonoelastography, Tumor Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1815