Search results for: grey clustering method
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8359

Search results for: grey clustering method

8119 Efficient Web Usage Mining Based on K-Medoids Clustering Technique

Authors: P. Sengottuvelan, T. Gopalakrishnan

Abstract:

Web Usage Mining is the application of data mining techniques to find usage patterns from web log data, so as to grasp required patterns and serve the requirements of Web-based applications. User’s expertise on the internet may be improved by minimizing user’s web access latency. This may be done by predicting the future search page earlier and the same may be prefetched and cached. Therefore, to enhance the standard of web services, it is needed topic to research the user web navigation behavior. Analysis of user’s web navigation behavior is achieved through modeling web navigation history. We propose this technique which cluster’s the user sessions, based on the K-medoids technique.

Keywords: Clustering, K-medoids, Recommendation, User Session, Web Usage Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1358
8118 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis

Abstract:

With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: Clustering, load profiling, load modeling, machine learning, energy efficiency and quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1147
8117 Preliminary Evaluation of Different Water Qualities on Leucaena Leucocephala Seed Germination and Seedling Growth

Authors: Maher J. Tadros, Naji K. Al-Mefleh

Abstract:

The evaluation of non-conventional water resources on seed germination and seedling growth performance at early growth stages is still in progress especially in forage crops. This study was designed to test the effect of four types of water qualities (treated wastewater (TWW), industrial water (IW), grey water (GW), and Distilled water (DW)) on germination and early seedling vigor of Leucaena leucocephala. The results showed that the germination was not significantly affected by the different water qualities. Seed germination reached maximum after 17, 14, 14, and 21 days under GW, IW, TWW, and DW treatments, respectively. The highest mean of shoot length was scored under the GW treatment. And, the highest mean of root length was scored under DW which was not significant from GW treatment. The means of shoot fresh was the highest under the TWW. The means of root fresh weight was not significantly different from each other's under different treatments. The growth performance was in progress with no mortality during 21 days of growth. Thus, the best non-conventional water qualities alternatives based on the cleanness, nutrients, and toxicity are the GW, TWW and IW, respectively.

Keywords: Seed germination, Growth performance, Leucaena, Multipurpose forest trees, Waste water, Grey water

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1824
8116 Automatic Detection of Breast Tumors in Sonoelastographic Images Using DWT

Authors: A. Sindhuja, V. Sadasivam

Abstract:

Breast Cancer is the most common malignancy in women and the second leading cause of death for women all over the world. Earlier the detection of cancer, better the treatment. The diagnosis and treatment of the cancer rely on segmentation of Sonoelastographic images. Texture features has not considered for Sonoelastographic segmentation. Sonoelastographic images of 15 patients containing both benign and malignant tumorsare considered for experimentation.The images are enhanced to remove noise in order to improve contrast and emphasize tumor boundary. It is then decomposed into sub-bands using single level Daubechies wavelets varying from single co-efficient to six coefficients. The Grey Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP) features are extracted and then selected by ranking it using Sequential Floating Forward Selection (SFFS) technique from each sub-band. The resultant images undergo K-Means clustering and then few post-processing steps to remove the false spots. The tumor boundary is detected from the segmented image. It is proposed that Local Binary Pattern (LBP) from the vertical coefficients of Daubechies wavelet with two coefficients is best suited for segmentation of Sonoelastographic breast images among the wavelet members using one to six coefficients for decomposition. The results are also quantified with the help of an expert radiologist. The proposed work can be used for further diagnostic process to decide if the segmented tumor is benign or malignant.

Keywords: Breast Cancer, Segmentation, Sonoelastography, Tumor Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2161
8115 Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Authors: Kasthurirangan Gopalakrishnan, Siddhartha Khaitan, Anshu Manik

Abstract:

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

Keywords: Artificial neural networks, cluster analysis, Kohonen maps, wine recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2081
8114 Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

Authors: Youngji Yoo, Cheong-Sool Park, Jun Seok Kim, Young-Hak Lee, Sung-Shick Kim, Jun-Geol Baek

Abstract:

In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.

Keywords: Semiconductor, non-normal mixed process data, clustering, Statistical Quality Control (SQC), regression tree, Pearson distribution system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1732
8113 Geographic Profiling Based on Multi-point Centrography with K-means Clustering

Authors: Jiaji Zhou, Le Liang, Long Chen

Abstract:

Geographic Profiling has successfully assisted investigations for serial crimes. Considering the multi-cluster feature of serial criminal spots, we propose a Multi-point Centrography model as a natural extension of Single-point Centrography for geographic profiling. K-means clustering is first performed on the data samples and then Single-point Centrography is adopted to derive a probability distribution on each cluster. Finally, a weighted combinations of each distribution is formed to make next-crime spot prediction. Experimental study on real cases demonstrates the effectiveness of our proposed model.

Keywords: Geographic profiling, Centrography model, K-means algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2027
8112 Identification of Nonlinear Systems Using Radial Basis Function Neural Network

Authors: C. Pislaru, A. Shebani

Abstract:

This paper uses the radial basis function neural network (RBFNN) for system identification of nonlinear systems. Five nonlinear systems are used to examine the activity of RBFNN in system modeling of nonlinear systems; the five nonlinear systems are dual tank system, single tank system, DC motor system, and two academic models. The feed forward method is considered in this work for modelling the non-linear dynamic models, where the KMeans clustering algorithm used in this paper to select the centers of radial basis function network, because it is reliable, offers fast convergence and can handle large data sets. The least mean square method is used to adjust the weights to the output layer, and Euclidean distance method used to measure the width of the Gaussian function.

Keywords: System identification, Nonlinear system, Neural networks, RBF neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2813
8111 Automatic Choice of Topics for Seminars by Clustering Students According to Their Profile

Authors: J.R. Quevedo, E. Montañés, J. Ranilla, A. Bahamonde

Abstract:

The new framework the Higher Education is immersed in involves a complete change in the way lecturers must teach and students must learn. Whereas the lecturer was the main character in traditional education, the essential goal now is to increase the students' participation in the process. Thus, one of the main tasks of lecturers in this new context is to design activities of different nature in order to encourage such participation. Seminars are one of the activities included in this environment. They are active sessions that enable going in depth into specific topics as support of other activities. They are characterized by some features such as favoring interaction between students and lecturers or improving their communication skills. Hence, planning and organizing strategic seminars is indeed a great challenge for lecturers with the aim of acquiring knowledge and abilities. This paper proposes a method using Artificial Intelligence techniques to obtain student profiles from their marks and preferences. The goal of building such profiles is twofold. First, it facilitates the task of splitting the students into different groups, each group with similar preferences and learning difficulties. Second, it makes it easy to select adequate topics to be a candidate for the seminars. The results obtained can be either a guarantee of what the lecturers could observe during the development of the course or a clue to reconsider new methodological strategies in certain topics.

Keywords: artificial intelligence, clustering, organizingseminars, student profile

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1323
8110 Hierarchical Checkpoint Protocol in Data Grids

Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed

Abstract:

Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.

Keywords: Data grids, fault tolerance, chandy-lamport, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 898
8109 Issue Reorganization Using the Measure of Relevance

Authors: William Wong Xiu Shun, Yoonjin Hyun, Mingyu Kim, Seongi Choi, Namgyu Kim

Abstract:

The need to extract R&D keywords from issues and use them to retrieve R&D information is increasing rapidly. However, it is difficult to identify related issues or distinguish them. Although the similarity between issues cannot be identified, with an R&D lexicon, issues that always share the same R&D keywords can be determined. In detail, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Furthermore, the relationship among issues that share the same R&D keywords can be shown in a more systematic way by clustering them according to keywords. Thus, sharing R&D results and reusing R&D technology can be facilitated. Indirectly, redundant investment in R&D can be reduced as the relevant R&D information can be shared among corresponding issues and the reusability of related R&D can be improved. Therefore, a methodology to cluster issues from the perspective of common R&D keywords is proposed to satisfy these demands.

Keywords: Clustering, Social Network Analysis, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2003
8108 Generating Normally Distributed Clusters by Means of a Self-organizing Growing Neural Network– An Application to Market Segmentation –

Authors: Reinhold Decker, Christian Holsing, Sascha Lerke

Abstract:

This paper presents a new growing neural network for cluster analysis and market segmentation, which optimizes the size and structure of clusters by iteratively checking them for multivariate normality. We combine the recently published SGNN approach [8] with the basic principle underlying the Gaussian-means algorithm [13] and the Mardia test for multivariate normality [18, 19]. The new approach distinguishes from existing ones by its holistic design and its great autonomy regarding the clustering process as a whole. Its performance is demonstrated by means of synthetic 2D data and by real lifestyle survey data usable for market segmentation.

Keywords: Artificial neural network, clustering, multivariatenormality, market segmentation, self-organization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1159
8107 Anomaly Detection and Characterization to Classify Traffic Anomalies Case Study: TOT Public Company Limited Network

Authors: O. Siriporn, S. Benjawan

Abstract:

This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.

Keywords: Unsupervised, clustering, anomaly, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057
8106 Performance Evaluation of Clustered Routing Protocols for Heterogeneous Wireless Sensor Networks

Authors: Awatef Chniguir, Tarek Farah, Zouhair Ben Jemaa, Safya Belguith

Abstract:

Optimal routing allows minimizing energy consumption in wireless sensor networks (WSN). Clustering has proven its effectiveness in organizing WSN by reducing channel contention and packet collision and enhancing network throughput under heavy load. Therefore, nowadays, with the emergence of the Internet of Things, heterogeneity is essential. Stable election protocol (SEP) that has increased the network stability period and lifetime is the first clustering protocol for heterogeneous WSN. SEP and its descendants, namely SEP, Threshold Sensitive SEP (TSEP), Enhanced TSEP (ETSSEP) and Current Energy Allotted TSEP (CEATSEP), were studied. These algorithms’ performance was evaluated based on different metrics, especially first node death (FND), to compare their stability. Simulations were conducted on the MATLAB tool considering two scenarios: The first one demonstrates the fraction variation of advanced nodes by setting the number of total nodes. The second considers the interpretation of the number of nodes while keeping the number of advanced nodes permanent. CEATSEP outperforms its antecedents by increasing stability and, at the same time, keeping a low throughput. It also operates very well in a large-scale network. Consequently, CEATSEP has a useful lifespan and energy efficiency compared to the other routing protocol for heterogeneous WSN.

Keywords: Clustering, heterogeneous, stability, scalability, throughput, IoT, WSN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 363
8105 Implementation of A Photo-Curable 3D Additive Manufacturing Technology with Coloring Gray Capability by Using Piezo Ink-Jet

Authors: Ming-Jong Tsai, Y. L. Cheng, Y. L. Kuo, S. Y. Hsiao, J .W. Chen, P. H. Liu, D. H. Chen

Abstract:

The 3D printing is a combination of digital technology, material science, intelligent manufacturing and control of opto-mechatronics systems. It is called the third industrial revolution from the view of the Economist Journal. A color 3D printing machine may provide the necessary support for high value-added industrial and commercial design, architectural design, personal boutique, and 3D artist’s creation. The main goal of this paper is to develop photo-curable color 3D manufacturing technology and system implementation. The key technologies include (1) Photo-curable color 3D additive manufacturing processes development and materials research (2) Piezo type ink-jet head control and Opto-mechatronics integration technique of the photo-curable color 3D laminated manufacturing system. The proposed system is integrated with single Piezo type ink-jet head with two individual channels for two primary UV light curable color resins which can provide for future colorful 3D printing solutions. The main research results are 16 grey levels and grey resolution of 75 dpi. 

Keywords: 3d printing, additive manufacturing, color, photo-curable, Piezo type ink-jet, UV Resin.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2096
8104 Accelerating GLA with an M-Tree

Authors: Olli Luoma, Johannes Tuikkala, Olli Nevalainen

Abstract:

In this paper, we propose a novel improvement for the generalized Lloyd Algorithm (GLA). Our algorithm makes use of an M-tree index built on the codebook which makes it possible to reduce the number of distance computations when the nearest code words are searched. Our method does not impose the use of any specific distance function, but works with any metric distance, making it more general than many other fast GLA variants. Finally, we present the positive results of our performance experiments.

Keywords: Clustering, GLA, M-Tree, Vector Quantization .

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1464
8103 Effect of Clustering on Energy Efficiency and Network Lifetime in Wireless Sensor Networks

Authors: Prakash G L, Chaitra K Meti, Poojitha K, Divya R.K.

Abstract:

Wireless Sensor Network is Multi hop Self-configuring Wireless Network consisting of sensor nodes. The deployment of wireless sensor networks in many application areas, e.g., aggregation services, requires self-organization of the network nodes into clusters. Efficient way to enhance the lifetime of the system is to partition the network into distinct clusters with a high energy node as cluster head. The different methods of node clustering techniques have appeared in the literature, and roughly fall into two families; those based on the construction of a dominating set and those which are based solely on energy considerations. Energy optimized cluster formation for a set of randomly scattered wireless sensors is presented. Sensors within a cluster are expected to be communicating with cluster head only. The energy constraint and limited computing resources of the sensor nodes present the major challenges in gathering the data. In this paper we propose a framework to study how partially correlated data affect the performance of clustering algorithms. The total energy consumption and network lifetime can be analyzed by combining random geometry techniques and rate distortion theory. We also present the relation between compression distortion and data correlation.

Keywords: Clusters, multi hop, random geometry, rate distortion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589
8102 Skin Lesion Segmentation Using Color Channel Optimization and Clustering-based Histogram Thresholding

Authors: Rahil Garnavi, Mohammad Aldeen, M. Emre Celebi, Alauddin Bhuiyan, Constantinos Dolianitis, George Varigos

Abstract:

Automatic segmentation of skin lesions is the first step towards the automated analysis of malignant melanoma. Although numerous segmentation methods have been developed, few studies have focused on determining the most effective color space for melanoma application. This paper proposes an automatic segmentation algorithm based on color space analysis and clustering-based histogram thresholding, a process which is able to determine the optimal color channel for detecting the borders in dermoscopy images. The algorithm is tested on a set of 30 high resolution dermoscopy images. A comprehensive evaluation of the results is provided, where borders manually drawn by four dermatologists, are compared to automated borders detected by the proposed algorithm, applying three previously used metrics of accuracy, sensitivity, and specificity and a new metric of similarity. By performing ROC analysis and ranking the metrics, it is demonstrated that the best results are obtained with the X and XoYoR color channels, resulting in an accuracy of approximately 97%. The proposed method is also compared with two state-of-theart skin lesion segmentation methods.

Keywords: Border detection, Color space analysis, Dermoscopy, Histogram thresholding, Melanoma, Segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
8101 K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

Authors: Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang

Abstract:

Matching high dimensional features between images is computationally expensive for exhaustive search approaches in computer vision. Although the dimension of the feature can be degraded by simplifying the prior knowledge of homography, matching accuracy may degrade as a tradeoff. In this paper, we present a feature matching method based on k-means algorithm that reduces the matching cost and matches the features between images instead of using a simplified geometric assumption. Experimental results show that the proposed method outperforms the previous linear exhaustive search approaches in terms of the inlier ratio of matched pairs.

Keywords: Feature matching, k-means clustering, scale invariant feature transform, linear exhaustive search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1035
8100 Enhancing Privacy-Preserving Cloud Database Querying by Preventing Brute Force Attacks

Authors: Ambika Vishal Pawar, Ajay Dani

Abstract:

Considering the complexities involved in Cloud computing, there are still plenty of issues that affect the privacy of data in cloud environment. Unless these problems get solved, we think that the problem of preserving privacy in cloud databases is still open. In tokenization and homomorphic cryptography based solutions for privacy preserving cloud database querying, there is possibility that by colluding with service provider adversary may run brute force attacks that will reveal the attribute values.

In this paper we propose a solution by defining the variant of K –means clustering algorithm that effectively detects such brute force attacks and enhances privacy of cloud database querying by preventing this attacks.

Keywords: Privacy, Database, Cloud Computing, Clustering, K-means, Cryptography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2511
8099 Energy Efficient Data Aggregation in Sensor Networks with Optimized Cluster Head Selection

Authors: D. Naga Ravi Kiran, C. G. Dethe

Abstract:

Wireless Sensor Network (WSN) routing is complex due to its dynamic nature, computational overhead, limited battery life, non-conventional addressing scheme, self-organization, and sensor nodes limited transmission range. An energy efficient routing protocol is a major concern in WSN. LEACH is a hierarchical WSN routing protocol to increase network life. It performs self-organizing and re-clustering functions for each round. This study proposes a better sensor networks cluster head selection for efficient data aggregation. The algorithm is based on Tabu search.

Keywords: Wireless Sensor Network (WSN), LEACH, Clustering, Tabu Search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1987
8098 A Symbol by Symbol Clustering Based Blind Equalizer

Authors: Kristina Georgoulakis

Abstract:

A new blind symbol by symbol equalizer is proposed. The operation of the proposed equalizer is based on the geometric properties of the two dimensional data constellation. An unsupervised clustering technique is used to locate the clusters formed by the received data. The symmetric properties of the clusters labels are subsequently utilized in order to label the clusters. Following this step, the received data are compared to clusters and decisions are made on a symbol by symbol basis, by assigning to each data the label of the nearest cluster. The operation of the equalizer is investigated both in linear and nonlinear channels. The performance of the proposed equalizer is compared to the performance of a CMAbased blind equalizer.

Keywords: Blind equalization, channel equalization, cluster based equalisers

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1379
8097 An Educational Data Mining System for Advising Higher Education Students

Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy

Abstract:

Educational  data mining  is  a  specific  data   mining field applied to data originating from educational environments, it relies on different  approaches to discover hidden knowledge  from  the  available   data. Among these approaches are   machine   learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.

In  our  research, we propose  a “Student  Advisory  Framework” that  utilizes  classification  and  clustering  to  build  an  intelligent system. This system can be used to provide pieces of consultations to a first year  university  student to  pursue a  certain   education   track   where  he/she  will  likely  succeed  in, aiming  to  decrease   the  high  rate   of  academic  failure   among these  students.  A real case study  in Cairo  Higher  Institute  for Engineering, Computer  Science  and  Management  is  presented using  real  dataset   collected  from  2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5165
8096 Novel Hybrid Method for Gene Selection and Cancer Prediction

Authors: Liping Jing, Michael K. Ng, Tieyong Zeng

Abstract:

Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.

Keywords: Gene Selection, Cancer Prediction, Lasso, Clustering, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1997
8095 Designing Social Care Policies in the Long Term: A Study Using Regression, Clustering and Backpropagation Neural Nets

Authors: Sotirios Raptis

Abstract:

Linking social needs to social classes using different criteria may lead to social services misuse. The paper discusses using ML and Neural Networks (NNs) in linking public services in Scotland in the long term and advocates, this can result in a reduction of the services cost connecting resources needed in groups for similar services. The paper combines typical regression models with clustering and cross-correlation as complementary constituents to predict the demand. Insurance companies and public policymakers can pack linked services such as those offered to the elderly or to low-income people in the longer term. The work is based on public data from 22 services offered by Public Health Services (PHS) Scotland and from the Scottish Government (SG) from 1981 to 2019 that are broken into 110 years series called factors and uses Linear Regression (LR), Autoregression (ARMA) and 3 types of back-propagation (BP) Neural Networks (BPNN) to link them under specific conditions. Relationships found were between smoking related healthcare provision, mental health-related health services, and epidemiological weight in Primary 1(Education) Body Mass Index (BMI) in children. Primary component analysis (PCA) found 11 significant factors while C-Means (CM) clustering gave 5 major factors clusters.

Keywords: Probability, cohorts, data frames, services, prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 380
8094 Dempster-Shafer Evidence Theory for Image Segmentation: Application in Cells Images

Authors: S. Ben Chaabane, M. Sayadi, F. Fnaiech, E. Brassart

Abstract:

In this paper we propose a new knowledge model using the Dempster-Shafer-s evidence theory for image segmentation and fusion. The proposed method is composed essentially of two steps. First, mass distributions in Dempster-Shafer theory are obtained from the membership degrees of each pixel covering the three image components (R, G and B). Each membership-s degree is determined by applying Fuzzy C-Means (FCM) clustering to the gray levels of the three images. Second, the fusion process consists in defining three discernment frames which are associated with the three images to be fused, and then combining them to form a new frame of discernment. The strategy used to define mass distributions in the combined framework is discussed in detail. The proposed fusion method is illustrated in the context of image segmentation. Experimental investigations and comparative studies with the other previous methods are carried out showing thus the robustness and superiority of the proposed method in terms of image segmentation.

Keywords: Fuzzy C-means, Color image, data fusion, Dempster-Shafer's evidence theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2159
8093 A Monte Carlo Method to Data Stream Analysis

Authors: Kittisak Kerdprasop, Nittaya Kerdprasop, Pairote Sattayatham

Abstract:

Data stream analysis is the process of computing various summaries and derived values from large amounts of data which are continuously generated at a rapid rate. The nature of a stream does not allow a revisit on each data element. Furthermore, data processing must be fast to produce timely analysis results. These requirements impose constraints on the design of the algorithms to balance correctness against timely responses. Several techniques have been proposed over the past few years to address these challenges. These techniques can be categorized as either dataoriented or task-oriented. The data-oriented approach analyzes a subset of data or a smaller transformed representation, whereas taskoriented scheme solves the problem directly via approximation techniques. We propose a hybrid approach to tackle the data stream analysis problem. The data stream has been both statistically transformed to a smaller size and computationally approximated its characteristics. We adopt a Monte Carlo method in the approximation step. The data reduction has been performed horizontally and vertically through our EMR sampling method. The proposed method is analyzed by a series of experiments. We apply our algorithm on clustering and classification tasks to evaluate the utility of our approach.

Keywords: Data Stream, Monte Carlo, Sampling, DensityEstimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1382
8092 Assessment of Ultra-High Cycle Fatigue Behavior of EN-GJL-250 Cast Iron Using Ultrasonic Fatigue Testing Machine

Authors: Saeedeh Bakhtiari, Johannes Depessemier, Stijn Hertelé, Wim De Waele

Abstract:

High cycle fatigue comprising up to 107 load cycles has been the subject of many studies, and the behavior of many materials was recorded adequately in this regime. However, many applications involve larger numbers of load cycles during the lifetime of machine components. In this ultra-high cycle regime, other failure mechanisms play, and the concept of a fatigue endurance limit (assumed for materials such as steel) is often an oversimplification of reality. When machine component design demands a high geometrical complexity, cast iron grades become interesting candidate materials. Grey cast iron is known for its low cost, high compressive strength, and good damping properties. However, the ultra-high cycle fatigue behavior of cast iron is poorly documented. The current work focuses on the ultra-high cycle fatigue behavior of EN-GJL-250 (GG25) grey cast iron by developing an ultrasonic (20 kHz) fatigue testing system. Moreover, the testing machine is instrumented to measure the temperature and the displacement of  the specimen, and to control the temperature. The high resonance frequency allowed to assess the  behavior of the cast iron of interest within a matter of days for ultra-high numbers of cycles, and repeat the tests to quantify the natural scatter in fatigue resistance.

Keywords: GG25, cast iron, ultra-high cycle fatigue, ultrasonic test.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 734
8091 Caffeine Content Investigation in the Turkish Black Teas

Authors: E. Moroydor Derun, A. S. Kipcak, O. Dere Ozdemir, F. Demir, M. Karakoc, S. Piskin

Abstract:

Tea is a widely consumed beverage that contains many components. Caffeine belongs to this group of components called alkaloids contain nitrogen. In this study caffeine contents of three types of Turkish teas are determined by using extraction method. After condensation process, residue of caffeine and oil are obtained with evaporation. The oil which is in the residue is removed by hot water. Extraction process performed by using chloroform and the crude caffeine is obtained. From the results of experiments, caffeine contents are found in black tea, green tea and earl grey tea as 3.57±0.43%, 3.11±0.02%, 4.29±0.27%, respectively. Caffeine contents which are found in 1, 5 and 10 cups of tea are calculated. Furthermore, the daily intake of caffeine from black teas that affects human health is investigated.

Keywords: Caffeine, extraction, tea, health.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8521
8090 Hybrid Hierarchical Routing Protocol for WSN Lifetime Maximization

Authors: H. Aoudia, Y. Touati, E. H. Teguig, A. Ali Cherif

Abstract:

Conceiving and developing routing protocols for wireless sensor networks requires considerations on constraints such as network lifetime and energy consumption. In this paper, we propose a hybrid hierarchical routing protocol named HHRP combining both clustering mechanism and multipath optimization taking into account residual energy and RSSI measures. HHRP consists of classifying dynamically nodes into clusters where coordinators nodes with extra privileges are able to manipulate messages, aggregate data and ensure transmission between nodes according to TDMA and CDMA schedules. The reconfiguration of the network is carried out dynamically based on a threshold value which is associated with the number of nodes belonging to the smallest cluster. To show the effectiveness of the proposed approach HHRP, a comparative study with LEACH protocol is illustrated in simulations.

Keywords: Routing protocols, energy optimization, clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 856