Search results for: grey clustering method
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 19064

Search results for: grey clustering method

18914 Water Detection in Aerial Images Using Fuzzy Sets

Authors: Caio Marcelo Nunes, Anderson da Silva Soares, Gustavo Teodoro Laureano, Clarimar Jose Coelho

Abstract:

This paper presents a methodology to pixel recognition in aerial images using fuzzy $c$-means algorithm. This algorithm is a alternative to recognize areas considering uncertainties and inaccuracies. Traditional clustering technics are used in recognizing of multispectral images of earth's surface. This technics recognize well-defined borders that can be easily discretized. However, in the real world there are many areas with uncertainties and inaccuracies which can be mapped by clustering algorithms that use fuzzy sets. The methodology presents in this work is applied to multispectral images obtained from Landsat-5/TM satellite. The pixels are joined using the $c$-means algorithm. After, a classification process identify the types of surface according the patterns obtained from spectral response of image surface. The classes considered are, exposed soil, moist soil, vegetation, turbid water and clean water. The results obtained shows that the fuzzy clustering identify the real type of the earth's surface.

Keywords: aerial images, fuzzy clustering, image processing, pattern recognition

Procedia PDF Downloads 451
18913 Using Genetic Algorithms and Rough Set Based Fuzzy K-Modes to Improve Centroid Model Clustering Performance on Categorical Data

Authors: Rishabh Srivastav, Divyam Sharma

Abstract:

We propose an algorithm to cluster categorical data named as ‘Genetic algorithm initialized rough set based fuzzy K-Modes for categorical data’. We propose an amalgamation of the simple K-modes algorithm, the Rough and Fuzzy set based K-modes and the Genetic Algorithm to form a new algorithm,which we hypothesise, will provide better Centroid Model clustering results, than existing standard algorithms. In the proposed algorithm, the initialization and updation of modes is done by the use of genetic algorithms while the membership values are calculated using the rough set and fuzzy logic.

Keywords: categorical data, fuzzy logic, genetic algorithm, K modes clustering, rough sets

Procedia PDF Downloads 226
18912 Hybridized Approach for Distance Estimation Using K-Means Clustering

Authors: Ritu Vashistha, Jitender Kumar

Abstract:

Clustering using the K-means algorithm is a very common way to understand and analyze the obtained output data. When a similar object is grouped, this is called the basis of Clustering. There is K number of objects and C number of cluster in to single cluster in which k is always supposed to be less than C having each cluster to be its own centroid but the major problem is how is identify the cluster is correct based on the data. Formulation of the cluster is not a regular task for every tuple of row record or entity but it is done by an iterative process. Each and every record, tuple, entity is checked and examined and similarity dissimilarity is examined. So this iterative process seems to be very lengthy and unable to give optimal output for the cluster and time taken to find the cluster. To overcome the drawback challenge, we are proposing a formula to find the clusters at the run time, so this approach can give us optimal results. The proposed approach uses the Euclidian distance formula as well melanosis to find the minimum distance between slots as technically we called clusters and the same approach we have also applied to Ant Colony Optimization(ACO) algorithm, which results in the production of two and multi-dimensional matrix.

Keywords: ant colony optimization, data clustering, centroids, data mining, k-means

Procedia PDF Downloads 109
18911 Clustering of Panels and Shade Diffusion Techniques for Partially Shaded PV Array-Review

Authors: Shahida Khatoon, Mohd. Faisal Jalil, Vaishali Gautam

Abstract:

The Photovoltaic (PV) generated power is mainly dependent on environmental factors. The PV array’s lifetime and overall systems effectiveness reduce due to the partial shading condition. Clustering the electrical connections between solar modules is a viable strategy for minimizing these power losses by shade diffusion. This article comprehensively evaluates various PV array clustering/reconfiguration models for PV systems. These are static and dynamic reconfiguration techniques for extracting maximum power in mismatch conditions. This paper explores and analyzes current breakthroughs in solar PV performance improvement strategies that merit further investigation. Altogether, researchers and academicians working in the field of dedicated solar power generation will benefit from this research.

Keywords: static reconfiguration, dynamic reconfiguration, photo voltaic array, partial shading, CTC configuration

Procedia PDF Downloads 87
18910 Probabilistic Graphical Model for the Web

Authors: M. Nekri, A. Khelladi

Abstract:

The world wide web network is a network with a complex topology, the main properties of which are the distribution of degrees in power law, A low clustering coefficient and a weak average distance. Modeling the web as a graph allows locating the information in little time and consequently offering a help in the construction of the research engine. Here, we present a model based on the already existing probabilistic graphs with all the aforesaid characteristics. This work will consist in studying the web in order to know its structuring thus it will enable us to modelize it more easily and propose a possible algorithm for its exploration.

Keywords: clustering coefficient, preferential attachment, small world, web community

Procedia PDF Downloads 246
18909 Enhancing the Network Security with Gray Code

Authors: Thomas Adi Purnomo Sidhi

Abstract:

Nowadays, network is an essential need in almost every part of human daily activities. People now can seamlessly connect to others through the Internet. With advanced technology, our personal data now can be more easily accessed. One of many components we are concerned for delivering the best network is a security issue. This paper is proposing a method that provides more options for security. This research aims to improve network security by focusing on the physical layer which is the first layer of the OSI model. The layer consists of the basic networking hardware transmission technologies of a network. With the use of observation method, the research produces a schematic design for enhancing the network security through the gray code converter.

Keywords: network, network security, grey code, physical layer

Procedia PDF Downloads 479
18908 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI

Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer

Abstract:

In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.

Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting

Procedia PDF Downloads 500
18907 Scattering Operator and Spectral Clustering for Ultrasound Images: Application on Deep Venous Thrombi

Authors: Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier, Léo Fréchier, Barthélémy Hermenault

Abstract:

Deep Venous Thrombosis (DVT) occurs when a thrombus is formed within a deep vein (most often in the legs). This disease can be deadly if a part or the whole thrombus reaches the lung and causes a Pulmonary Embolism (PE). This disorder, often asymptomatic, has multifactorial causes: immobilization, surgery, pregnancy, age, cancers, and genetic variations. Our project aims to relate the thrombus epidemiology (origins, patient predispositions, PE) to its structure using ultrasound images. Ultrasonography and elastography were collected using Toshiba Aplio 500 at Brest Hospital. This manuscript compares two classification approaches: spectral clustering and scattering operator. The former is based on the graph and matrix theories while the latter cascades wavelet convolutions with nonlinear modulus and averaging operators.

Keywords: deep venous thrombosis, ultrasonography, elastography, scattering operator, wavelet, spectral clustering

Procedia PDF Downloads 458
18906 The Optimization of an Industrial Recycling Line: Improving the Durability of Recycled Polyethyene Blends

Authors: Alae Lamtai, Said Elkoun, Hniya Kharmoudi, Mathieu Robert, Carl Diez

Abstract:

This study applies Taguchi's design of experiment methodology and grey relational analysis (GRA) for multi objective optimization of an industrial recycling line. This last is composed mainly of a mono and twin-screw extruder and a filtration system. Experiments were performed according to L₁₆ standard orthogonal array based on five process parameters, namely: mono screw design, screw speed of the mono and twin-screw extruder, melt pump pressure, and filter mesh size. The objective of this optimization is to improve the durability of the Polyethylene (PE) blend by decreasing the loss of Stress Crack resistance (SCR) using Notched Crack Ligament Stress (NCLS) test and Unnotched Crack Ligament Stress (UCLS) in parallel with increasing the gain of Izod impact strength of the Polyethylene (PE) blend before and after recycling. Based on Grey Relational Analysis (GRA), the optimal setting of process parameters was identified, and the results indicated that the mono-screw design and screw speed of both mono and twin-screw extruder impact significantly the mechanical properties of recycled Polyethylene (PE) blend.

Keywords: Taguchi, recycling line, polyethylene, stress crack resistance, Izod impact strength, grey relational analysis

Procedia PDF Downloads 50
18905 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures

Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat

Abstract:

In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.

Keywords: association rules, clustering, similarity measure, statistical approaches

Procedia PDF Downloads 297
18904 An Analysis on Clustering Based Gene Selection and Classification for Gene Expression Data

Authors: K. Sathishkumar, V. Thiagarasu

Abstract:

Due to recent advances in DNA microarray technology, it is now feasible to obtain gene expression profiles of tissue samples at relatively low costs. Many scientists around the world use the advantage of this gene profiling to characterize complex biological circumstances and diseases. Microarray techniques that are used in genome-wide gene expression and genome mutation analysis help scientists and physicians in understanding of the pathophysiological mechanisms, in diagnoses and prognoses, and choosing treatment plans. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. This work presents an analysis of several clustering algorithms proposed to deals with the gene expression data effectively. The existing clustering algorithms like Support Vector Machine (SVM), K-means algorithm and evolutionary algorithm etc. are analyzed thoroughly to identify the advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. In order to improve the classification performance of the best approach in terms of Accuracy, Convergence Behavior and processing time, a hybrid clustering based optimization approach has been proposed.

Keywords: microarray technology, gene expression data, clustering, gene Selection

Procedia PDF Downloads 303
18903 An Enhanced Distributed Weighted Clustering Algorithm for Intra and Inter Cluster Routing in MANET

Authors: K. Gomathi

Abstract:

Mobile Ad hoc Networks (MANET) is defined as collection of routable wireless mobile nodes with no centralized administration and communicate each other using radio signals. Especially MANETs deployed in hostile environments where hackers will try to disturb the secure data transfer and drain the valuable network resources. Since MANET is battery operated network, preserving the network resource is essential one. For resource constrained computation, efficient routing and to increase the network stability, the network is divided into smaller groups called clusters. The clustering architecture consists of Cluster Head(CH), ordinary node and gateway. The CH is responsible for inter and intra cluster routing. CH election is a prominent research area and many more algorithms are developed using many different metrics. The CH with longer life sustains network lifetime, for this purpose Secondary Cluster Head(SCH) also elected and it is more economical. To nominate efficient CH, a Enhanced Distributed Weighted Clustering Algorithm (EDWCA) has been proposed. This approach considers metrics like battery power, degree difference and speed of the node for CH election. The proficiency of proposed one is evaluated and compared with existing algorithm using Network Simulator(NS-2).

Keywords: MANET, EDWCA, clustering, cluster head

Procedia PDF Downloads 367
18902 Production Optimization under Geological Uncertainty Using Distance-Based Clustering

Authors: Byeongcheol Kang, Junyi Kim, Hyungsik Jung, Hyungjun Yang, Jaewoo An, Jonggeun Choe

Abstract:

It is important to figure out reservoir properties for better production management. Due to the limited information, there are geological uncertainties on very heterogeneous or channel reservoir. One of the solutions is to generate multiple equi-probable realizations using geostatistical methods. However, some models have wrong properties, which need to be excluded for simulation efficiency and reliability. We propose a novel method of model selection scheme, based on distance-based clustering for reliable application of production optimization algorithm. Distance is defined as a degree of dissimilarity between the data. We calculate Hausdorff distance to classify the models based on their similarity. Hausdorff distance is useful for shape matching of the reservoir models. We use multi-dimensional scaling (MDS) to describe the models on two dimensional space and group them by K-means clustering. Rather than simulating all models, we choose one representative model from each cluster and find out the best model, which has the similar production rates with the true values. From the process, we can select good reservoir models near the best model with high confidence. We make 100 channel reservoir models using single normal equation simulation (SNESIM). Since oil and gas prefer to flow through the sand facies, it is critical to characterize pattern and connectivity of the channels in the reservoir. After calculating Hausdorff distances and projecting the models by MDS, we can see that the models assemble depending on their channel patterns. These channel distributions affect operation controls of each production well so that the model selection scheme improves management optimization process. We use one of useful global search algorithms, particle swarm optimization (PSO), for our production optimization. PSO is good to find global optimum of objective function, but it takes too much time due to its usage of many particles and iterations. In addition, if we use multiple reservoir models, the simulation time for PSO will be soared. By using the proposed method, we can select good and reliable models that already matches production data. Considering geological uncertainty of the reservoir, we can get well-optimized production controls for maximum net present value. The proposed method shows one of novel solutions to select good cases among the various probabilities. The model selection schemes can be applied to not only production optimization but also history matching or other ensemble-based methods for efficient simulations.

Keywords: distance-based clustering, geological uncertainty, particle swarm optimization (PSO), production optimization

Procedia PDF Downloads 122
18901 Important Factors Affecting the Effectiveness of Quality Control Circles

Authors: Sogol Zarafshan

Abstract:

The present study aimed to identify important factors affecting the effectiveness of quality control circles in a hospital, as well as rank them using a combination of fuzzy VIKOR and Grey Relational Analysis (GRA). The study population consisted of five academic members and five experts in the field of nursing working in a hospital, who were selected using a purposive sampling method. Also, a sample of 107 nurses was selected through a simple random sampling method using their employee codes and the random-number table. The required data were collected using a researcher-made questionnaire which consisted of 12 factors. The validity of this questionnaire was confirmed through giving the opinions of experts and academic members who participated in the present study, as well as performing confirmatory factor analysis. Its reliability also was verified (α=0.796). The collected data were analyzed using SPSS 22.0 and LISREL 8.8, as well as VIKOR–GRA and IPA methods. The results of ranking the factors affecting the effectiveness of quality control circles showed that the highest and lowest ranks were related to ‘Managers’ and supervisors’ support’ and ‘Group leadership’. Also, the highest hospital performance was for factors such as ‘Clear goals and objectives’ and ‘Group cohesiveness and homogeneity’, and the lowest for ‘Reward system’ and ‘Feedback system’, respectively. The results showed that although ‘Training the members’, ‘Using the right tools’ and ‘Reward system’ were factors that were of great importance, the organization’s performance for these factors was poor. Therefore, these factors should be paid more attention by the studied hospital managers and should be improved as soon as possible.

Keywords: Quality control circles, Fuzzy VIKOR, Grey Relational Analysis, Importance–Performance Analysis

Procedia PDF Downloads 114
18900 Zero Energy Buildings in Hot-Humid Tropical Climates: Boundaries of the Energy Optimization Grey Zone

Authors: Nakul V. Naphade, Sandra G. L. Persiani, Yew Wah Wong, Pramod S. Kamath, Avinash H. Anantharam, Hui Ling Aw, Yann Grynberg

Abstract:

Achieving zero-energy targets in existing buildings is known to be a difficult task requiring important cuts in the building energy consumption, which in many cases clash with the functional necessities of the building wherever the on-site energy generation is unable to match the overall energy consumption. Between the building’s consumption optimization limit and the energy, target stretches a case-specific optimization grey zone, which requires tailored intervention and enhanced user’s commitment. In the view of the future adoption of more stringent energy-efficiency targets in the context of hot-humid tropical climates, this study aims to define the energy optimization grey zone by assessing the energy-efficiency limit in the state-of-the-art typical mid- and high-rise full AC office buildings, through the integration of currently available technologies. Energy models of two code-compliant generic office-building typologies were developed as a baseline, a 20-storey ‘high-rise’ and a 7-storey ‘mid-rise’. Design iterations carried out on the energy models with advanced market ready technologies in lighting, envelope, plug load management and ACMV systems and controls, lead to a representative energy model of the current maximum technical potential. The simulations showed that ZEB targets could be achieved in fully AC buildings under an average of seven floors only by compromising on energy-intense facilities (as full AC, unlimited power-supply, standard user behaviour, etc.). This paper argues that drastic changes must be made in tropical buildings to span the energy optimization grey zone and achieve zero energy. Fully air-conditioned areas must be rethought, while smart technologies must be integrated with an aggressive involvement and motivation of the users to synchronize with the new system’s energy savings goal.

Keywords: energy simulation, office building, tropical climate, zero energy buildings

Procedia PDF Downloads 160
18899 Automatic LV Segmentation with K-means Clustering and Graph Searching on Cardiac MRI

Authors: Hae-Yeoun Lee

Abstract:

Quantification of cardiac function is performed by calculating blood volume and ejection fraction in routine clinical practice. However, these works have been performed by manual contouring,which requires computational costs and varies on the observer. In this paper, an automatic left ventricle segmentation algorithm on cardiac magnetic resonance images (MRI) is presented. Using knowledge on cardiac MRI, a K-mean clustering technique is applied to segment blood region on a coil-sensitivity corrected image. Then, a graph searching technique is used to correct segmentation errors from coil distortion and noises. Finally, blood volume and ejection fraction are calculated. Using cardiac MRI from 15 subjects, the presented algorithm is tested and compared with manual contouring by experts to show outstanding performance.

Keywords: cardiac MRI, graph searching, left ventricle segmentation, K-means clustering

Procedia PDF Downloads 383
18898 An Energy-Balanced Clustering Method on Wireless Sensor Networks

Authors: Yu-Ting Tsai, Chiun-Chieh Hsu, Yu-Chun Chu

Abstract:

In recent years, due to the development of wireless network technology, many researchers have devoted to the study of wireless sensor networks. The applications of wireless sensor network mainly use the sensor nodes to collect the required information, and send the information back to the users. Since the sensed area is difficult to reach, there are many restrictions on the design of the sensor nodes, where the most important restriction is the limited energy of sensor nodes. Because of the limited energy, researchers proposed a number of ways to reduce energy consumption and balance the load of sensor nodes in order to increase the network lifetime. In this paper, we proposed the Energy-Balanced Clustering method with Auxiliary Members on Wireless Sensor Networks(EBCAM)based on the cluster routing. The main purpose is to balance the energy consumption on the sensed area and average the distribution of dead nodes in order to avoid excessive energy consumption because of the increasing in transmission distance. In addition, we use the residual energy and average energy consumption of the nodes within the cluster to choose the cluster heads, use the multi hop transmission method to deliver the data, and dynamically adjust the transmission radius according to the load conditions. Finally, we use the auxiliary cluster members to change the delivering path according to the residual energy of the cluster head in order to its load. Finally, we compare the proposed method with the related algorithms via simulated experiments and then analyze the results. It reveals that the proposed method outperforms other algorithms in the numbers of used rounds and the average energy consumption.

Keywords: auxiliary nodes, cluster, load balance, routing algorithm, wireless sensor network

Procedia PDF Downloads 257
18897 Machine Learning Approach for Lateralization of Temporal Lobe Epilepsy

Authors: Samira-Sadat JamaliDinan, Haidar Almohri, Mohammad-Reza Nazem-Zadeh

Abstract:

Lateralization of temporal lobe epilepsy (TLE) is very important for positive surgical outcomes. We propose a machine learning framework to ultimately identify the epileptogenic hemisphere for temporal lobe epilepsy (TLE) cases using magnetoencephalography (MEG) coherence source imaging (CSI) and diffusion tensor imaging (DTI). Unlike most studies that use classification algorithms, we propose an effective clustering approach to distinguish between normal and TLE cases. We apply the famous Minkowski weighted K-Means (MWK-Means) technique as the clustering framework. To overcome the problem of poor initialization of K-Means, we use particle swarm optimization (PSO) to effectively select the initial centroids of clusters prior to applying MWK-Means. We demonstrate that compared to K-means and MWK-means independently, this approach is able to improve the result of a benchmark data set.

Keywords: temporal lobe epilepsy, machine learning, clustering, magnetoencephalography

Procedia PDF Downloads 130
18896 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: evolutionary computation, feature selection, classification, clustering

Procedia PDF Downloads 344
18895 Visualization and Performance Measure to Determine Number of Topics in Twitter Data Clustering Using Hybrid Topic Modeling

Authors: Moulana Mohammed

Abstract:

Topic models are widely used in building clusters of documents for more than a decade, yet problems occurring in choosing optimal number of topics. The main problem is the lack of a stable metric of the quality of topics obtained during the construction of topic models. The authors analyzed from previous works, most of the models used in determining the number of topics are non-parametric and quality of topics determined by using perplexity and coherence measures and concluded that they are not applicable in solving this problem. In this paper, we used the parametric method, which is an extension of the traditional topic model with visual access tendency for visualization of the number of topics (clusters) to complement clustering and to choose optimal number of topics based on results of cluster validity indices. Developed hybrid topic models are demonstrated with different Twitter datasets on various topics in obtaining the optimal number of topics and in measuring the quality of clusters. The experimental results showed that the Visual Non-negative Matrix Factorization (VNMF) topic model performs well in determining the optimal number of topics with interactive visualization and in performance measure of the quality of clusters with validity indices.

Keywords: interactive visualization, visual mon-negative matrix factorization model, optimal number of topics, cluster validity indices, Twitter data clustering

Procedia PDF Downloads 114
18894 A Clustering-Based Approach for Weblog Data Cleaning

Authors: Amine Ganibardi, Cherif Arab Ali

Abstract:

This paper addresses the data cleaning issue as a part of web usage data preprocessing within the scope of Web Usage Mining. Weblog data recorded by web servers within log files reflect usage activity, i.e., End-users’ clicks and underlying user-agents’ hits. As Web Usage Mining is interested in End-users’ behavior, user-agents’ hits are referred to as noise to be cleaned-off before mining. Filtering hits from clicks is not trivial for two reasons, i.e., a server records requests interlaced in sequential order regardless of their source or type, website resources may be set up as requestable interchangeably by end-users and user-agents. The current methods are content-centric based on filtering heuristics of relevant/irrelevant items in terms of some cleaning attributes, i.e., website’s resources filetype extensions, website’s resources pointed by hyperlinks/URIs, http methods, user-agents, etc. These methods need exhaustive extra-weblog data and prior knowledge on the relevant and/or irrelevant items to be assumed as clicks or hits within the filtering heuristics. Such methods are not appropriate for dynamic/responsive Web for three reasons, i.e., resources may be set up to as clickable by end-users regardless of their type, website’s resources are indexed by frame names without filetype extensions, web contents are generated and cancelled differently from an end-user to another. In order to overcome these constraints, a clustering-based cleaning method centered on the logging structure is proposed. This method focuses on the statistical properties of the logging structure at the requested and referring resources attributes levels. It is insensitive to logging content and does not need extra-weblog data. The used statistical property takes on the structure of the generated logging feature by webpage requests in terms of clicks and hits. Since a webpage consists of its single URI and several components, these feature results in a single click to multiple hits ratio in terms of the requested and referring resources. Thus, the clustering-based method is meant to identify two clusters based on the application of the appropriate distance to the frequency matrix of the requested and referring resources levels. As the ratio clicks to hits is single to multiple, the clicks’ cluster is the smallest one in requests number. Hierarchical Agglomerative Clustering based on a pairwise distance (Gower) and average linkage has been applied to four logfiles of dynamic/responsive websites whose click to hits ratio range from 1/2 to 1/15. The optimal clustering set on the basis of average linkage and maximum inter-cluster inertia results always in two clusters. The evaluation of the smallest cluster referred to as clicks cluster under the terms of confusion matrix indicators results in 97% of true positive rate. The content-centric cleaning methods, i.e., conventional and advanced cleaning, resulted in a lower rate 91%. Thus, the proposed clustering-based cleaning outperforms the content-centric methods within dynamic and responsive web design without the need of any extra-weblog. Such an improvement in cleaning quality is likely to refine dependent analysis.

Keywords: clustering approach, data cleaning, data preprocessing, weblog data, web usage data

Procedia PDF Downloads 158
18893 Clustering-Based Threshold Model for Condition Rating of Concrete Bridge Decks

Authors: M. Alsharqawi, T. Zayed, S. Abu Dabous

Abstract:

To ensure safety and serviceability of bridge infrastructure, accurate condition assessment and rating methods are needed to provide basis for bridge Maintenance, Repair and Replacement (MRR) decisions. In North America, the common practices to assess condition of bridges are through visual inspection. These practices are limited to detect surface defects and external flaws. Further, the thresholds that define the severity of bridge deterioration are selected arbitrarily. The current research discusses the main deteriorations and defects identified during visual inspection and Non-Destructive Evaluation (NDE). NDE techniques are becoming popular in augmenting the visual examination during inspection to detect subsurface defects. Quality inspection data and accurate condition assessment and rating are the basis for determining appropriate MRR decisions. Thus, in this paper, a novel method for bridge condition assessment using the Quality Function Deployment (QFD) theory is utilized. The QFD model is designed to provide an integrated condition by evaluating both the surface and subsurface defects for concrete bridges. Moreover, an integrated condition rating index with four thresholds is developed based on the QFD condition assessment model and using K-means clustering technique. Twenty case studies are analyzed by applying the QFD model and implementing the developed rating index. The results from the analyzed case studies show that the proposed threshold model produces robust MRR recommendations consistent with decisions and recommendations made by bridge managers on these projects. The proposed method is expected to advance the state of the art of bridges condition assessment and rating.

Keywords: concrete bridge decks, condition assessment and rating, quality function deployment, k-means clustering technique

Procedia PDF Downloads 199
18892 Graph Clustering Unveiled: ClusterSyn - A Machine Learning Framework for Predicting Anti-Cancer Drug Synergy Scores

Authors: Babak Bahri, Fatemeh Yassaee Meybodi, Changiz Eslahchi

Abstract:

In the pursuit of effective cancer therapies, the exploration of combinatorial drug regimens is crucial to leverage synergistic interactions between drugs, thereby improving treatment efficacy and overcoming drug resistance. However, identifying synergistic drug pairs poses challenges due to the vast combinatorial space and limitations of experimental approaches. This study introduces ClusterSyn, a machine learning (ML)-powered framework for classifying anti-cancer drug synergy scores. ClusterSyn employs a two-step approach involving drug clustering and synergy score prediction using a fully connected deep neural network. For each cell line in the training dataset, a drug graph is constructed, with nodes representing drugs and edge weights denoting synergy scores between drug pairs. Drugs are clustered using the Markov clustering (MCL) algorithm, and vectors representing the similarity of drug pairs to each cluster are input into the deep neural network for synergy score prediction (synergy or antagonism). Clustering results demonstrate effective grouping of drugs based on synergy scores, aligning similar synergy profiles. Subsequently, neural network predictions and synergy scores of the two drugs on others within their clusters are used to predict the synergy score of the considered drug pair. This approach facilitates comparative analysis with clustering and regression-based methods, revealing the superior performance of ClusterSyn over state-of-the-art methods like DeepSynergy and DeepDDS on diverse datasets such as Oniel and Almanac. The results highlight the remarkable potential of ClusterSyn as a versatile tool for predicting anti-cancer drug synergy scores.

Keywords: drug synergy, clustering, prediction, machine learning., deep learning

Procedia PDF Downloads 51
18891 A Grey-Box Text Attack Framework Using Explainable AI

Authors: Esther Chiramal, Kelvin Soh Boon Kai

Abstract:

Explainable AI is a strong strategy implemented to understand complex black-box model predictions in a human-interpretable language. It provides the evidence required to execute the use of trustworthy and reliable AI systems. On the other hand, however, it also opens the door to locating possible vulnerabilities in an AI model. Traditional adversarial text attack uses word substitution, data augmentation techniques, and gradient-based attacks on powerful pre-trained Bidirectional Encoder Representations from Transformers (BERT) variants to generate adversarial sentences. These attacks are generally white-box in nature and not practical as they can be easily detected by humans e.g., Changing the word from “Poor” to “Rich”. We proposed a simple yet effective Grey-box cum Black-box approach that does not require the knowledge of the model while using a set of surrogate Transformer/BERT models to perform the attack using Explainable AI techniques. As Transformers are the current state-of-the-art models for almost all Natural Language Processing (NLP) tasks, an attack generated from BERT1 is transferable to BERT2. This transferability is made possible due to the attention mechanism in the transformer that allows the model to capture long-range dependencies in a sequence. Using the power of BERT generalisation via attention, we attempt to exploit how transformers learn by attacking a few surrogate transformer variants which are all based on a different architecture. We demonstrate that this approach is highly effective to generate semantically good sentences by changing as little as one word that is not detectable by humans while still fooling other BERT models.

Keywords: BERT, explainable AI, Grey-box text attack, transformer

Procedia PDF Downloads 117
18890 Altered Network Organization in Mild Alzheimer's Disease Compared to Mild Cognitive Impairment Using Resting-State EEG

Authors: Chia-Feng Lu, Yuh-Jen Wang, Shin Teng, Yu-Te Wu, Sui-Hing Yan

Abstract:

Brain functional networks based on resting-state EEG data were compared between patients with mild Alzheimer’s disease (mAD) and matched patients with amnestic subtype of mild cognitive impairment (aMCI). We integrated the time–frequency cross mutual information (TFCMI) method to estimate the EEG functional connectivity between cortical regions and the network analysis based on graph theory to further investigate the alterations of functional networks in mAD compared with aMCI group. We aimed at investigating the changes of network integrity, local clustering, information processing efficiency, and fault tolerance in mAD brain networks for different frequency bands based on several topological properties, including degree, strength, clustering coefficient, shortest path length, and efficiency. Results showed that the disruptions of network integrity and reductions of network efficiency in mAD characterized by lower degree, decreased clustering coefficient, higher shortest path length, and reduced global and local efficiencies in the delta, theta, beta2, and gamma bands were evident. The significant changes in network organization can be used in assisting discrimination of mAD from aMCI in clinical.

Keywords: EEG, functional connectivity, graph theory, TFCMI

Procedia PDF Downloads 415
18889 Genetic Variation among the Wild and Hatchery Raised Populations of Labeo rohita Revealed by RAPD Markers

Authors: Fayyaz Rasool, Shakeela Parveen

Abstract:

The studies on genetic diversity of Labeo rohita by using molecular markers were carried out to investigate the genetic structure by RAPAD marker and the levels of polymorphism and similarity amongst the different groups of five populations of wild and farmed types. The samples were collected from different five locations as representatives of wild and hatchery raised populations. RAPAD data for Jaccard’s coefficient by following the un-weighted Pair Group Method with Arithmetic Mean (UPGMA) for Hierarchical Clustering of the similar groups on the basis of similarity amongst the genotypes and the dendrogram generated divided the randomly selected individuals of the five populations into three classes/clusters. The variance decomposition for the optimal classification values remained as 52.11% for within class variation, while 47.89% for the between class differences. The Principal Component Analysis (PCA) for grouping of the different genotypes from the different environmental conditions was done by Spearman Varimax rotation method for bi-plot generation of the co-occurrence of the same genotypes with similar genetic properties and specificity of different primers indicated clearly that the increase in the number of factors or components was correlated with the decrease in eigenvalues. The Kaiser Criterion based upon the eigenvalues greater than one, first two main factors accounted for 58.177% of cumulative variability.

Keywords: variation, clustering, PCA, wild, hatchery, RAPAD, Labeo rohita

Procedia PDF Downloads 423
18888 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 270
18887 A Model Based Metaheuristic for Hybrid Hierarchical Community Structure in Social Networks

Authors: Radhia Toujani, Jalel Akaichi

Abstract:

In recent years, the study of community detection in social networks has received great attention. The hierarchical structure of the network leads to the emergence of the convergence to a locally optimal community structure. In this paper, we aim to avoid this local optimum in the introduced hybrid hierarchical method. To achieve this purpose, we present an objective function where we incorporate the value of structural and semantic similarity based modularity and a metaheuristic namely bees colonies algorithm to optimize our objective function on both hierarchical level divisive and agglomerative. In order to assess the efficiency and the accuracy of the introduced hybrid bee colony model, we perform an extensive experimental evaluation on both synthetic and real networks.

Keywords: social network, community detection, agglomerative hierarchical clustering, divisive hierarchical clustering, similarity, modularity, metaheuristic, bee colony

Procedia PDF Downloads 360
18886 Analysing Industry Clustering to Develop Competitive Advantage for Wualai Silver Handicraft

Authors: Khanita Tumphasuwan

Abstract:

The Wualai community of Northern Thailand represents important intellectual and social capital and their silver handicraft products are desirable tourist souvenirs within Chiang Mai Province. This community has been in danger of losing this social and intellectual capital due to the application of an improper tool, the Scottish Enterprise model of clustering. This research aims to analyze and increase its competitive advantages for preventing the loss of social and intellectual capital. To improve the Wualai’s competitive advantage, analysis is undertaken using a Porterian cluster approach, including the diamond model, five forces model and cluster mapping. Research results suggest that utilizing the community’s Buddhist beliefs can foster collaboration between community members and is the only way to improve cluster effectiveness, increase competitive advantage, and in turn conserve the Wualai community.

Keywords: industry clustering, silver handicraft, competitive advantage, intellectual capital, social capital

Procedia PDF Downloads 535
18885 An Intrusion Detection Systems Based on K-Means, K-Medoids and Support Vector Clustering Using Ensemble

Authors: A. Mohammadpour, Ebrahim Najafi Kajabad, Ghazale Ipakchi

Abstract:

Presently, computer networks’ security rise in importance and many studies have also been conducted in this field. By the penetration of the internet networks in different fields, many things need to be done to provide a secure industrial and non-industrial network. Fire walls, appropriate Intrusion Detection Systems (IDS), encryption protocols for information sending and receiving, and use of authentication certificated are among things, which should be considered for system security. The aim of the present study is to use the outcome of several algorithms, which cause decline in IDS errors, in the way that improves system security and prevents additional overload to the system. Finally, regarding the obtained result we can also detect the amount and percentage of more sub attacks. By running the proposed system, which is based on the use of multi-algorithmic outcome and comparing that by the proposed single algorithmic methods, we observed a 78.64% result in attack detection that is improved by 3.14% than the proposed algorithms.

Keywords: intrusion detection systems, clustering, k-means, k-medoids, SV clustering, ensemble

Procedia PDF Downloads 199