Search results for: Average Linkage Clustering (ALC)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5374

Search results for: Average Linkage Clustering (ALC)

5254 Spatial Scale of Clustering of Residential Burglary and Its Dependence on Temporal Scale

Authors: Mohammed A. Alazawi, Shiguo Jiang, Steven F. Messner

Abstract:

Research has long focused on two main spatial aspects of crime: spatial patterns and spatial processes. When analyzing these patterns and processes, a key issue has been to determine the proper spatial scale. In addition, it is important to consider the possibility that these patterns and processes might differ appreciably for different temporal scales and might vary across geographic units of analysis. We examine the spatial-temporal dependence of residential burglary. This dependence is tested at varying geographical scales and temporal aggregations. The analyses are based on recorded incidents of crime in Columbus, Ohio during the 1994-2002 period. We implement point pattern analysis on the crime points using Ripley’s K function. The results indicate that spatial point patterns of residential burglary reveal spatial scales of clustering relatively larger than the average size of census tracts of the study area. Also, spatial scale is independent of temporal scale. The results of our analyses concerning the geographic scale of spatial patterns and processes can inform the development of effective policies for crime control.

Keywords: inhomogeneous K function, residential burglary, spatial point pattern, spatial scale, temporal scale

Procedia PDF Downloads 319
5253 Health Trajectory Clustering Using Deep Belief Networks

Authors: Farshid Hajati, Federico Girosi, Shima Ghassempour

Abstract:

We present a Deep Belief Network (DBN) method for clustering health trajectories. Deep Belief Network (DBN) is a deep architecture that consists of a stack of Restricted Boltzmann Machines (RBM). In a deep architecture, each layer learns more complex features than the past layers. The proposed method depends on DBN in clustering without using back propagation learning algorithm. The proposed DBN has a better a performance compared to the deep neural network due the initialization of the connecting weights. We use Contrastive Divergence (CD) method for training the RBMs which increases the performance of the network. The performance of the proposed method is evaluated extensively on the Health and Retirement Study (HRS) database. The University of Michigan Health and Retirement Study (HRS) is a nationally representative longitudinal study that has surveyed more than 27,000 elderly and near-elderly Americans since its inception in 1992. Participants are interviewed every two years and they collect data on physical and mental health, insurance coverage, financial status, family support systems, labor market status, and retirement planning. The dataset is publicly available and we use the RAND HRS version L, which is easy to use and cleaned up version of the data. The size of sample data set is 268 and the length of the trajectories is equal to 10. The trajectories do not stop when the patient dies and represent 10 different interviews of live patients. Compared to the state-of-the-art benchmarks, the experimental results show the effectiveness and superiority of the proposed method in clustering health trajectories.

Keywords: health trajectory, clustering, deep learning, DBN

Procedia PDF Downloads 347
5252 Clustering of Panels and Shade Diffusion Techniques for Partially Shaded PV Array-Review

Authors: Shahida Khatoon, Mohd. Faisal Jalil, Vaishali Gautam

Abstract:

The Photovoltaic (PV) generated power is mainly dependent on environmental factors. The PV array’s lifetime and overall systems effectiveness reduce due to the partial shading condition. Clustering the electrical connections between solar modules is a viable strategy for minimizing these power losses by shade diffusion. This article comprehensively evaluates various PV array clustering/reconfiguration models for PV systems. These are static and dynamic reconfiguration techniques for extracting maximum power in mismatch conditions. This paper explores and analyzes current breakthroughs in solar PV performance improvement strategies that merit further investigation. Altogether, researchers and academicians working in the field of dedicated solar power generation will benefit from this research.

Keywords: static reconfiguration, dynamic reconfiguration, photo voltaic array, partial shading, CTC configuration

Procedia PDF Downloads 88
5251 Fast Short-Term Electrical Load Forecasting under High Meteorological Variability with a Multiple Equation Time Series Approach

Authors: Charline David, Alexandre Blondin Massé, Arnaud Zinflou

Abstract:

In 2016, Clements, Hurn, and Li proposed a multiple equation time series approach for the short-term load forecasting, reporting an average mean absolute percentage error (MAPE) of 1.36% on an 11-years dataset for the Queensland region in Australia. We present an adaptation of their model to the electrical power load consumption for the whole Quebec province in Canada. More precisely, we take into account two additional meteorological variables — cloudiness and wind speed — on top of temperature, as well as the use of multiple meteorological measurements taken at different locations on the territory. We also consider other minor improvements. Our final model shows an average MAPE score of 1:79% over an 8-years dataset.

Keywords: short-term load forecasting, special days, time series, multiple equations, parallelization, clustering

Procedia PDF Downloads 78
5250 Clustering-Based Computational Workload Minimization in Ontology Matching

Authors: Mansir Abubakar, Hazlina Hamdan, Norwati Mustapha, Teh Noranis Mohd Aris

Abstract:

In order to build a matching pattern for each class correspondences of ontology, it is required to specify a set of attribute correspondences across two corresponding classes by clustering. Clustering reduces the size of potential attribute correspondences considered in the matching activity, which will significantly reduce the computation workload; otherwise, all attributes of a class should be compared with all attributes of the corresponding class. Most existing ontology matching approaches lack scalable attributes discovery methods, such as cluster-based attribute searching. This problem makes ontology matching activity computationally expensive. It is therefore vital in ontology matching to design a scalable element or attribute correspondence discovery method that would reduce the size of potential elements correspondences during mapping thereby reduce the computational workload in a matching process as a whole. The objective of this work is 1) to design a clustering method for discovering similar attributes correspondences and relationships between ontologies, 2) to discover element correspondences by classifying elements of each class based on element’s value features using K-medoids clustering technique. Discovering attribute correspondence is highly required for comparing instances when matching two ontologies. During the matching process, any two instances across two different data sets should be compared to their attribute values, so that they can be regarded to be the same or not. Intuitively, any two instances that come from classes across which there is a class correspondence are likely to be identical to each other. Besides, any two instances that hold more similar attribute values are more likely to be matched than the ones with less similar attribute values. Most of the time, similar attribute values exist in the two instances across which there is an attribute correspondence. This work will present how to classify attributes of each class with K-medoids clustering, then, clustered groups to be mapped by their statistical value features. We will also show how to map attributes of a clustered group to attributes of the mapped clustered group, generating a set of potential attribute correspondences that would be applied to generate a matching pattern. The K-medoids clustering phase would largely reduce the number of attribute pairs that are not corresponding for comparing instances as only the coverage probability of attributes pairs that reaches 100% and attributes above the specified threshold can be considered as potential attributes for a matching. Using clustering will reduce the size of potential elements correspondences to be considered during mapping activity, which will in turn reduce the computational workload significantly. Otherwise, all element of the class in source ontology have to be compared with all elements of the corresponding classes in target ontology. K-medoids can ably cluster attributes of each class, so that a proportion of attribute pairs that are not corresponding would not be considered when constructing the matching pattern.

Keywords: attribute correspondence, clustering, computational workload, k-medoids clustering, ontology matching

Procedia PDF Downloads 230
5249 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 128
5248 An Integrated Label Propagation Network for Structural Condition Assessment

Authors: Qingsong Xiong, Cheng Yuan, Qingzhao Kong, Haibei Xiong

Abstract:

Deep-learning-driven approaches based on vibration responses have attracted larger attention in rapid structural condition assessment while obtaining sufficient measured training data with corresponding labels is relevantly costly and even inaccessible in practical engineering. This study proposes an integrated label propagation network for structural condition assessment, which is able to diffuse the labels from continuously-generating measurements by intact structure to those of missing labels of damage scenarios. The integrated network is embedded with damage-sensitive features extraction by deep autoencoder and pseudo-labels propagation by optimized fuzzy clustering, the architecture and mechanism which are elaborated. With a sophisticated network design and specified strategies for improving performance, the present network achieves to extends the superiority of self-supervised representation learning, unsupervised fuzzy clustering and supervised classification algorithms into an integration aiming at assessing damage conditions. Both numerical simulations and full-scale laboratory shaking table tests of a two-story building structure were conducted to validate its capability of detecting post-earthquake damage. The identifying accuracy of a present network was 0.95 in numerical validations and an average 0.86 in laboratory case studies, respectively. It should be noted that the whole training procedure of all involved models in the network stringently doesn’t rely upon any labeled data of damage scenarios but only several samples of intact structure, which indicates a significant superiority in model adaptability and feasible applicability in practice.

Keywords: autoencoder, condition assessment, fuzzy clustering, label propagation

Procedia PDF Downloads 78
5247 Scattering Operator and Spectral Clustering for Ultrasound Images: Application on Deep Venous Thrombi

Authors: Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier, Léo Fréchier, Barthélémy Hermenault

Abstract:

Deep Venous Thrombosis (DVT) occurs when a thrombus is formed within a deep vein (most often in the legs). This disease can be deadly if a part or the whole thrombus reaches the lung and causes a Pulmonary Embolism (PE). This disorder, often asymptomatic, has multifactorial causes: immobilization, surgery, pregnancy, age, cancers, and genetic variations. Our project aims to relate the thrombus epidemiology (origins, patient predispositions, PE) to its structure using ultrasound images. Ultrasonography and elastography were collected using Toshiba Aplio 500 at Brest Hospital. This manuscript compares two classification approaches: spectral clustering and scattering operator. The former is based on the graph and matrix theories while the latter cascades wavelet convolutions with nonlinear modulus and averaging operators.

Keywords: deep venous thrombosis, ultrasonography, elastography, scattering operator, wavelet, spectral clustering

Procedia PDF Downloads 458
5246 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures

Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat

Abstract:

In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.

Keywords: association rules, clustering, similarity measure, statistical approaches

Procedia PDF Downloads 297
5245 A Technique for Image Segmentation Using K-Means Clustering Classification

Authors: Sadia Basar, Naila Habib, Awais Adnan

Abstract:

The paper presents the Technique for Image Segmentation Using K-Means Clustering Classification. The presented algorithms were specific, however, missed the neighboring information and required high-speed computerized machines to run the segmentation algorithms. Clustering is the process of partitioning a group of data points into a small number of clusters. The proposed method is content-aware and feature extraction method which is able to run on low-end computerized machines, simple algorithm, required low-quality streaming, efficient and used for security purpose. It has the capability to highlight the boundary and the object. At first, the user enters the data in the representation of the input. Then in the next step, the digital image is converted into groups clusters. Clusters are divided into many regions. The same categories with same features of clusters are assembled within a group and different clusters are placed in other groups. Finally, the clusters are combined with respect to similar features and then represented in the form of segments. The clustered image depicts the clear representation of the digital image in order to highlight the regions and boundaries of the image. At last, the final image is presented in the form of segments. All colors of the image are separated in clusters.

Keywords: clustering, image segmentation, K-means function, local and global minimum, region

Procedia PDF Downloads 357
5244 Automatic Classification for the Degree of Disc Narrowing from X-Ray Images Using CNN

Authors: Kwangmin Joo

Abstract:

Automatic detection of lumbar vertebrae and classification method is proposed for evaluating the degree of disc narrowing. Prior to classification, deep learning based segmentation is applied to detect individual lumbar vertebra. M-net is applied to segment five lumbar vertebrae and fine-tuning segmentation is employed to improve the accuracy of segmentation. Using the features extracted from previous step, clustering technique, k-means clustering, is applied to estimate the degree of disc space narrowing under four grade scoring system. As preliminary study, techniques proposed in this research could help building an automatic scoring system to diagnose the severity of disc narrowing from X-ray images.

Keywords: Disc space narrowing, Degenerative disc disorders, Deep learning based segmentation, Clustering technique

Procedia PDF Downloads 109
5243 An Analysis on Clustering Based Gene Selection and Classification for Gene Expression Data

Authors: K. Sathishkumar, V. Thiagarasu

Abstract:

Due to recent advances in DNA microarray technology, it is now feasible to obtain gene expression profiles of tissue samples at relatively low costs. Many scientists around the world use the advantage of this gene profiling to characterize complex biological circumstances and diseases. Microarray techniques that are used in genome-wide gene expression and genome mutation analysis help scientists and physicians in understanding of the pathophysiological mechanisms, in diagnoses and prognoses, and choosing treatment plans. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. This work presents an analysis of several clustering algorithms proposed to deals with the gene expression data effectively. The existing clustering algorithms like Support Vector Machine (SVM), K-means algorithm and evolutionary algorithm etc. are analyzed thoroughly to identify the advantages and limitations. The performance evaluation of the existing algorithms is carried out to determine the best approach. In order to improve the classification performance of the best approach in terms of Accuracy, Convergence Behavior and processing time, a hybrid clustering based optimization approach has been proposed.

Keywords: microarray technology, gene expression data, clustering, gene Selection

Procedia PDF Downloads 303
5242 An Enhanced Distributed Weighted Clustering Algorithm for Intra and Inter Cluster Routing in MANET

Authors: K. Gomathi

Abstract:

Mobile Ad hoc Networks (MANET) is defined as collection of routable wireless mobile nodes with no centralized administration and communicate each other using radio signals. Especially MANETs deployed in hostile environments where hackers will try to disturb the secure data transfer and drain the valuable network resources. Since MANET is battery operated network, preserving the network resource is essential one. For resource constrained computation, efficient routing and to increase the network stability, the network is divided into smaller groups called clusters. The clustering architecture consists of Cluster Head(CH), ordinary node and gateway. The CH is responsible for inter and intra cluster routing. CH election is a prominent research area and many more algorithms are developed using many different metrics. The CH with longer life sustains network lifetime, for this purpose Secondary Cluster Head(SCH) also elected and it is more economical. To nominate efficient CH, a Enhanced Distributed Weighted Clustering Algorithm (EDWCA) has been proposed. This approach considers metrics like battery power, degree difference and speed of the node for CH election. The proficiency of proposed one is evaluated and compared with existing algorithm using Network Simulator(NS-2).

Keywords: MANET, EDWCA, clustering, cluster head

Procedia PDF Downloads 367
5241 Data Mining Spatial: Unsupervised Classification of Geographic Data

Authors: Chahrazed Zouaoui

Abstract:

In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.

Keywords: mining, GIS, geo-clustering, neighborhood

Procedia PDF Downloads 360
5240 Automatic LV Segmentation with K-means Clustering and Graph Searching on Cardiac MRI

Authors: Hae-Yeoun Lee

Abstract:

Quantification of cardiac function is performed by calculating blood volume and ejection fraction in routine clinical practice. However, these works have been performed by manual contouring,which requires computational costs and varies on the observer. In this paper, an automatic left ventricle segmentation algorithm on cardiac magnetic resonance images (MRI) is presented. Using knowledge on cardiac MRI, a K-mean clustering technique is applied to segment blood region on a coil-sensitivity corrected image. Then, a graph searching technique is used to correct segmentation errors from coil distortion and noises. Finally, blood volume and ejection fraction are calculated. Using cardiac MRI from 15 subjects, the presented algorithm is tested and compared with manual contouring by experts to show outstanding performance.

Keywords: cardiac MRI, graph searching, left ventricle segmentation, K-means clustering

Procedia PDF Downloads 383
5239 Multi-Environment Quantitative Trait Loci Mapping for Grain Iron and Zinc Content Using Bi-Parental Recombinant Inbred Lines in Pearl Millet

Authors: Tripti Singhal, C. Tara Satyavathi, S. P. Singh, Aruna Kumar, Mukesh Sankar S., C. Bhardwaj, Mallik M., Jayant Bhat, N. Anuradha, Nirupma Singh

Abstract:

Pearl millet is a climate-resilient nutritious crop. We report iron and zinc content QTLs from 3 divergent locations. The content of grain Fe in the RILs ranged between 36 and 114 mg/kg, and that of Zn from 20 to 106 mg/kg across the three years at over 3 locations (Delhi, Dharwad, and Jodhpur). We used SSRs to generate a linkage map using 210 F₆ RIL derived from the (PPMI 683 × PPMI 627) cross. The linkage map of 151 loci was 3403.6 cM in length. QTL analysis revealed a total of 22 QTLs for both traits at all locations. Inside QTLs, candidate genes were identified using bioinformatics approaches.

Keywords: yield, pearl millet, QTL mapping, multi-environment, RILs

Procedia PDF Downloads 119
5238 Machine Learning Approach for Lateralization of Temporal Lobe Epilepsy

Authors: Samira-Sadat JamaliDinan, Haidar Almohri, Mohammad-Reza Nazem-Zadeh

Abstract:

Lateralization of temporal lobe epilepsy (TLE) is very important for positive surgical outcomes. We propose a machine learning framework to ultimately identify the epileptogenic hemisphere for temporal lobe epilepsy (TLE) cases using magnetoencephalography (MEG) coherence source imaging (CSI) and diffusion tensor imaging (DTI). Unlike most studies that use classification algorithms, we propose an effective clustering approach to distinguish between normal and TLE cases. We apply the famous Minkowski weighted K-Means (MWK-Means) technique as the clustering framework. To overcome the problem of poor initialization of K-Means, we use particle swarm optimization (PSO) to effectively select the initial centroids of clusters prior to applying MWK-Means. We demonstrate that compared to K-means and MWK-means independently, this approach is able to improve the result of a benchmark data set.

Keywords: temporal lobe epilepsy, machine learning, clustering, magnetoencephalography

Procedia PDF Downloads 130
5237 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques

Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar

Abstract:

Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.

Keywords: share of electricity generation, k-means clustering, discriminant, CO2 emission

Procedia PDF Downloads 398
5236 A Feature Clustering-Based Sequential Selection Approach for Color Texture Classification

Authors: Mohamed Alimoussa, Alice Porebski, Nicolas Vandenbroucke, Rachid Oulad Haj Thami, Sana El Fkihi

Abstract:

Color and texture are highly discriminant visual cues that provide an essential information in many types of images. Color texture representation and classification is therefore one of the most challenging problems in computer vision and image processing applications. Color textures can be represented in different color spaces by using multiple image descriptors which generate a high dimensional set of texture features. In order to reduce the dimensionality of the feature set, feature selection techniques can be used. The goal of feature selection is to find a relevant subset from an original feature space that can improve the accuracy and efficiency of a classification algorithm. Traditionally, feature selection is focused on removing irrelevant features, neglecting the possible redundancy between relevant ones. This is why some feature selection approaches prefer to use feature clustering analysis to aid and guide the search. These techniques can be divided into two categories. i) Feature clustering-based ranking algorithm uses feature clustering as an analysis that comes before feature ranking. Indeed, after dividing the feature set into groups, these approaches perform a feature ranking in order to select the most discriminant feature of each group. ii) Feature clustering-based subset search algorithms can use feature clustering following one of three strategies; as an initial step that comes before the search, binded and combined with the search or as the search alternative and replacement. In this paper, we propose a new feature clustering-based sequential selection approach for the purpose of color texture representation and classification. Our approach is a three step algorithm. First, irrelevant features are removed from the feature set thanks to a class-correlation measure. Then, introducing a new automatic feature clustering algorithm, the feature set is divided into several feature clusters. Finally, a sequential search algorithm, based on a filter model and a separability measure, builds a relevant and non redundant feature subset: at each step, a feature is selected and features of the same cluster are removed and thus not considered thereafter. This allows to significantly speed up the selection process since large number of redundant features are eliminated at each step. The proposed algorithm uses the clustering algorithm binded and combined with the search. Experiments using a combination of two well known texture descriptors, namely Haralick features extracted from Reduced Size Chromatic Co-occurence Matrices (RSCCMs) and features extracted from Local Binary patterns (LBP) image histograms, on five color texture data sets, Outex, NewBarktex, Parquet, Stex and USPtex demonstrate the efficiency of our method compared to seven of the state of the art methods in terms of accuracy and computation time.

Keywords: feature selection, color texture classification, feature clustering, color LBP, chromatic cooccurrence matrix

Procedia PDF Downloads 111
5235 Graph Clustering Unveiled: ClusterSyn - A Machine Learning Framework for Predicting Anti-Cancer Drug Synergy Scores

Authors: Babak Bahri, Fatemeh Yassaee Meybodi, Changiz Eslahchi

Abstract:

In the pursuit of effective cancer therapies, the exploration of combinatorial drug regimens is crucial to leverage synergistic interactions between drugs, thereby improving treatment efficacy and overcoming drug resistance. However, identifying synergistic drug pairs poses challenges due to the vast combinatorial space and limitations of experimental approaches. This study introduces ClusterSyn, a machine learning (ML)-powered framework for classifying anti-cancer drug synergy scores. ClusterSyn employs a two-step approach involving drug clustering and synergy score prediction using a fully connected deep neural network. For each cell line in the training dataset, a drug graph is constructed, with nodes representing drugs and edge weights denoting synergy scores between drug pairs. Drugs are clustered using the Markov clustering (MCL) algorithm, and vectors representing the similarity of drug pairs to each cluster are input into the deep neural network for synergy score prediction (synergy or antagonism). Clustering results demonstrate effective grouping of drugs based on synergy scores, aligning similar synergy profiles. Subsequently, neural network predictions and synergy scores of the two drugs on others within their clusters are used to predict the synergy score of the considered drug pair. This approach facilitates comparative analysis with clustering and regression-based methods, revealing the superior performance of ClusterSyn over state-of-the-art methods like DeepSynergy and DeepDDS on diverse datasets such as Oniel and Almanac. The results highlight the remarkable potential of ClusterSyn as a versatile tool for predicting anti-cancer drug synergy scores.

Keywords: drug synergy, clustering, prediction, machine learning., deep learning

Procedia PDF Downloads 52
5234 Evaluating the Understanding of the University Students (Basic Sciences and Engineering) about the Numerical Representation of the Average Rate of Change

Authors: Saeid Haghjoo, Ebrahim Reyhani, Fahimeh Kolahdouz

Abstract:

The present study aimed to evaluate the understanding of the students in Tehran universities (Iran) about the numerical representation of the average rate of change based on the Structure of Observed Learning Outcomes (SOLO). In the present descriptive-survey research, the statistical population included undergraduate students (basic sciences and engineering) in the universities of Tehran. The samples were 604 students selected by random multi-stage clustering. The measurement tool was a task whose face and content validity was confirmed by math and mathematics education professors. Using Cronbach's Alpha criterion, the reliability coefficient of the task was obtained 0.95, which verified its reliability. The collected data were analyzed by descriptive statistics and inferential statistics (chi-squared and independent t-tests) under SPSS-24 software. According to the SOLO model in the prestructural, unistructural, and multistructural levels, basic science students had a higher percentage of understanding than that of engineering students, although the outcome was inverse at the relational level. However, there was no significant difference in the average understanding of both groups. The results indicated that students failed to have a proper understanding of the numerical representation of the average rate of change, in addition to missconceptions when using physics formulas in solving the problem. In addition, multiple solutions were derived along with their dominant methods during the qualitative analysis. The current research proposed to focus on the context problems with approximate calculations and numerical representation, using software and connection common relations between math and physics in the teaching process of teachers and professors.

Keywords: average rate of change, context problems, derivative, numerical representation, SOLO taxonomy

Procedia PDF Downloads 79
5233 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement

Authors: Wang Lin, Li Zhiqiang

Abstract:

The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.

Keywords: behavior pattern, cooperative learning, data analyze, k-means clustering algorithm

Procedia PDF Downloads 161
5232 Analysing Industry Clustering to Develop Competitive Advantage for Wualai Silver Handicraft

Authors: Khanita Tumphasuwan

Abstract:

The Wualai community of Northern Thailand represents important intellectual and social capital and their silver handicraft products are desirable tourist souvenirs within Chiang Mai Province. This community has been in danger of losing this social and intellectual capital due to the application of an improper tool, the Scottish Enterprise model of clustering. This research aims to analyze and increase its competitive advantages for preventing the loss of social and intellectual capital. To improve the Wualai’s competitive advantage, analysis is undertaken using a Porterian cluster approach, including the diamond model, five forces model and cluster mapping. Research results suggest that utilizing the community’s Buddhist beliefs can foster collaboration between community members and is the only way to improve cluster effectiveness, increase competitive advantage, and in turn conserve the Wualai community.

Keywords: industry clustering, silver handicraft, competitive advantage, intellectual capital, social capital

Procedia PDF Downloads 540
5231 An Intrusion Detection Systems Based on K-Means, K-Medoids and Support Vector Clustering Using Ensemble

Authors: A. Mohammadpour, Ebrahim Najafi Kajabad, Ghazale Ipakchi

Abstract:

Presently, computer networks’ security rise in importance and many studies have also been conducted in this field. By the penetration of the internet networks in different fields, many things need to be done to provide a secure industrial and non-industrial network. Fire walls, appropriate Intrusion Detection Systems (IDS), encryption protocols for information sending and receiving, and use of authentication certificated are among things, which should be considered for system security. The aim of the present study is to use the outcome of several algorithms, which cause decline in IDS errors, in the way that improves system security and prevents additional overload to the system. Finally, regarding the obtained result we can also detect the amount and percentage of more sub attacks. By running the proposed system, which is based on the use of multi-algorithmic outcome and comparing that by the proposed single algorithmic methods, we observed a 78.64% result in attack detection that is improved by 3.14% than the proposed algorithms.

Keywords: intrusion detection systems, clustering, k-means, k-medoids, SV clustering, ensemble

Procedia PDF Downloads 199
5230 Filtering Intrusion Detection Alarms Using Ant Clustering Approach

Authors: Ghodhbani Salah, Jemili Farah

Abstract:

With the growth of cyber attacks, information safety has become an important issue all over the world. Many firms rely on security technologies such as intrusion detection systems (IDSs) to manage information technology security risks. IDSs are considered to be the last line of defense to secure a network and play a very important role in detecting large number of attacks. However the main problem with today’s most popular commercial IDSs is generating high volume of alerts and huge number of false positives. This drawback has become the main motivation for many research papers in IDS area. Hence, in this paper we present a data mining technique to assist network administrators to analyze and reduce false positive alarms that are produced by an IDS and increase detection accuracy. Our data mining technique is unsupervised clustering method based on hybrid ANT algorithm. This algorithm discovers clusters of intruders’ behavior without prior knowledge of a possible number of classes, then we apply K-means algorithm to improve the convergence of the ANT clustering. Experimental results on real dataset show that our proposed approach is efficient with high detection rate and low false alarm rate.

Keywords: intrusion detection system, alarm filtering, ANT class, ant clustering, intruders’ behaviors, false alarms

Procedia PDF Downloads 388
5229 Linkage Disequilibrium and Haplotype Blocks Study from Two High-Density Panels and a Combined Panel in Nelore Beef Cattle

Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari

Abstract:

Genotype imputation has been used to reduce genomic selections costs. In order to increase haplotype detection accuracy in methods that considers the linkage disequilibrium, another approach could be used, such as combined genotype data from different panels. Therefore, this study aimed to evaluate the linkage disequilibrium and haplotype blocks in two high-density panels before and after the imputation to a combined panel in Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip (IHD), wherein 93 animals (23 bulls and 70 progenies) were also genotyped with the Affymetrix Axion Genome-Wide BOS 1 Array Plate (AHD). After the quality control, 809 IHD animals (509,107 SNPs) and 93 AHD (427,875 SNPs) remained for analyses. The combined genotype panel (CP) was constructed by merging both panels after quality control, resulting in 880,336 SNPs. Imputation analysis was conducted using software FImpute v.2.2b. The reference (CP) and target (IHD) populations consisted of 23 bulls and 786 animals, respectively. The linkage disequilibrium and haplotype blocks studies were carried out for IHD, AHD, and imputed CP. Two linkage disequilibrium measures were considered; the correlation coefficient between alleles from two loci (r²) and the |D’|. Both measures were calculated using the software PLINK. The haplotypes' blocks were estimated using the software Haploview. The r² measurement presented different decay when compared to |D’|, wherein AHD and IHD had almost the same decay. For r², even with possible overestimation by the sample size for AHD (93 animals), the IHD presented higher values when compared to AHD for shorter distances, but with the increase of distance, both panels presented similar values. The r² measurement is influenced by the minor allele frequency of the pair of SNPs, which can cause the observed difference comparing the r² decay and |D’| decay. As a sum of the combinations between Illumina and Affymetrix panels, the CP presented a decay equivalent to a mean of these combinations. The estimated haplotype blocks detected for IHD, AHD, and CP were 84,529, 63,967, and 140,336, respectively. The IHD were composed by haplotype blocks with mean of 137.70 ± 219.05kb, the AHD with mean of 102.10kb ± 155.47, and the CP with mean of 107.10kb ± 169.14. The majority of the haplotype blocks of these three panels were composed by less than 10 SNPs, with only 3,882 (IHD), 193 (AHD) and 8,462 (CP) haplotype blocks composed by 10 SNPs or more. There was an increase in the number of chromosomes covered with long haplotypes when CP was used as well as an increase in haplotype coverage for short chromosomes (23-29), which can contribute for studies that explore haplotype blocks. In general, using CP could be an alternative to increase density and number of haplotype blocks, increasing the probability to obtain a marker close to a quantitative trait loci of interest.

Keywords: Bos taurus indicus, decay, genotype imputation, single nucleotide polymorphism

Procedia PDF Downloads 257
5228 A Comparative Analysis of Clustering Approaches for Understanding Patterns in Health Insurance Uptake: Evidence from Sociodemographic Kenyan Data

Authors: Nelson Kimeli Kemboi Yego, Juma Kasozi, Joseph Nkruzinza, Francis Kipkogei

Abstract:

The study investigated the low uptake of health insurance in Kenya despite efforts to achieve universal health coverage through various health insurance schemes. Unsupervised machine learning techniques were employed to identify patterns in health insurance uptake based on sociodemographic factors among Kenyan households. The aim was to identify key demographic groups that are underinsured and to provide insights for the development of effective policies and outreach programs. Using the 2021 FinAccess Survey, the study clustered Kenyan households based on their health insurance uptake and sociodemographic features to reveal patterns in health insurance uptake across the country. The effectiveness of k-prototypes clustering, hierarchical clustering, and agglomerative hierarchical clustering in clustering based on sociodemographic factors was compared. The k-prototypes approach was found to be the most effective at uncovering distinct and well-separated clusters in the Kenyan sociodemographic data related to health insurance uptake based on silhouette, Calinski-Harabasz, Davies-Bouldin, and Rand indices. Hence, it was utilized in uncovering the patterns in uptake. The results of the analysis indicate that inclusivity in health insurance is greatly related to affordability. The findings suggest that targeted policy interventions and outreach programs are necessary to increase health insurance uptake in Kenya, with the ultimate goal of achieving universal health coverage. The study provides important insights for policymakers and stakeholders in the health insurance sector to address the low uptake of health insurance and to ensure that healthcare services are accessible and affordable to all Kenyans, regardless of their socio-demographic status. The study highlights the potential of unsupervised machine learning techniques to provide insights into complex health policy issues and improve decision-making in the health sector.

Keywords: health insurance, unsupervised learning, clustering algorithms, machine learning

Procedia PDF Downloads 106
5227 Remote Assessment and Change Detection of GreenLAI of Cotton Crop Using Different Vegetation Indices

Authors: Ganesh B. Shinde, Vijaya B. Musande

Abstract:

Cotton crop identification based on the timely information has significant advantage to the different implications of food, economic and environment. Due to the significant advantages, the accurate detection of cotton crop regions using supervised learning procedure is challenging problem in remote sensing. Here, classifiers on the direct image are played a major role but the results are not much satisfactorily. In order to further improve the effectiveness, variety of vegetation indices are proposed in the literature. But, recently, the major challenge is to find the better vegetation indices for the cotton crop identification through the proposed methodology. Accordingly, fuzzy c-means clustering is combined with neural network algorithm, trained by Levenberg-Marquardt for cotton crop classification. To experiment the proposed method, five LISS-III satellite images was taken and the experimentation was done with six vegetation indices such as Simple Ratio, Normalized Difference Vegetation Index, Enhanced Vegetation Index, Green Atmospherically Resistant Vegetation Index, Wide-Dynamic Range Vegetation Index, Green Chlorophyll Index. Along with these indices, Green Leaf Area Index is also considered for investigation. From the research outcome, Green Atmospherically Resistant Vegetation Index outperformed with all other indices by reaching the average accuracy value of 95.21%.

Keywords: Fuzzy C-Means clustering (FCM), neural network, Levenberg-Marquardt (LM) algorithm, vegetation indices

Procedia PDF Downloads 294
5226 Applying Hybrid Graph Drawing and Clustering Methods on Stock Investment Analysis

Authors: Mouataz Zreika, Maria Estela Varua

Abstract:

Stock investment decisions are often made based on current events of the global economy and the analysis of historical data. Conversely, visual representation could assist investors’ gain deeper understanding and better insight on stock market trends more efficiently. The trend analysis is based on long-term data collection. The study adopts a hybrid method that combines the Clustering algorithm and Force-directed algorithm to overcome the scalability problem when visualizing large data. This method exemplifies the potential relationships between each stock, as well as determining the degree of strength and connectivity, which will provide investors another understanding of the stock relationship for reference. Information derived from visualization will also help them make an informed decision. The results of the experiments show that the proposed method is able to produced visualized data aesthetically by providing clearer views for connectivity and edge weights.

Keywords: clustering, force-directed, graph drawing, stock investment analysis

Procedia PDF Downloads 285
5225 An Interpretable Data-Driven Approach for the Stratification of the Cardiorespiratory Fitness

Authors: D.Mendes, J. Henriques, P. Carvalho, T. Rocha, S. Paredes, R. Cabiddu, R. Trimer, R. Mendes, A. Borghi-Silva, L. Kaminsky, E. Ashley, R. Arena, J. Myers

Abstract:

The continued exploration of clinically relevant predictive models continues to be an important pursuit. Cardiorespiratory fitness (CRF) portends clinical vital information and as such its accurate prediction is of high importance. Therefore, the aim of the current study was to develop a data-driven model, based on computational intelligence techniques and, in particular, clustering approaches, to predict CRF. Two prediction models were implemented and compared: 1) the traditional Wasserman/Hansen Equations; and 2) an interpretable clustering approach. Data used for this analysis were from the 'FRIEND - Fitness Registry and the Importance of Exercise: The National Data Base'; in the present study a subset of 10690 apparently healthy individuals were utilized. The accuracy of the models was performed through the computation of sensitivity, specificity, and geometric mean values. The results show the superiority of the clustering approach in the accurate estimation of CRF (i.e., maximal oxygen consumption).

Keywords: cardiorespiratory fitness, data-driven models, knowledge extraction, machine learning

Procedia PDF Downloads 264