Search results for: data clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24389

Search results for: data clustering

24209 An Embarrassingly Simple Semi-supervised Approach to Increase Recall in Online Shopping Domain to Match Structured Data with Unstructured Data

Authors: Sachin Nagargoje

Abstract:

Complete labeled data is often difficult to obtain in a practical scenario. Even if one manages to obtain the data, the quality of the data is always in question. In shopping vertical, offers are the input data, which is given by advertiser with or without a good quality of information. In this paper, an author investigated the possibility of using a very simple Semi-supervised learning approach to increase the recall of unhealthy offers (has badly written Offer Title or partial product details) in shopping vertical domain. The author found that the semisupervised learning method had improved the recall in the Smart Phone category by 30% on A=B testing on 10% traffic and increased the YoY (Year over Year) number of impressions per month by 33% at production. This also made a significant increase in Revenue, but that cannot be publicly disclosed.

Keywords: semi-supervised learning, clustering, recall, coverage

Procedia PDF Downloads 92
24208 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 411
24207 The Analyzer: Clustering Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human Computer Interaction

Authors: Dona Shaini Abhilasha Nanayakkara, Kurugamage Jude Pravinda Gregory Perera

Abstract:

E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. The Analyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling The Analyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.

Keywords: data clustering, data standardization, dimensionality reduction, human computer interaction, user profiling

Procedia PDF Downloads 36
24206 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization

Authors: K. Umbleja, M. Ichino, H. Yaguchi

Abstract:

In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.

Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data

Procedia PDF Downloads 137
24205 Enhanced Cluster Based Connectivity Maintenance in Vehicular Ad Hoc Network

Authors: Manverpreet Kaur, Amarpreet Singh

Abstract:

The demand of Vehicular ad hoc networks is increasing day by day, due to offering the various applications and marvelous benefits to VANET users. Clustering in VANETs is most important to overcome the connectivity problems of VANETs. In this paper, we proposed a new clustering technique Enhanced cluster based connectivity maintenance in vehicular ad hoc network. Our objective is to form long living clusters. The proposed approach is grouping the vehicles, on the basis of the longest list of neighbors to form clusters. The cluster formation and cluster head selection process done by the RSU that may results it reduces the chances of overhead on to the network. The cluster head selection procedure is the vehicle which has closest speed to average speed will elect as a cluster Head by the RSU and if two vehicles have same speed which is closest to average speed then they will be calculate by one of the new parameter i.e. distance to their respective destination. The vehicle which has largest distance to their destination will be choosing as a cluster Head by the RSU. Our simulation outcomes show that our technique performs better than the existing technique.

Keywords: VANETs, clustering, connectivity, cluster head, intelligent transportation system (ITS)

Procedia PDF Downloads 210
24204 Radar on Bike: Coarse Classification based on Multi-Level Clustering for Cyclist Safety Enhancement

Authors: Asma Omri, Noureddine Benothman, Sofiane Sayahi, Fethi Tlili, Hichem Besbes

Abstract:

Cycling, a popular mode of transportation, can also be perilous due to cyclists' vulnerability to collisions with vehicles and obstacles. This paper presents an innovative cyclist safety system based on radar technology designed to offer real-time collision risk warnings to cyclists. The system incorporates a low-power radar sensor affixed to the bicycle and connected to a microcontroller. It leverages radar point cloud detections, a clustering algorithm, and a supervised classifier. These algorithms are optimized for efficiency to run on the TI’s AWR 1843 BOOST radar, utilizing a coarse classification approach distinguishing between cars, trucks, two-wheeled vehicles, and other objects. To enhance the performance of clustering techniques, we propose a 2-Level clustering approach. This approach builds on the state-of-the-art Density-based spatial clustering of applications with noise (DBSCAN). The objective is to first cluster objects based on their velocity, then refine the analysis by clustering based on position. The initial level identifies groups of objects with similar velocities and movement patterns. The subsequent level refines the analysis by considering the spatial distribution of these objects. The clusters obtained from the first level serve as input for the second level of clustering. Our proposed technique surpasses the classical DBSCAN algorithm in terms of geometrical metrics, including homogeneity, completeness, and V-score. Relevant cluster features are extracted and utilized to classify objects using an SVM classifier. Potential obstacles are identified based on their velocity and proximity to the cyclist. To optimize the system, we used the View of Delft dataset for hyperparameter selection and SVM classifier training. The system's performance was assessed using our collected dataset of radar point clouds synchronized with a camera on an Nvidia Jetson Nano board. The radar-based cyclist safety system is a practical solution that can be easily installed on any bicycle and connected to smartphones or other devices, offering real-time feedback and navigation assistance to cyclists. We conducted experiments to validate the system's feasibility, achieving an impressive 85% accuracy in the classification task. This system has the potential to significantly reduce the number of accidents involving cyclists and enhance their safety on the road.

Keywords: 2-level clustering, coarse classification, cyclist safety, warning system based on radar technology

Procedia PDF Downloads 46
24203 Performance Analysis of Deterministic Stable Election Protocol Using Fuzzy Logic in Wireless Sensor Network

Authors: Sumanpreet Kaur, Harjit Pal Singh, Vikas Khullar

Abstract:

In Wireless Sensor Network (WSN), the sensor containing motes (nodes) incorporate batteries that can lament at some extent. To upgrade the energy utilization, clustering is one of the prototypical approaches for split sensor motes into a number of clusters where one mote (also called as node) proceeds as a Cluster Head (CH). CH selection is one of the optimization techniques for enlarging stability and network lifespan. Deterministic Stable Election Protocol (DSEP) is an effectual clustering protocol that makes use of three kinds of nodes with dissimilar residual energy for CH election. Fuzzy Logic technology is used to expand energy level of DSEP protocol by using fuzzy inference system. This paper presents protocol DSEP using Fuzzy Logic (DSEP-FL) CH by taking into account four linguistic variables such as energy, concentration, centrality and distance to base station. Simulation results show that our proposed method gives more effective results in term of a lifespan of network and stability as compared to the performance of other clustering protocols.

Keywords: DSEP, fuzzy logic, energy model, WSN

Procedia PDF Downloads 170
24202 Regression Analysis in Estimating Stream-Flow and the Effect of Hierarchical Clustering Analysis: A Case Study in Euphrates-Tigris Basin

Authors: Goksel Ezgi Guzey, Bihrat Onoz

Abstract:

The scarcity of streamflow gauging stations and the increasing effects of global warming cause designing water management systems to be very difficult. This study is a significant contribution to assessing regional regression models for estimating streamflow. In this study, simulated meteorological data was related to the observed streamflow data from 1971 to 2020 for 33 stream gauging stations of the Euphrates-Tigris Basin. Ordinary least squares regression was used to predict flow for 2020-2100 with the simulated meteorological data. CORDEX- EURO and CORDEX-MENA domains were used with 0.11 and 0.22 grids, respectively, to estimate climate conditions under certain climate scenarios. Twelve meteorological variables simulated by two regional climate models, RCA4 and RegCM4, were used as independent variables in the ordinary least squares regression, where the observed streamflow was the dependent variable. The variability of streamflow was then calculated with 5-6 meteorological variables and watershed characteristics such as area and height prior to the application. Of the regression analysis of 31 stream gauging stations' data, the stations were subjected to a clustering analysis, which grouped the stations in two clusters in terms of their hydrometeorological properties. Two streamflow equations were found for the two clusters of stream gauging stations for every domain and every regional climate model, which increased the efficiency of streamflow estimation by a range of 10-15% for all the models. This study underlines the importance of homogeneity of a region in estimating streamflow not only in terms of the geographical location but also in terms of the meteorological characteristics of that region.

Keywords: hydrology, streamflow estimation, climate change, hydrologic modeling, HBV, hydropower

Procedia PDF Downloads 95
24201 Progressive Multimedia Collection Structuring via Scene Linking

Authors: Aman Berhe, Camille Guinaudeau, Claude Barras

Abstract:

In order to facilitate information seeking in large collections of multimedia documents with long and progressive content (such as broadcast news or TV series), one can extract the semantic links that exist between semantically coherent parts of documents, i.e., scenes. The links can then create a coherent collection of scenes from which it is easier to perform content analysis, topic extraction, or information retrieval. In this paper, we focus on TV series structuring and propose two approaches for scene linking at different levels of granularity (episode and season): a fuzzy online clustering technique and a graph-based community detection algorithm. When evaluated on the two first seasons of the TV series Game of Thrones, we found that the fuzzy online clustering approach performed better compared to graph-based community detection at the episode level, while graph-based approaches show better performance at the season level.

Keywords: multimedia collection structuring, progressive content, scene linking, fuzzy clustering, community detection

Procedia PDF Downloads 70
24200 Clustering Using Cooperative Multihop Mini-Groups in Wireless Sensor Network: A Novel Approach

Authors: Virender Ranga, Mayank Dave, Anil Kumar Verma

Abstract:

Recently wireless sensor networks (WSNs) are used in many real life applications like environmental monitoring, habitat monitoring, health monitoring etc. Due to power constraint cheaper devices used in these applications, the energy consumption of each device should be kept as low as possible such that network operates for longer period of time. One of the techniques to prolong the network lifetime is an intelligent grouping of sensor nodes such that they can perform their operation in cooperative and energy efficient manner. With this motivation, we propose a novel approach by organize the sensor nodes in cooperative multihop mini-groups so that the total global energy consumption of the network can be reduced and network lifetime can be improved. Our proposed approach also reduces the number of transmitted messages inside the WSNs, which further minimizes the energy consumption of the whole network. The experimental simulations show that our proposed approach outperforms over the state-of-the-art approach in terms of stability period and aggregated data.

Keywords: clustering, cluster-head, mini-group, stability period

Procedia PDF Downloads 323
24199 Bioinformatic Approaches in Population Genetics and Phylogenetic Studies

Authors: Masoud Sheidai

Abstract:

Biologists with a special field of population genetics and phylogeny have different research tasks such as populations’ genetic variability and divergence, species relatedness, the evolution of genetic and morphological characters, and identification of DNA SNPs with adaptive potential. To tackle these problems and reach a concise conclusion, they must use the proper and efficient statistical and bioinformatic methods as well as suitable genetic and morphological characteristics. In recent years application of different bioinformatic and statistical methods, which are based on various well-documented assumptions, are the proper analytical tools in the hands of researchers. The species delineation is usually carried out with the use of different clustering methods like K-means clustering based on proper distance measures according to the studied features of organisms. A well-defined species are assumed to be separated from the other taxa by molecular barcodes. The species relationships are studied by using molecular markers, which are analyzed by different analytical methods like multidimensional scaling (MDS) and principal coordinate analysis (PCoA). The species population structuring and genetic divergence are usually investigated by PCoA and PCA methods and a network diagram. These are based on bootstrapping of data. The Association of different genes and DNA sequences to ecological and geographical variables is determined by LFMM (Latent factor mixed model) and redundancy analysis (RDA), which are based on Bayesian and distance methods. Molecular and morphological differentiating characters in the studied species may be identified by linear discriminant analysis (DA) and discriminant analysis of principal components (DAPC). We shall illustrate these methods and related conclusions by giving examples from different edible and medicinal plant species.

Keywords: GWAS analysis, K-Means clustering, LFMM, multidimensional scaling, redundancy analysis

Procedia PDF Downloads 90
24198 Geographic Legacies for Modern Day Disease Research: Autism Spectrum Disorder as a Case-Control Study

Authors: Rebecca Richards Steed, James Van Derslice, Ken Smith, Richard Medina, Amanda Bakian

Abstract:

Elucidating gene-environment interactions for heritable disease outcomes is an emerging area of disease research, with genetic studies informing hypotheses for environment and gene interactions underlying some of the most confounding diseases of our time, like autism spectrum disorder (ASD). Geography has thus far played a key role in identifying environmental factors contributing to disease, but its use can be broadened to include genetic and environmental factors that have a synergistic effect on disease. Through the use of family pedigrees and disease outcomes with life-course residential histories, space-time clustering of generations at critical developmental windows can provide further understanding of (1) environmental factors that contribute to disease patterns in families, (2) susceptible critical windows of development most impacted by environment, (3) and that are most likely to lead to an ASD diagnosis. This paper introduces a retrospective case-control study that utilizes pedigree data, health data, and residential life-course location points to find space-time clustering of ancestors with a grandchild/child with a clinical diagnosis of ASD. Finding space-time clusters of ancestors at critical developmental windows serves as a proxy for shared environmental exposures. The authors refer to geographic life-course exposures as geographic legacies. Identifying space-time clusters of ancestors creates a bridge for researching exposures of past generations that may impact modern-day progeny health. Results from the space-time cluster analysis show multiple clusters for the maternal and paternal pedigrees. The paternal grandparent pedigree resulted in the most space-time clustering for birth and childhood developmental windows. No statistically significant clustering was found for adolescent years. These results will be further studied to identify the specific share of space-time environmental exposures. In conclusion, this study has found significant space-time clusters of parents, and grandparents for both maternal and paternal lineage. These results will be used to identify what environmental exposures have been shared with family members at critical developmental windows of time, and additional analysis will be applied.

Keywords: family pedigree, environmental exposure, geographic legacy, medical geography, transgenerational inheritance

Procedia PDF Downloads 88
24197 Fault-Detection and Self-Stabilization Protocol for Wireless Sensor Networks

Authors: Ather Saeed, Arif Khan, Jeffrey Gosper

Abstract:

Sensor devices are prone to errors and sudden node failures, which are difficult to detect in a timely manner when deployed in real-time, hazardous, large-scale harsh environments and in medical emergencies. Therefore, the loss of data can be life-threatening when the sensed phenomenon is not disseminated due to sudden node failure, battery depletion or temporary malfunctioning. We introduce a set of partial differential equations for localizing faults, similar to Green’s and Maxwell’s equations used in Electrostatics and Electromagnetism. We introduce a node organization and clustering scheme for self-stabilizing sensor networks. Green’s theorem is applied to regions where the curve is closed and continuously differentiable to ensure network connectivity. Experimental results show that the proposed GTFD (Green’s Theorem fault-detection and Self-stabilization) protocol not only detects faulty nodes but also accurately generates network stability graphs where urgent intervention is required for dynamically self-stabilizing the network.

Keywords: Green’s Theorem, self-stabilization, fault-localization, RSSI, WSN, clustering

Procedia PDF Downloads 40
24196 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting

Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi

Abstract:

An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.

Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power

Procedia PDF Downloads 376
24195 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 344
24194 Data-Driven Market Segmentation in Hospitality Using Unsupervised Machine Learning

Authors: Rik van Leeuwen, Ger Koole

Abstract:

Within hospitality, marketing departments use segmentation to create tailored strategies to ensure personalized marketing. This study provides a data-driven approach by segmenting guest profiles via hierarchical clustering based on an extensive set of features. The industry requires understandable outcomes that contribute to adaptability for marketing departments to make data-driven decisions and ultimately driving profit. A marketing department specified a business question that guides the unsupervised machine learning algorithm. Features of guests change over time; therefore, there is a probability that guests transition from one segment to another. The purpose of the study is to provide steps in the process from raw data to actionable insights, which serve as a guideline for how hospitality companies can adopt an algorithmic approach.

Keywords: hierarchical cluster analysis, hospitality, market segmentation

Procedia PDF Downloads 75
24193 The Data Quality Model for the IoT based Real-time Water Quality Monitoring Sensors

Authors: Rabbia Idrees, Ananda Maiti, Saurabh Garg, Muhammad Bilal Amin

Abstract:

IoT devices are the basic building blocks of IoT network that generate enormous volume of real-time and high-speed data to help organizations and companies to take intelligent decisions. To integrate this enormous data from multisource and transfer it to the appropriate client is the fundamental of IoT development. The handling of this huge quantity of devices along with the huge volume of data is very challenging. The IoT devices are battery-powered and resource-constrained and to provide energy efficient communication, these IoT devices go sleep or online/wakeup periodically and a-periodically depending on the traffic loads to reduce energy consumption. Sometime these devices get disconnected due to device battery depletion. If the node is not available in the network, then the IoT network provides incomplete, missing, and inaccurate data. Moreover, many IoT applications, like vehicle tracking and patient tracking require the IoT devices to be mobile. Due to this mobility, If the distance of the device from the sink node become greater than required, the connection is lost. Due to this disconnection other devices join the network for replacing the broken-down and left devices. This make IoT devices dynamic in nature which brings uncertainty and unreliability in the IoT network and hence produce bad quality of data. Due to this dynamic nature of IoT devices we do not know the actual reason of abnormal data. If data are of poor-quality decisions are likely to be unsound. It is highly important to process data and estimate data quality before bringing it to use in IoT applications. In the past many researchers tried to estimate data quality and provided several Machine Learning (ML), stochastic and statistical methods to perform analysis on stored data in the data processing layer, without focusing the challenges and issues arises from the dynamic nature of IoT devices and how it is impacting data quality. A comprehensive review on determining the impact of dynamic nature of IoT devices on data quality is done in this research and presented a data quality model that can deal with this challenge and produce good quality of data. This research presents the data quality model for the sensors monitoring water quality. DBSCAN clustering and weather sensors are used in this research to make data quality model for the sensors monitoring water quality. An extensive study has been done in this research on finding the relationship between the data of weather sensors and sensors monitoring water quality of the lakes and beaches. The detailed theoretical analysis has been presented in this research mentioning correlation between independent data streams of the two sets of sensors. With the help of the analysis and DBSCAN, a data quality model is prepared. This model encompasses five dimensions of data quality: outliers’ detection and removal, completeness, patterns of missing values and checks the accuracy of the data with the help of cluster’s position. At the end, the statistical analysis has been done on the clusters formed as the result of DBSCAN, and consistency is evaluated through Coefficient of Variation (CoV).

Keywords: clustering, data quality, DBSCAN, and Internet of things (IoT)

Procedia PDF Downloads 107
24192 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 147
24191 Design and Implementation of Machine Learning Model for Short-Term Energy Forecasting in Smart Home Management System

Authors: R. Ramesh, K. K. Shivaraman

Abstract:

The main aim of this paper is to handle the energy requirement in an efficient manner by merging the advanced digital communication and control technologies for smart grid applications. In order to reduce user home load during peak load hours, utility applies several incentives such as real-time pricing, time of use, demand response for residential customer through smart meter. However, this method provides inconvenience in the sense that user needs to respond manually to prices that vary in real time. To overcome these inconvenience, this paper proposes a convolutional neural network (CNN) with k-means clustering machine learning model which have ability to forecast energy requirement in short term, i.e., hour of the day or day of the week. By integrating our proposed technique with home energy management based on Bluetooth low energy provides predicted value to user for scheduling appliance in advanced. This paper describes detail about CNN configuration and k-means clustering algorithm for short-term energy forecasting.

Keywords: convolutional neural network, fuzzy logic, k-means clustering approach, smart home energy management

Procedia PDF Downloads 281
24190 Optical Flow Direction Determination for Railway Crossing Occupancy Monitoring

Authors: Zdenek Silar, Martin Dobrovolny

Abstract:

This article deals with the obstacle detection on a railway crossing (clearance detection). Detection is based on the optical flow estimation and classification of the flow vectors by K-means clustering algorithm. For classification of passing vehicles is used optical flow direction determination. The optical flow estimation is based on a modified Lucas-Kanade method.

Keywords: background estimation, direction of optical flow, K-means clustering, objects detection, railway crossing monitoring, velocity vectors

Procedia PDF Downloads 485
24189 Wavelet Based Residual Method of Detecting GSM Signal Strength Fading

Authors: Danladi Ali, Onah Festus Iloabuchi

Abstract:

In this paper, GSM signal strength was measured in order to detect the type of the signal fading phenomenon using one-dimensional multilevel wavelet residual method and neural network clustering to determine the average GSM signal strength received in the study area. The wavelet residual method predicted that the GSM signal experienced slow fading and attenuated with MSE of 3.875dB. The neural network clustering revealed that mostly -75dB, -85dB and -95dB were received. This means that the signal strength received in the study is a weak signal.

Keywords: one-dimensional multilevel wavelets, path loss, GSM signal strength, propagation, urban environment

Procedia PDF Downloads 314
24188 The Phylogenetic Investigation of Candidate Genes Related to Type II Diabetes in Man and Other Species

Authors: Srijoni Banerjee

Abstract:

Sequences of some of the candidate genes (e.g., CPE, CDKAL1, GCKR, HSD11B1, IGF2BP2, IRS1, LPIN1, PKLR, TNF, PPARG) implicated in some of the complex disease, e.g. Type II diabetes in man has been compared with other species to investigate phylogenetic affinity. Based on mRNA sequence of these genes of 7 to 8 species, using bioinformatics tools Mega 5, Bioedit, Clustal W, distance matrix was obtained. Phylogenetic trees were obtained by NJ and UPGMA clustering methods. The results of the phylogenetic analyses show that of the species compared: Xenopus l., Danio r., Macaca m., Homo sapiens s., Rattus n., Mus m. and Gallus g., Bos taurus, both NJ and UPGMA clustering show close affinity between clustering of Homo sapiens s. (Man) with Rattus n. (Rat), Mus m. species for the candidate genes, except in case of Lipin1 gene. The results support the functional similarity of these genes in physiological and biochemical process involving man and mouse/rat. Therefore, in understanding the complex etiology and treatment of the complex disease mouse/rate model is the best laboratory choice for experimentation.

Keywords: phylogeny, candidate gene of type-2 diabetes, CPE, CDKAL1, GCKR, HSD11B1, IGF2BP2, IRS1, LPIN1, PKLR, TNF, PPARG

Procedia PDF Downloads 287
24187 Predicting Open Chromatin Regions in Cell-Free DNA Whole Genome Sequencing Data by Correlation Clustering  

Authors: Fahimeh Palizban, Farshad Noravesh, Amir Hossein Saeidian, Mahya Mehrmohamadi

Abstract:

In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the blood and contribute to a pool of circulating fragments called cell-free DNA. Accordingly, identifying the tissue origin of these DNA fragments from the plasma can result in more accurate and fast disease diagnosis and precise treatment protocols. Open chromatin regions are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. There have been several studies in the area of cancer liquid biopsy that integrate distinct genomic and epigenomic features for early cancer detection along with tissue of origin detection. However, multimodal analysis requires several types of experiments to cover the genomic and epigenomic aspects of a single sample, which will lead to a huge amount of cost and time. To overcome these limitations, the idea of predicting OCRs from WGS is of particular importance. In this regard, we proposed a computational approach to target the prediction of open chromatin regions as an important epigenetic feature from cell-free DNA whole genome sequence data. To fulfill this objective, local sequencing depth will be fed to our proposed algorithm and the prediction of the most probable open chromatin regions from whole genome sequencing data can be carried out. Our method integrates the signal processing method with sequencing depth data and includes count normalization, Discrete Fourie Transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of the clustering (open chromatin region+, open chromatin region-) with previously validated open chromatin regions related to human blood samples of the ATAC-DB database. The percentage of overlap between predicted open chromatin regions and the experimentally validated regions obtained by ATAC-seq in ATAC-DB is greater than 67%, which indicates meaningful prediction. As it is evident, OCRs are mostly located in the transcription start sites (TSS) of the genes. In this regard, we compared the concordance between the predicted OCRs and the human genes TSS regions obtained from refTSS and it showed proper accordance around 52.04% and ~78% with all and the housekeeping genes, respectively. Accurately detecting open chromatin regions from plasma cell-free DNA-seq data is a very challenging computational problem due to the existence of several confounding factors, such as technical and biological variations. Although this approach is in its infancy, there has already been an attempt to apply it, which leads to a tool named OCRDetector with some restrictions like the need for highly depth cfDNA WGS data, prior information about OCRs distribution, and considering multiple features. However, we implemented a graph signal clustering based on a single depth feature in an unsupervised learning manner that resulted in faster performance and decent accuracy. Overall, we tried to investigate the epigenomic pattern of a cell-free DNA sample from a new computational perspective that can be used along with other tools to investigate genetic and epigenetic aspects of a single whole genome sequencing data for efficient liquid biopsy-related analysis.

Keywords: open chromatin regions, cancer, cell-free DNA, epigenomics, graph signal processing, correlation clustering

Procedia PDF Downloads 110
24186 An Approach for Pattern Recognition and Prediction of Information Diffusion Model on Twitter

Authors: Amartya Hatua, Trung Nguyen, Andrew Sung

Abstract:

In this paper, we study the information diffusion process on Twitter as a multivariate time series problem. Our model concerns three measures (volume, network influence, and sentiment of tweets) based on 10 features, and we collected 27 million tweets to build our information diffusion time series dataset for analysis. Then, different time series clustering techniques with Dynamic Time Warping (DTW) distance were used to identify different patterns of information diffusion. Finally, we built the information diffusion prediction models for new hashtags which comprise two phrases: The first phrase is recognizing the pattern using k-NN with DTW distance; the second phrase is building the forecasting model using the traditional Autoregressive Integrated Moving Average (ARIMA) model and the non-linear recurrent neural network of Long Short-Term Memory (LSTM). Preliminary results of performance evaluation between different forecasting models show that LSTM with clustering information notably outperforms other models. Therefore, our approach can be applied in real-world applications to analyze and predict the information diffusion characteristics of selected topics or memes (hashtags) in Twitter.

Keywords: ARIMA, DTW, information diffusion, LSTM, RNN, time series clustering, time series forecasting, Twitter

Procedia PDF Downloads 361
24185 Clustering-Based Threshold Model for Condition Rating of Concrete Bridge Decks

Authors: M. Alsharqawi, T. Zayed, S. Abu Dabous

Abstract:

To ensure safety and serviceability of bridge infrastructure, accurate condition assessment and rating methods are needed to provide basis for bridge Maintenance, Repair and Replacement (MRR) decisions. In North America, the common practices to assess condition of bridges are through visual inspection. These practices are limited to detect surface defects and external flaws. Further, the thresholds that define the severity of bridge deterioration are selected arbitrarily. The current research discusses the main deteriorations and defects identified during visual inspection and Non-Destructive Evaluation (NDE). NDE techniques are becoming popular in augmenting the visual examination during inspection to detect subsurface defects. Quality inspection data and accurate condition assessment and rating are the basis for determining appropriate MRR decisions. Thus, in this paper, a novel method for bridge condition assessment using the Quality Function Deployment (QFD) theory is utilized. The QFD model is designed to provide an integrated condition by evaluating both the surface and subsurface defects for concrete bridges. Moreover, an integrated condition rating index with four thresholds is developed based on the QFD condition assessment model and using K-means clustering technique. Twenty case studies are analyzed by applying the QFD model and implementing the developed rating index. The results from the analyzed case studies show that the proposed threshold model produces robust MRR recommendations consistent with decisions and recommendations made by bridge managers on these projects. The proposed method is expected to advance the state of the art of bridges condition assessment and rating.

Keywords: concrete bridge decks, condition assessment and rating, quality function deployment, k-means clustering technique

Procedia PDF Downloads 195
24184 Improving Cell Type Identification of Single Cell Data by Iterative Graph-Based Noise Filtering

Authors: Annika Stechemesser, Rachel Pounds, Emma Lucas, Chris Dawson, Julia Lipecki, Pavle Vrljicak, Jan Brosens, Sean Kehoe, Jason Yap, Lawrence Young, Sascha Ott

Abstract:

Advances in technology make it now possible to retrieve the genetic information of thousands of single cancerous cells. One of the key challenges in single cell analysis of cancerous tissue is to determine the number of different cell types and their characteristic genes within the sample to better understand the tumors and their reaction to different treatments. For this analysis to be possible, it is crucial to filter out background noise as it can severely blur the downstream analysis and give misleading results. In-depth analysis of the state-of-the-art filtering methods for single cell data showed that they do, in some cases, not separate noisy and normal cells sufficiently. We introduced an algorithm that filters and clusters single cell data simultaneously without relying on certain genes or thresholds chosen by eye. It detects communities in a Shared Nearest Neighbor similarity network, which captures the similarities and dissimilarities of the cells by optimizing the modularity and then identifies and removes vertices with a weak clustering belonging. This strategy is based on the fact that noisy data instances are very likely to be similar to true cell types but do not match any of these wells. Once the clustering is complete, we apply a set of evaluation metrics on the cluster level and accept or reject clusters based on the outcome. The performance of our algorithm was tested on three datasets and led to convincing results. We were able to replicate the results on a Peripheral Blood Mononuclear Cells dataset. Furthermore, we applied the algorithm to two samples of ovarian cancer from the same patient before and after chemotherapy. Comparing the standard approach to our algorithm, we found a hidden cell type in the ovarian postchemotherapy data with interesting marker genes that are potentially relevant for medical research.

Keywords: cancer research, graph theory, machine learning, single cell analysis

Procedia PDF Downloads 78
24183 Performance Evaluation of Clustered Routing Protocols for Heterogeneous Wireless Sensor Networks

Authors: Awatef Chniguir, Tarek Farah, Zouhair Ben Jemaa, Safya Belguith

Abstract:

Optimal routing allows minimizing energy consumption in wireless sensor networks (WSN). Clustering has proven its effectiveness in organizing WSN by reducing channel contention and packet collision and enhancing network throughput under heavy load. Therefore, nowadays, with the emergence of the Internet of Things, heterogeneity is essential. Stable election protocol (SEP) that has increased the network stability period and lifetime is the first clustering protocol for heterogeneous WSN. SEP and its descendants, namely SEP, Threshold Sensitive SEP (TSEP), Enhanced TSEP (ETSSEP) and Current Energy Allotted TSEP (CEATSEP), were studied. These algorithms’ performance was evaluated based on different metrics, especially first node death (FND), to compare their stability. Simulations were conducted on the MATLAB tool considering two scenarios: The first one demonstrates the fraction variation of advanced nodes by setting the number of total nodes. The second considers the interpretation of the number of nodes while keeping the number of advanced nodes permanent. CEATSEP outperforms its antecedents by increasing stability and, at the same time, keeping a low throughput. It also operates very well in a large-scale network. Consequently, CEATSEP has a useful lifespan and energy efficiency compared to the other routing protocol for heterogeneous WSN.

Keywords: clustering, heterogeneous, stability, scalability, IoT, WSN

Procedia PDF Downloads 99
24182 Improved Qualitative Modeling of the Magnetization Curve B(H) of the Ferromagnetic Materials for a Transformer Used in the Power Supply for Magnetron

Authors: M. Bassoui, M. Ferfra, M. Chrayagne

Abstract:

This paper presents a qualitative modeling for the nonlinear B-H curve of the saturable magnetic materials for a transformer with shunts used in the power supply for the magnetron. This power supply is composed of a single phase leakage flux transformer supplying a cell composed of a capacitor and a diode, which double the voltage and stabilize the current, and a single magnetron at the output of the cell. A procedure consisting of a fuzzy clustering method and a rule processing algorithm is then employed for processing the constructed fuzzy modeling rules to extract the qualitative properties of the curve.

Keywords: B(H) curve, fuzzy clustering, magnetron, power supply

Procedia PDF Downloads 205
24181 Neural Network Approach For Clustering Host Community: Based on Perceptions Toward Tourism, Their Satisfaction Level and Demographic Attributes in Iran (Lahijan)

Authors: Nasibeh Mohammadpour, Ali Rajabzadeh, Adel Azar, Hamid Zargham Borujeni,

Abstract:

Generally, various industries development depends on their stakeholders and beneficiaries supports. One of the most important stakeholders in tourism industry ( which has become one of the most important lucrative and employment-generating activities at the international level these days) are host communities in tourist destination which are affected and effect on this industry development. Recognizing host community and its segmentations can be important to get their support for future decisions and policy making. In order to identify these segments, in this study, clustering of the residents has been done by using some tools that are designed to encounter human complexities and have ability to model and generalize complex systems without any needs for the initial clusters’ seeds like classic methods. Neural networks can help to meet these expectations. The research have been planned to design neural networks-based mathematical model for clustering the host community effectively according to multi criteria, and identifies differences among segments. In order to achieve this goal, the residents’ segmentation has been done by demographic characteristics, their attitude towards the tourism development, the level of satisfaction and the type of their support in this field. The applied method is self-organized neural networks and the results have compared with K-means. As the results show, the use of Self- Organized Map (SOM) method provides much better results by considering the Cophenetic correlation and between clusters variance coefficients. Based on these criteria, the host community is divided into five sections with unique and distinctive features, which are in the best condition (in comparison other modes) according to Cophenetic correlation coefficient of 0.8769 and between clusters variance of 0.1412.

Keywords: Artificial Nural Network, Clustering , Resident, SOM, Tourism

Procedia PDF Downloads 140
24180 The Optimum Mel-Frequency Cepstral Coefficients (MFCCs) Contribution to Iranian Traditional Music Genre Classification by Instrumental Features

Authors: M. Abbasi Layegh, S. Haghipour, K. Athari, R. Khosravi, M. Tafkikialamdari

Abstract:

An approach to find the optimum mel-frequency cepstral coefficients (MFCCs) for the Radif of Mirzâ Ábdollâh, which is the principal emblem and the heart of Persian music, performed by most famous Iranian masters on two Iranian stringed instruments ‘Tar’ and ‘Setar’ is proposed. While investigating the variance of MFCC for each record in themusic database of 1500 gushe of the repertoire belonging to 12 modal systems (dastgâh and âvâz), we have applied the Fuzzy C-Mean clustering algorithm on each of the 12 coefficient and different combinations of those coefficients. We have applied the same experiment while increasing the number of coefficients but the clustering accuracy remained the same. Therefore, we can conclude that the first 7 MFCCs (V-7MFCC) are enough for classification of The Radif of Mirzâ Ábdollâh. Classical machine learning algorithms such as MLP neural networks, K-Nearest Neighbors (KNN), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) and Support Vector Machine (SVM) have been employed. Finally, it can be realized that SVM shows a better performance in this study.

Keywords: radif of Mirzâ Ábdollâh, Gushe, mel frequency cepstral coefficients, fuzzy c-mean clustering algorithm, k-nearest neighbors (KNN), gaussian mixture model (GMM), hidden markov model (HMM), support vector machine (SVM)

Procedia PDF Downloads 414