Search results for: correlation clustering
4390 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering
Authors: Emiel Caron
Abstract:
Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics
Procedia PDF Downloads 1944389 Automatic Detection of Traffic Stop Locations Using GPS Data
Authors: Areej Salaymeh, Loren Schwiebert, Stephen Remias, Jonathan Waddell
Abstract:
Extracting information from new data sources has emerged as a crucial task in many traffic planning processes, such as identifying traffic patterns, route planning, traffic forecasting, and locating infrastructure improvements. Given the advanced technologies used to collect Global Positioning System (GPS) data from dedicated GPS devices, GPS equipped phones, and navigation tools, intelligent data analysis methodologies are necessary to mine this raw data. In this research, an automatic detection framework is proposed to help identify and classify the locations of stopped GPS waypoints into two main categories: signalized intersections or highway congestion. The Delaunay triangulation is used to perform this assessment in the clustering phase. While most of the existing clustering algorithms need assumptions about the data distribution, the effectiveness of the Delaunay triangulation relies on triangulating geographical data points without such assumptions. Our proposed method starts by cleaning noise from the data and normalizing it. Next, the framework will identify stoppage points by calculating the traveled distance. The last step is to use clustering to form groups of waypoints for signalized traffic and highway congestion. Next, a binary classifier was applied to find distinguish highway congestion from signalized stop points. The binary classifier uses the length of the cluster to find congestion. The proposed framework shows high accuracy for identifying the stop positions and congestion points in around 99.2% of trials. We show that it is possible, using limited GPS data, to distinguish with high accuracy.Keywords: Delaunay triangulation, clustering, intelligent transportation systems, GPS data
Procedia PDF Downloads 2764388 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score
Authors: Jianfeng Hu
Abstract:
Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes
Procedia PDF Downloads 2854387 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting
Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi
Abstract:
An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power
Procedia PDF Downloads 4124386 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm
Authors: Ioannis P. Panapakidis, Marios N. Moschakis
Abstract:
With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.Keywords: clustering, load profiling, load modeling, machine learning, energy efficiency and quality
Procedia PDF Downloads 1654385 Empirical Study of Partitions Similarity Measures
Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd
Abstract:
This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.
Procedia PDF Downloads 2044384 Effect of Correlation of Random Variables on Structural Reliability Index
Authors: Agnieszka Dudzik
Abstract:
The problem of correlation between random variables in the structural reliability analysis has been extensively discussed in literature on the subject. The cases taken under consideration were usually related to correlation between random variables from one side of ultimate limit state: correlation between particular loads applied on structure or correlation between resistance of particular members of a structure as a system. It has been proved that positive correlation between these random variables reduces the reliability of structure and increases the probability of failure. In the paper, the problem of correlation between random variables from both side of the limit state equation will be taken under consideration. The simplest case where these random variables are of the normal distributions will be concerned. The case when a degree of that correlation is described by the covariance or the coefficient of correlation will be used. Special attention will be paid on questions: how much that correlation changes the reliability level and can it be ignored. In reliability analysis will be used well-known methods for assessment of the failure probability: based on the Hasofer-Lind reliability index and Monte Carlo method adapted to the problem of correlation. The main purpose of this work will be a presentation how correlation of random variables influence on reliability index of steel bar structures. Structural design parameters will be defined as deterministic values and random variables. The latter will be correlated. The criterion of structural failure will be expressed by limit functions related to the ultimate and serviceability limit state. In the description of random variables will be used only for the normal distribution. Sensitivity of reliability index to the random variables will be defined. If the reliability index sensitivity due to the random variable X will be low when compared with other variables, it can be stated that the impact of this variable on failure probability is small. Therefore, in successive computations, it can be treated as a deterministic parameter. Sensitivity analysis leads to simplify the description of the mathematical model, determine the new limit functions and values of the Hasofer-Lind reliability index. In the examples, the NUMPRESS software will be used in the reliability analysis.Keywords: correlation of random variables, reliability index, sensitivity of reliability index, steel structure
Procedia PDF Downloads 2384383 Data Clustering in Wireless Sensor Network Implemented on Self-Organization Feature Map (SOFM) Neural Network
Authors: Krishan Kumar, Mohit Mittal, Pramod Kumar
Abstract:
Wireless sensor network is one of the most promising communication networks for monitoring remote environmental areas. In this network, all the sensor nodes are communicated with each other via radio signals. The sensor nodes have capability of sensing, data storage and processing. The sensor nodes collect the information through neighboring nodes to particular node. The data collection and processing is done by data aggregation techniques. For the data aggregation in sensor network, clustering technique is implemented in the sensor network by implementing self-organizing feature map (SOFM) neural network. Some of the sensor nodes are selected as cluster head nodes. The information aggregated to cluster head nodes from non-cluster head nodes and then this information is transferred to base station (or sink nodes). The aim of this paper is to manage the huge amount of data with the help of SOM neural network. Clustered data is selected to transfer to base station instead of whole information aggregated at cluster head nodes. This reduces the battery consumption over the huge data management. The network lifetime is enhanced at a greater extent.Keywords: artificial neural network, data clustering, self organization feature map, wireless sensor network
Procedia PDF Downloads 5184382 Design and Implementation of Machine Learning Model for Short-Term Energy Forecasting in Smart Home Management System
Authors: R. Ramesh, K. K. Shivaraman
Abstract:
The main aim of this paper is to handle the energy requirement in an efficient manner by merging the advanced digital communication and control technologies for smart grid applications. In order to reduce user home load during peak load hours, utility applies several incentives such as real-time pricing, time of use, demand response for residential customer through smart meter. However, this method provides inconvenience in the sense that user needs to respond manually to prices that vary in real time. To overcome these inconvenience, this paper proposes a convolutional neural network (CNN) with k-means clustering machine learning model which have ability to forecast energy requirement in short term, i.e., hour of the day or day of the week. By integrating our proposed technique with home energy management based on Bluetooth low energy provides predicted value to user for scheduling appliance in advanced. This paper describes detail about CNN configuration and k-means clustering algorithm for short-term energy forecasting.Keywords: convolutional neural network, fuzzy logic, k-means clustering approach, smart home energy management
Procedia PDF Downloads 3054381 Optical Flow Direction Determination for Railway Crossing Occupancy Monitoring
Authors: Zdenek Silar, Martin Dobrovolny
Abstract:
This article deals with the obstacle detection on a railway crossing (clearance detection). Detection is based on the optical flow estimation and classification of the flow vectors by K-means clustering algorithm. For classification of passing vehicles is used optical flow direction determination. The optical flow estimation is based on a modified Lucas-Kanade method.Keywords: background estimation, direction of optical flow, K-means clustering, objects detection, railway crossing monitoring, velocity vectors
Procedia PDF Downloads 5194380 Wavelet Based Residual Method of Detecting GSM Signal Strength Fading
Authors: Danladi Ali, Onah Festus Iloabuchi
Abstract:
In this paper, GSM signal strength was measured in order to detect the type of the signal fading phenomenon using one-dimensional multilevel wavelet residual method and neural network clustering to determine the average GSM signal strength received in the study area. The wavelet residual method predicted that the GSM signal experienced slow fading and attenuated with MSE of 3.875dB. The neural network clustering revealed that mostly -75dB, -85dB and -95dB were received. This means that the signal strength received in the study is a weak signal.Keywords: one-dimensional multilevel wavelets, path loss, GSM signal strength, propagation, urban environment
Procedia PDF Downloads 3384379 Statistical Correlation between Ply Mechanical Properties of Composite and Its Effect on Structure Reliability
Authors: S. Zhang, L. Zhang, X. Chen
Abstract:
Due to the large uncertainty on the mechanical properties of FRP (fibre reinforced plastic), the reliability evaluation of FRP structures are currently receiving much attention in industry. However, possible statistical correlation between ply mechanical properties has been so far overlooked, and they are mostly assumed to be independent random variables. In this study, the statistical correlation between ply mechanical properties of uni-directional and plain weave composite is firstly analyzed by a combination of Monte-Carlo simulation and finite element modeling of the FRP unit cell. Large linear correlation coefficients between the in-plane mechanical properties are observed, and the correlation coefficients are heavily dependent on the uncertainty of the fibre volume ratio. It is also observed that the correlation coefficients related to Poisson’s ratio are negative while others are positive. To experimentally achieve the statistical correlation coefficients between in-plane mechanical properties of FRP, all concerned in-plane mechanical properties of the same specimen needs to be known. In-plane shear modulus of FRP is experimentally derived by the approach suggested in the ASTM standard D5379M. Tensile tests are conducted using the same specimens used for the shear test, and due to non-uniform tensile deformation a modification factor is derived by a finite element modeling. Digital image correlation is adopted to characterize the specimen non-uniform deformation. The preliminary experimental results show a good agreement with the numerical analysis on the statistical correlation. Then, failure probability of laminate plates is calculated in cases considering and not considering the statistical correlation, using the Monte-Carlo and Markov Chain Monte-Carlo methods, respectively. The results highlight the importance of accounting for the statistical correlation between ply mechanical properties to achieve accurate failure probability of laminate plates. Furthermore, it is found that for the multi-layer laminate plate, the statistical correlation between the ply elastic properties significantly affects the laminate reliability while the effect of statistical correlation between the ply strength is minimal.Keywords: failure probability, FRP, reliability, statistical correlation
Procedia PDF Downloads 1624378 The Phylogenetic Investigation of Candidate Genes Related to Type II Diabetes in Man and Other Species
Authors: Srijoni Banerjee
Abstract:
Sequences of some of the candidate genes (e.g., CPE, CDKAL1, GCKR, HSD11B1, IGF2BP2, IRS1, LPIN1, PKLR, TNF, PPARG) implicated in some of the complex disease, e.g. Type II diabetes in man has been compared with other species to investigate phylogenetic affinity. Based on mRNA sequence of these genes of 7 to 8 species, using bioinformatics tools Mega 5, Bioedit, Clustal W, distance matrix was obtained. Phylogenetic trees were obtained by NJ and UPGMA clustering methods. The results of the phylogenetic analyses show that of the species compared: Xenopus l., Danio r., Macaca m., Homo sapiens s., Rattus n., Mus m. and Gallus g., Bos taurus, both NJ and UPGMA clustering show close affinity between clustering of Homo sapiens s. (Man) with Rattus n. (Rat), Mus m. species for the candidate genes, except in case of Lipin1 gene. The results support the functional similarity of these genes in physiological and biochemical process involving man and mouse/rat. Therefore, in understanding the complex etiology and treatment of the complex disease mouse/rate model is the best laboratory choice for experimentation.Keywords: phylogeny, candidate gene of type-2 diabetes, CPE, CDKAL1, GCKR, HSD11B1, IGF2BP2, IRS1, LPIN1, PKLR, TNF, PPARG
Procedia PDF Downloads 3214377 Energy Efficient Clustering with Adaptive Particle Swarm Optimization
Authors: KumarShashvat, ArshpreetKaur, RajeshKumar, Raman Chadha
Abstract:
Wireless sensor networks have principal characteristic of having restricted energy and with limitation that energy of the nodes cannot be replenished. To increase the lifetime in this scenario WSN route for data transmission is opted such that utilization of energy along the selected route is negligible. For this energy efficient network, dandy infrastructure is needed because it impinges the network lifespan. Clustering is a technique in which nodes are grouped into disjoints and non–overlapping sets. In this technique data is collected at the cluster head. In this paper, Adaptive-PSO algorithm is proposed which forms energy aware clusters by minimizing the cost of locating the cluster head. The main concern is of the suitability of the swarms by adjusting the learning parameters of PSO. Particle Swarm Optimization converges quickly at the beginning stage of the search but during the course of time, it becomes stable and may be trapped in local optima. In suggested network model swarms are given the intelligence of the spiders which makes them capable enough to avoid earlier convergence and also help them to escape from the local optima. Comparison analysis with traditional PSO shows that new algorithm considerably enhances the performance where multi-dimensional functions are taken into consideration.Keywords: Particle Swarm Optimization, adaptive – PSO, comparison between PSO and A-PSO, energy efficient clustering
Procedia PDF Downloads 2494376 An Approach for Pattern Recognition and Prediction of Information Diffusion Model on Twitter
Authors: Amartya Hatua, Trung Nguyen, Andrew Sung
Abstract:
In this paper, we study the information diffusion process on Twitter as a multivariate time series problem. Our model concerns three measures (volume, network influence, and sentiment of tweets) based on 10 features, and we collected 27 million tweets to build our information diffusion time series dataset for analysis. Then, different time series clustering techniques with Dynamic Time Warping (DTW) distance were used to identify different patterns of information diffusion. Finally, we built the information diffusion prediction models for new hashtags which comprise two phrases: The first phrase is recognizing the pattern using k-NN with DTW distance; the second phrase is building the forecasting model using the traditional Autoregressive Integrated Moving Average (ARIMA) model and the non-linear recurrent neural network of Long Short-Term Memory (LSTM). Preliminary results of performance evaluation between different forecasting models show that LSTM with clustering information notably outperforms other models. Therefore, our approach can be applied in real-world applications to analyze and predict the information diffusion characteristics of selected topics or memes (hashtags) in Twitter.Keywords: ARIMA, DTW, information diffusion, LSTM, RNN, time series clustering, time series forecasting, Twitter
Procedia PDF Downloads 3924375 Combining the Dynamic Conditional Correlation and Range-GARCH Models to Improve Covariance Forecasts
Authors: Piotr Fiszeder, Marcin Fałdziński, Peter Molnár
Abstract:
The dynamic conditional correlation model of Engle (2002) is one of the most popular multivariate volatility models. However, this model is based solely on closing prices. It has been documented in the literature that the high and low price of the day can be used in an efficient volatility estimation. We, therefore, suggest a model which incorporates high and low prices into the dynamic conditional correlation framework. Empirical evaluation of this model is conducted on three datasets: currencies, stocks, and commodity exchange-traded funds. The utilisation of realized variances and covariances as proxies for true variances and covariances allows us to reach a strong conclusion that our model outperforms not only the standard dynamic conditional correlation model but also a competing range-based dynamic conditional correlation model.Keywords: volatility, DCC model, high and low prices, range-based models, covariance forecasting
Procedia PDF Downloads 1844374 Hierarchical Checkpoint Protocol in Data Grids
Authors: Rahma Souli-Jbali, Minyar Sassi Hidri, Rahma Ben Ayed
Abstract:
Grid of computing nodes has emerged as a representative means of connecting distributed computers or resources scattered all over the world for the purpose of computing and distributed storage. Since fault tolerance becomes complex due to the availability of resources in decentralized grid environment, it can be used in connection with replication in data grids. The objective of our work is to present fault tolerance in data grids with data replication-driven model based on clustering. The performance of the protocol is evaluated with Omnet++ simulator. The computational results show the efficiency of our protocol in terms of recovery time and the number of process in rollbacks.Keywords: data grids, fault tolerance, clustering, chandy-lamport
Procedia PDF Downloads 3424373 An Observation of the Information Technology Research and Development Based on Article Data Mining: A Survey Study on Science Direct
Authors: Muhammet Dursun Kaya, Hasan Asil
Abstract:
One of the most important factors of research and development is the deep insight into the evolutions of scientific development. The state-of-the-art tools and instruments can considerably assist the researchers, and many of the world organizations have become aware of the advantages of data mining for the acquisition of the knowledge required for the unstructured data. This paper was an attempt to review the articles on the information technology published in the past five years with the aid of data mining. A clustering approach was used to study these articles, and the research results revealed that three topics, namely health, innovation, and information systems, have captured the special attention of the researchers.Keywords: information technology, data mining, scientific development, clustering
Procedia PDF Downloads 2784372 Performance Evaluation of Clustered Routing Protocols for Heterogeneous Wireless Sensor Networks
Authors: Awatef Chniguir, Tarek Farah, Zouhair Ben Jemaa, Safya Belguith
Abstract:
Optimal routing allows minimizing energy consumption in wireless sensor networks (WSN). Clustering has proven its effectiveness in organizing WSN by reducing channel contention and packet collision and enhancing network throughput under heavy load. Therefore, nowadays, with the emergence of the Internet of Things, heterogeneity is essential. Stable election protocol (SEP) that has increased the network stability period and lifetime is the first clustering protocol for heterogeneous WSN. SEP and its descendants, namely SEP, Threshold Sensitive SEP (TSEP), Enhanced TSEP (ETSSEP) and Current Energy Allotted TSEP (CEATSEP), were studied. These algorithms’ performance was evaluated based on different metrics, especially first node death (FND), to compare their stability. Simulations were conducted on the MATLAB tool considering two scenarios: The first one demonstrates the fraction variation of advanced nodes by setting the number of total nodes. The second considers the interpretation of the number of nodes while keeping the number of advanced nodes permanent. CEATSEP outperforms its antecedents by increasing stability and, at the same time, keeping a low throughput. It also operates very well in a large-scale network. Consequently, CEATSEP has a useful lifespan and energy efficiency compared to the other routing protocol for heterogeneous WSN.Keywords: clustering, heterogeneous, stability, scalability, IoT, WSN
Procedia PDF Downloads 1334371 The Correlation between Air Pollution and Tourette Syndrome
Authors: Mengnan Sun
Abstract:
It is unclear about the association between air pollution and Tourette Syndrome (TS), although people have suspected that air pollution might trigger TS. TS is a type of neural system disease usually found among children. The number of TS patients has significantly increased in recent decades, suggesting an importance and urgency to examine the possible triggers or conditions that are associated with TS. In this study, the correlation between air pollution and three allergic diseases---asthma, allergic conjunctivitis (AC), and allergic rhinitis (AR)---is examined. Then, a correlation between these allergic diseases and TS is proved. In this way, this study establishes a positive correlation between air pollution and TS. Measures the public can take to help TS patients are also analyzed at the end of this article. The article hopes to raise people’s awareness to reduce air pollution for the good of TS patients or people with other disorders that are associated with air pollution.Keywords: air pollution, allergic diseases, climate change, Tourette Syndrome
Procedia PDF Downloads 644370 Improved Qualitative Modeling of the Magnetization Curve B(H) of the Ferromagnetic Materials for a Transformer Used in the Power Supply for Magnetron
Authors: M. Bassoui, M. Ferfra, M. Chrayagne
Abstract:
This paper presents a qualitative modeling for the nonlinear B-H curve of the saturable magnetic materials for a transformer with shunts used in the power supply for the magnetron. This power supply is composed of a single phase leakage flux transformer supplying a cell composed of a capacitor and a diode, which double the voltage and stabilize the current, and a single magnetron at the output of the cell. A procedure consisting of a fuzzy clustering method and a rule processing algorithm is then employed for processing the constructed fuzzy modeling rules to extract the qualitative properties of the curve.Keywords: B(H) curve, fuzzy clustering, magnetron, power supply
Procedia PDF Downloads 2414369 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches
Authors: Aya Salama
Abstract:
Digital Twin is an emerging research topic that attracted researchers in the last decade. It is used in many fields, such as smart manufacturing and smart healthcare because it saves time and money. It is usually related to other technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, Human digital twin (HDT), in specific, is still a novel idea that still needs to prove its feasibility. HDT expands the idea of Digital Twin to human beings, which are living beings and different from the inanimate physical entities. The goal of this research was to create a Human digital twin that is responsible for real-time human replies automation by simulating human behavior. For this reason, clustering, supervised classification, topic extraction, and sentiment analysis were studied in this paper. The feasibility of the HDT for personal replies generation on social messaging applications was proved in this work. The overall accuracy of the proposed approach in this paper was 63% which is a very promising result that can open the way for researchers to expand the idea of HDT. This was achieved by using Random Forest for clustering the question data base and matching new questions. K-nearest neighbor was also applied for sentiment analysis.Keywords: human digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification, clustering
Procedia PDF Downloads 894368 A Comparison of South East Asian Face Emotion Classification based on Optimized Ellipse Data Using Clustering Technique
Authors: M. Karthigayan, M. Rizon, Sazali Yaacob, R. Nagarajan, M. Muthukumaran, Thinaharan Ramachandran, Sargunam Thirugnanam
Abstract:
In this paper, using a set of irregular and regular ellipse fitting equations using Genetic algorithm (GA) are applied to the lip and eye features to classify the human emotions. Two South East Asian (SEA) faces are considered in this work for the emotion classification. There are six emotions and one neutral are considered as the output. Each subject shows unique characteristic of the lip and eye features for various emotions. GA is adopted to optimize irregular ellipse characteristics of the lip and eye features in each emotion. That is, the top portion of lip configuration is a part of one ellipse and the bottom of different ellipse. Two ellipse based fitness equations are proposed for the lip configuration and relevant parameters that define the emotions are listed. The GA method has achieved reasonably successful classification of emotion. In some emotions classification, optimized data values of one emotion are messed or overlapped to other emotion ranges. In order to overcome the overlapping problem between the emotion optimized values and at the same time to improve the classification, a fuzzy clustering method (FCM) of approach has been implemented to offer better classification. The GA-FCM approach offers a reasonably good classification within the ranges of clusters and it had been proven by applying to two SEA subjects and have improved the classification rate.Keywords: ellipse fitness function, genetic algorithm, emotion recognition, fuzzy clustering
Procedia PDF Downloads 5514367 The Optimum Mel-Frequency Cepstral Coefficients (MFCCs) Contribution to Iranian Traditional Music Genre Classification by Instrumental Features
Authors: M. Abbasi Layegh, S. Haghipour, K. Athari, R. Khosravi, M. Tafkikialamdari
Abstract:
An approach to find the optimum mel-frequency cepstral coefficients (MFCCs) for the Radif of Mirzâ Ábdollâh, which is the principal emblem and the heart of Persian music, performed by most famous Iranian masters on two Iranian stringed instruments ‘Tar’ and ‘Setar’ is proposed. While investigating the variance of MFCC for each record in themusic database of 1500 gushe of the repertoire belonging to 12 modal systems (dastgâh and âvâz), we have applied the Fuzzy C-Mean clustering algorithm on each of the 12 coefficient and different combinations of those coefficients. We have applied the same experiment while increasing the number of coefficients but the clustering accuracy remained the same. Therefore, we can conclude that the first 7 MFCCs (V-7MFCC) are enough for classification of The Radif of Mirzâ Ábdollâh. Classical machine learning algorithms such as MLP neural networks, K-Nearest Neighbors (KNN), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) and Support Vector Machine (SVM) have been employed. Finally, it can be realized that SVM shows a better performance in this study.Keywords: radif of Mirzâ Ábdollâh, Gushe, mel frequency cepstral coefficients, fuzzy c-mean clustering algorithm, k-nearest neighbors (KNN), gaussian mixture model (GMM), hidden markov model (HMM), support vector machine (SVM)
Procedia PDF Downloads 4484366 Multi-Level Clustering Based Congestion Control Protocol for Cyber Physical Systems
Authors: Manpreet Kaur, Amita Rani, Sanjay Kumar
Abstract:
The Internet of Things (IoT), a cyber-physical paradigm, allows a large number of devices to connect and send the sensory data in the network simultaneously. This tremendous amount of data generated leads to very high network load consequently resulting in network congestion. It further amounts to frequent loss of useful information and depletion of significant amount of nodes’ energy. Therefore, there is a need to control congestion in IoT so as to prolong network lifetime and improve the quality of service (QoS). Hence, we propose a two-level clustering based routing algorithm considering congestion score and packet priority metrics that focus on minimizing the network congestion. In the proposed Priority based Congestion Control (PBCC) protocol the sensor nodes in IoT network form clusters that reduces the amount of traffic and the nodes are prioritized to emphasize important data. Simultaneously, a congestion score determines the occurrence of congestion at a particular node. The proposed protocol outperforms the existing Packet Discard Network Clustering (PDNC) protocol in terms of buffer size, packet transmission range, network region and number of nodes, under various simulation scenarios.Keywords: internet of things, cyber-physical systems, congestion control, priority, transmission rate
Procedia PDF Downloads 3084365 Correlation Mapping for Measuring Platelet Adhesion
Authors: Eunseop Yeom
Abstract:
Platelets can be activated by the surrounding blood flows where a blood vessel is narrowed as a result of atherosclerosis. Numerous studies have been conducted to identify the relation between platelets activation and thrombus formation. To measure platelet adhesion, this study proposes an image analysis technique. Blood samples are delivered in the microfluidic channel, and then platelets are activated by a stenotic micro-channel with 90% severity. By applying proposed correlation mapping, which visualizes decorrelation of the streaming blood flow, the area of adhered platelets (APlatelet) was estimated without labeling platelets. In order to evaluate the performance of correlation mapping on the detection of platelet adhesion, the effect of tile size was investigated by calculating 2D correlation coefficients with binary images obtained by manual labeling and the correlation mapping method with different sizes of the square tile ranging from 3 to 50 pixels. The maximum 2D correlation coefficient is observed with the optimum tile size of 5×5 pixels. As the area of the platelet adhesion increases, the platelets plug the channel and there is only a small amount of blood flows. This image analysis could provide new insights for better understanding of the interactions between platelet aggregation and blood flows in various physiological conditions.Keywords: platelet activation, correlation coefficient, image analysis, shear rate
Procedia PDF Downloads 3354364 Fusion Models for Cyber Threat Defense: Integrating Clustering, Random Forests, and Support Vector Machines to Against Windows Malware
Authors: Azita Ramezani, Atousa Ramezani
Abstract:
In the ever-escalating landscape of windows malware the necessity for pioneering defense strategies turns into undeniable this study introduces an avant-garde approach fusing the capabilities of clustering random forests and support vector machines SVM to combat the intricate web of cyber threats our fusion model triumphs with a staggering accuracy of 98.67 and an equally formidable f1 score of 98.68 a testament to its effectiveness in the realm of windows malware defense by deciphering the intricate patterns within malicious code our model not only raises the bar for detection precision but also redefines the paradigm of cybersecurity preparedness this breakthrough underscores the potential embedded in the fusion of diverse analytical methodologies and signals a paradigm shift in fortifying against the relentless evolution of windows malicious threats as we traverse through the dynamic cybersecurity terrain this research serves as a beacon illuminating the path toward a resilient future where innovative fusion models stand at the forefront of cyber threat defense.Keywords: fusion models, cyber threat defense, windows malware, clustering, random forests, support vector machines (SVM), accuracy, f1-score, cybersecurity, malicious code detection
Procedia PDF Downloads 724363 The Investigation of Correlation between Body Composition and Physical Activity in University Students
Authors: Ferruh Taspinar, Gulce K. Seyyar, Gamze Kurt, Eda O. Okur, Emrah Afsar, Ismail Saracoglu, Betul Taspinar
Abstract:
Alterations of physical activity can effect body composition (especially body fat ratio); however body mass index may not sufficient to indicate these minimal differences. The aim of this study was to evaluate the relationship between body composition and physical activity in university students. In this study, 132 university students (mean age; 21.21±1.51) were included. Tanita BC-418 and International Physical Activity Questionnaire (IPAQ) were used to evaluate participants. The correlation between the parameters was analysed via Spearman correlation analysis. Significance level in statistical analyses was accepted is 0.05. The results showed that there was no correlation between body mass index and physical activity (p>0.05). There was a positive correlation between body muscle ratio and physical activity, whereas a negative correlation between body fat ratio and physical activity (p<0.05). This study showed that body fat and muscle ratio affects the level of physical activity in healthy university students. Therefore, we thought that physical activity might reduce effects of the diseases caused by disturbed body composition. Further studies are required to support this idea.Keywords: body composition, body mass index, physical activity, university student
Procedia PDF Downloads 3564362 Employing GIS to Analyze Areas Prone to Flooding: Case Study of Thailand
Authors: Sanpachai Huvanandana, Settapong Malisuwan, Soparwan Tongyuak, Prust Pannachet, Anong Phoepueak, Navneet Madan
Abstract:
Many regions of Thailand are prone to flooding due to tropical climate. A commonly increasing precipitation in this continent results in risk of flooding. Many efforts have been implemented such as drainage control system, multiple dams, and irrigation canals. In order to decide where the drainages, dams, and canal should be appropriately located, the flooding risk area should be determined. This paper is aimed to identify the appropriate features that can be used to classify the flooding risk area in Thailand. Several features have been analyzed and used to classify the area. Non-supervised clustering techniques have been used and the results have been compared with ten years average actual flooding area.Keywords: flood area clustering, geographical information system, flood features
Procedia PDF Downloads 2964361 Detecting of Crime Hot Spots for Crime Mapping
Authors: Somayeh Nezami
Abstract:
The management of financial and human resources of police in metropolitans requires many information and exact plans to reduce a rate of crime and increase the safety of the society. Geographical Information Systems have an important role in providing crime maps and their analysis. By using them and identification of crime hot spots along with spatial presentation of the results, it is possible to allocate optimum resources while presenting effective methods for decision making and preventive solutions. In this paper, we try to explain and compare between some of the methods of hot spots analysis such as Mode, Fuzzy Mode and Nearest Neighbour Hierarchical spatial clustering (NNH). Then the spots with the highest crime rates of drug smuggling for one province in Iran with borderline with Afghanistan are obtained. We will show that among these three methods NNH leads to the best result.Keywords: GIS, Hot spots, nearest neighbor hierarchical spatial clustering, NNH, spatial analysis of crime
Procedia PDF Downloads 331