Search results for: clustering algorithms
2224 A Model Based Metaheuristic for Hybrid Hierarchical Community Structure in Social Networks
Authors: Radhia Toujani, Jalel Akaichi
Abstract:
In recent years, the study of community detection in social networks has received great attention. The hierarchical structure of the network leads to the emergence of the convergence to a locally optimal community structure. In this paper, we aim to avoid this local optimum in the introduced hybrid hierarchical method. To achieve this purpose, we present an objective function where we incorporate the value of structural and semantic similarity based modularity and a metaheuristic namely bees colonies algorithm to optimize our objective function on both hierarchical level divisive and agglomerative. In order to assess the efficiency and the accuracy of the introduced hybrid bee colony model, we perform an extensive experimental evaluation on both synthetic and real networks.Keywords: social network, community detection, agglomerative hierarchical clustering, divisive hierarchical clustering, similarity, modularity, metaheuristic, bee colony
Procedia PDF Downloads 3792223 CoP-Networks: Virtual Spaces for New Faculty’s Professional Development in the 21st Higher Education
Authors: Eman AbuKhousa, Marwan Z. Bataineh
Abstract:
The 21st century higher education and globalization challenge new faculty members to build effective professional networks and partnership with industry in order to accelerate their growth and success. This creates the need for community of practice (CoP)-oriented development approaches that focus on cognitive apprenticeship while considering individual predisposition and future career needs. This work adopts data mining, clustering analysis, and social networking technologies to present the CoP-Network as a virtual space that connects together similar career-aspiration individuals who are socially influenced to join and engage in a process for domain-related knowledge and practice acquisitions. The CoP-Network model can be integrated into higher education to extend traditional graduate and professional development programs.Keywords: clustering analysis, community of practice, data mining, higher education, new faculty challenges, social network, social influence, professional development
Procedia PDF Downloads 1832222 Unlocking E-commerce: Analyzing User Behavior and Segmenting Customers for Strategic Insights
Authors: Aditya Patil, Arun Patil, Vaishali Patil, Sudhir Chitnis, Anjum Patel
Abstract:
Rapid growth has given e-commerce platforms a lot of client behavior and spending data. To maximize their strategy, businesses must understand how customers utilize online shopping platforms and what influences their purchases. Our research focuses on e-commerce user behavior and purchasing trends. This extensive study examines spending and user behavior. Regression and grouping disclose relevant data from the dataset. We can understand user spending trends via multilevel regression. We can analyze how pricing, user demographics, and product categories affect customer purchase decisions with this technique. Clustering groups consumers by spending. Important information was found. Purchase habits vary by user group. Our analysis illuminates the complex world of e-commerce consumer behavior and purchase trends. Understanding user behavior helps create effective e-commerce marketing strategies. This market can benefit from K-means clustering. This study focuses on tailoring strategies to user groups and improving product and price effectiveness. Customer buying behaviors across categories were shown via K-means clusters. Average spending is highest in Cluster 4 and lowest in Cluster 3. Clothing is less popular than gadgets and appliances around the holidays. Cluster spending distribution is examined using average variables. Our research enhances e-commerce analytics. Companies can improve customer service and decision-making with this data.Keywords: e-commerce, regression, clustering, k-means
Procedia PDF Downloads 182221 The Effect of Feature Selection on Pattern Classification
Authors: Chih-Fong Tsai, Ya-Han Hu
Abstract:
The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.Keywords: data mining, feature selection, pattern classification, dimensionality reduction
Procedia PDF Downloads 6692220 A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm
Authors: J. Yang, Y. Ma, X. Zhang, S. Li, Y. Zhang
Abstract:
The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.Keywords: degree, initial cluster center, k-means, minimum spanning tree
Procedia PDF Downloads 4112219 Probabilistic Gathering of Agents with Simple Sensors: Distributed Algorithm for Aggregation of Robots Equipped with Binary On-Board Detectors
Authors: Ariel Barel, Rotem Manor, Alfred M. Bruckstein
Abstract:
We present a probabilistic gathering algorithm for agents that can only detect the presence of other agents in front of or behind them. The agents act in the plane and are identical and indistinguishable, oblivious, and lack any means of direct communication. They do not have a common frame of reference in the plane and choose their orientation (direction of possible motion) at random. The analysis of the gathering process assumes that the agents act synchronously in selecting random orientations that remain fixed during each unit time-interval. Two algorithms are discussed. The first one assumes discrete jumps based on the sensing results given the randomly selected motion direction, and in this case, extensive experimental results exhibit probabilistic clustering into a circular region with radius equal to the step-size in time proportional to the number of agents. The second algorithm assumes agents with continuous sensing and motion, and in this case, we can prove gathering into a very small circular region in finite expected time.Keywords: control, decentralized, gathering, multi-agent, simple sensors
Procedia PDF Downloads 1642218 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms
Authors: Neha Ahirwar
Abstract:
In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree
Procedia PDF Downloads 672217 Proposing a Boundary Coverage Algorithm for Underwater Sensor Network
Authors: Seyed Mohsen Jameii
Abstract:
Wireless underwater sensor networks are a type of sensor networks that are located in underwater environments and linked together by acoustic waves. The application of these kinds of network includes monitoring of pollutants (chemical, biological, and nuclear), oil fields detection, prediction of the likelihood of a tsunami in coastal areas, the use of wireless sensor nodes to monitor the passing submarines, and determination of appropriate locations for anchoring ships. This paper proposes a boundary coverage algorithm for intrusion detection in underwater sensor networks. In the first phase of the proposed algorithm, optimal deployment of nodes is done in the water. In the second phase, after the employment of nodes at the proper depth, clustering is executed to reduce the exchanges of messages between the sensors. In the third phase, the algorithm of "divide and conquer" is used to save energy and increase network efficiency. The simulation results demonstrate the efficiency of the proposed algorithm.Keywords: boundary coverage, clustering, divide and conquer, underwater sensor nodes
Procedia PDF Downloads 3412216 Power Aware Modified I-LEACH Protocol Using Fuzzy IF Then Rules
Authors: Gagandeep Singh, Navdeep Singh
Abstract:
Due to limited battery of sensor nodes, so energy efficiency found to be main constraint in WSN. Therefore the main focus of the present work is to find the ways to minimize the energy consumption problem and will results; enhancement in the network stability period and life time. Many researchers have proposed different kind of the protocols to enhance the network lifetime further. This paper has evaluated the issues which have been neglected in the field of the WSNs. WSNs are composed of multiple unattended ultra-small, limited-power sensor nodes. Sensor nodes are deployed randomly in the area of interest. Sensor nodes have limited processing, wireless communication and power resource capabilities Sensor nodes send sensed data to sink or Base Station (BS). I-LEACH gives adaptive clustering mechanism which very efficiently deals with energy conservations. This paper ends up with the shortcomings of various adaptive clustering based WSNs protocols.Keywords: WSN, I-Leach, MATLAB, sensor
Procedia PDF Downloads 2752215 Application of Data Mining Techniques for Tourism Knowledge Discovery
Authors: Teklu Urgessa, Wookjae Maeng, Joong Seek Lee
Abstract:
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.Keywords: classification algorithms, data mining, knowledge discovery, tourism
Procedia PDF Downloads 2952214 An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods
Authors: Issa Qabaja, Fadi Thabtah
Abstract:
Email phishing classification is one of the vital problems in the online security research domain that have attracted several scholars due to its impact on the users payments performed daily online. One aspect to reach a good performance by the detection algorithms in the email phishing problem is to identify the minimal set of features that significantly have an impact on raising the phishing detection rate. This paper investigate three known feature selection methods named Information Gain (IG), Chi-square and Correlation Features Set (CFS) on the email phishing problem to separate high influential features from low influential ones in phishing detection. We measure the degree of influentially by applying four data mining algorithms on a large set of features. We compare the accuracy of these algorithms on the complete features set before feature selection has been applied and after feature selection has been applied. After conducting experiments, the results show 12 common significant features have been chosen among the considered features by the feature selection methods. Further, the average detection accuracy derived by the data mining algorithms on the reduced 12-features set was very slight affected when compared with the one derived from the 47-features set.Keywords: data mining, email classification, phishing, online security
Procedia PDF Downloads 4322213 An Optimal Steganalysis Based Approach for Embedding Information in Image Cover Media with Security
Authors: Ahlem Fatnassi, Hamza Gharsellaoui, Sadok Bouamama
Abstract:
This paper deals with the study of interest in the fields of Steganography and Steganalysis. Steganography involves hiding information in a cover media to obtain the stego media in such a way that the cover media is perceived not to have any embedded message for its unintended recipients. Steganalysis is the mechanism of detecting the presence of hidden information in the stego media and it can lead to the prevention of disastrous security incidents. In this paper, we provide a critical review of the steganalysis algorithms available to analyze the characteristics of an image stego media against the corresponding cover media and understand the process of embedding the information and its detection. We anticipate that this paper can also give a clear picture of the current trends in steganography so that we can develop and improvise appropriate steganalysis algorithms.Keywords: optimization, heuristics and metaheuristics algorithms, embedded systems, low-power consumption, steganalysis heuristic approach
Procedia PDF Downloads 2922212 Cyber Attacks Management in IoT Networks Using Deep Learning and Edge Computing
Authors: Asmaa El Harat, Toumi Hicham, Youssef Baddi
Abstract:
This survey delves into the complex realm of Internet of Things (IoT) security, highlighting the urgent need for effective cybersecurity measures as IoT devices become increasingly common. It explores a wide array of cyber threats targeting IoT devices and focuses on mitigating these attacks through the combined use of deep learning and machine learning algorithms, as well as edge and cloud computing paradigms. The survey starts with an overview of the IoT landscape and the various types of attacks that IoT devices face. It then reviews key machine learning and deep learning algorithms employed in IoT cybersecurity, providing a detailed comparison to assist in selecting the most suitable algorithms. Finally, the survey provides valuable insights for cybersecurity professionals and researchers aiming to enhance security in the intricate world of IoT.Keywords: internet of things (IoT), cybersecurity, machine learning, deep learning
Procedia PDF Downloads 312211 Semantic Based Analysis in Complaint Management System with Analytics
Authors: Francis Alterado, Jennifer Enriquez
Abstract:
Semantic Based Analysis in Complaint Management System with Analytics is an enhanced tool of providing complaints by the clients as well as a mechanism for Palawan Polytechnic College to gather, process, and monitor status of these complaints. The study has a mobile application that serves as a remote facility of communication between the students and the school management on the issues encountered by the student and the solution of every complaint received. In processing the complaints, text mining and clustering algorithms were utilized. Every module of the systems was tested and based on the results; these are 100% free from error before integration was done. A system testing was also done by checking the expected functionality of the system which was 100% functional. The system was tested by 10 students by forwarding complaints to 10 departments. Based on results, the students were able to submit complaints, the system was able to process accordingly by identifying to which department the complaints are intended, and the concerned department was able to give feedback on the complaint received to the student. With this, the system gained 4.7 rating which means Excellent.Keywords: technology adoption, emerging technology, issues challenges, algorithm, text mining, mobile technology
Procedia PDF Downloads 1992210 The Impact of Temporal Impairment on Quality of Experience (QoE) in Video Streaming: A No Reference (NR) Subjective and Objective Study
Authors: Muhammad Arslan Usman, Muhammad Rehan Usman, Soo Young Shin
Abstract:
Live video streaming is one of the most widely used service among end users, yet it is a big challenge for the network operators in terms of quality. The only way to provide excellent Quality of Experience (QoE) to the end users is continuous monitoring of live video streaming. For this purpose, there are several objective algorithms available that monitor the quality of the video in a live stream. Subjective tests play a very important role in fine tuning the results of objective algorithms. As human perception is considered to be the most reliable source for assessing the quality of a video stream, subjective tests are conducted in order to develop more reliable objective algorithms. Temporal impairments in a live video stream can have a negative impact on the end users. In this paper we have conducted subjective evaluation tests on a set of video sequences containing temporal impairment known as frame freezing. Frame Freezing is considered as a transmission error as well as a hardware error which can result in loss of video frames on the reception side of a transmission system. In our subjective tests, we have performed tests on videos that contain a single freezing event and also for videos that contain multiple freezing events. We have recorded our subjective test results for all the videos in order to give a comparison on the available No Reference (NR) objective algorithms. Finally, we have shown the performance of no reference algorithms used for objective evaluation of videos and suggested the algorithm that works better. The outcome of this study shows the importance of QoE and its effect on human perception. The results for the subjective evaluation can serve the purpose for validating objective algorithms.Keywords: objective evaluation, subjective evaluation, quality of experience (QoE), video quality assessment (VQA)
Procedia PDF Downloads 6022209 Neuroevolution Based on Adaptive Ensembles of Biologically Inspired Optimization Algorithms Applied for Modeling a Chemical Engineering Process
Authors: Sabina-Adriana Floria, Marius Gavrilescu, Florin Leon, Silvia Curteanu, Costel Anton
Abstract:
Neuroevolution is a subfield of artificial intelligence used to solve various problems in different application areas. Specifically, neuroevolution is a technique that applies biologically inspired methods to generate neural network architectures and optimize their parameters automatically. In this paper, we use different biologically inspired optimization algorithms in an ensemble strategy with the aim of training multilayer perceptron neural networks, resulting in regression models used to simulate the industrial chemical process of obtaining bricks from silicone-based materials. Installations in the raw ceramics industry, i.e., bricks, are characterized by significant energy consumption and large quantities of emissions. In addition, the initial conditions that were taken into account during the design and commissioning of the installation can change over time, which leads to the need to add new mixes to adjust the operating conditions for the desired purpose, e.g., material properties and energy saving. The present approach follows the study by simulation of a process of obtaining bricks from silicone-based materials, i.e., the modeling and optimization of the process. Optimization aims to determine the working conditions that minimize the emissions represented by nitrogen monoxide. We first use a search procedure to find the best values for the parameters of various biologically inspired optimization algorithms. Then, we propose an adaptive ensemble strategy that uses only a subset of the best algorithms identified in the search stage. The adaptive ensemble strategy combines the results of selected algorithms and automatically assigns more processing capacity to the more efficient algorithms. Their efficiency may also vary at different stages of the optimization process. In a given ensemble iteration, the most efficient algorithms aim to maintain good convergence, while the less efficient algorithms can improve population diversity. The proposed adaptive ensemble strategy outperforms the individual optimizers and the non-adaptive ensemble strategy in convergence speed, and the obtained results provide lower error values.Keywords: optimization, biologically inspired algorithm, neuroevolution, ensembles, bricks, emission minimization
Procedia PDF Downloads 1162208 Analysis and Modeling of Photovoltaic System with Different Research Methods of Maximum Power Point Tracking
Authors: Mehdi Ameur, Ahmed Essakdi, Tamou Nasser
Abstract:
The purpose of this paper is the analysis and modeling of the photovoltaic system with MPPT techniques. This system is developed by combining the models of established solar module and DC-DC converter with the algorithms of perturb and observe (P&O), incremental conductance (INC) and fuzzy logic controller(FLC). The system is simulated under different climate conditions and MPPT algorithms to determine the influence of these conditions on characteristic power-voltage of PV system. According to the comparisons of the simulation results, the photovoltaic system can extract the maximum power with precision and rapidity using the MPPT algorithms discussed in this paper.Keywords: photovoltaic array, maximum power point tracking, MPPT, perturb and observe, P&O, incremental conductance, INC, hill climbing, HC, fuzzy logic controller, FLC
Procedia PDF Downloads 4292207 Evaluation of Dual Polarization Rainfall Estimation Algorithm Applicability in Korea: A Case Study on Biseulsan Radar
Authors: Chulsang Yoo, Gildo Kim
Abstract:
Dual polarization radar provides comprehensive information about rainfall by measuring multiple parameters. In Korea, for the rainfall estimation, JPOLE and CSU-HIDRO algorithms are generally used. This study evaluated the local applicability of JPOLE and CSU-HIDRO algorithms in Korea by using the observed rainfall data collected on August, 2014 by the Biseulsan dual polarization radar data and KMA AWS. A total of 11,372 pairs of radar-ground rain rate data were classified according to thresholds of synthetic algorithms into suitable and unsuitable data. Then, evaluation criteria were derived by comparing radar rain rate and ground rain rate, respectively, for entire, suitable, unsuitable data. The results are as follows: (1) The radar rain rate equation including KDP, was found better in the rainfall estimation than the other equations for both JPOLE and CSU-HIDRO algorithms. The thresholds were found to be adequately applied for both algorithms including specific differential phase. (2) The radar rain rate equation including horizontal reflectivity and differential reflectivity were found poor compared to the others. The result was not improved even when only the suitable data were applied. Acknowledgments: This work was supported by the Basic Science Research Program through the National Research Foundation of Korea, funded by the Ministry of Education (NRF-2013R1A1A2011012).Keywords: CSU-HIDRO algorithm, dual polarization radar, JPOLE algorithm, radar rainfall estimation algorithm
Procedia PDF Downloads 2142206 LiDAR Based Real Time Multiple Vehicle Detection and Tracking
Authors: Zhongzhen Luo, Saeid Habibi, Martin v. Mohrenschildt
Abstract:
Self-driving vehicle require a high level of situational awareness in order to maneuver safely when driving in real world condition. This paper presents a LiDAR based real time perception system that is able to process sensor raw data for multiple target detection and tracking in dynamic environment. The proposed algorithm is nonparametric and deterministic that is no assumptions and priori knowledge are needed from the input data and no initializations are required. Additionally, the proposed method is working on the three-dimensional data directly generated by LiDAR while not scarifying the rich information contained in the domain of 3D. Moreover, a fast and efficient for real time clustering algorithm is applied based on a radially bounded nearest neighbor (RBNN). Hungarian algorithm procedure and adaptive Kalman filtering are used for data association and tracking algorithm. The proposed algorithm is able to run in real time with average run time of 70ms per frame.Keywords: lidar, segmentation, clustering, tracking
Procedia PDF Downloads 4232205 Heterogenous Dimensional Super Resolution of 3D CT Scans Using Transformers
Authors: Helen Zhang
Abstract:
Accurate segmentation of the airways from CT scans is crucial for early diagnosis of lung cancer. However, the existing airway segmentation algorithms often rely on thin-slice CT scans, which can be inconvenient and costly. This paper presents a set of machine learning-based 3D super-resolution algorithms along heterogeneous dimensions to improve the resolution of thicker CT scans to reduce the reliance on thin-slice scans. To evaluate the efficacy of the super-resolution algorithms, quantitative assessments using PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity index) were performed. The impact of super-resolution on airway segmentation accuracy is also studied. The proposed approach has the potential to make airway segmentation more accessible and affordable, thereby facilitating early diagnosis and treatment of lung cancer.Keywords: 3D super-resolution, airway segmentation, thin-slice CT scans, machine learning
Procedia PDF Downloads 1172204 Research on the Risks of Railroad Receiving and Dispatching Trains Operators: Natural Language Processing Risk Text Mining
Authors: Yangze Lan, Ruihua Xv, Feng Zhou, Yijia Shan, Longhao Zhang, Qinghui Xv
Abstract:
Receiving and dispatching trains is an important part of railroad organization, and the risky evaluation of operating personnel is still reflected by scores, lacking further excavation of wrong answers and operating accidents. With natural language processing (NLP) technology, this study extracts the keywords and key phrases of 40 relevant risk events about receiving and dispatching trains and reclassifies the risk events into 8 categories, such as train approach and signal risks, dispatching command risks, and so on. Based on the historical risk data of personnel, the K-Means clustering method is used to classify the risk level of personnel. The result indicates that the high-risk operating personnel need to strengthen the training of train receiving and dispatching operations towards essential trains and abnormal situations.Keywords: receiving and dispatching trains, natural language processing, risk evaluation, K-means clustering
Procedia PDF Downloads 912203 The Use of Appeals in Green Printed Advertisements: A Case of Product Orientation and Organizational Image Orientation Ads
Authors: Chutima Ruanguttamanun
Abstract:
Despite the relatively large number of studies that have examined the use of appeals in advertisements, research on the use of appeals in green advertisements is still underdeveloped and needs to be investigated further, as it is definitely a tool for marketers to create illustrious ads. In this study, content analysis was employed to examine the nature of green advertising appeals and to match the appeals with the green advertisements. Two different types of green print advertisings, product orientation and organizational image orientation were used. Thirty highly educated participants with different backgrounds were asked individually to ascertain three appeals out of thirty-four given appeals found among forty real green advertisements. To analyze participant responses and to group them based on common appeals, two-step K-mean clustering is used. The clustering solution indicates that eye-catching graphics and imaginative appeals are highly notable in both types of green ads. Depressed, meaningful and sad appeals are found to be highly used in organizational image orientation ads, whereas, corporate image, informative and natural appeals are found to be essential for product orientation ads.Keywords: advertising appeals, green marketing, green advertisement, printed advertisement
Procedia PDF Downloads 2782202 A Decision Support System for the Detection of Illicit Substance Production Sites
Authors: Krystian Chachula, Robert Nowak
Abstract:
Manufacturing home-made explosives and synthetic drugs is an increasing problem in Europe. To combat that, a data fusion system is proposed for the detection and localization of production sites in urban environments. The data consists of measurements of properties of wastewater performed by various sensors installed in a sewage network. A four-stage fusion strategy allows detecting sources of waste products from known chemical reactions. First, suspicious measurements are used to compute the amount and position of discharged compounds. Then, this information is propagated through the sewage network to account for missing sensors. The next step is clustering and the formation of tracks. Eventually, tracks are used to reconstruct discharge events. Sensor measurements are simulated by a subsystem based on real-world data. In this paper, different discharge scenarios are considered to show how the parameters of used algorithms affect the effectiveness of the proposed system. This research is a part of the SYSTEM project (SYnergy of integrated Sensors and Technologies for urban sEcured environMent).Keywords: continuous monitoring, information fusion and sensors, internet of things, multisensor fusion
Procedia PDF Downloads 1152201 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review
Authors: Faisal Muhibuddin, Ani Dijah Rahajoe
Abstract:
This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review
Procedia PDF Downloads 662200 Bioinformatic Approaches in Population Genetics and Phylogenetic Studies
Authors: Masoud Sheidai
Abstract:
Biologists with a special field of population genetics and phylogeny have different research tasks such as populations’ genetic variability and divergence, species relatedness, the evolution of genetic and morphological characters, and identification of DNA SNPs with adaptive potential. To tackle these problems and reach a concise conclusion, they must use the proper and efficient statistical and bioinformatic methods as well as suitable genetic and morphological characteristics. In recent years application of different bioinformatic and statistical methods, which are based on various well-documented assumptions, are the proper analytical tools in the hands of researchers. The species delineation is usually carried out with the use of different clustering methods like K-means clustering based on proper distance measures according to the studied features of organisms. A well-defined species are assumed to be separated from the other taxa by molecular barcodes. The species relationships are studied by using molecular markers, which are analyzed by different analytical methods like multidimensional scaling (MDS) and principal coordinate analysis (PCoA). The species population structuring and genetic divergence are usually investigated by PCoA and PCA methods and a network diagram. These are based on bootstrapping of data. The Association of different genes and DNA sequences to ecological and geographical variables is determined by LFMM (Latent factor mixed model) and redundancy analysis (RDA), which are based on Bayesian and distance methods. Molecular and morphological differentiating characters in the studied species may be identified by linear discriminant analysis (DA) and discriminant analysis of principal components (DAPC). We shall illustrate these methods and related conclusions by giving examples from different edible and medicinal plant species.Keywords: GWAS analysis, K-Means clustering, LFMM, multidimensional scaling, redundancy analysis
Procedia PDF Downloads 1252199 A Clustering-Based Approach for Weblog Data Cleaning
Authors: Amine Ganibardi, Cherif Arab Ali
Abstract:
This paper addresses the data cleaning issue as a part of web usage data preprocessing within the scope of Web Usage Mining. Weblog data recorded by web servers within log files reflect usage activity, i.e., End-users’ clicks and underlying user-agents’ hits. As Web Usage Mining is interested in End-users’ behavior, user-agents’ hits are referred to as noise to be cleaned-off before mining. Filtering hits from clicks is not trivial for two reasons, i.e., a server records requests interlaced in sequential order regardless of their source or type, website resources may be set up as requestable interchangeably by end-users and user-agents. The current methods are content-centric based on filtering heuristics of relevant/irrelevant items in terms of some cleaning attributes, i.e., website’s resources filetype extensions, website’s resources pointed by hyperlinks/URIs, http methods, user-agents, etc. These methods need exhaustive extra-weblog data and prior knowledge on the relevant and/or irrelevant items to be assumed as clicks or hits within the filtering heuristics. Such methods are not appropriate for dynamic/responsive Web for three reasons, i.e., resources may be set up to as clickable by end-users regardless of their type, website’s resources are indexed by frame names without filetype extensions, web contents are generated and cancelled differently from an end-user to another. In order to overcome these constraints, a clustering-based cleaning method centered on the logging structure is proposed. This method focuses on the statistical properties of the logging structure at the requested and referring resources attributes levels. It is insensitive to logging content and does not need extra-weblog data. The used statistical property takes on the structure of the generated logging feature by webpage requests in terms of clicks and hits. Since a webpage consists of its single URI and several components, these feature results in a single click to multiple hits ratio in terms of the requested and referring resources. Thus, the clustering-based method is meant to identify two clusters based on the application of the appropriate distance to the frequency matrix of the requested and referring resources levels. As the ratio clicks to hits is single to multiple, the clicks’ cluster is the smallest one in requests number. Hierarchical Agglomerative Clustering based on a pairwise distance (Gower) and average linkage has been applied to four logfiles of dynamic/responsive websites whose click to hits ratio range from 1/2 to 1/15. The optimal clustering set on the basis of average linkage and maximum inter-cluster inertia results always in two clusters. The evaluation of the smallest cluster referred to as clicks cluster under the terms of confusion matrix indicators results in 97% of true positive rate. The content-centric cleaning methods, i.e., conventional and advanced cleaning, resulted in a lower rate 91%. Thus, the proposed clustering-based cleaning outperforms the content-centric methods within dynamic and responsive web design without the need of any extra-weblog. Such an improvement in cleaning quality is likely to refine dependent analysis.Keywords: clustering approach, data cleaning, data preprocessing, weblog data, web usage data
Procedia PDF Downloads 1702198 AM/E/c Queuing Hub Maximal Covering Location Model with Fuzzy Parameter
Authors: M. H. Fazel Zarandi, N. Moshahedi
Abstract:
The hub location problem appears in a variety of applications such as medical centers, firefighting facilities, cargo delivery systems and telecommunication network design. The location of service centers has a strong influence on the congestion at each of them, and, consequently, on the quality of service. This paper presents a fuzzy maximal hub covering location problem (FMCHLP) in which travel costs between any pair of nodes is considered as a fuzzy variable. In order to consider the quality of service, we model each hub as a queue. Arrival rate follows Poisson distribution and service rate follows Erlang distribution. In this paper, at first, a nonlinear mathematical programming model is presented. Then, we convert it to the linear one. We solved the linear model using GAMS software up to 25 nodes and for large sizes due to the complexity of hub covering location problems, and simulated annealing algorithm is developed to solve and test the model. Also, we used possibilistic c-means clustering method in order to find an initial solution.Keywords: fuzzy modeling, location, possibilistic clustering, queuing
Procedia PDF Downloads 3942197 Virtual 3D Environments for Image-Based Navigation Algorithms
Authors: V. B. Bastos, M. P. Lima, P. R. G. Kurka
Abstract:
This paper applies to the creation of virtual 3D environments for the study and development of mobile robot image based navigation algorithms and techniques, which need to operate robustly and efficiently. The test of these algorithms can be performed in a physical way, from conducting experiments on a prototype, or by numerical simulations. Current simulation platforms for robotic applications do not have flexible and updated models for image rendering, being unable to reproduce complex light effects and materials. Thus, it is necessary to create a test platform that integrates sophisticated simulated applications of real environments for navigation, with data and image processing. This work proposes the development of a high-level platform for building 3D model’s environments and the test of image-based navigation algorithms for mobile robots. Techniques were used for applying texture and lighting effects in order to accurately represent the generation of rendered images regarding the real world version. The application will integrate image processing scripts, trajectory control, dynamic modeling and simulation techniques for physics representation and picture rendering with the open source 3D creation suite - Blender.Keywords: simulation, visual navigation, mobile robot, data visualization
Procedia PDF Downloads 2552196 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring
Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti
Abstract:
Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by density-based time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., mean value, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one class classifier (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, a new anomaly detector strategy is proposed, namely one class classifier neural network two (OCCNN2), which exploit the classification capability of standard classifiers in an anomaly detection problem, finding the standard class (the boundary of the features space in normal operating conditions) through a two-step approach: coarse and fine boundary estimation. The coarse estimation uses classics OCC techniques, while the fine estimation is performed through a feedforward neural network (NN) trained that exploits the boundaries estimated in the coarse step. The detection algorithms vare then compared with known methods based on principal component analysis (PCA), kernel principal component analysis (KPCA), and auto-associative neural network (ANN). In many cases, the proposed solution increases the performance with respect to the standard OCC algorithms in terms of F1 score and accuracy. In particular, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 96% with the proposed method.Keywords: anomaly detection, frequencies selection, modal analysis, neural network, sensor network, structural health monitoring, vibration measurement
Procedia PDF Downloads 1232195 Transfer Knowledge From Multiple Source Problems to a Target Problem in Genetic Algorithm
Authors: Terence Soule, Tami Al Ghamdi
Abstract:
To study how to transfer knowledge from multiple source problems to the target problem, we modeled the Transfer Learning (TL) process using Genetic Algorithms as the model solver. TL is the process that aims to transfer learned data from one problem to another problem. The TL process aims to help Machine Learning (ML) algorithms find a solution to the problems. The Genetic Algorithms (GA) give researchers access to information that we have about how the old problem is solved. In this paper, we have five different source problems, and we transfer the knowledge to the target problem. We studied different scenarios of the target problem. The results showed combined knowledge from multiple source problems improves the GA performance. Also, the process of combining knowledge from several problems results in promoting diversity of the transferred population.Keywords: transfer learning, genetic algorithm, evolutionary computation, source and target
Procedia PDF Downloads 140