Search results for: ensemble clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 759

Search results for: ensemble clustering

489 Clustering-Based Threshold Model for Condition Rating of Concrete Bridge Decks

Authors: M. Alsharqawi, T. Zayed, S. Abu Dabous

Abstract:

To ensure safety and serviceability of bridge infrastructure, accurate condition assessment and rating methods are needed to provide basis for bridge Maintenance, Repair and Replacement (MRR) decisions. In North America, the common practices to assess condition of bridges are through visual inspection. These practices are limited to detect surface defects and external flaws. Further, the thresholds that define the severity of bridge deterioration are selected arbitrarily. The current research discusses the main deteriorations and defects identified during visual inspection and Non-Destructive Evaluation (NDE). NDE techniques are becoming popular in augmenting the visual examination during inspection to detect subsurface defects. Quality inspection data and accurate condition assessment and rating are the basis for determining appropriate MRR decisions. Thus, in this paper, a novel method for bridge condition assessment using the Quality Function Deployment (QFD) theory is utilized. The QFD model is designed to provide an integrated condition by evaluating both the surface and subsurface defects for concrete bridges. Moreover, an integrated condition rating index with four thresholds is developed based on the QFD condition assessment model and using K-means clustering technique. Twenty case studies are analyzed by applying the QFD model and implementing the developed rating index. The results from the analyzed case studies show that the proposed threshold model produces robust MRR recommendations consistent with decisions and recommendations made by bridge managers on these projects. The proposed method is expected to advance the state of the art of bridges condition assessment and rating.

Keywords: concrete bridge decks, condition assessment and rating, quality function deployment, k-means clustering technique

Procedia PDF Downloads 195
488 Fault-Detection and Self-Stabilization Protocol for Wireless Sensor Networks

Authors: Ather Saeed, Arif Khan, Jeffrey Gosper

Abstract:

Sensor devices are prone to errors and sudden node failures, which are difficult to detect in a timely manner when deployed in real-time, hazardous, large-scale harsh environments and in medical emergencies. Therefore, the loss of data can be life-threatening when the sensed phenomenon is not disseminated due to sudden node failure, battery depletion or temporary malfunctioning. We introduce a set of partial differential equations for localizing faults, similar to Green’s and Maxwell’s equations used in Electrostatics and Electromagnetism. We introduce a node organization and clustering scheme for self-stabilizing sensor networks. Green’s theorem is applied to regions where the curve is closed and continuously differentiable to ensure network connectivity. Experimental results show that the proposed GTFD (Green’s Theorem fault-detection and Self-stabilization) protocol not only detects faulty nodes but also accurately generates network stability graphs where urgent intervention is required for dynamically self-stabilizing the network.

Keywords: Green’s Theorem, self-stabilization, fault-localization, RSSI, WSN, clustering

Procedia PDF Downloads 40
487 Design of Agricultural Machinery Factory Facility Layout

Authors: Nilda Tri Putri, Muhammad Taufik

Abstract:

Tools and agricultural machinery (Alsintan) is a tool used in agribusiness activities. Alsintan used to change the traditional farming systems generally use manual equipment into modern agriculture with mechanization. CV Nugraha Chakti Consultant make an action plan for industrial development Alsintan West Sumatra in 2012 to develop medium industries of Alsintan become a major industry of Alsintan, one of efforts made is increase the production capacity of the industry Alsintan. Production capacity for superior products as hydrotiller and threshers set each for 2.000 units per year. CV Citra Dragon as one of the medium industry alsintan in West Sumatra has a plan to relocate the existing plant to meet growing consumer demand each year. Increased production capacity and plant relocation plan has led to a change in the layout; therefore need to design the layout of the plant facility CV Citra Dragon. First step the to design of plant layout is design the layout of the production floor. The design of the production floor layout is done by applying group technology layout. The initial step is to do a machine grouping and part family using the Average Linkage Clustering (ALC) and Rank Order Clustering (ROC). Furthermore done independent work station design and layout design using the Modified Spanning Tree (MST). Alternative selection layout is done to select the best production floor layout between ALC and ROC cell grouping. Furthermore, to design the layout of warehouses, offices and other production support facilities. Activity Relationship Chart methods used to organize the placement of factory facilities has been designed. After structuring plan facilities, calculated cost manufacturing facility plant establishment. Type of layout is used on the production floor layout technology group. The production floor is composed of four cell machinery, assembly area and painting area. The total distance of the displacement of material in a single production amounted to 1120.16 m which means need 18,7minutes of transportation time for one time production. Alsintan Factory has designed a circular flow pattern with 11 facilities. The facilities were designed consisting of 10 rooms and 1 parking space. The measure of factory building is 84 m x 52 m.

Keywords: Average Linkage Clustering (ALC), Rank Order Clustering (ROC), Modified Spanning Tree (MST), Activity Relationship Chart (ARC)

Procedia PDF Downloads 466
486 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis

Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa

Abstract:

Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.

Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM

Procedia PDF Downloads 423
485 Unseen Classes: The Paradigm Shift in Machine Learning

Authors: Vani Singhal, Jitendra Parmar, Satyendra Singh Chouhan

Abstract:

Unseen class discovery has now become an important part of a machine-learning algorithm to judge new classes. Unseen classes are the classes on which the machine learning model is not trained on. With the advancement in technology and AI replacing humans, the amount of data has increased to the next level. So while implementing a model on real-world examples, we come across unseen new classes. Our aim is to find the number of unseen classes by using a hierarchical-based active learning algorithm. The algorithm is based on hierarchical clustering as well as active sampling. The number of clusters that we will get in the end will give the number of unseen classes. The total clusters will also contain some clusters that have unseen classes. Instead of first discovering unseen classes and then finding their number, we directly calculated the number by applying the algorithm. The dataset used is for intent classification. The target data is the intent of the corresponding query. We conclude that when the machine learning model will encounter real-world data, it will automatically find the number of unseen classes. In the future, our next work would be to label these unseen classes correctly.

Keywords: active sampling, hierarchical clustering, open world learning, unseen class discovery

Procedia PDF Downloads 136
484 Study for an Optimal Cable Connection within an Inner Grid of an Offshore Wind Farm

Authors: Je-Seok Shin, Wook-Won Kim, Jin-O Kim

Abstract:

The offshore wind farm needs to be designed carefully considering economics and reliability aspects. There are many decision-making problems for designing entire offshore wind farm, this paper focuses on an inner grid layout which means the connection between wind turbines as well as between wind turbines and an offshore substation. A methodology proposed in this paper determines the connections and the cable type for each connection section using K-clustering, minimum spanning tree and cable selection algorithms. And then, a cost evaluation is performed in terms of investment, power loss and reliability. Through the cost evaluation, an optimal layout of inner grid is determined so as to have the lowest total cost. In order to demonstrate the validity of the methodology, the case study is conducted on 240MW offshore wind farm, and the results show that it is helpful to design optimally offshore wind farm.

Keywords: offshore wind farm, optimal layout, k-clustering algorithm, minimum spanning algorithm, cable type selection, power loss cost, reliability cost

Procedia PDF Downloads 358
483 Optimal Maintenance Clustering for Rail Track Components Subject to Possession Capacity Constraints

Authors: Cuong D. Dao, Rob J.I. Basten, Andreas Hartmann

Abstract:

This paper studies the optimal maintenance planning of preventive maintenance and renewal activities for components in a single railway track when the available time for maintenance is limited. The rail-track system consists of several types of components, such as rail, ballast, and switches with different preventive maintenance and renewal intervals. To perform maintenance or renewal on the track, a train free period for maintenance, called a possession, is required. Since a major possession directly affects the regular train schedule, maintenance and renewal activities are clustered as much as possible. In a highly dense and utilized railway network, the possession time on the track is critical since the demand for train operations is very high and a long possession has a severe impact on the regular train schedule. We present an optimization model and investigate the maintenance schedules with and without the possession capacity constraint. In addition, we also integrate the social-economic cost related to the effects of the maintenance time to the variable possession cost into the optimization model. A numerical example is provided to illustrate the model.

Keywords: rail-track components, maintenance, optimal clustering, possession capacity

Procedia PDF Downloads 232
482 Spatial Scale of Clustering of Residential Burglary and Its Dependence on Temporal Scale

Authors: Mohammed A. Alazawi, Shiguo Jiang, Steven F. Messner

Abstract:

Research has long focused on two main spatial aspects of crime: spatial patterns and spatial processes. When analyzing these patterns and processes, a key issue has been to determine the proper spatial scale. In addition, it is important to consider the possibility that these patterns and processes might differ appreciably for different temporal scales and might vary across geographic units of analysis. We examine the spatial-temporal dependence of residential burglary. This dependence is tested at varying geographical scales and temporal aggregations. The analyses are based on recorded incidents of crime in Columbus, Ohio during the 1994-2002 period. We implement point pattern analysis on the crime points using Ripley’s K function. The results indicate that spatial point patterns of residential burglary reveal spatial scales of clustering relatively larger than the average size of census tracts of the study area. Also, spatial scale is independent of temporal scale. The results of our analyses concerning the geographic scale of spatial patterns and processes can inform the development of effective policies for crime control.

Keywords: inhomogeneous K function, residential burglary, spatial point pattern, spatial scale, temporal scale

Procedia PDF Downloads 310
481 Evaluation of the Effect of Learning Disabilities and Accommodations on the Prediction of the Exam Performance: Ordinal Decision-Tree Algorithm

Authors: G. Singer, M. Golan

Abstract:

Providing students with learning disabilities (LD) with extra time to grant them equal access to the exam is a necessary but insufficient condition to compensate for their LD; there should also be a clear indication that the additional time was actually used. For example, if students with LD use more time than students without LD and yet receive lower grades, this may indicate that a different accommodation is required. If they achieve higher grades but use the same amount of time, then the effectiveness of the accommodation has not been demonstrated. The main goal of this study is to evaluate the effect of including parameters related to LD and extended exam time, along with other commonly-used characteristics (e.g., student background and ability measures such as high-school grades), on the ability of ordinal decision-tree algorithms to predict exam performance. We use naturally-occurring data collected from hundreds of undergraduate engineering students. The sub-goals are i) to examine the improvement in prediction accuracy when the indicator of exam performance includes 'actual time used' in addition to the conventional indicator (exam grade) employed in most research; ii) to explore the effectiveness of extended exam time on exam performance for different courses and for LD students with different profiles (i.e., sets of characteristics). This is achieved by using the patterns (i.e., subgroups) generated by the algorithms to identify pairs of subgroups that differ in just one characteristic (e.g., course or type of LD) but have different outcomes in terms of exam performance (grade and time used). Since grade and time used to exhibit an ordering form, we propose a method based on ordinal decision-trees, which applies a weighted information-gain ratio (WIGR) measure for selecting the classifying attributes. Unlike other known ordinal algorithms, our method does not assume monotonicity in the data. The proposed WIGR is an extension of an information-theoretic measure, in the sense that it adjusts to the case of an ordinal target and takes into account the error severity between two different target classes. Specifically, we use ordinal C4.5, random-forest, and AdaBoost algorithms, as well as an ensemble technique composed of ordinal and non-ordinal classifiers. Firstly, we find that the inclusion of LD and extended exam-time parameters improves prediction of exam performance (compared to specifications of the algorithms that do not include these variables). Secondly, when the indicator of exam performance includes 'actual time used' together with grade (as opposed to grade only), the prediction accuracy improves. Thirdly, our subgroup analyses show clear differences in the effect of extended exam time on exam performance among different courses and different student profiles. From a methodological perspective, we find that the ordinal decision-tree based algorithms outperform their conventional, non-ordinal counterparts. Further, we demonstrate that the ensemble-based approach leverages the strengths of each type of classifier (ordinal and non-ordinal) and yields better performance than each classifier individually.

Keywords: actual exam time usage, ensemble learning, learning disabilities, ordinal classification, time extension

Procedia PDF Downloads 68
480 Clustering Color Space, Time Interest Points for Moving Objects

Authors: Insaf Bellamine, Hamid Tairi

Abstract:

Detecting moving objects in sequences is an essential step for video analysis. This paper mainly contributes to the Color Space-Time Interest Points (CSTIP) extraction and detection. We propose a new method for detection of moving objects. Two main steps compose the proposed method. First, we suggest to apply the algorithm of the detection of Color Space-Time Interest Points (CSTIP) on both components of the Color Structure-Texture Image Decomposition which is based on a Partial Differential Equation (PDE): a color geometric structure component and a color texture component. A descriptor is associated to each of these points. In a second stage, we address the problem of grouping the points (CSTIP) into clusters. Experiments and comparison to other motion detection methods on challenging sequences show the performance of the proposed method and its utility for video analysis. Experimental results are obtained from very different types of videos, namely sport videos and animation movies.

Keywords: Color Space-Time Interest Points (CSTIP), Color Structure-Texture Image Decomposition, Motion Detection, clustering

Procedia PDF Downloads 350
479 Optical Flow Based System for Cross Traffic Alert

Authors: Giuseppe Spampinato, Salvatore Curti, Ivana Guarneri, Arcangelo Bruna

Abstract:

This document describes an advanced system and methodology for Cross Traffic Alert (CTA), able to detect vehicles that move into the vehicle driving path from the left or right side. The camera is supposed to be not only on a vehicle still, e.g. at a traffic light or at an intersection, but also moving slowly, e.g. in a car park. In all of the aforementioned conditions, a driver’s short loss of concentration or distraction can easily lead to a serious accident. A valid support to avoid these kinds of car crashes is represented by the proposed system. It is an extension of our previous work, related to a clustering system, which only works on fixed cameras. Just a vanish point calculation and simple optical flow filtering, to eliminate motion vectors due to the car relative movement, is performed to let the system achieve high performances with different scenarios, cameras and resolutions. The proposed system just uses as input the optical flow, which is hardware implemented in the proposed platform and since the elaboration of the whole system is really speed and power consumption, it is inserted directly in the camera framework, allowing to execute all the processing in real-time.

Keywords: clustering, cross traffic alert, optical flow, real time, vanishing point

Procedia PDF Downloads 171
478 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms

Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga

Abstract:

Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.

Keywords: anomaly detection, clustering, pattern recognition, web sessions

Procedia PDF Downloads 257
477 Geographic Legacies for Modern Day Disease Research: Autism Spectrum Disorder as a Case-Control Study

Authors: Rebecca Richards Steed, James Van Derslice, Ken Smith, Richard Medina, Amanda Bakian

Abstract:

Elucidating gene-environment interactions for heritable disease outcomes is an emerging area of disease research, with genetic studies informing hypotheses for environment and gene interactions underlying some of the most confounding diseases of our time, like autism spectrum disorder (ASD). Geography has thus far played a key role in identifying environmental factors contributing to disease, but its use can be broadened to include genetic and environmental factors that have a synergistic effect on disease. Through the use of family pedigrees and disease outcomes with life-course residential histories, space-time clustering of generations at critical developmental windows can provide further understanding of (1) environmental factors that contribute to disease patterns in families, (2) susceptible critical windows of development most impacted by environment, (3) and that are most likely to lead to an ASD diagnosis. This paper introduces a retrospective case-control study that utilizes pedigree data, health data, and residential life-course location points to find space-time clustering of ancestors with a grandchild/child with a clinical diagnosis of ASD. Finding space-time clusters of ancestors at critical developmental windows serves as a proxy for shared environmental exposures. The authors refer to geographic life-course exposures as geographic legacies. Identifying space-time clusters of ancestors creates a bridge for researching exposures of past generations that may impact modern-day progeny health. Results from the space-time cluster analysis show multiple clusters for the maternal and paternal pedigrees. The paternal grandparent pedigree resulted in the most space-time clustering for birth and childhood developmental windows. No statistically significant clustering was found for adolescent years. These results will be further studied to identify the specific share of space-time environmental exposures. In conclusion, this study has found significant space-time clusters of parents, and grandparents for both maternal and paternal lineage. These results will be used to identify what environmental exposures have been shared with family members at critical developmental windows of time, and additional analysis will be applied.

Keywords: family pedigree, environmental exposure, geographic legacy, medical geography, transgenerational inheritance

Procedia PDF Downloads 88
476 Climate Change Effects in a Mediterranean Island and Streamflow Changes for a Small Basin Using Euro-Cordex Regional Climate Simulations Combined with the SWAT Model

Authors: Pier Andrea Marras, Daniela Lima, Pedro Matos Soares, Rita Maria Cardoso, Daniela Medas, Elisabetta Dore, Giovanni De Giudici

Abstract:

Climate change effects on the hydrologic cycle are the main concern for the evaluation of water management strategies. Climate models project scenarios of precipitation changes in the future, considering greenhouse emissions. In this study, the EURO-CORDEX (European Coordinated Regional Downscaling Experiment) climate models were first evaluated in a Mediterranean island (Sardinia) against observed precipitation for a historical reference period (1976-2005). A weighted multi-model ensemble (ENS) was built, weighting the single models based on their ability to reproduce observed rainfall. Future projections (2071-2100) were carried out using the 8.5 RCP emissions scenario to evaluate changes in precipitations. ENS was then used as climate forcing for the SWAT model (Soil and Water Assessment Tool), with the aim to assess the consequences of such projected changes on streamflow and runoff of two small catchments located in the South-West Sardinia. Results showed that a decrease of mean rainfall values, up to -25 % at yearly scale, is expected for the future, along with an increase of extreme precipitation events. Particularly in the eastern and southern areas, extreme events are projected to increase by 30%. Such changes reflect on the hydrologic cycle with a decrease of mean streamflow and runoff, except in spring, when runoff is projected to increase by 20-30%. These results stress that the Mediterranean is a hotspot for climate change, and the use of model tools can provide very useful information to adopt water and land management strategies to deal with such changes.

Keywords: EURO-CORDEX, climate change, hydrology, SWAT model, Sardinia, multi-model ensemble

Procedia PDF Downloads 185
475 Molecular Dynamic Simulation of CO2 Absorption into Mixed Aqueous Solutions MDEA/PZ

Authors: N. Harun, E. E. Masiren, W. H. W. Ibrahim, F. Adam

Abstract:

Amine absorption process is an approach for mitigation of CO2 from flue gas that produces from power plant. This process is the most common system used in chemical and oil industries for gas purification to remove acid gases. On the challenges of this process is high energy requirement for solvent regeneration to release CO2. In the past few years, mixed alkanolamines have received increasing attention. In most cases, the mixtures contain N-methyldiethanolamine (MDEA) as the base amine with the addition of one or two more reactive amines such as PZ. The reason for the application of such blend amine is to take advantage of high reaction rate of CO2 with the activator combined with the advantages of the low heat of regeneration of MDEA. Several experimental and simulation studies have been undertaken to understand this process using blend MDEA/PZ solvent. Despite those studies, the mechanism of CO2 absorption into the aqueous MDEA is not well understood and available knowledge within the open literature is limited. The aim of this study is to investigate the intermolecular interaction of the blend MDEA/PZ using Molecular Dynamics (MD) simulation. MD simulation was run under condition 313K and 1 atm using NVE ensemble at 200ps and NVT ensemble at 1ns. The results were interpreted in term of Radial Distribution Function (RDF) analysis through two system of interest i.e binary and tertiary. The binary system will explain the interaction between amine and water molecule while tertiary system used to determine the interaction between the amine and CO2 molecule. For the binary system, it was observed that the –OH group of MDEA is more attracted to water molecule compared to –NH group of MDEA. The –OH group of MDEA can form the hydrogen bond with water that will assist the solubility of MDEA in water. The intermolecular interaction probability of –OH and –NH group of MDEA with CO2 in blended MDEA/PZ is higher than using single MDEA. This findings show that PZ molecule act as an activator to promote the intermolecular interaction between MDEA and CO2.Thus, blend of MDEA with PZ is expecting to increase the absorption rate of CO2 and reduce the heat regeneration requirement.

Keywords: amine absorption process, blend MDEA/PZ, CO2 capture, molecular dynamic simulation, radial distribution function

Procedia PDF Downloads 263
474 Multimodal Biometric Cryptography Based Authentication in Cloud Environment to Enhance Information Security

Authors: D. Pugazhenthi, B. Sree Vidya

Abstract:

Cloud computing is one of the emerging technologies that enables end users to use the services of cloud on ‘pay per usage’ strategy. This technology grows in a fast pace and so is its security threat. One among the various services provided by cloud is storage. In this service, security plays a vital factor for both authenticating legitimate users and protection of information. This paper brings in efficient ways of authenticating users as well as securing information on the cloud. Initial phase proposed in this paper deals with an authentication technique using multi-factor and multi-dimensional authentication system with multi-level security. Unique identification and slow intrusive formulates an advanced reliability on user-behaviour based biometrics than conventional means of password authentication. By biometric systems, the accounts are accessed only by a legitimate user and not by a nonentity. The biometric templates employed here do not include single trait but multiple, viz., iris and finger prints. The coordinating stage of the authentication system functions on Ensemble Support Vector Machine (SVM) and optimization by assembling weights of base SVMs for SVM ensemble after individual SVM of ensemble is trained by the Artificial Fish Swarm Algorithm (AFSA). Thus it helps in generating a user-specific secure cryptographic key of the multimodal biometric template by fusion process. Data security problem is averted and enhanced security architecture is proposed using encryption and decryption system with double key cryptography based on Fuzzy Neural Network (FNN) for data storing and retrieval in cloud computing . The proposing scheme aims to protect the records from hackers by arresting the breaking of cipher text to original text. This improves the authentication performance that the proposed double cryptographic key scheme is capable of providing better user authentication and better security which distinguish between the genuine and fake users. Thus, there are three important modules in this proposed work such as 1) Feature extraction, 2) Multimodal biometric template generation and 3) Cryptographic key generation. The extraction of the feature and texture properties from the respective fingerprint and iris images has been done initially. Finally, with the help of fuzzy neural network and symmetric cryptography algorithm, the technique of double key encryption technique has been developed. As the proposed approach is based on neural networks, it has the advantage of not being decrypted by the hacker even though the data were hacked already. The results prove that authentication process is optimal and stored information is secured.

Keywords: artificial fish swarm algorithm (AFSA), biometric authentication, decryption, encryption, fingerprint, fusion, fuzzy neural network (FNN), iris, multi-modal, support vector machine classification

Procedia PDF Downloads 227
473 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 216
472 Experimental Modeling of Spray and Water Sheet Formation Due to Wave Interactions with Vertical and Slant Bow-Shaped Model

Authors: Armin Bodaghkhani, Bruce Colbourne, Yuri S. Muzychka

Abstract:

The process of spray-cloud formation and flow kinematics produced from breaking wave impact on vertical and slant lab-scale bow-shaped models were experimentally investigated. Bubble Image Velocimetry (BIV) and Image Processing (IP) techniques were applied to study the various types of wave-model impacts. Different wave characteristics were generated in a tow tank to investigate the effects of wave characteristics, such as wave phase velocity, wave steepness on droplet velocities, and behavior of the process of spray cloud formation. The phase ensemble-averaged vertical velocity and turbulent intensity were computed. A high-speed camera and diffused LED backlights were utilized to capture images for further post processing. Various pressure sensors and capacitive wave probes were used to measure the wave impact pressure and the free surface profile at different locations of the model and wave-tank, respectively. Droplet sizes and velocities were measured using BIV and IP techniques to trace bubbles and droplets in order to measure their velocities and sizes by correlating the texture in these images. The impact pressure and droplet size distributions were compared to several previously experimental models, and satisfactory agreements were achieved. The distribution of droplets in front of both models are demonstrated. Due to the highly transient process of spray formation, the drag coefficient for several stages of this transient displacement for various droplet size ranges and different Reynolds number were calculated based on the ensemble average method. From the experimental results, the slant model produces less spray in comparison with the vertical model, and the droplet velocities generated from the wave impact with the slant model have a lower velocity as compared with the vertical model.

Keywords: spray charachteristics, droplet size and velocity, wave-body interactions, bubble image velocimetry, image processing

Procedia PDF Downloads 277
471 Decision Support System in Air Pollution Using Data Mining

Authors: E. Fathallahi Aghdam, V. Hosseini

Abstract:

Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.

Keywords: data mining, clustering, air pollution, crisp approach

Procedia PDF Downloads 401
470 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: bioinformatics, cancer motif, DNA, k-mers, Levenshtein distance, SOM

Procedia PDF Downloads 158
469 Solar Power Forecasting for the Bidding Zones of the Italian Electricity Market with an Analog Ensemble Approach

Authors: Elena Collino, Dario A. Ronzio, Goffredo Decimi, Maurizio Riva

Abstract:

The rapid increase of renewable energy in Italy is led by wind and solar installations. The 2017 Italian energy strategy foresees a further development of these sustainable technologies, especially solar. This fact has resulted in new opportunities, challenges, and different problems to deal with. The growth of renewables allows to meet the European requirements regarding energy and environmental policy, but these types of sources are difficult to manage because they are intermittent and non-programmable. Operationally, these characteristics can lead to instability on the voltage profile and increasing uncertainty on energy reserve scheduling. The increasing renewable production must be considered with more and more attention especially by the Transmission System Operator (TSO). The TSO, in fact, every day provides orders on energy dispatch, once the market outcome has been determined, on extended areas, defined mainly on the basis of power transmission limitations. In Italy, six market zone are defined: Northern-Italy, Central-Northern Italy, Central-Southern Italy, Southern Italy, Sardinia, and Sicily. An accurate hourly renewable power forecasting for the day-ahead on these extended areas brings an improvement both in terms of dispatching and reserve management. In this study, an operational forecasting tool of the hourly solar output for the six Italian market zones is presented, and the performance is analysed. The implementation is carried out by means of a numerical weather prediction model, coupled with a statistical post-processing in order to derive the power forecast on the basis of the meteorological projection. The weather forecast is obtained from the limited area model RAMS on the Italian territory, initialized with IFS-ECMWF boundary conditions. The post-processing calculates the solar power production with the Analog Ensemble technique (AN). This statistical approach forecasts the production using a probability distribution of the measured production registered in the past when the weather scenario looked very similar to the forecasted one. The similarity is evaluated for the components of the solar radiation: global (GHI), diffuse (DIF) and direct normal (DNI) irradiation, together with the corresponding azimuth and zenith solar angles. These are, in fact, the main factors that affect the solar production. Considering that the AN performance is strictly related to the length and quality of the historical data a training period of more than one year has been used. The training set is made by historical Numerical Weather Prediction (NWP) forecasts at 12 UTC for the GHI, DIF and DNI variables over the Italian territory together with corresponding hourly measured production for each of the six zones. The AN technique makes it possible to estimate the aggregate solar production in the area, without information about the technologic characteristics of the all solar parks present in each area. Besides, this information is often only partially available. Every day, the hourly solar power forecast for the six Italian market zones is made publicly available through a website.

Keywords: analog ensemble, electricity market, PV forecast, solar energy

Procedia PDF Downloads 124
468 Genetic Variation among the Wild and Hatchery Raised Populations of Labeo rohita Revealed by RAPD Markers

Authors: Fayyaz Rasool, Shakeela Parveen

Abstract:

The studies on genetic diversity of Labeo rohita by using molecular markers were carried out to investigate the genetic structure by RAPAD marker and the levels of polymorphism and similarity amongst the different groups of five populations of wild and farmed types. The samples were collected from different five locations as representatives of wild and hatchery raised populations. RAPAD data for Jaccard’s coefficient by following the un-weighted Pair Group Method with Arithmetic Mean (UPGMA) for Hierarchical Clustering of the similar groups on the basis of similarity amongst the genotypes and the dendrogram generated divided the randomly selected individuals of the five populations into three classes/clusters. The variance decomposition for the optimal classification values remained as 52.11% for within class variation, while 47.89% for the between class differences. The Principal Component Analysis (PCA) for grouping of the different genotypes from the different environmental conditions was done by Spearman Varimax rotation method for bi-plot generation of the co-occurrence of the same genotypes with similar genetic properties and specificity of different primers indicated clearly that the increase in the number of factors or components was correlated with the decrease in eigenvalues. The Kaiser Criterion based upon the eigenvalues greater than one, first two main factors accounted for 58.177% of cumulative variability.

Keywords: variation, clustering, PCA, wild, hatchery, RAPAD, Labeo rohita

Procedia PDF Downloads 415
467 Molecular Survey and Genetic Diversity of Bartonella henselae Strains Infecting Stray Cats from Algeria

Authors: Naouelle Azzag, Nadia Haddad, Benoit Durand, Elisabeth Petit, Ali Ammouche, Bruno Chomel, Henri J. Boulouis

Abstract:

Bartonella henselae is a small, gram negative, arthropod-borne bacterium that has been shown to cause multiple clinical manifestations in humans including cat scratch disease, bacillary angiomatosis, endocarditis, and bacteremia. In this research, we report the results of a cross sectional study of Bartonella henselae bacteremia in stray cats from Algiers. Whole blood of 227 stray cats from Algiers was tested for the presence of Bartonella species by culture and for the evaluation of the genetic diversity of B. henselae strains by multi-locus variable number of tandem repeats assay (MLVA). Bacteremia prevalence was 17% and only B. henselae was identified. Type I was the predominant type (64%). MLVA typing of 259 strains from 30 bacteremic cats revealed 52 different profiles. 51 of these profiles were specific to Algerian cats/identified for the first time. 20/30 cats (67%) harbored 2 to 7 MLVA profiles simultaneously. The similarity of MLVA profiles obtained from the same cat, neighbor-joining clustering and structure-neighbor clustering showed that such a diversity likely results from two different mechanisms occurring either independently or simultaneously independent infections and genetic drift from a primary strain.

Keywords: Bartonella, cat, MLVA, genetic

Procedia PDF Downloads 120
466 Unsupervised Detection of Burned Area from Remote Sensing Images Using Spatial Correlation and Fuzzy Clustering

Authors: Tauqir A. Moughal, Fusheng Yu, Abeer Mazher

Abstract:

Land-cover and land-use change information are important because of their practical uses in various applications, including deforestation, damage assessment, disasters monitoring, urban expansion, planning, and land management. Therefore, developing change detection methods for remote sensing images is an important ongoing research agenda. However, detection of change through optical remote sensing images is not a trivial task due to many factors including the vagueness between the boundaries of changed and unchanged regions and spatial dependence of the pixels to its neighborhood. In this paper, we propose a binary change detection technique for bi-temporal optical remote sensing images. As in most of the optical remote sensing images, the transition between the two clusters (change and no change) is overlapping and the existing methods are incapable of providing the accurate cluster boundaries. In this regard, a methodology has been proposed which uses the fuzzy c-means clustering to tackle the problem of vagueness in the changed and unchanged class by formulating the soft boundaries between them. Furthermore, in order to exploit the neighborhood information of the pixels, the input patterns are generated corresponding to each pixel from bi-temporal images using 3×3, 5×5 and 7×7 window. The between images and within image spatial dependence of the pixels to its neighborhood is quantified by using Pearson product moment correlation and Moran’s I statistics, respectively. The proposed technique consists of two phases. At first, between images and within image spatial correlation is calculated to utilize the information that the pixels at different locations may not be independent. Second, fuzzy c-means technique is used to produce two clusters from input feature by not only taking care of vagueness between the changed and unchanged class but also by exploiting the spatial correlation of the pixels. To show the effectiveness of the proposed technique, experiments are conducted on multispectral and bi-temporal remote sensing images. A subset (2100×1212 pixels) of a pan-sharpened, bi-temporal Landsat 5 thematic mapper optical image of Los Angeles, California, is used in this study which shows a long period of the forest fire continued from July until October 2009. Early forest fire and later forest fire optical remote sensing images were acquired on July 5, 2009 and October 25, 2009, respectively. The proposed technique is used to detect the fire (which causes change on earth’s surface) and compared with the existing K-means clustering technique. Experimental results showed that proposed technique performs better than the already existing technique. The proposed technique can be easily extendable for optical hyperspectral images and is suitable for many practical applications.

Keywords: burned area, change detection, correlation, fuzzy clustering, optical remote sensing

Procedia PDF Downloads 139
465 Image Segmentation Techniques: Review

Authors: Lindani Mbatha, Suvendi Rimer, Mpho Gololo

Abstract:

Image segmentation is the process of dividing an image into several sections, such as the object's background and the foreground. It is a critical technique in both image-processing tasks and computer vision. Most of the image segmentation algorithms have been developed for gray-scale images and little research and algorithms have been developed for the color images. Most image segmentation algorithms or techniques vary based on the input data and the application. Nearly all of the techniques are not suitable for noisy environments. Most of the work that has been done uses the Markov Random Field (MRF), which involves the computations and is said to be robust to noise. In the past recent years' image segmentation has been brought to tackle problems such as easy processing of an image, interpretation of the contents of an image, and easy analysing of an image. This article reviews and summarizes some of the image segmentation techniques and algorithms that have been developed in the past years. The techniques include neural networks (CNN), edge-based techniques, region growing, clustering, and thresholding techniques and so on. The advantages and disadvantages of medical ultrasound image segmentation techniques are also discussed. The article also addresses the applications and potential future developments that can be done around image segmentation. This review article concludes with the fact that no technique is perfectly suitable for the segmentation of all different types of images, but the use of hybrid techniques yields more accurate and efficient results.

Keywords: clustering-based, convolution-network, edge-based, region-growing

Procedia PDF Downloads 55
464 FracXpert: Ensemble Machine Learning Approach for Localization and Classification of Bone Fractures in Cricket Athletes

Authors: Madushani Rodrigo, Banuka Athuraliya

Abstract:

In today's world of medical diagnosis and prediction, machine learning stands out as a strong tool, transforming old ways of caring for health. This study analyzes the use of machine learning in the specialized domain of sports medicine, with a focus on the timely and accurate detection of bone fractures in cricket athletes. Failure to identify bone fractures in real time can result in malunion or non-union conditions. To ensure proper treatment and enhance the bone healing process, accurately identifying fracture locations and types is necessary. When interpreting X-ray images, it relies on the expertise and experience of medical professionals in the identification process. Sometimes, radiographic images are of low quality, leading to potential issues. Therefore, it is necessary to have a proper approach to accurately localize and classify fractures in real time. The research has revealed that the optimal approach needs to address the stated problem and employ appropriate radiographic image processing techniques and object detection algorithms. These algorithms should effectively localize and accurately classify all types of fractures with high precision and in a timely manner. In order to overcome the challenges of misidentifying fractures, a distinct model for fracture localization and classification has been implemented. The research also incorporates radiographic image enhancement and preprocessing techniques to overcome the limitations posed by low-quality images. A classification ensemble model has been implemented using ResNet18 and VGG16. In parallel, a fracture segmentation model has been implemented using the enhanced U-Net architecture. Combining the results of these two implemented models, the FracXpert system can accurately localize exact fracture locations along with fracture types from the available 12 different types of fracture patterns, which include avulsion, comminuted, compressed, dislocation, greenstick, hairline, impacted, intraarticular, longitudinal, oblique, pathological, and spiral. This system will generate a confidence score level indicating the degree of confidence in the predicted result. Using ResNet18 and VGG16 architectures, the implemented fracture segmentation model, based on the U-Net architecture, achieved a high accuracy level of 99.94%, demonstrating its precision in identifying fracture locations. Simultaneously, the classification ensemble model achieved an accuracy of 81.0%, showcasing its ability to categorize various fracture patterns, which is instrumental in the fracture treatment process. In conclusion, FracXpert has become a promising ML application in sports medicine, demonstrating its potential to revolutionize fracture detection processes. By leveraging the power of ML algorithms, this study contributes to the advancement of diagnostic capabilities in cricket athlete healthcare, ensuring timely and accurate identification of bone fractures for the best treatment outcomes.

Keywords: multiclass classification, object detection, ResNet18, U-Net, VGG16

Procedia PDF Downloads 32
463 Tree-Based Inference for Regionalization: A Comparative Study of Global Topological Perturbation Methods

Authors: Orhun Aydin, Mark V. Janikas, Rodrigo Alves, Renato Assuncao

Abstract:

In this paper, a tree-based perturbation methodology for regionalization inference is presented. Regionalization is a constrained optimization problem that aims to create groups with similar attributes while satisfying spatial contiguity constraints. Similar to any constrained optimization problem, the spatial constraint may hinder convergence to some global minima, resulting in spatially contiguous members of a group with dissimilar attributes. This paper presents a general methodology for rigorously perturbing spatial constraints through the use of random spanning trees. The general framework presented can be used to quantify the effect of the spatial constraints in the overall regionalization result. We compare several types of stochastic spanning trees used in inference problems such as fuzzy regionalization and determining the number of regions. Performance of stochastic spanning trees is juxtaposed against the traditional permutation-based hypothesis testing frequently used in spatial statistics. Inference results for fuzzy regionalization and determining the number of regions is presented on the Local Area Personal Incomes for Texas Counties provided by the Bureau of Economic Analysis.

Keywords: regionalization, constrained clustering, probabilistic inference, fuzzy clustering

Procedia PDF Downloads 191
462 Resistance and Sub-Resistances of RC Beams Subjected to Multiple Failure Modes

Authors: F. Sangiorgio, J. Silfwerbrand, G. Mancini

Abstract:

Geometric and mechanical properties all influence the resistance of RC structures and may, in certain combination of property values, increase the risk of a brittle failure of the whole system. This paper presents a statistical and probabilistic investigation on the resistance of RC beams designed according to Eurocodes 2 and 8, and subjected to multiple failure modes, under both the natural variation of material properties and the uncertainty associated with cross-section and transverse reinforcement geometry. A full probabilistic model based on JCSS Probabilistic Model Code is derived. Different beams are studied through material nonlinear analysis via Monte Carlo simulations. The resistance model is consistent with Eurocode 2. Both a multivariate statistical evaluation and the data clustering analysis of outcomes are then performed. Results show that the ultimate load behaviour of RC beams subjected to flexural and shear failure modes seems to be mainly influenced by the combination of the mechanical properties of both longitudinal reinforcement and stirrups, and the tensile strength of concrete, of which the latter appears to affect the overall response of the system in a nonlinear way. The model uncertainty of the resistance model used in the analysis plays undoubtedly an important role in interpreting results.

Keywords: modelling, Monte Carlo simulations, probabilistic models, data clustering, reinforced concrete members, structural design

Procedia PDF Downloads 446
461 A Single-Channel BSS-Based Method for Structural Health Monitoring of Civil Infrastructure under Environmental Variations

Authors: Yanjie Zhu, André Jesus, Irwanda Laory

Abstract:

Structural Health Monitoring (SHM), involving data acquisition, data interpretation and decision-making system aim to continuously monitor the structural performance of civil infrastructures under various in-service circumstances. The main value and purpose of SHM is identifying damages through data interpretation system. Research on SHM has been expanded in the last decades and a large volume of data is recorded every day owing to the dramatic development in sensor techniques and certain progress in signal processing techniques. However, efficient and reliable data interpretation for damage detection under environmental variations is still a big challenge. Structural damages might be masked because variations in measured data can be the result of environmental variations. This research reports a novel method based on single-channel Blind Signal Separation (BSS), which extracts environmental effects from measured data directly without any prior knowledge of the structure loading and environmental conditions. Despite the successful application in audio processing and bio-medical research fields, BSS has never been used to detect damage under varying environmental conditions. This proposed method optimizes and combines Ensemble Empirical Mode Decomposition (EEMD), Principal Component Analysis (PCA) and Independent Component Analysis (ICA) together to separate structural responses due to different loading conditions respectively from a single channel input signal. The ICA is applying on dimension-reduced output of EEMD. Numerical simulation of a truss bridge, inspired from New Joban Line Arakawa Railway Bridge, is used to validate this method. All results demonstrate that the single-channel BSS-based method can recover temperature effects from mixed structural response recorded by a single sensor with a convincing accuracy. This will be the foundation of further research on direct damage detection under varying environment.

Keywords: damage detection, ensemble empirical mode decomposition (EEMD), environmental variations, independent component analysis (ICA), principal component analysis (PCA), structural health monitoring (SHM)

Procedia PDF Downloads 280
460 Embedded Hybrid Intuition: A Deep Learning and Fuzzy Logic Approach to Collective Creation and Computational Assisted Narratives

Authors: Roberto Cabezas H

Abstract:

The current work shows the methodology developed to create narrative lighting spaces for the multimedia performance piece 'cluster: the vanished paradise.' This empirical research is focused on exploring unconventional roles for machines in subjective creative processes, by delving into the semantics of data and machine intelligence algorithms in hybrid technological, creative contexts to expand epistemic domains trough human-machine cooperation. The creative process in scenic and performing arts is guided mostly by intuition; from that idea, we developed an approach to embed collective intuition in computational creative systems, by joining the properties of Generative Adversarial Networks (GAN’s) and Fuzzy Clustering based on a semi-supervised data creation and analysis pipeline. The model makes use of GAN’s to learn from phenomenological data (data generated from experience with lighting scenography) and algorithmic design data (augmented data by procedural design methods), fuzzy logic clustering is then applied to artificially created data from GAN’s to define narrative transitions built on membership index; this process allowed for the creation of simple and complex spaces with expressive capabilities based on position and light intensity as the parameters to guide the narrative. Hybridization comes not only from the human-machine symbiosis but also on the integration of different techniques for the implementation of the aided design system. Machine intelligence tools as proposed in this work are well suited to redefine collaborative creation by learning to express and expand a conglomerate of ideas and a wide range of opinions for the creation of sensory experiences. We found in GAN’s and Fuzzy Logic an ideal tool to develop new computational models based on interaction, learning, emotion and imagination to expand the traditional algorithmic model of computation.

Keywords: fuzzy clustering, generative adversarial networks, human-machine cooperation, hybrid collective data, multimedia performance

Procedia PDF Downloads 115