Search results for: constrained clustering
718 AM/E/c Queuing Hub Maximal Covering Location Model with Fuzzy Parameter
Authors: M. H. Fazel Zarandi, N. Moshahedi
Abstract:
The hub location problem appears in a variety of applications such as medical centers, firefighting facilities, cargo delivery systems and telecommunication network design. The location of service centers has a strong influence on the congestion at each of them, and, consequently, on the quality of service. This paper presents a fuzzy maximal hub covering location problem (FMCHLP) in which travel costs between any pair of nodes is considered as a fuzzy variable. In order to consider the quality of service, we model each hub as a queue. Arrival rate follows Poisson distribution and service rate follows Erlang distribution. In this paper, at first, a nonlinear mathematical programming model is presented. Then, we convert it to the linear one. We solved the linear model using GAMS software up to 25 nodes and for large sizes due to the complexity of hub covering location problems, and simulated annealing algorithm is developed to solve and test the model. Also, we used possibilistic c-means clustering method in order to find an initial solution.Keywords: fuzzy modeling, location, possibilistic clustering, queuing
Procedia PDF Downloads 396717 Summarizing Data Sets for Data Mining by Using Statistical Methods in Coastal Engineering
Authors: Yunus Doğan, Ahmet Durap
Abstract:
Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.Keywords: clustering algorithms, coastal engineering, data mining, data summarization, statistical methods
Procedia PDF Downloads 361716 Observationally Constrained Estimates of Aerosol Indirect Radiative Forcing over Indian Ocean
Authors: Sofiya Rao, Sagnik Dey
Abstract:
Aerosol-cloud-precipitation interaction continues to be one of the largest sources of uncertainty in quantifying the aerosol climate forcing. The uncertainty is increasing from global to regional scale. This problem remains unresolved due to the large discrepancy in the representation of cloud processes in the climate models. Most of the studies on aerosol-cloud-climate interaction and aerosol-cloud-precipitation over Indian Ocean (like INDOEX, CAIPEEX campaign etc.) are restricted to either particular to one season or particular to one region. Here we developed a theoretical framework to quantify aerosol indirect radiative forcing using Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol and cloud products of 15 years (2000-2015) period over the Indian Ocean. This framework relies on the observationally constrained estimate of the aerosol-induced change in cloud albedo. We partitioned the change in cloud albedo into the change in Liquid Water Path (LWP) and Effective Radius of Clouds (Reff) in response to an aerosol optical depth (AOD). Cloud albedo response to an increase in AOD is most sensitive in the range of LWP between 120-300 gm/m² for a range of Reff varying from 8-24 micrometer, which means aerosols are most sensitive to this range of LWP and Reff. Using this framework, aerosol forcing during a transition from indirect to semi-direct effect is also calculated. The outcome of this analysis shows best results over the Arabian Sea in comparison with the Bay of Bengal and the South Indian Ocean because of heterogeneity in aerosol spices over the Arabian Sea. Over the Arabian Sea during Winter Season the more absorbing aerosols are dominating, during Pre-monsoon dust (coarse mode aerosol particles) are more dominating. In winter and pre-monsoon majorly the aerosol forcing is more dominating while during monsoon and post-monsoon season meteorological forcing is more dominating. Over the South Indian Ocean, more or less same types of aerosol (Sea salt) are present. Over the Arabian Sea the Aerosol Indirect Radiative forcing are varying from -5 ± 4.5 W/m² for winter season while in other seasons it is reducing. The results provide observationally constrained estimates of aerosol indirect forcing in the Indian Ocean which can be helpful in evaluating the climate model performance in the context of such complex interactions.Keywords: aerosol-cloud-precipitation interaction, aerosol-cloud-climate interaction, indirect radiative forcing, climate model
Procedia PDF Downloads 178715 Efficient Principal Components Estimation of Large Factor Models
Authors: Rachida Ouysse
Abstract:
This paper proposes a constrained principal components (CnPC) estimator for efficient estimation of large-dimensional factor models when errors are cross sectionally correlated and the number of cross-sections (N) may be larger than the number of observations (T). Although principal components (PC) method is consistent for any path of the panel dimensions, it is inefficient as the errors are treated to be homoskedastic and uncorrelated. The new CnPC exploits the assumption of bounded cross-sectional dependence, which defines Chamberlain and Rothschild’s (1983) approximate factor structure, as an explicit constraint and solves a constrained PC problem. The CnPC method is computationally equivalent to the PC method applied to a regularized form of the data covariance matrix. Unlike maximum likelihood type methods, the CnPC method does not require inverting a large covariance matrix and thus is valid for panels with N ≥ T. The paper derives a convergence rate and an asymptotic normality result for the CnPC estimators of the common factors. We provide feasible estimators and show in a simulation study that they are more accurate than the PC estimator, especially for panels with N larger than T, and the generalized PC type estimators, especially for panels with N almost as large as T.Keywords: high dimensionality, unknown factors, principal components, cross-sectional correlation, shrinkage regression, regularization, pseudo-out-of-sample forecasting
Procedia PDF Downloads 150714 Time-Series Load Data Analysis for User Power Profiling
Authors: Mahdi Daghmhehci Firoozjaei, Minchang Kim, Dima Alhadidi
Abstract:
In this paper, we present a power profiling model for smart grid consumers based on real time load data acquired smart meters. It profiles consumers’ power consumption behaviour using the dynamic time warping (DTW) clustering algorithm. Due to the invariability of signal warping of this algorithm, time-disordered load data can be profiled and consumption features be extracted. Two load types are defined and the related load patterns are extracted for classifying consumption behaviour by DTW. The classification methodology is discussed in detail. To evaluate the performance of the method, we analyze the time-series load data measured by a smart meter in a real case. The results verify the effectiveness of the proposed profiling method with 90.91% true positive rate for load type clustering in the best case.Keywords: power profiling, user privacy, dynamic time warping, smart grid
Procedia PDF Downloads 155713 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks
Authors: Paul Shize Li, Frank Alber
Abstract:
A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation
Procedia PDF Downloads 169712 An Integrated Label Propagation Network for Structural Condition Assessment
Authors: Qingsong Xiong, Cheng Yuan, Qingzhao Kong, Haibei Xiong
Abstract:
Deep-learning-driven approaches based on vibration responses have attracted larger attention in rapid structural condition assessment while obtaining sufficient measured training data with corresponding labels is relevantly costly and even inaccessible in practical engineering. This study proposes an integrated label propagation network for structural condition assessment, which is able to diffuse the labels from continuously-generating measurements by intact structure to those of missing labels of damage scenarios. The integrated network is embedded with damage-sensitive features extraction by deep autoencoder and pseudo-labels propagation by optimized fuzzy clustering, the architecture and mechanism which are elaborated. With a sophisticated network design and specified strategies for improving performance, the present network achieves to extends the superiority of self-supervised representation learning, unsupervised fuzzy clustering and supervised classification algorithms into an integration aiming at assessing damage conditions. Both numerical simulations and full-scale laboratory shaking table tests of a two-story building structure were conducted to validate its capability of detecting post-earthquake damage. The identifying accuracy of a present network was 0.95 in numerical validations and an average 0.86 in laboratory case studies, respectively. It should be noted that the whole training procedure of all involved models in the network stringently doesn’t rely upon any labeled data of damage scenarios but only several samples of intact structure, which indicates a significant superiority in model adaptability and feasible applicability in practice.Keywords: autoencoder, condition assessment, fuzzy clustering, label propagation
Procedia PDF Downloads 98711 Automatic Detection of Proliferative Cells in Immunohistochemically Images of Meningioma Using Fuzzy C-Means Clustering and HSV Color Space
Authors: Vahid Anari, Mina Bakhshi
Abstract:
Visual search and identification of immunohistochemically stained tissue of meningioma was performed manually in pathologic laboratories to detect and diagnose the cancers type of meningioma. This task is very tedious and time-consuming. Moreover, because of cell's complex nature, it still remains a challenging task to segment cells from its background and analyze them automatically. In this paper, we develop and test a computerized scheme that can automatically identify cells in microscopic images of meningioma and classify them into positive (proliferative) and negative (normal) cells. Dataset including 150 images are used to test the scheme. The scheme uses Fuzzy C-means algorithm as a color clustering method based on perceptually uniform hue, saturation, value (HSV) color space. Since the cells are distinguishable by the human eye, the accuracy and stability of the algorithm are quantitatively compared through application to a wide variety of real images.Keywords: positive cell, color segmentation, HSV color space, immunohistochemistry, meningioma, thresholding, fuzzy c-means
Procedia PDF Downloads 211710 A Comprehensive Study and Evaluation on Image Fashion Features Extraction
Authors: Yuanchao Sang, Zhihao Gong, Longsheng Chen, Long Chen
Abstract:
Clothing fashion represents a human’s aesthetic appreciation towards everyday outfits and appetite for fashion, and it reflects the development of status in society, humanity, and economics. However, modelling fashion by machine is extremely challenging because fashion is too abstract to be efficiently described by machines. Even human beings can hardly reach a consensus about fashion. In this paper, we are dedicated to answering a fundamental fashion-related problem: what image feature best describes clothing fashion? To address this issue, we have designed and evaluated various image features, ranging from traditional low-level hand-crafted features to mid-level style awareness features to various current popular deep neural network-based features, which have shown state-of-the-art performance in various vision tasks. In summary, we tested the following 9 feature representations: color, texture, shape, style, convolutional neural networks (CNNs), CNNs with distance metric learning (CNNs&DML), AutoEncoder, CNNs with multiple layer combination (CNNs&MLC) and CNNs with dynamic feature clustering (CNNs&DFC). Finally, we validated the performance of these features on two publicly available datasets. Quantitative and qualitative experimental results on both intra-domain and inter-domain fashion clothing image retrieval showed that deep learning based feature representations far outweigh traditional hand-crafted feature representation. Additionally, among all deep learning based methods, CNNs with explicit feature clustering performs best, which shows feature clustering is essential for discriminative fashion feature representation.Keywords: convolutional neural network, feature representation, image processing, machine modelling
Procedia PDF Downloads 141709 The European Research and Development Project Improved Nuclear Site Characterization for Waste Minimization in Decommissioning under Constrained Environment: Focus on Performance Analysis and Overall Uncertainty
Authors: M. Crozet, D. Roudil, T. Branger, S. Boden, P. Peerani, B. Russell, M. Herranz, L. Aldave de la Heras
Abstract:
The EURATOM work program project INSIDER (Improved Nuclear Site Characterization for Waste minimization in Decommissioning under Constrained Environment) was launched in June 2017. This 4-year project has 18 partners and aims at improving the management of contaminated materials arising from decommissioning and dismantling (D&D) operations by proposing an integrated methodology of characterization. This methodology is based on advanced statistical processing and modelling, coupled with adapted and innovative analytical and measurement methods, with respect to sustainability and economic objectives. In order to achieve these objectives, the approaches will be then applied to common case studies in the form of Inter-laboratory comparisons on matrix representative reference samples and benchmarking. Work Package 6 (WP6) ‘Performance analysis and overall uncertainty’ is in charge of the analysis of the benchmarking on real samples, the organisation of inter-laboratory comparison on synthetic certified reference materials and the establishment of overall uncertainty budget. Assessment of the outcome will be used for providing recommendations and guidance resulting in pre-standardization tests.Keywords: decommissioning, sampling strategy, research and development, characterization, European project
Procedia PDF Downloads 365708 Artificial Intelligent Methodology for Liquid Propellant Engine Design Optimization
Authors: Hassan Naseh, Javad Roozgard
Abstract:
This paper represents the methodology based on Artificial Intelligent (AI) applied to Liquid Propellant Engine (LPE) optimization. The AI methodology utilized from Adaptive neural Fuzzy Inference System (ANFIS). In this methodology, the optimum objective function means to achieve maximum performance (specific impulse). The independent design variables in ANFIS modeling are combustion chamber pressure and temperature and oxidizer to fuel ratio and output of this modeling are specific impulse that can be applied with other objective functions in LPE design optimization. To this end, the LPE’s parameter has been modeled in ANFIS methodology based on generating fuzzy inference system structure by using grid partitioning, subtractive clustering and Fuzzy C-Means (FCM) clustering for both inferences (Mamdani and Sugeno) and various types of membership functions. The final comparing optimization results shown accuracy and processing run time of the Gaussian ANFIS Methodology between all methods.Keywords: ANFIS methodology, artificial intelligent, liquid propellant engine, optimization
Procedia PDF Downloads 590707 Clustering Using Cooperative Multihop Mini-Groups in Wireless Sensor Network: A Novel Approach
Authors: Virender Ranga, Mayank Dave, Anil Kumar Verma
Abstract:
Recently wireless sensor networks (WSNs) are used in many real life applications like environmental monitoring, habitat monitoring, health monitoring etc. Due to power constraint cheaper devices used in these applications, the energy consumption of each device should be kept as low as possible such that network operates for longer period of time. One of the techniques to prolong the network lifetime is an intelligent grouping of sensor nodes such that they can perform their operation in cooperative and energy efficient manner. With this motivation, we propose a novel approach by organize the sensor nodes in cooperative multihop mini-groups so that the total global energy consumption of the network can be reduced and network lifetime can be improved. Our proposed approach also reduces the number of transmitted messages inside the WSNs, which further minimizes the energy consumption of the whole network. The experimental simulations show that our proposed approach outperforms over the state-of-the-art approach in terms of stability period and aggregated data.Keywords: clustering, cluster-head, mini-group, stability period
Procedia PDF Downloads 358706 Clustering-Based Threshold Model for Condition Rating of Concrete Bridge Decks
Authors: M. Alsharqawi, T. Zayed, S. Abu Dabous
Abstract:
To ensure safety and serviceability of bridge infrastructure, accurate condition assessment and rating methods are needed to provide basis for bridge Maintenance, Repair and Replacement (MRR) decisions. In North America, the common practices to assess condition of bridges are through visual inspection. These practices are limited to detect surface defects and external flaws. Further, the thresholds that define the severity of bridge deterioration are selected arbitrarily. The current research discusses the main deteriorations and defects identified during visual inspection and Non-Destructive Evaluation (NDE). NDE techniques are becoming popular in augmenting the visual examination during inspection to detect subsurface defects. Quality inspection data and accurate condition assessment and rating are the basis for determining appropriate MRR decisions. Thus, in this paper, a novel method for bridge condition assessment using the Quality Function Deployment (QFD) theory is utilized. The QFD model is designed to provide an integrated condition by evaluating both the surface and subsurface defects for concrete bridges. Moreover, an integrated condition rating index with four thresholds is developed based on the QFD condition assessment model and using K-means clustering technique. Twenty case studies are analyzed by applying the QFD model and implementing the developed rating index. The results from the analyzed case studies show that the proposed threshold model produces robust MRR recommendations consistent with decisions and recommendations made by bridge managers on these projects. The proposed method is expected to advance the state of the art of bridges condition assessment and rating.Keywords: concrete bridge decks, condition assessment and rating, quality function deployment, k-means clustering technique
Procedia PDF Downloads 225705 Fault-Detection and Self-Stabilization Protocol for Wireless Sensor Networks
Authors: Ather Saeed, Arif Khan, Jeffrey Gosper
Abstract:
Sensor devices are prone to errors and sudden node failures, which are difficult to detect in a timely manner when deployed in real-time, hazardous, large-scale harsh environments and in medical emergencies. Therefore, the loss of data can be life-threatening when the sensed phenomenon is not disseminated due to sudden node failure, battery depletion or temporary malfunctioning. We introduce a set of partial differential equations for localizing faults, similar to Green’s and Maxwell’s equations used in Electrostatics and Electromagnetism. We introduce a node organization and clustering scheme for self-stabilizing sensor networks. Green’s theorem is applied to regions where the curve is closed and continuously differentiable to ensure network connectivity. Experimental results show that the proposed GTFD (Green’s Theorem fault-detection and Self-stabilization) protocol not only detects faulty nodes but also accurately generates network stability graphs where urgent intervention is required for dynamically self-stabilizing the network.Keywords: Green’s Theorem, self-stabilization, fault-localization, RSSI, WSN, clustering
Procedia PDF Downloads 77704 Design of Agricultural Machinery Factory Facility Layout
Authors: Nilda Tri Putri, Muhammad Taufik
Abstract:
Tools and agricultural machinery (Alsintan) is a tool used in agribusiness activities. Alsintan used to change the traditional farming systems generally use manual equipment into modern agriculture with mechanization. CV Nugraha Chakti Consultant make an action plan for industrial development Alsintan West Sumatra in 2012 to develop medium industries of Alsintan become a major industry of Alsintan, one of efforts made is increase the production capacity of the industry Alsintan. Production capacity for superior products as hydrotiller and threshers set each for 2.000 units per year. CV Citra Dragon as one of the medium industry alsintan in West Sumatra has a plan to relocate the existing plant to meet growing consumer demand each year. Increased production capacity and plant relocation plan has led to a change in the layout; therefore need to design the layout of the plant facility CV Citra Dragon. First step the to design of plant layout is design the layout of the production floor. The design of the production floor layout is done by applying group technology layout. The initial step is to do a machine grouping and part family using the Average Linkage Clustering (ALC) and Rank Order Clustering (ROC). Furthermore done independent work station design and layout design using the Modified Spanning Tree (MST). Alternative selection layout is done to select the best production floor layout between ALC and ROC cell grouping. Furthermore, to design the layout of warehouses, offices and other production support facilities. Activity Relationship Chart methods used to organize the placement of factory facilities has been designed. After structuring plan facilities, calculated cost manufacturing facility plant establishment. Type of layout is used on the production floor layout technology group. The production floor is composed of four cell machinery, assembly area and painting area. The total distance of the displacement of material in a single production amounted to 1120.16 m which means need 18,7minutes of transportation time for one time production. Alsintan Factory has designed a circular flow pattern with 11 facilities. The facilities were designed consisting of 10 rooms and 1 parking space. The measure of factory building is 84 m x 52 m.Keywords: Average Linkage Clustering (ALC), Rank Order Clustering (ROC), Modified Spanning Tree (MST), Activity Relationship Chart (ARC)
Procedia PDF Downloads 497703 Production Optimization under Geological Uncertainty Using Distance-Based Clustering
Authors: Byeongcheol Kang, Junyi Kim, Hyungsik Jung, Hyungjun Yang, Jaewoo An, Jonggeun Choe
Abstract:
It is important to figure out reservoir properties for better production management. Due to the limited information, there are geological uncertainties on very heterogeneous or channel reservoir. One of the solutions is to generate multiple equi-probable realizations using geostatistical methods. However, some models have wrong properties, which need to be excluded for simulation efficiency and reliability. We propose a novel method of model selection scheme, based on distance-based clustering for reliable application of production optimization algorithm. Distance is defined as a degree of dissimilarity between the data. We calculate Hausdorff distance to classify the models based on their similarity. Hausdorff distance is useful for shape matching of the reservoir models. We use multi-dimensional scaling (MDS) to describe the models on two dimensional space and group them by K-means clustering. Rather than simulating all models, we choose one representative model from each cluster and find out the best model, which has the similar production rates with the true values. From the process, we can select good reservoir models near the best model with high confidence. We make 100 channel reservoir models using single normal equation simulation (SNESIM). Since oil and gas prefer to flow through the sand facies, it is critical to characterize pattern and connectivity of the channels in the reservoir. After calculating Hausdorff distances and projecting the models by MDS, we can see that the models assemble depending on their channel patterns. These channel distributions affect operation controls of each production well so that the model selection scheme improves management optimization process. We use one of useful global search algorithms, particle swarm optimization (PSO), for our production optimization. PSO is good to find global optimum of objective function, but it takes too much time due to its usage of many particles and iterations. In addition, if we use multiple reservoir models, the simulation time for PSO will be soared. By using the proposed method, we can select good and reliable models that already matches production data. Considering geological uncertainty of the reservoir, we can get well-optimized production controls for maximum net present value. The proposed method shows one of novel solutions to select good cases among the various probabilities. The model selection schemes can be applied to not only production optimization but also history matching or other ensemble-based methods for efficient simulations.Keywords: distance-based clustering, geological uncertainty, particle swarm optimization (PSO), production optimization
Procedia PDF Downloads 144702 Segmental Dynamics of Poly(Alkyl Methacrylate) Chain in Ultra-Thin Spin-Cast Films
Authors: Hiroyuki Aoki
Abstract:
Polymeric materials are often used in a form of thin film such as food wrap and surface coating. In such the applications, polymer films thinner than 100 nm have been often used. The thickness of such the ultra-thin film is less than the unperturbed size of a polymer chain; therefore, the polymer chain in an ultra-thin film is strongly constrained. However, the details on the constrained dynamics of polymer molecules in ultra-thin films are still unclear. In the current study, the segmental dynamics of single polymer chain was directly investigated by fluorescence microscopy. The individual chains of poly(alkyl methacrylate) labeled by a perylenediimide dye molecule were observed by a highly sensitive fluorescence microscope in a defocus condition. The translational and rotational diffusion of the center segment in a single polymer chain was directly analyzed. The segmental motion in a thin film with a thickness of 10 nm was found to be suppressed compared to that in a bulk state. The detailed analysis of the molecular motion revealed that the diffusion rate of the in-plane rotation was similar to the thin film and the bulk; on the other hand, the out-of-plane motion was restricted in a thin film. This result indicates that the spatial restriction in an ultra-thin film thinner than the unperturbed chain dimension alters the dynamics of individual molecules in a polymer system.Keywords: polymer materials, single molecule, molecular motion, fluorescence microscopy, super-resolution techniques
Procedia PDF Downloads 318701 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis
Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa
Abstract:
Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM
Procedia PDF Downloads 454700 Unseen Classes: The Paradigm Shift in Machine Learning
Authors: Vani Singhal, Jitendra Parmar, Satyendra Singh Chouhan
Abstract:
Unseen class discovery has now become an important part of a machine-learning algorithm to judge new classes. Unseen classes are the classes on which the machine learning model is not trained on. With the advancement in technology and AI replacing humans, the amount of data has increased to the next level. So while implementing a model on real-world examples, we come across unseen new classes. Our aim is to find the number of unseen classes by using a hierarchical-based active learning algorithm. The algorithm is based on hierarchical clustering as well as active sampling. The number of clusters that we will get in the end will give the number of unseen classes. The total clusters will also contain some clusters that have unseen classes. Instead of first discovering unseen classes and then finding their number, we directly calculated the number by applying the algorithm. The dataset used is for intent classification. The target data is the intent of the corresponding query. We conclude that when the machine learning model will encounter real-world data, it will automatically find the number of unseen classes. In the future, our next work would be to label these unseen classes correctly.Keywords: active sampling, hierarchical clustering, open world learning, unseen class discovery
Procedia PDF Downloads 173699 Study for an Optimal Cable Connection within an Inner Grid of an Offshore Wind Farm
Authors: Je-Seok Shin, Wook-Won Kim, Jin-O Kim
Abstract:
The offshore wind farm needs to be designed carefully considering economics and reliability aspects. There are many decision-making problems for designing entire offshore wind farm, this paper focuses on an inner grid layout which means the connection between wind turbines as well as between wind turbines and an offshore substation. A methodology proposed in this paper determines the connections and the cable type for each connection section using K-clustering, minimum spanning tree and cable selection algorithms. And then, a cost evaluation is performed in terms of investment, power loss and reliability. Through the cost evaluation, an optimal layout of inner grid is determined so as to have the lowest total cost. In order to demonstrate the validity of the methodology, the case study is conducted on 240MW offshore wind farm, and the results show that it is helpful to design optimally offshore wind farm.Keywords: offshore wind farm, optimal layout, k-clustering algorithm, minimum spanning algorithm, cable type selection, power loss cost, reliability cost
Procedia PDF Downloads 386698 Path Planning for Unmanned Aerial Vehicles in Constrained Environments for Locust Elimination
Authors: Aadiv Shah, Hari Nair, Vedant Mittal, Alice Cheeran
Abstract:
Present-day agricultural practices such as blanket spraying not only lead to excessive usage of pesticides but also harm the overall crop yield. This paper introduces an algorithm to optimize the traversal of an unmanned aerial vehicle (UAV) in constrained environments. The proposed system focuses on the agricultural application of targeted spraying for locust elimination. Given a satellite image of a farm, target zones that are prone to locust swarm formation are detected through the calculation of the normalized difference vegetation index (NDVI). This is followed by determining the optimal path for traversal of a UAV through these target zones using the proposed algorithm in order to perform pesticide spraying in the most efficient manner possible. Unlike the classic travelling salesman problem involving point-to-point optimization, the proposed algorithm determines an optimal path for multiple regions, independent of its geometry. Finally, the paper explores the idea of implementing reinforcement learning to model complex environmental behaviour and make the path planning mechanism for UAVs agnostic to external environment changes. This system not only presents a solution to the enormous losses incurred due to locust attacks but also an efficient way to automate agricultural practices across the globe in order to improve farmer ergonomics.Keywords: locust, NDVI, optimization, path planning, reinforcement learning, UAV
Procedia PDF Downloads 251697 Optimal Maintenance Clustering for Rail Track Components Subject to Possession Capacity Constraints
Authors: Cuong D. Dao, Rob J.I. Basten, Andreas Hartmann
Abstract:
This paper studies the optimal maintenance planning of preventive maintenance and renewal activities for components in a single railway track when the available time for maintenance is limited. The rail-track system consists of several types of components, such as rail, ballast, and switches with different preventive maintenance and renewal intervals. To perform maintenance or renewal on the track, a train free period for maintenance, called a possession, is required. Since a major possession directly affects the regular train schedule, maintenance and renewal activities are clustered as much as possible. In a highly dense and utilized railway network, the possession time on the track is critical since the demand for train operations is very high and a long possession has a severe impact on the regular train schedule. We present an optimization model and investigate the maintenance schedules with and without the possession capacity constraint. In addition, we also integrate the social-economic cost related to the effects of the maintenance time to the variable possession cost into the optimization model. A numerical example is provided to illustrate the model.Keywords: rail-track components, maintenance, optimal clustering, possession capacity
Procedia PDF Downloads 264696 Spatial Scale of Clustering of Residential Burglary and Its Dependence on Temporal Scale
Authors: Mohammed A. Alazawi, Shiguo Jiang, Steven F. Messner
Abstract:
Research has long focused on two main spatial aspects of crime: spatial patterns and spatial processes. When analyzing these patterns and processes, a key issue has been to determine the proper spatial scale. In addition, it is important to consider the possibility that these patterns and processes might differ appreciably for different temporal scales and might vary across geographic units of analysis. We examine the spatial-temporal dependence of residential burglary. This dependence is tested at varying geographical scales and temporal aggregations. The analyses are based on recorded incidents of crime in Columbus, Ohio during the 1994-2002 period. We implement point pattern analysis on the crime points using Ripley’s K function. The results indicate that spatial point patterns of residential burglary reveal spatial scales of clustering relatively larger than the average size of census tracts of the study area. Also, spatial scale is independent of temporal scale. The results of our analyses concerning the geographic scale of spatial patterns and processes can inform the development of effective policies for crime control.Keywords: inhomogeneous K function, residential burglary, spatial point pattern, spatial scale, temporal scale
Procedia PDF Downloads 347695 Clustering Color Space, Time Interest Points for Moving Objects
Authors: Insaf Bellamine, Hamid Tairi
Abstract:
Detecting moving objects in sequences is an essential step for video analysis. This paper mainly contributes to the Color Space-Time Interest Points (CSTIP) extraction and detection. We propose a new method for detection of moving objects. Two main steps compose the proposed method. First, we suggest to apply the algorithm of the detection of Color Space-Time Interest Points (CSTIP) on both components of the Color Structure-Texture Image Decomposition which is based on a Partial Differential Equation (PDE): a color geometric structure component and a color texture component. A descriptor is associated to each of these points. In a second stage, we address the problem of grouping the points (CSTIP) into clusters. Experiments and comparison to other motion detection methods on challenging sequences show the performance of the proposed method and its utility for video analysis. Experimental results are obtained from very different types of videos, namely sport videos and animation movies.Keywords: Color Space-Time Interest Points (CSTIP), Color Structure-Texture Image Decomposition, Motion Detection, clustering
Procedia PDF Downloads 379694 Optical Flow Based System for Cross Traffic Alert
Authors: Giuseppe Spampinato, Salvatore Curti, Ivana Guarneri, Arcangelo Bruna
Abstract:
This document describes an advanced system and methodology for Cross Traffic Alert (CTA), able to detect vehicles that move into the vehicle driving path from the left or right side. The camera is supposed to be not only on a vehicle still, e.g. at a traffic light or at an intersection, but also moving slowly, e.g. in a car park. In all of the aforementioned conditions, a driver’s short loss of concentration or distraction can easily lead to a serious accident. A valid support to avoid these kinds of car crashes is represented by the proposed system. It is an extension of our previous work, related to a clustering system, which only works on fixed cameras. Just a vanish point calculation and simple optical flow filtering, to eliminate motion vectors due to the car relative movement, is performed to let the system achieve high performances with different scenarios, cameras and resolutions. The proposed system just uses as input the optical flow, which is hardware implemented in the proposed platform and since the elaboration of the whole system is really speed and power consumption, it is inserted directly in the camera framework, allowing to execute all the processing in real-time.Keywords: clustering, cross traffic alert, optical flow, real time, vanishing point
Procedia PDF Downloads 203693 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms
Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga
Abstract:
Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.Keywords: anomaly detection, clustering, pattern recognition, web sessions
Procedia PDF Downloads 288692 Geographic Legacies for Modern Day Disease Research: Autism Spectrum Disorder as a Case-Control Study
Authors: Rebecca Richards Steed, James Van Derslice, Ken Smith, Richard Medina, Amanda Bakian
Abstract:
Elucidating gene-environment interactions for heritable disease outcomes is an emerging area of disease research, with genetic studies informing hypotheses for environment and gene interactions underlying some of the most confounding diseases of our time, like autism spectrum disorder (ASD). Geography has thus far played a key role in identifying environmental factors contributing to disease, but its use can be broadened to include genetic and environmental factors that have a synergistic effect on disease. Through the use of family pedigrees and disease outcomes with life-course residential histories, space-time clustering of generations at critical developmental windows can provide further understanding of (1) environmental factors that contribute to disease patterns in families, (2) susceptible critical windows of development most impacted by environment, (3) and that are most likely to lead to an ASD diagnosis. This paper introduces a retrospective case-control study that utilizes pedigree data, health data, and residential life-course location points to find space-time clustering of ancestors with a grandchild/child with a clinical diagnosis of ASD. Finding space-time clusters of ancestors at critical developmental windows serves as a proxy for shared environmental exposures. The authors refer to geographic life-course exposures as geographic legacies. Identifying space-time clusters of ancestors creates a bridge for researching exposures of past generations that may impact modern-day progeny health. Results from the space-time cluster analysis show multiple clusters for the maternal and paternal pedigrees. The paternal grandparent pedigree resulted in the most space-time clustering for birth and childhood developmental windows. No statistically significant clustering was found for adolescent years. These results will be further studied to identify the specific share of space-time environmental exposures. In conclusion, this study has found significant space-time clusters of parents, and grandparents for both maternal and paternal lineage. These results will be used to identify what environmental exposures have been shared with family members at critical developmental windows of time, and additional analysis will be applied.Keywords: family pedigree, environmental exposure, geographic legacy, medical geography, transgenerational inheritance
Procedia PDF Downloads 116691 The Role of Metaheuristic Approaches in Engineering Problems
Authors: Ferzat Anka
Abstract:
Many types of problems can be solved using traditional analytical methods. However, these methods take a long time and cause inefficient use of resources. In particular, different approaches may be required in solving complex and global engineering problems that we frequently encounter in real life. The bigger and more complex a problem, the harder it is to solve. Such problems are called Nondeterministic Polynomial time (NP-hard) in the literature. The main reasons for recommending different metaheuristic algorithms for various problems are the use of simple concepts, the use of simple mathematical equations and structures, the use of non-derivative mechanisms, the avoidance of local optima, and their fast convergence. They are also flexible, as they can be applied to different problems without very specific modifications. Thanks to these features, it can be easily embedded even in many hardware devices. Accordingly, this approach can also be used in trend application areas such as IoT, big data, and parallel structures. Indeed, the metaheuristic approaches are algorithms that return near-optimal results for solving large-scale optimization problems. This study is focused on the new metaheuristic method that has been merged with the chaotic approach. It is based on the chaos theorem and helps relevant algorithms to improve the diversity of the population and fast convergence. This approach is based on Chimp Optimization Algorithm (ChOA), that is a recently introduced metaheuristic algorithm inspired by nature. This algorithm identified four types of chimpanzee groups: attacker, barrier, chaser, and driver, and proposed a suitable mathematical model for them based on the various intelligence and sexual motivations of chimpanzees. However, this algorithm is not more successful in the convergence rate and escaping of the local optimum trap in solving high-dimensional problems. Although it and some of its variants use some strategies to overcome these problems, it is observed that it is not sufficient. Therefore, in this study, a newly expanded variant is described. In the algorithm called Ex-ChOA, hybrid models are proposed for position updates of search agents, and a dynamic switching mechanism is provided for transition phases. This flexible structure solves the slow convergence problem of ChOA and improves its accuracy in multidimensional problems. Therefore, it tries to achieve success in solving global, complex, and constrained problems. The main contribution of this study is 1) It improves the accuracy and solves the slow convergence problem of the ChOA. 2) It proposes new hybrid movement strategy models for position updates of search agents. 3) It provides success in solving global, complex, and constrained problems. 4) It provides a dynamic switching mechanism between phases. The performance of the Ex-ChOA algorithm is analyzed on a total of 8 benchmark functions, as well as a total of 2 classical and constrained engineering problems. The proposed algorithm is compared with the ChoA, and several well-known variants (Weighted-ChoA, Enhanced-ChoA) are used. In addition, an Improved algorithm from the Grey Wolf Optimizer (I-GWO) method is chosen for comparison since the working model is similar. The obtained results depict that the proposed algorithm performs better or equivalently to the compared algorithms.Keywords: optimization, metaheuristic, chimp optimization algorithm, engineering constrained problems
Procedia PDF Downloads 77690 Decision Support System in Air Pollution Using Data Mining
Authors: E. Fathallahi Aghdam, V. Hosseini
Abstract:
Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.Keywords: data mining, clustering, air pollution, crisp approach
Procedia PDF Downloads 428689 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach
Authors: G. Tamilpavai, C. Vishnuppriya
Abstract:
Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.Keywords: bioinformatics, cancer motif, DNA, k-mers, Levenshtein distance, SOM
Procedia PDF Downloads 188