Search results for: space-time clustering analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27123

Search results for: space-time clustering analysis

26883 Predicting Open Chromatin Regions in Cell-Free DNA Whole Genome Sequencing Data by Correlation Clustering  

Authors: Fahimeh Palizban, Farshad Noravesh, Amir Hossein Saeidian, Mahya Mehrmohamadi

Abstract:

In the recent decade, the emergence of liquid biopsy has significantly improved cancer monitoring and detection. Dying cells, including those originating from tumors, shed their DNA into the blood and contribute to a pool of circulating fragments called cell-free DNA. Accordingly, identifying the tissue origin of these DNA fragments from the plasma can result in more accurate and fast disease diagnosis and precise treatment protocols. Open chromatin regions are important epigenetic features of DNA that reflect cell types of origin. Profiling these features by DNase-seq, ATAC-seq, and histone ChIP-seq provides insights into tissue-specific and disease-specific regulatory mechanisms. There have been several studies in the area of cancer liquid biopsy that integrate distinct genomic and epigenomic features for early cancer detection along with tissue of origin detection. However, multimodal analysis requires several types of experiments to cover the genomic and epigenomic aspects of a single sample, which will lead to a huge amount of cost and time. To overcome these limitations, the idea of predicting OCRs from WGS is of particular importance. In this regard, we proposed a computational approach to target the prediction of open chromatin regions as an important epigenetic feature from cell-free DNA whole genome sequence data. To fulfill this objective, local sequencing depth will be fed to our proposed algorithm and the prediction of the most probable open chromatin regions from whole genome sequencing data can be carried out. Our method integrates the signal processing method with sequencing depth data and includes count normalization, Discrete Fourie Transform conversion, graph construction, graph cut optimization by linear programming, and clustering. To validate the proposed method, we compared the output of the clustering (open chromatin region+, open chromatin region-) with previously validated open chromatin regions related to human blood samples of the ATAC-DB database. The percentage of overlap between predicted open chromatin regions and the experimentally validated regions obtained by ATAC-seq in ATAC-DB is greater than 67%, which indicates meaningful prediction. As it is evident, OCRs are mostly located in the transcription start sites (TSS) of the genes. In this regard, we compared the concordance between the predicted OCRs and the human genes TSS regions obtained from refTSS and it showed proper accordance around 52.04% and ~78% with all and the housekeeping genes, respectively. Accurately detecting open chromatin regions from plasma cell-free DNA-seq data is a very challenging computational problem due to the existence of several confounding factors, such as technical and biological variations. Although this approach is in its infancy, there has already been an attempt to apply it, which leads to a tool named OCRDetector with some restrictions like the need for highly depth cfDNA WGS data, prior information about OCRs distribution, and considering multiple features. However, we implemented a graph signal clustering based on a single depth feature in an unsupervised learning manner that resulted in faster performance and decent accuracy. Overall, we tried to investigate the epigenomic pattern of a cell-free DNA sample from a new computational perspective that can be used along with other tools to investigate genetic and epigenetic aspects of a single whole genome sequencing data for efficient liquid biopsy-related analysis.

Keywords: open chromatin regions, cancer, cell-free DNA, epigenomics, graph signal processing, correlation clustering

Procedia PDF Downloads 110
26882 The Analyzer: Clustering Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human Computer Interaction

Authors: Dona Shaini Abhilasha Nanayakkara, Kurugamage Jude Pravinda Gregory Perera

Abstract:

E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. The Analyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling The Analyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.

Keywords: data clustering, data standardization, dimensionality reduction, human computer interaction, user profiling

Procedia PDF Downloads 36
26881 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks

Authors: Paul Shize Li, Frank Alber

Abstract:

A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.

Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation

Procedia PDF Downloads 104
26880 Analysis of Ozone Episodes in the Forest and Vegetation Areas with Using HYSPLIT Model: A Case Study of the North-West Side of Biga Peninsula, Turkey

Authors: Deniz Sari, Selahattin İncecik, Nesimi Ozkurt

Abstract:

Surface ozone, which named as one of the most critical pollutants in the 21th century, threats to human health, forest and vegetation. Specifically, in rural areas surface ozone cause significant influences on agricultural productions and trees. In this study, in order to understand to the surface ozone levels in rural areas we focus on the north-western side of Biga Peninsula which covers by the mountainous and forested area. Ozone concentrations were measured for the first time with passive sampling at 10 sites and two online monitoring stations in this rural area from 2013 and 2015. Using with the daytime hourly O3 measurements during light hours (08:00–20:00) exceeding the threshold of 40 ppb over the 3 months (May, June and July) for agricultural crops, and over the six months (April to September) for forest trees AOT40 (Accumulated hourly O3 concentrations Over a Threshold of 40 ppb) cumulative index was calculated. AOT40 is defined by EU Directive 2008/50/EC to evaluate whether ozone pollution is a risk for vegetation, and is calculated by using hourly ozone concentrations from monitoring systems. In the present study, we performed the trajectory analysis by The Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model to follow the long-range transport sources contributing to the high ozone levels in the region. The ozone episodes observed between 2013 and 2015 were analysed using the HYSPLIT model developed by the NOAA-ARL. In addition, the cluster analysis is used to identify homogeneous groups of air mass transport patterns can be conducted through air trajectory clustering by grouping similar trajectories in terms of air mass movement. Backward trajectories produced for 3 years by HYSPLIT model were assigned to different clusters according to their moving speed and direction using a k-means clustering algorithm. According to cluster analysis results, northerly flows to study area cause to high ozone levels in the region. The results present that the ozone values in the study area are above the critical levels for forest and vegetation based on EU Directive 2008/50/EC.

Keywords: AOT40, Biga Peninsula, HYSPLIT, surface ozone

Procedia PDF Downloads 223
26879 An Integrated Label Propagation Network for Structural Condition Assessment

Authors: Qingsong Xiong, Cheng Yuan, Qingzhao Kong, Haibei Xiong

Abstract:

Deep-learning-driven approaches based on vibration responses have attracted larger attention in rapid structural condition assessment while obtaining sufficient measured training data with corresponding labels is relevantly costly and even inaccessible in practical engineering. This study proposes an integrated label propagation network for structural condition assessment, which is able to diffuse the labels from continuously-generating measurements by intact structure to those of missing labels of damage scenarios. The integrated network is embedded with damage-sensitive features extraction by deep autoencoder and pseudo-labels propagation by optimized fuzzy clustering, the architecture and mechanism which are elaborated. With a sophisticated network design and specified strategies for improving performance, the present network achieves to extends the superiority of self-supervised representation learning, unsupervised fuzzy clustering and supervised classification algorithms into an integration aiming at assessing damage conditions. Both numerical simulations and full-scale laboratory shaking table tests of a two-story building structure were conducted to validate its capability of detecting post-earthquake damage. The identifying accuracy of a present network was 0.95 in numerical validations and an average 0.86 in laboratory case studies, respectively. It should be noted that the whole training procedure of all involved models in the network stringently doesn’t rely upon any labeled data of damage scenarios but only several samples of intact structure, which indicates a significant superiority in model adaptability and feasible applicability in practice.

Keywords: autoencoder, condition assessment, fuzzy clustering, label propagation

Procedia PDF Downloads 73
26878 The Application of Video Segmentation Methods for the Purpose of Action Detection in Videos

Authors: Nassima Noufail, Sara Bouhali

Abstract:

In this work, we develop a semi-supervised solution for the purpose of action detection in videos and propose an efficient algorithm for video segmentation. The approach is divided into video segmentation, feature extraction, and classification. In the first part, a video is segmented into clips, and we used the K-means algorithm for this segmentation; our goal is to find groups based on similarity in the video. The application of k-means clustering into all the frames is time-consuming; therefore, we started by the identification of transition frames where the scene in the video changes significantly, and then we applied K-means clustering into these transition frames. We used two image filters, the gaussian filter and the Laplacian of Gaussian. Each filter extracts a set of features from the frames. The Gaussian filter blurs the image and omits the higher frequencies, and the Laplacian of gaussian detects regions of rapid intensity changes; we then used this vector of filter responses as an input to our k-means algorithm. The output is a set of cluster centers. Each video frame pixel is then mapped to the nearest cluster center and painted with a corresponding color to form a visual map. The resulting visual map had similar pixels grouped. We then computed a cluster score indicating how clusters are near each other and plotted a signal representing frame number vs. clustering score. Our hypothesis was that the evolution of the signal would not change if semantically related events were happening in the scene. We marked the breakpoints at which the root mean square level of the signal changes significantly, and each breakpoint is an indication of the beginning of a new video segment. In the second part, for each segment from part 1, we randomly selected a 16-frame clip, then we extracted spatiotemporal features using convolutional 3D network C3D for every 16 frames using a pre-trained model. The C3D final output is a 512-feature vector dimension; hence we used principal component analysis (PCA) for dimensionality reduction. The final part is the classification. The C3D feature vectors are used as input to a multi-class linear support vector machine (SVM) for the training model, and we used a multi-classifier to detect the action. We evaluated our experiment on the UCF101 dataset, which consists of 101 human action categories, and we achieved an accuracy that outperforms the state of art by 1.2%.

Keywords: video segmentation, action detection, classification, Kmeans, C3D

Procedia PDF Downloads 49
26877 An Infinite Mixture Model for Modelling Stutter Ratio in Forensic Data Analysis

Authors: M. A. C. S. Sampath Fernando, James M. Curran, Renate Meyer

Abstract:

Forensic DNA analysis has received much attention over the last three decades, due to its incredible usefulness in human identification. The statistical interpretation of DNA evidence is recognised as one of the most mature fields in forensic science. Peak heights in an Electropherogram (EPG) are approximately proportional to the amount of template DNA in the original sample being tested. A stutter is a minor peak in an EPG, which is not masking as an allele of a potential contributor, and considered as an artefact that is presumed to be arisen due to miscopying or slippage during the PCR. Stutter peaks are mostly analysed in terms of stutter ratio that is calculated relative to the corresponding parent allele height. Analysis of mixture profiles has always been problematic in evidence interpretation, especially with the presence of PCR artefacts like stutters. Unlike binary and semi-continuous models; continuous models assign a probability (as a continuous weight) for each possible genotype combination, and significantly enhances the use of continuous peak height information resulting in more efficient reliable interpretations. Therefore, the presence of a sound methodology to distinguish between stutters and real alleles is essential for the accuracy of the interpretation. Sensibly, any such method has to be able to focus on modelling stutter peaks. Bayesian nonparametric methods provide increased flexibility in applied statistical modelling. Mixture models are frequently employed as fundamental data analysis tools in clustering and classification of data and assume unidentified heterogeneous sources for data. In model-based clustering, each unknown source is reflected by a cluster, and the clusters are modelled using parametric models. Specifying the number of components in finite mixture models, however, is practically difficult even though the calculations are relatively simple. Infinite mixture models, in contrast, do not require the user to specify the number of components. Instead, a Dirichlet process, which is an infinite-dimensional generalization of the Dirichlet distribution, is used to deal with the problem of a number of components. Chinese restaurant process (CRP), Stick-breaking process and Pólya urn scheme are frequently used as Dirichlet priors in Bayesian mixture models. In this study, we illustrate an infinite mixture of simple linear regression models for modelling stutter ratio and introduce some modifications to overcome weaknesses associated with CRP.

Keywords: Chinese restaurant process, Dirichlet prior, infinite mixture model, PCR stutter

Procedia PDF Downloads 305
26876 Automatic Detection of Proliferative Cells in Immunohistochemically Images of Meningioma Using Fuzzy C-Means Clustering and HSV Color Space

Authors: Vahid Anari, Mina Bakhshi

Abstract:

Visual search and identification of immunohistochemically stained tissue of meningioma was performed manually in pathologic laboratories to detect and diagnose the cancers type of meningioma. This task is very tedious and time-consuming. Moreover, because of cell's complex nature, it still remains a challenging task to segment cells from its background and analyze them automatically. In this paper, we develop and test a computerized scheme that can automatically identify cells in microscopic images of meningioma and classify them into positive (proliferative) and negative (normal) cells. Dataset including 150 images are used to test the scheme. The scheme uses Fuzzy C-means algorithm as a color clustering method based on perceptually uniform hue, saturation, value (HSV) color space. Since the cells are distinguishable by the human eye, the accuracy and stability of the algorithm are quantitatively compared through application to a wide variety of real images.

Keywords: positive cell, color segmentation, HSV color space, immunohistochemistry, meningioma, thresholding, fuzzy c-means

Procedia PDF Downloads 178
26875 A Comprehensive Study and Evaluation on Image Fashion Features Extraction

Authors: Yuanchao Sang, Zhihao Gong, Longsheng Chen, Long Chen

Abstract:

Clothing fashion represents a human’s aesthetic appreciation towards everyday outfits and appetite for fashion, and it reflects the development of status in society, humanity, and economics. However, modelling fashion by machine is extremely challenging because fashion is too abstract to be efficiently described by machines. Even human beings can hardly reach a consensus about fashion. In this paper, we are dedicated to answering a fundamental fashion-related problem: what image feature best describes clothing fashion? To address this issue, we have designed and evaluated various image features, ranging from traditional low-level hand-crafted features to mid-level style awareness features to various current popular deep neural network-based features, which have shown state-of-the-art performance in various vision tasks. In summary, we tested the following 9 feature representations: color, texture, shape, style, convolutional neural networks (CNNs), CNNs with distance metric learning (CNNs&DML), AutoEncoder, CNNs with multiple layer combination (CNNs&MLC) and CNNs with dynamic feature clustering (CNNs&DFC). Finally, we validated the performance of these features on two publicly available datasets. Quantitative and qualitative experimental results on both intra-domain and inter-domain fashion clothing image retrieval showed that deep learning based feature representations far outweigh traditional hand-crafted feature representation. Additionally, among all deep learning based methods, CNNs with explicit feature clustering performs best, which shows feature clustering is essential for discriminative fashion feature representation.

Keywords: convolutional neural network, feature representation, image processing, machine modelling

Procedia PDF Downloads 110
26874 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 105
26873 Artificial Intelligent Methodology for Liquid Propellant Engine Design Optimization

Authors: Hassan Naseh, Javad Roozgard

Abstract:

This paper represents the methodology based on Artificial Intelligent (AI) applied to Liquid Propellant Engine (LPE) optimization. The AI methodology utilized from Adaptive neural Fuzzy Inference System (ANFIS). In this methodology, the optimum objective function means to achieve maximum performance (specific impulse). The independent design variables in ANFIS modeling are combustion chamber pressure and temperature and oxidizer to fuel ratio and output of this modeling are specific impulse that can be applied with other objective functions in LPE design optimization. To this end, the LPE’s parameter has been modeled in ANFIS methodology based on generating fuzzy inference system structure by using grid partitioning, subtractive clustering and Fuzzy C-Means (FCM) clustering for both inferences (Mamdani and Sugeno) and various types of membership functions. The final comparing optimization results shown accuracy and processing run time of the Gaussian ANFIS Methodology between all methods.

Keywords: ANFIS methodology, artificial intelligent, liquid propellant engine, optimization

Procedia PDF Downloads 541
26872 Clustering Using Cooperative Multihop Mini-Groups in Wireless Sensor Network: A Novel Approach

Authors: Virender Ranga, Mayank Dave, Anil Kumar Verma

Abstract:

Recently wireless sensor networks (WSNs) are used in many real life applications like environmental monitoring, habitat monitoring, health monitoring etc. Due to power constraint cheaper devices used in these applications, the energy consumption of each device should be kept as low as possible such that network operates for longer period of time. One of the techniques to prolong the network lifetime is an intelligent grouping of sensor nodes such that they can perform their operation in cooperative and energy efficient manner. With this motivation, we propose a novel approach by organize the sensor nodes in cooperative multihop mini-groups so that the total global energy consumption of the network can be reduced and network lifetime can be improved. Our proposed approach also reduces the number of transmitted messages inside the WSNs, which further minimizes the energy consumption of the whole network. The experimental simulations show that our proposed approach outperforms over the state-of-the-art approach in terms of stability period and aggregated data.

Keywords: clustering, cluster-head, mini-group, stability period

Procedia PDF Downloads 323
26871 Clustering-Based Threshold Model for Condition Rating of Concrete Bridge Decks

Authors: M. Alsharqawi, T. Zayed, S. Abu Dabous

Abstract:

To ensure safety and serviceability of bridge infrastructure, accurate condition assessment and rating methods are needed to provide basis for bridge Maintenance, Repair and Replacement (MRR) decisions. In North America, the common practices to assess condition of bridges are through visual inspection. These practices are limited to detect surface defects and external flaws. Further, the thresholds that define the severity of bridge deterioration are selected arbitrarily. The current research discusses the main deteriorations and defects identified during visual inspection and Non-Destructive Evaluation (NDE). NDE techniques are becoming popular in augmenting the visual examination during inspection to detect subsurface defects. Quality inspection data and accurate condition assessment and rating are the basis for determining appropriate MRR decisions. Thus, in this paper, a novel method for bridge condition assessment using the Quality Function Deployment (QFD) theory is utilized. The QFD model is designed to provide an integrated condition by evaluating both the surface and subsurface defects for concrete bridges. Moreover, an integrated condition rating index with four thresholds is developed based on the QFD condition assessment model and using K-means clustering technique. Twenty case studies are analyzed by applying the QFD model and implementing the developed rating index. The results from the analyzed case studies show that the proposed threshold model produces robust MRR recommendations consistent with decisions and recommendations made by bridge managers on these projects. The proposed method is expected to advance the state of the art of bridges condition assessment and rating.

Keywords: concrete bridge decks, condition assessment and rating, quality function deployment, k-means clustering technique

Procedia PDF Downloads 195
26870 Fault-Detection and Self-Stabilization Protocol for Wireless Sensor Networks

Authors: Ather Saeed, Arif Khan, Jeffrey Gosper

Abstract:

Sensor devices are prone to errors and sudden node failures, which are difficult to detect in a timely manner when deployed in real-time, hazardous, large-scale harsh environments and in medical emergencies. Therefore, the loss of data can be life-threatening when the sensed phenomenon is not disseminated due to sudden node failure, battery depletion or temporary malfunctioning. We introduce a set of partial differential equations for localizing faults, similar to Green’s and Maxwell’s equations used in Electrostatics and Electromagnetism. We introduce a node organization and clustering scheme for self-stabilizing sensor networks. Green’s theorem is applied to regions where the curve is closed and continuously differentiable to ensure network connectivity. Experimental results show that the proposed GTFD (Green’s Theorem fault-detection and Self-stabilization) protocol not only detects faulty nodes but also accurately generates network stability graphs where urgent intervention is required for dynamically self-stabilizing the network.

Keywords: Green’s Theorem, self-stabilization, fault-localization, RSSI, WSN, clustering

Procedia PDF Downloads 40
26869 Design of Agricultural Machinery Factory Facility Layout

Authors: Nilda Tri Putri, Muhammad Taufik

Abstract:

Tools and agricultural machinery (Alsintan) is a tool used in agribusiness activities. Alsintan used to change the traditional farming systems generally use manual equipment into modern agriculture with mechanization. CV Nugraha Chakti Consultant make an action plan for industrial development Alsintan West Sumatra in 2012 to develop medium industries of Alsintan become a major industry of Alsintan, one of efforts made is increase the production capacity of the industry Alsintan. Production capacity for superior products as hydrotiller and threshers set each for 2.000 units per year. CV Citra Dragon as one of the medium industry alsintan in West Sumatra has a plan to relocate the existing plant to meet growing consumer demand each year. Increased production capacity and plant relocation plan has led to a change in the layout; therefore need to design the layout of the plant facility CV Citra Dragon. First step the to design of plant layout is design the layout of the production floor. The design of the production floor layout is done by applying group technology layout. The initial step is to do a machine grouping and part family using the Average Linkage Clustering (ALC) and Rank Order Clustering (ROC). Furthermore done independent work station design and layout design using the Modified Spanning Tree (MST). Alternative selection layout is done to select the best production floor layout between ALC and ROC cell grouping. Furthermore, to design the layout of warehouses, offices and other production support facilities. Activity Relationship Chart methods used to organize the placement of factory facilities has been designed. After structuring plan facilities, calculated cost manufacturing facility plant establishment. Type of layout is used on the production floor layout technology group. The production floor is composed of four cell machinery, assembly area and painting area. The total distance of the displacement of material in a single production amounted to 1120.16 m which means need 18,7minutes of transportation time for one time production. Alsintan Factory has designed a circular flow pattern with 11 facilities. The facilities were designed consisting of 10 rooms and 1 parking space. The measure of factory building is 84 m x 52 m.

Keywords: Average Linkage Clustering (ALC), Rank Order Clustering (ROC), Modified Spanning Tree (MST), Activity Relationship Chart (ARC)

Procedia PDF Downloads 466
26868 Production Optimization under Geological Uncertainty Using Distance-Based Clustering

Authors: Byeongcheol Kang, Junyi Kim, Hyungsik Jung, Hyungjun Yang, Jaewoo An, Jonggeun Choe

Abstract:

It is important to figure out reservoir properties for better production management. Due to the limited information, there are geological uncertainties on very heterogeneous or channel reservoir. One of the solutions is to generate multiple equi-probable realizations using geostatistical methods. However, some models have wrong properties, which need to be excluded for simulation efficiency and reliability. We propose a novel method of model selection scheme, based on distance-based clustering for reliable application of production optimization algorithm. Distance is defined as a degree of dissimilarity between the data. We calculate Hausdorff distance to classify the models based on their similarity. Hausdorff distance is useful for shape matching of the reservoir models. We use multi-dimensional scaling (MDS) to describe the models on two dimensional space and group them by K-means clustering. Rather than simulating all models, we choose one representative model from each cluster and find out the best model, which has the similar production rates with the true values. From the process, we can select good reservoir models near the best model with high confidence. We make 100 channel reservoir models using single normal equation simulation (SNESIM). Since oil and gas prefer to flow through the sand facies, it is critical to characterize pattern and connectivity of the channels in the reservoir. After calculating Hausdorff distances and projecting the models by MDS, we can see that the models assemble depending on their channel patterns. These channel distributions affect operation controls of each production well so that the model selection scheme improves management optimization process. We use one of useful global search algorithms, particle swarm optimization (PSO), for our production optimization. PSO is good to find global optimum of objective function, but it takes too much time due to its usage of many particles and iterations. In addition, if we use multiple reservoir models, the simulation time for PSO will be soared. By using the proposed method, we can select good and reliable models that already matches production data. Considering geological uncertainty of the reservoir, we can get well-optimized production controls for maximum net present value. The proposed method shows one of novel solutions to select good cases among the various probabilities. The model selection schemes can be applied to not only production optimization but also history matching or other ensemble-based methods for efficient simulations.

Keywords: distance-based clustering, geological uncertainty, particle swarm optimization (PSO), production optimization

Procedia PDF Downloads 112
26867 Unseen Classes: The Paradigm Shift in Machine Learning

Authors: Vani Singhal, Jitendra Parmar, Satyendra Singh Chouhan

Abstract:

Unseen class discovery has now become an important part of a machine-learning algorithm to judge new classes. Unseen classes are the classes on which the machine learning model is not trained on. With the advancement in technology and AI replacing humans, the amount of data has increased to the next level. So while implementing a model on real-world examples, we come across unseen new classes. Our aim is to find the number of unseen classes by using a hierarchical-based active learning algorithm. The algorithm is based on hierarchical clustering as well as active sampling. The number of clusters that we will get in the end will give the number of unseen classes. The total clusters will also contain some clusters that have unseen classes. Instead of first discovering unseen classes and then finding their number, we directly calculated the number by applying the algorithm. The dataset used is for intent classification. The target data is the intent of the corresponding query. We conclude that when the machine learning model will encounter real-world data, it will automatically find the number of unseen classes. In the future, our next work would be to label these unseen classes correctly.

Keywords: active sampling, hierarchical clustering, open world learning, unseen class discovery

Procedia PDF Downloads 136
26866 Study for an Optimal Cable Connection within an Inner Grid of an Offshore Wind Farm

Authors: Je-Seok Shin, Wook-Won Kim, Jin-O Kim

Abstract:

The offshore wind farm needs to be designed carefully considering economics and reliability aspects. There are many decision-making problems for designing entire offshore wind farm, this paper focuses on an inner grid layout which means the connection between wind turbines as well as between wind turbines and an offshore substation. A methodology proposed in this paper determines the connections and the cable type for each connection section using K-clustering, minimum spanning tree and cable selection algorithms. And then, a cost evaluation is performed in terms of investment, power loss and reliability. Through the cost evaluation, an optimal layout of inner grid is determined so as to have the lowest total cost. In order to demonstrate the validity of the methodology, the case study is conducted on 240MW offshore wind farm, and the results show that it is helpful to design optimally offshore wind farm.

Keywords: offshore wind farm, optimal layout, k-clustering algorithm, minimum spanning algorithm, cable type selection, power loss cost, reliability cost

Procedia PDF Downloads 358
26865 A Literature Review on the Role of Local Potential for Creative Industries

Authors: Maya Irjayanti

Abstract:

Local creativity utilization has been a strategic investment to be expanded as a creative industry due to its significant contribution to the national gross domestic product. Many developed and developing countries look toward creative industries as an agenda for the economic growth. This study aims to identify the role of local potential for creative industries from various empirical studies. The method performed in this study will involve a peer-reviewed journal articles and conference papers review addressing local potential and creative industries. The literature review analysis will include several steps: material collection, descriptive analysis, category selection, and material evaluation. Finally, the outcome expected provides a creative industries clustering based on the local potential of various nations. In addition, the finding of this study will be used as future research reference to explore a particular area with well-known aspects of local potential for creative industry products.

Keywords: business, creativity, local potential, local wisdom

Procedia PDF Downloads 347
26864 Optimal Maintenance Clustering for Rail Track Components Subject to Possession Capacity Constraints

Authors: Cuong D. Dao, Rob J.I. Basten, Andreas Hartmann

Abstract:

This paper studies the optimal maintenance planning of preventive maintenance and renewal activities for components in a single railway track when the available time for maintenance is limited. The rail-track system consists of several types of components, such as rail, ballast, and switches with different preventive maintenance and renewal intervals. To perform maintenance or renewal on the track, a train free period for maintenance, called a possession, is required. Since a major possession directly affects the regular train schedule, maintenance and renewal activities are clustered as much as possible. In a highly dense and utilized railway network, the possession time on the track is critical since the demand for train operations is very high and a long possession has a severe impact on the regular train schedule. We present an optimization model and investigate the maintenance schedules with and without the possession capacity constraint. In addition, we also integrate the social-economic cost related to the effects of the maintenance time to the variable possession cost into the optimization model. A numerical example is provided to illustrate the model.

Keywords: rail-track components, maintenance, optimal clustering, possession capacity

Procedia PDF Downloads 232
26863 Analytical Study of Data Mining Techniques for Software Quality Assurance

Authors: Mariam Bibi, Rubab Mehboob, Mehreen Sirshar

Abstract:

Satisfying the customer requirements is the ultimate goal of producing or developing any product. The quality of the product is decided on the bases of the level of customer satisfaction. There are different techniques which have been reported during the survey which enhance the quality of the product through software defect prediction and by locating the missing software requirements. Some mining techniques were proposed to assess the individual performance indicators in collaborative environment to reduce errors at individual level. The basic intention is to produce a product with zero or few defects thereby producing a best product quality wise. In the analysis of survey the techniques like Genetic algorithm, artificial neural network, classification and clustering techniques and decision tree are studied. After analysis it has been discovered that these techniques contributed much to the improvement and enhancement of the quality of the product.

Keywords: data mining, defect prediction, missing requirements, software quality

Procedia PDF Downloads 427
26862 Optical Flow Based System for Cross Traffic Alert

Authors: Giuseppe Spampinato, Salvatore Curti, Ivana Guarneri, Arcangelo Bruna

Abstract:

This document describes an advanced system and methodology for Cross Traffic Alert (CTA), able to detect vehicles that move into the vehicle driving path from the left or right side. The camera is supposed to be not only on a vehicle still, e.g. at a traffic light or at an intersection, but also moving slowly, e.g. in a car park. In all of the aforementioned conditions, a driver’s short loss of concentration or distraction can easily lead to a serious accident. A valid support to avoid these kinds of car crashes is represented by the proposed system. It is an extension of our previous work, related to a clustering system, which only works on fixed cameras. Just a vanish point calculation and simple optical flow filtering, to eliminate motion vectors due to the car relative movement, is performed to let the system achieve high performances with different scenarios, cameras and resolutions. The proposed system just uses as input the optical flow, which is hardware implemented in the proposed platform and since the elaboration of the whole system is really speed and power consumption, it is inserted directly in the camera framework, allowing to execute all the processing in real-time.

Keywords: clustering, cross traffic alert, optical flow, real time, vanishing point

Procedia PDF Downloads 171
26861 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms

Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga

Abstract:

Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.

Keywords: anomaly detection, clustering, pattern recognition, web sessions

Procedia PDF Downloads 257
26860 Microbial Biogeography of Greek Olive Varieties Assessed by Amplicon-Based Metagenomics Analysis

Authors: Lena Payati, Maria Kazou, Effie Tsakalidou

Abstract:

Table olives are one of the most popular fermented vegetables worldwide, which along with olive oil, have a crucial role in the world economy. They are highly appreciated by the consumers for their characteristic taste and pleasant aromas, while several health and nutritional benefits have been reported as well. Until recently, microbial biogeography, i.e., the study of microbial diversity over time and space, has been mainly associated with wine. However, nowadays, the term 'terroir' has been extended to other crops and food products so as to link the geographical origin and environmental conditions to quality aspects of fermented foods. Taking the above into consideration, the present study focuses on the microbial fingerprinting of the most important olive varieties of Greece with the state-of-the-art amplicon-based metagenomics analysis. Towards this, in 2019, 61 samples from 38 different olive varieties were collected at the final stage of ripening from 13 well spread geographical regions in Greece. For the metagenomics analysis, total DNA was extracted from the olive samples, and the 16S rRNA gene and ITS DNA region were sequenced and analyzed using bioinformatics tools for the identification of bacterial and yeasts/fungal diversity, respectively. Furthermore, principal component analysis (PCA) was also performed for data clustering based on the average microbial composition of all samples from each region of origin. According to the composition, results obtained, when samples were analyzed separately, the majority of both bacteria (such as Pantoea, Enterobacter, Roserbergiella, and Pseudomonas) and yeasts/fungi (such as Aureobasidium, Debaromyces, Candida, and Cladosporium) genera identified were found in all 61 samples. Even though interesting differences were observed at the relative abundance level of the identified genera, the bacterial genus Pantoea and the yeast/fungi genus Aureobasidium were the dominant ones in 35 and 40 samples, respectively. Of note, olive samples collected from the same region had similar fingerprint (genera identified and relative abundance level) regardless of the variety, indicating a potential association between the relative abundance of certain taxa and the geographical region. When samples were grouped by region of origin, distinct bacterial profiles per region were observed, which was also evident from the PCA analysis. This was not the case for the yeast/fungi profiles since 10 out of the 13 regions were grouped together mainly due to the dominance of the genus Aureobasidium. A second cluster was formed for the islands Crete and Rhodes, both of which are located in the Southeast Aegean Sea. These two regions clustered together mainly due to the identification of the genus Toxicocladosporium in relatively high abundances. Finally, the Agrinio region was separated from the others as it showed a completely different microbial fingerprinting. However, due to the limited number of olive samples from some regions, a subsequent PCA analysis with more samples from these regions is expected to yield in a more clear clustering. The present study is part of a bigger project, the first of its kind in Greece, with the ultimate goal to analyze a larger set of olive samples of different varieties and from different regions in Greece in order to have a reliable olives’ microbial biogeography.

Keywords: amplicon-based metagenomics analysis, bacteria, microbial biogeography, olive microbiota, yeasts/fungi

Procedia PDF Downloads 87
26859 Combined Analysis of m⁶A and m⁵C Modulators on the Prognosis of Hepatocellular Carcinoma

Authors: Hongmeng Su, Luyu Zhao, Yanyan Qian, Hong Fan

Abstract:

Aim: Hepatocellular carcinoma (HCC) is one of the most common malignant tumors that endanger human health seriously. RNA methylation, especially N6-methyladenosine (m⁶A) and 5-methylcytosine (m⁵C), a crucial epigenetic transcriptional regulatory mechanism, plays an important role in tumorigenesis, progression and prognosis. This research aims to systematically evaluate the prognostic value of m⁶A and m⁵C modulators in HCC patients. Methods: Twenty-four modulators of m⁶A and m⁵C were candidates to analyze their expression level and their contribution to predict the prognosis of HCC. Consensus clustering analysis was applied to classify HCC patients. Cox and LASSO regression were used to construct the risk model. According to the risk score, HCC patients were divided into high-risk and low/medium-risk groups. The clinical pathology factors of HCC patients were analyzed by univariate and multivariate Cox regression analysis. Results: The HCC patients were classified into 2 clusters with significant differences in overall survival and clinical characteristics. Nine-gene risk model was constructed including METTL3, VIRMA, YTHDF1, YTHDF2, NOP2, NSUN4, NSUN5, DNMT3A and ALYREF. It was indicated that the risk score could serve as an independent prognostic factor for patients with HCC. Conclusion: This study constructed a Nine-gene risk model by modulators of m⁶A and m⁵C and investigated its effect on the clinical prognosis of HCC. This model may provide important consideration for the therapeutic strategy and prognosis evaluation analysis of patients with HCC.

Keywords: hepatocellular carcinoma, m⁶A, m⁵C, prognosis, RNA methylation

Procedia PDF Downloads 34
26858 Cluster Analysis and Benchmarking for Performance Optimization of a Pyrochlore Processing Unit

Authors: Ana C. R. P. Ferreira, Adriano H. P. Pereira

Abstract:

Given the frequent variation of mineral properties throughout the Araxá pyrochlore deposit, even if a good homogenization work has been carried out before feeding the processing plants, an operation with quality and performance’s high variety standard is expected. These results could be improved and standardized if the blend composition parameters that most influence the processing route are determined, and then the types of raw materials are grouped by them, finally presenting a great reference with operational settings for each group. Associating the physical and chemical parameters of a unit operation through benchmarking or even an optimal reference of metallurgical recovery and product quality reflects in the reduction of the production costs, optimization of the mineral resource, and guarantee of greater stability in the subsequent processes of the production chain that uses the mineral of interest. Conducting a comprehensive exploratory data analysis to identify which characteristics of the ore are most relevant to the process route, associated with the use of Machine Learning algorithms for grouping the raw material (ore) and associating these with reference variables in the process’ benchmark is a reasonable alternative for the standardization and improvement of mineral processing units. Clustering methods through Decision Tree and K-Means were employed, associated with algorithms based on the theory of benchmarking, with criteria defined by the process team in order to reference the best adjustments for processing the ore piles of each cluster. A clean user interface was created to obtain the outputs of the created algorithm. The results were measured through the average time of adjustment and stabilization of the process after a new pile of homogenized ore enters the plant, as well as the average time needed to achieve the best processing result. Direct gains from the metallurgical recovery of the process were also measured. The results were promising, with a reduction in the adjustment time and stabilization when starting the processing of a new ore pile, as well as reaching the benchmark. Also noteworthy are the gains in metallurgical recovery, which reflect a significant saving in ore consumption and a consequent reduction in production costs, hence a more rational use of the tailings dams and life optimization of the mineral deposit.

Keywords: mineral clustering, machine learning, process optimization, pyrochlore processing

Procedia PDF Downloads 107
26857 Spatio-Temporal Analysis of Rabies Incidence in Herbivores of Economic Interest in Brazil

Authors: Francisco Miroslav Ulloa-Stanojlovic, Gina Polo, Ricardo Augusto Dias

Abstract:

In Brazil, there is a high incidence of rabies in herbivores of economic interest (HEI) transmitted by the common vampire bat Desmodus rotundus, the presence of human rabies cases and the huge economic losses in the world's largest cattle industry, it is important to assist the National Program for Control of Rabies in herbivores in Brazil, that aims to reduce the incidence of rabies in HEI populations, mainly through epidemiological surveillance, vaccination of herbivores and control of vampire-bat roosts. Material and Methods: A spatiotemporal retrospective Kulldorff's spatial scan statistic based on a Poisson model and Monte Carlo simulation and an Anselin's Local Moran's I statistic were used to uncover spatial clustering of HEI rabies from 2000 – 2014. Results: Were identify three important clusters with significant year-to-year variation (Figure 1). In 2000, was identified one area of clustering in the North region, specifically in the State of Tocantins. Between the year 2000 and 2004, a cluster centered in the Midwest and Southeast region including the States of Goiás, Minas Gerais, Rio de Janeiro, Espirito Santo and São Paulo was prominent. And finally between 2000 and 2005 was found an important cluster in the North, Midwest and South region. Conclusions: The HEI rabies is endemic in the country, in addition, appears to be significant differences among the States according to their surveillance services, that may be difficulting the control of the disease, also other factors could be influencing in the maintenance of this problem like the lack of information of vampire-bat roosts identification, and limited human resources for realization of field monitoring. A review of the program control by the authorities it’s necessary.

Keywords: Brazil, Desmodus rotundus, herbivores, rabies

Procedia PDF Downloads 383
26856 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 372
26855 Decision Support System in Air Pollution Using Data Mining

Authors: E. Fathallahi Aghdam, V. Hosseini

Abstract:

Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.

Keywords: data mining, clustering, air pollution, crisp approach

Procedia PDF Downloads 401
26854 Identification of Disease Causing DNA Motifs in Human DNA Using Clustering Approach

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Studying DNA (deoxyribonucleic acid) sequence is useful in biological processes and it is applied in the fields such as diagnostic and forensic research. DNA is the hereditary information in human and almost all other organisms. It is passed to their generations. Earlier stage detection of defective DNA sequence may lead to many developments in the field of Bioinformatics. Nowadays various tedious techniques are used to identify defective DNA. The proposed work is to analyze and identify the cancer-causing DNA motif in a given sequence. Initially the human DNA sequence is separated as k-mers using k-mer separation rule. The separated k-mers are clustered using Self Organizing Map (SOM). Using Levenshtein distance measure, cancer associated DNA motif is identified from the k-mer clusters. Experimental results of this work indicate the presence or absence of cancer causing DNA motif. If the cancer associated DNA motif is found in DNA, it is declared as the cancer disease causing DNA sequence. Otherwise the input human DNA is declared as normal sequence. Finally, elapsed time is calculated for finding the presence of cancer causing DNA motif using clustering formation. It is compared with normal process of finding cancer causing DNA motif. Locating cancer associated motif is easier in cluster formation process than the other one. The proposed work will be an initiative aid for finding genetic disease related research.

Keywords: bioinformatics, cancer motif, DNA, k-mers, Levenshtein distance, SOM

Procedia PDF Downloads 158