Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 11555

Search results for: objective function clustering

11375 Spatial-Temporal Clustering Characteristics of Dengue in the Northern Region of Sri Lanka, 2010-2013

Authors: Sumiko Anno, Keiji Imaoka, Takeo Tadono, Tamotsu Igarashi, Subramaniam Sivaganesh, Selvam Kannathasan, Vaithehi Kumaran, Sinnathamby Noble Surendran

Abstract:

Dengue outbreaks are affected by biological, ecological, socio-economic and demographic factors that vary over time and space. These factors have been examined separately and still require systematic clarification. The present study aimed to investigate the spatial-temporal clustering relationships between these factors and dengue outbreaks in the northern region of Sri Lanka. Remote sensing (RS) data gathered from a plurality of satellites were used to develop an index comprising rainfall, humidity and temperature data. RS data gathered by ALOS/AVNIR-2 were used to detect urbanization, and a digital land cover map was used to extract land cover information. Other data on relevant factors and dengue outbreaks were collected through institutions and extant databases. The analyzed RS data and databases were integrated into geographic information systems, enabling temporal analysis, spatial statistical analysis and space-time clustering analysis. Our present results showed that increases in the number of the combination of ecological factor and socio-economic and demographic factors with above the average or the presence contribute to significantly high rates of space-time dengue clusters.

Keywords: ALOS/AVNIR-2, dengue, space-time clustering analysis, Sri Lanka

Procedia PDF Downloads 462

11374 Lung Function, Urinary Heavy Metals And ITS Other Influencing Factors Among Community In Klang Valley

Authors: Ammar Amsyar Abdul Haddi, Mohd Hasni Jaafar

Abstract:

Heavy metals are elements naturally presented in the environment that can cause adverse effect to health. But not much literature was found on effects toward lung function, where impairment of lung function may lead to various lung diseases. The objective of the study is to explore the lung function impairment, urinary heavy metal level, and its associated factors among the community in Klang valley, Malaysia. Sampling was done in Kuala Lumpur suburb public and housing areas during community events throughout March 2019 till October 2019. respondents who gave the consent were given a questionnaire to answer and was proceeded with a lung function test. Urine samples were obtained at the end of the session and sent for Inductively coupled plasma mass spectrometry (ICP-MS) analysis for heavy metal cadmium (Cd) and lead (Pb) concentration. A total of 200 samples were analysed, and of all, 52% of respondents were male, Age ranging from 18 years old to 74 years old with a mean age of 38.44. Urinary samples show that 12% of the respondent (n=22) has Cd level above than average, and 1.5 % of the respondent (n=3) has urinary Pb at an above normal level. Bivariate analysis show that there was a positive correlation between urinary Cd and urinary Pb (r= 0.309; p<0.001). Furthermore, there was a negative correlation between urinary Cd level and full vital capacity (FVC) (r=-0.202, p=0.004), Force expiratory volume at 1 second (FEV1) (r = -0.225, p=0.001), and also with Force expiratory flow between 25-75% FVC (FEF25%-75%) (r= -0.187, p=0.008). however, urinary Pb did not show any association with FVC, FEV1, FEV1/FVC, or FEF25%-75%. Multiple linear regression analysis shows that urinary Cd remained significant and negatively affect FVC% (p=0.025) and FEV1% (p=0.004) achieved from the predicted value. On top of that, other factors such as education level (p=0.013) and duration of smoking(p=0.003) may influencing both urinary Cd and performance in lung function as well, suggesting Cd as a potential mediating factor between smoking and impairment of lung function. however, there was no interaction detected between heavy metal or other influencing factor in this study. In short, there is a negative linear relationship detected between urinary Cd and lung function, and urinary Cd is likely to affects lung function in a restrictive pattern. Since smoking is also an influencing factor for urinary Cd and lung function impairment, it is highly suggested that smokers should be screened for lung function and urinary Cd level in the future for early disease prevention.

Keywords: lung function, heavy metals, community

Procedia PDF Downloads 138

11373 Non-Differentiable Mond-Weir Type Symmetric Duality under Generalized Invexity

Authors: Jai Prakash Verma, Khushboo Verma

Abstract:

In the present paper, a pair of Mond-Weir type non-differentiable multiobjective second-order programming problems, involving two kernel functions, where each of the objective functions contains support function, is formulated. We prove weak, strong and converse duality theorem for the second-order symmetric dual programs under η-pseudoinvexity conditions.

Keywords: non-differentiable multiobjective programming, second-order symmetric duality, efficiency, support function, eta-pseudoinvexity

Procedia PDF Downloads 235

11372 The Modality of Multivariate Skew Normal Mixture

Authors: Bader Alruwaili, Surajit Ray

Abstract:

Finite mixtures are a flexible and powerful tool that can be used for univariate and multivariate distributions, and a wide range of research analysis has been conducted based on the multivariate normal mixture and multivariate of a t-mixture. Determining the number of modes is an important activity that, in turn, allows one to determine the number of homogeneous groups in a population. Our work currently being carried out relates to the study of the modality of the skew normal distribution in the univariate and multivariate cases. For the skew normal distribution, the aims are associated with studying the modality of the skew normal distribution and providing the ridgeline, the ridgeline elevation function, the $\Pi$ function, and the curvature function, and this will be conducive to an exploration of the number and location of mode when mixing the two components of skew normal distribution. The subsequent objective is to apply these results to the application of real world data sets, such as flow cytometry data.

Keywords: mode, modality, multivariate skew normal, finite mixture, number of mode

Procedia PDF Downloads 471

11371 Uncertainty Quantification of Corrosion Anomaly Length of Oil and Gas Steel Pipelines Based on Inline Inspection and Field Data

Authors: Tammeen Siraj, Wenxing Zhou, Terry Huang, Mohammad Al-Amin

Abstract:

The high resolution inline inspection (ILI) tool is used extensively in the pipeline industry to identify, locate, and measure metal-loss corrosion anomalies on buried oil and gas steel pipelines. Corrosion anomalies may occur singly (i.e. individual anomalies) or as clusters (i.e. a colony of corrosion anomalies). Although the ILI technology has advanced immensely, there are measurement errors associated with the sizes of corrosion anomalies reported by ILI tools due limitations of the tools and associated sizing algorithms, and detection threshold of the tools (i.e. the minimum detectable feature dimension). Quantifying the measurement error in the ILI data is crucial for corrosion management and developing maintenance strategies that satisfy the safety and economic constraints. Studies on the measurement error associated with the length of the corrosion anomalies (in the longitudinal direction of the pipeline) has been scarcely reported in the literature and will be investigated in the present study. Limitations in the ILI tool and clustering process can sometimes cause clustering error, which is defined as the error introduced during the clustering process by including or excluding a single or group of anomalies in or from a cluster. Clustering error has been found to be one of the biggest contributory factors for relatively high uncertainties associated with ILI reported anomaly length. As such, this study focuses on developing a consistent and comprehensive framework to quantify the measurement errors in the ILI-reported anomaly length by comparing the ILI data and corresponding field measurements for individual and clustered corrosion anomalies. The analysis carried out in this study is based on the ILI and field measurement data for a set of anomalies collected from two segments of a buried natural gas pipeline currently in service in Alberta, Canada. Data analyses showed that the measurement error associated with the ILI-reported length of the anomalies without clustering error, denoted as Type I anomalies is markedly less than that for anomalies with clustering error, denoted as Type II anomalies. A methodology employing data mining techniques is further proposed to classify the Type I and Type II anomalies based on the ILI-reported corrosion anomaly information.

Keywords: clustered corrosion anomaly, corrosion anomaly assessment, corrosion anomaly length, individual corrosion anomaly, metal-loss corrosion, oil and gas steel pipeline

Procedia PDF Downloads 294

11370 Agglomerative Hierarchical Clustering Based on Morphmetric Parameters of the Populations of Labeo rohita

Authors: Fayyaz Rasool, Naureen Aziz Qureshi, Shakeela Parveen

Abstract:

Labeo rohita populations from five geographical locations from the hatchery and riverine system of Punjab-Pakistan were studied for the clustering on the basis of similarities and differences based on morphometric parameters within the species. Agglomerative Hierarchical Clustering (AHC) was done by using Pearson Correlation Coefficient and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) as Agglomeration method by XLSTAT 2012 version 1.02. A dendrogram with the data on the morphometrics of the representative samples of each site divided the populations of Labeo rohita in to five major clusters or classes. The variance decomposition for the optimal classification values remained as 19.24% for within class variation, while 80.76% for the between class differences. The representative central objects of the each class, the distances between the class centroids and also the distance between the central objects of the classes were generated by the analysis. A measurable distinction between the classes of the populations of the Labeo rohita was indicated in this study which determined the impacts of changing environment and other possible factors influencing the variation level among the populations of the same species.

Keywords: AHC, Labeo rohita, hatchery, riverine, morphometric

Procedia PDF Downloads 438

11369 Switched System Diagnosis Based on Intelligent State Filtering with Unknown Models

Authors: Nada Slimane, Foued Theljani, Faouzi Bouani

Abstract:

The paper addresses the problem of fault diagnosis for systems operating in several modes (normal or faulty) based on states assessment. We use, for this purpose, a methodology consisting of three main processes: 1) sequential data clustering, 2) linear model regression and 3) state filtering. Typically, Kalman Filter (KF) is an algorithm that provides estimation of unknown states using a sequence of I/O measurements. Inevitably, although it is an efficient technique for state estimation, it presents two main weaknesses. First, it merely predicts states without being able to isolate/classify them according to their different operating modes, whether normal or faulty modes. To deal with this dilemma, the KF is endowed with an extra clustering step based fully on sequential version of the k-means algorithm. Second, to provide state estimation, KF requires state space models, which can be unknown. A linear regularized regression is used to identify the required models. To prove its effectiveness, the proposed approach is assessed on a simulated benchmark.

Keywords: clustering, diagnosis, Kalman Filtering, k-means, regularized regression

Procedia PDF Downloads 165

11368 Routing and Energy Efficiency through Data Coupled Clustering in Large Scale Wireless Sensor Networks (WSNs)

Authors: Jainendra Singh, Zaheeruddin

Abstract:

A typical wireless sensor networks (WSNs) consists of several tiny and low-power sensors which use radio frequency to perform distributed sensing tasks. The longevity of wireless sensor networks (WSNs) is a major issue that impacts the application of such networks. While routing protocols are striving to save energy by acting on sensor nodes, recent studies show that network lifetime can be enhanced by further involving sink mobility. A common approach for energy efficiency is partitioning the network into clusters with correlated data, where the representative nodes simply transmit or average measurements inside the cluster. In this paper, we propose an energy- efficient homogenous clustering (EHC) technique. In this technique, the decision of each sensor is based on their residual energy and an estimate of how many of its neighboring cluster heads (CHs) will benefit from it being a CH. We, also explore the routing algorithm in clustered WSNs. We show that the proposed schemes significantly outperform current approaches in terms of packet delay, hop count and energy consumption of WSNs.

Keywords: wireless sensor network, energy efficiency, clustering, routing

Procedia PDF Downloads 246

11367 Tuning Fractional Order Proportional-Integral-Derivative Controller Using Hybrid Genetic Algorithm Particle Swarm and Differential Evolution Optimization Methods for Automatic Voltage Regulator System

Authors: Fouzi Aboura

Abstract:

The fractional order proportional-integral-derivative (FOPID) controller or fractional order (PIλDµ) is a proportional-integral-derivative (PID) controller where integral order (λ) and derivative order (µ) are fractional, one of the important application of classical PID is the Automatic Voltage Regulator (AVR).The FOPID controller needs five parameters optimization while the design of conventional PID controller needs only three parameters to be optimized. In our paper we have proposed a comparison between algorithms Differential Evolution (DE) and Hybrid Genetic Algorithm Particle Swarm Optimization (HGAPSO) ,we have studied theirs characteristics and performance analysis to find an optimum parameters of the FOPID controller, a new objective function is also proposed to take into account the relation between the performance criteria’s.

Keywords: FOPID controller, fractional order, AVR system, objective function, optimization, GA, PSO, HGAPSO

Procedia PDF Downloads 78

11366 Multi Objective Optimization for Two-Sided Assembly Line Balancing

Authors: Srushti Bhatt, M. B. Kiran

Abstract:

Two-sided assembly line balancing problem is yet to be addressed simply to compete for the global market for manufacturers. The task assigned in an ordered sequence to get optimum performance of the system is known as assembly line balancing problem mainly classified as single and two sided. It is very challenging in manufacturing industries to balance two-sided assembly line, wherein the set of sequential workstations the task operations are performed in two sides of the line. The conflicting major objective in two-sided assembly line balancing problem is either to maximize /minimize the performance parameters. The present study emphases on combining different evolutionary algorithm; ant colony, Tabu search and petri net method; and compares their results of an algorithm for solving two-sided assembly line balancing problem. The concept of multi objective optimization of performance parameters is now a day adopted to make a decision involving more than one objective function to be simultaneously optimized. The optimum result can be expected among the selected methods using multi-objective optimization. The performance parameters considered in the present study are a number of workstation, slickness and smoothness index. The simulation of the assembly line balancing problem provides optimal results of classical and practical problems.

Keywords: Ant colony, petri net, tabu search, two sided ALBP

Procedia PDF Downloads 261

11365 Performance Analysis of Deterministic Stable Election Protocol Using Fuzzy Logic in Wireless Sensor Network

Authors: Sumanpreet Kaur, Harjit Pal Singh, Vikas Khullar

Abstract:

In Wireless Sensor Network (WSN), the sensor containing motes (nodes) incorporate batteries that can lament at some extent. To upgrade the energy utilization, clustering is one of the prototypical approaches for split sensor motes into a number of clusters where one mote (also called as node) proceeds as a Cluster Head (CH). CH selection is one of the optimization techniques for enlarging stability and network lifespan. Deterministic Stable Election Protocol (DSEP) is an effectual clustering protocol that makes use of three kinds of nodes with dissimilar residual energy for CH election. Fuzzy Logic technology is used to expand energy level of DSEP protocol by using fuzzy inference system. This paper presents protocol DSEP using Fuzzy Logic (DSEP-FL) CH by taking into account four linguistic variables such as energy, concentration, centrality and distance to base station. Simulation results show that our proposed method gives more effective results in term of a lifespan of network and stability as compared to the performance of other clustering protocols.

Keywords: DSEP, fuzzy logic, energy model, WSN

Procedia PDF Downloads 181

11364 Technical, Environmental and Financial Assessment for Optimal Sizing of Run-of-River Small Hydropower Project: Case Study in Colombia

Authors: David Calderon Villegas, Thomas Kaltizky

Abstract:

Run-of-river (RoR) hydropower projects represent a viable, clean, and cost-effective alternative to dam-based plants and provide decentralized power production. However, RoR schemes cost-effectiveness depends on the proper selection of site and design flow, which is a challenging task because it requires multivariate analysis. In this respect, this study presents the development of an investment decision support tool for assessing the optimal size of an RoR scheme considering the technical, environmental, and cost constraints. The net present value (NPV) from a project perspective is used as an objective function for supporting the investment decision. The tool has been tested by applying it to an actual RoR project recently proposed in Colombia. The obtained results show that the optimum point in financial terms does not match the flow that maximizes energy generation from exploiting the river's available flow. For the case study, the flow that maximizes energy corresponds to a value of 5.1 m3/s. In comparison, an amount of 2.1 m3/s maximizes the investors NPV. Finally, a sensitivity analysis is performed to determine the NPV as a function of the debt rate changes and the electricity prices and the CapEx. Even for the worst-case scenario, the optimal size represents a positive business case with an NPV of 2.2 USD million and an IRR 1.5 times higher than the discount rate.

Keywords: small hydropower, renewable energy, RoR schemes, optimal sizing, objective function

Procedia PDF Downloads 116

11363 Closed Forms of Trigonometric Series Interms of Riemann’s ζ Function and Dirichlet η, λ, β Functions or the Hurwitz Zeta Function and Harmonic Numbers

Authors: Slobodan B. Tričković

Abstract:

We present the results concerned with trigonometric series that include sine and cosine functions with a parameter appearing in the denominator. We derive two types of closed-form formulas for trigonometric series. At first, for some integer values, as we know that Riemann’s ζ function and Dirichlet η, λ equal zero at negative even integers, whereas Dirichlet’s β function equals zero at negative odd integers, after a certain number of members, the rest of the series vanishes. Thus, a trigonometric series becomes a polynomial with coefficients involving Riemann’s ζ function and Dirichlet η, λ, β functions. On the other hand, in some cases, one cannot immediately replace the parameter with any positive integer because we shall encounter singularities. So it is necessary to take a limit, so in the process, we apply L’Hospital’s rule and, after a series of rearrangements, we bring a trigonometric series to a form suitable for the application of Choi-Srivastava’s theorem dealing with Hurwitz’s zeta function and Harmonic numbers. In this way, we express a trigonometric series as a polynomial over Hurwitz’s zeta function derivative.

Keywords: Dirichlet eta lambda beta functions, Riemann's zeta function, Hurwitz zeta function, Harmonic numbers

Procedia PDF Downloads 82

11362 Optimal Power Distribution and Power Trading Control among Loads in a Smart Grid Operated Industry

Authors: Vivek Upadhayay, Siddharth Deshmukh

Abstract:

In recent years utilization of renewable energy sources has increased majorly because of the increase in global warming concerns. Organization these days are generally operated by Micro grid or smart grid on a small level. Power optimization and optimal load tripping is possible in a smart grid based industry. In any plant or industry loads can be divided into different categories based on their importance to the plant and power requirement pattern in the working days. Coming up with an idea to divide loads in different such categories and providing different power management algorithm to each category of load can reduce the power cost and can come handy in balancing stability and reliability of power. An objective function is defined which is subjected to a variable that we are supposed to minimize. Constraint equations are formed taking difference between the power usages pattern of present day and same day of previous week. By considering the objectives of minimal load tripping and optimal power distribution the proposed problem formulation is a multi-object optimization problem. Through normalization of each objective function, the multi-objective optimization is transformed to single-objective optimization. As a result we are getting the optimized values of power required to each load for present day by use of the past values of the required power for the same day of last week. It is quite a demand response scheduling of power. These minimized values then will be distributed to each load through an algorithm used to optimize the power distribution at a greater depth. In case of power storage exceeding the power requirement, profit can be made by selling exceeding power to the main grid.

Keywords: power flow optimization, power trading enhancement, smart grid, multi-object optimization

Procedia PDF Downloads 512

11361 A Concept of Data Mining with XML Document

Authors: Akshay Agrawal, Anand K. Srivastava

Abstract:

The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.

Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering

Procedia PDF Downloads 357

11360 Progressive Multimedia Collection Structuring via Scene Linking

Authors: Aman Berhe, Camille Guinaudeau, Claude Barras

Abstract:

In order to facilitate information seeking in large collections of multimedia documents with long and progressive content (such as broadcast news or TV series), one can extract the semantic links that exist between semantically coherent parts of documents, i.e., scenes. The links can then create a coherent collection of scenes from which it is easier to perform content analysis, topic extraction, or information retrieval. In this paper, we focus on TV series structuring and propose two approaches for scene linking at different levels of granularity (episode and season): a fuzzy online clustering technique and a graph-based community detection algorithm. When evaluated on the two first seasons of the TV series Game of Thrones, we found that the fuzzy online clustering approach performed better compared to graph-based community detection at the episode level, while graph-based approaches show better performance at the season level.

Keywords: multimedia collection structuring, progressive content, scene linking, fuzzy clustering, community detection

Procedia PDF Downloads 79

11359 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 178

11358 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators

Authors: Nur Aziza Luxfiati

Abstract:

Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.

Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development

Procedia PDF Downloads 145

11357 Stability Analysis of SEIR Epidemic Model with Treatment Function

Authors: Sasiporn Rattanasupha, Settapat Chinviriyasit

Abstract:

The treatment function adopts a continuous and differentiable function which can describe the effect of delayed treatment when the number of infected individuals increases and the medical condition is limited. In this paper, the SEIR epidemic model with treatment function is studied to investigate the dynamics of the model due to the effect of treatment. It is assumed that the treatment rate is proportional to the number of infective patients. The stability of the model is analyzed. The model is simulated to illustrate the analytical results and to investigate the effects of treatment on the spread of infection.

Keywords: basic reproduction number, local stability, SEIR epidemic model, treatment function

Procedia PDF Downloads 504

11356 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering

Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal

Abstract:

The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.

Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease

Procedia PDF Downloads 185

11355 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 174

11354 Integration of Quality Function Deployment and Modular Function Deployment in Product Development

Authors: Naga Velamakuri, Jyothi K. Reddy

Abstract:

Quality must be designed into a product and not inspected has become the main motto of all the companies globally. Due to the rapidly increasing technology in the past few decades, the nature of demands from the consumers has become more sophisticated. To sustain this global revolution of innovation in production systems, companies have to take steps to accommodate this technology growth. In this process of understanding the customers' expectations, all the firms globally take steps to deliver a perfect output. Most of these techniques also concentrate on the consistent development and optimization of the product to exceed the expectations. Quality Function Deployment(QFD) and Modular Function Deployment(MFD) are such techniques which rely on the voice of the customer and help deliver the needs. In this paper, Quality Function Deployment and Modular Function Deployment techniques which help in converting the quantitative descriptions to qualitative outcomes are discussed. The area of interest would be to understand the scope of each of the techniques and the application range in product development when these are applied together to any problem. The research question would be mainly aimed at comprehending the limitations using modularity in product development.

Keywords: quality function deployment, modular function deployment, house of quality, methodology

Procedia PDF Downloads 307

11353 Improvement of Process Competitiveness Using Intelligent Reference Models

Authors: Julio Macedo

Abstract:

Several methodologies are now available to conceive the improvements of a process so that it becomes competitive as for example total quality, process reengineering, six sigma, define measure analysis improvement control method. These improvements are of different nature and can be external to the process represented by an optimization model or a discrete simulation model. In addition, the process stakeholders are several and have different desired performances for the process. Hence, the methodologies above do not have a tool to aid in the conception of the required improvements. In order to fill this void we suggest the use of intelligent reference models. A reference model is a set of qualitative differential equations and an objective function that minimizes the gap between the current and the desired performance indexes of the process. The reference models are intelligent so when they receive the current state of the problematic process and the desired performance indexes they generate the required improvements for the problematic process. The reference models are fuzzy cognitive maps added with an objective function and trained using the improvements implemented by the high performance firms. Experiments done in a set of students show the reference models allow them to conceive more improvements than students that do not use these models.

Keywords: continuous improvement, fuzzy cognitive maps, process competitiveness, qualitative simulation, system dynamics

Procedia PDF Downloads 71

11352 Optimization of Solar Rankine Cycle by Exergy Analysis and Genetic Algorithm

Authors: R. Akbari, M. A. Ehyaei, R. Shahi Shavvon

Abstract:

Nowadays, solar energy is used for energy purposes such as the use of thermal energy for domestic, industrial and power applications, as well as the conversion of the sunlight into electricity by photovoltaic cells. In this study, the thermodynamic simulation of the solar Rankin cycle with phase change material (paraffin) was first studied. Then energy and exergy analyses were performed. For optimization, a single and multi-objective genetic optimization algorithm to maximize thermal and exergy efficiency was used. The parameters discussed in this paper included the effects of input pressure on turbines, input mass flow to turbines, the surface of converters and collector angles on thermal and exergy efficiency. In the organic Rankin cycle, where solar energy is used as input energy, the fluid selection is considered as a necessary factor to achieve reliable and efficient operation. Therefore, silicon oil is selected for a high-temperature cycle and water for a low-temperature cycle as an operating fluid. The results showed that increasing the mass flow to turbines 1 and 2 would increase thermal efficiency, while it reduces and increases the exergy efficiency in turbines 1 and 2, respectively. Increasing the inlet pressure to the turbine 1 decreases the thermal and exergy efficiency, and increasing the inlet pressure to the turbine 2 increases the thermal efficiency and exergy efficiency. Also, increasing the angle of the collector increased thermal efficiency and exergy. The thermal efficiency of the system was 22.3% which improves to 33.2 and 27.2% in single-objective and multi-objective optimization, respectively. Also, the exergy efficiency of the system was 1.33% which has been improved to 1.719 and 1.529% in single-objective and multi-objective optimization, respectively. These results showed that the thermal and exergy efficiency in a single-objective optimization is greater than the multi-objective optimization.

Keywords: exergy analysis, genetic algorithm, rankine cycle, single and multi-objective function

Procedia PDF Downloads 131

11351 Automatic Detection of Traffic Stop Locations Using GPS Data

Authors: Areej Salaymeh, Loren Schwiebert, Stephen Remias, Jonathan Waddell

Abstract:

Extracting information from new data sources has emerged as a crucial task in many traffic planning processes, such as identifying traffic patterns, route planning, traffic forecasting, and locating infrastructure improvements. Given the advanced technologies used to collect Global Positioning System (GPS) data from dedicated GPS devices, GPS equipped phones, and navigation tools, intelligent data analysis methodologies are necessary to mine this raw data. In this research, an automatic detection framework is proposed to help identify and classify the locations of stopped GPS waypoints into two main categories: signalized intersections or highway congestion. The Delaunay triangulation is used to perform this assessment in the clustering phase. While most of the existing clustering algorithms need assumptions about the data distribution, the effectiveness of the Delaunay triangulation relies on triangulating geographical data points without such assumptions. Our proposed method starts by cleaning noise from the data and normalizing it. Next, the framework will identify stoppage points by calculating the traveled distance. The last step is to use clustering to form groups of waypoints for signalized traffic and highway congestion. Next, a binary classifier was applied to find distinguish highway congestion from signalized stop points. The binary classifier uses the length of the cluster to find congestion. The proposed framework shows high accuracy for identifying the stop positions and congestion points in around 99.2% of trials. We show that it is possible, using limited GPS data, to distinguish with high accuracy.

Keywords: Delaunay triangulation, clustering, intelligent transportation systems, GPS data

Procedia PDF Downloads 258

11350 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score

Authors: Jianfeng Hu

Abstract:

Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.

Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes

Procedia PDF Downloads 267

11349 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting

Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi

Abstract:

An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.

Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power

Procedia PDF Downloads 393

11348 Finding the Longest Common Subsequence in Normal DNA and Disease Affected Human DNA Using Self Organizing Map

Authors: G. Tamilpavai, C. Vishnuppriya

Abstract:

Bioinformatics is an active research area which combines biological matter as well as computer science research. The longest common subsequence (LCSS) is one of the major challenges in various bioinformatics applications. The computation of the LCSS plays a vital role in biomedicine and also it is an essential task in DNA sequence analysis in genetics. It includes wide range of disease diagnosing steps. The objective of this proposed system is to find the longest common subsequence which presents in a normal and various disease affected human DNA sequence using Self Organizing Map (SOM) and LCSS. The human DNA sequence is collected from National Center for Biotechnology Information (NCBI) database. Initially, the human DNA sequence is separated as k-mer using k-mer separation rule. Mean and median values are calculated from each separated k-mer. These calculated values are fed as input to the Self Organizing Map for the purpose of clustering. Then obtained clusters are given to the Longest Common Sub Sequence (LCSS) algorithm for finding common subsequence which presents in every clusters. It returns nx(n-1)/2 subsequence for each cluster where n is number of k-mer in a specific cluster. Experimental outcomes of this proposed system produce the possible number of longest common subsequence of normal and disease affected DNA data. Thus the proposed system will be a good initiative aid for finding disease causing sequence. Finally, performance analysis is carried out for different DNA sequences. The obtained values show that the retrieval of LCSS is done in a shorter time than the existing system.

Keywords: clustering, k-mers, longest common subsequence, SOM

Procedia PDF Downloads 245

11347 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis

Abstract:

With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: clustering, load profiling, load modeling, machine learning, energy efficiency and quality

Procedia PDF Downloads 142

11346 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 175