Search results for: k-means clustering based feature weighting
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 29262

Search results for: k-means clustering based feature weighting

28752 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 73
28751 Institutional Segmantation and Country Clustering: Implications for Multinational Enterprises Over Standardized Management

Authors: Jung-Hoon Han, Jooyoung Kwak

Abstract:

Distances between cultures, institutions are gaining academic attention once again since the classical debate on the validity of globalization. Despite the incessant efforts to define international segments with various concepts, no significant attempts have been made considering the institutional dimensions. Resource-based theory and institutional theory provides useful insights in assessing market environment and understanding when and how MNEs loose or gain advantages. This study consists of two parts: identifying institutional clusters and predicting the effect of MNEs’ origin on the applicability of competitive advantages. MNEs in one country cluster are expected to use similar management systems.

Keywords: institutional theory, resource-based theory, institutional environment, cultural dimensions, cluster analysis, standardized management

Procedia PDF Downloads 489
28750 Multidisciplinary and Multilevel Design Methodology of Unmanned Aerial Vehicles using Enhanced Collaborative Optimization

Authors: Pedro F. Albuquerque, Pedro V. Gamboa, Miguel A. Silvestre

Abstract:

The present work describes the implementation of the Enhanced Collaborative Optimization (ECO) multilevel architecture with a gradient-based optimization algorithm with the aim of performing a multidisciplinary design optimization of a generic unmanned aerial vehicle with morphing technologies. The concepts of weighting coefficient and a dynamic compatibility parameter are presented for the ECO architecture. A routine that calculates the aircraft performance for the user defined mission profile and vehicle’s performance requirements has been implemented using low fidelity models for the aerodynamics, stability, propulsion, weight, balance and flight performance. A benchmarking case study for evaluating the advantage of using a variable span wing within the optimization methodology developed is presented.

Keywords: multidisciplinary, multilevel, morphing, enhanced collaborative optimization

Procedia PDF Downloads 929
28749 Clustering Using Cooperative Multihop Mini-Groups in Wireless Sensor Network: A Novel Approach

Authors: Virender Ranga, Mayank Dave, Anil Kumar Verma

Abstract:

Recently wireless sensor networks (WSNs) are used in many real life applications like environmental monitoring, habitat monitoring, health monitoring etc. Due to power constraint cheaper devices used in these applications, the energy consumption of each device should be kept as low as possible such that network operates for longer period of time. One of the techniques to prolong the network lifetime is an intelligent grouping of sensor nodes such that they can perform their operation in cooperative and energy efficient manner. With this motivation, we propose a novel approach by organize the sensor nodes in cooperative multihop mini-groups so that the total global energy consumption of the network can be reduced and network lifetime can be improved. Our proposed approach also reduces the number of transmitted messages inside the WSNs, which further minimizes the energy consumption of the whole network. The experimental simulations show that our proposed approach outperforms over the state-of-the-art approach in terms of stability period and aggregated data.

Keywords: clustering, cluster-head, mini-group, stability period

Procedia PDF Downloads 357
28748 Surface Flattening Assisted with 3D Mannequin Based on Minimum Energy

Authors: Shih-Wen Hsiao, Rong-Qi Chen, Chien-Yu Lin

Abstract:

The topic of surface flattening plays a vital role in the field of computer aided design and manufacture. Surface flattening enables the production of 2D patterns and it can be used in design and manufacturing for developing a 3D surface to a 2D platform, especially in fashion design. This study describes surface flattening based on minimum energy methods according to the property of different fabrics. Firstly, through the geometric feature of a 3D surface, the less transformed area can be flattened on a 2D platform by geodesic. Then, strain energy that has accumulated in mesh can be stably released by an approximate implicit method and revised error function. In some cases, cutting mesh to further release the energy is a common way to fix the situation and enhance the accuracy of the surface flattening, and this makes the obtained 2D pattern naturally generate significant cracks. When this methodology is applied to a 3D mannequin constructed with feature lines, it enhances the level of computer-aided fashion design. Besides, when different fabrics are applied to fashion design, it is necessary to revise the shape of a 2D pattern according to the properties of the fabric. With this model, the outline of 2D patterns can be revised by distributing the strain energy with different results according to different fabric properties. Finally, this research uses some common design cases to illustrate and verify the feasibility of this methodology.

Keywords: surface flattening, strain energy, minimum energy, approximate implicit method, fashion design

Procedia PDF Downloads 334
28747 TMBCoI-SIOT: Trust Management System Based on the Community of Interest for the Social Internet of Things

Authors: Oumaima Ben Abderrahim, Mohamed Houcine Elhedhili, Leila Saidane

Abstract:

In this paper, we propose a trust management system based on clustering architecture for the social internet of things called TMBCO-SIOT. The proposed model integrates numerous factors such as direct and indirect trust; transaction factor; precaution factor; and social modeling of trust. The novelty of our approach can be summed up in two aspects. The first aspect concerns the architecture based on the community of interest (CoT) where each community is headed by an administrator (admin). However, the second aspect is the trust management system that tries to prevent On-Off attacks and mitigates dishonest recommendations using the k-means algorithm and guarantor things. The effectiveness of the proposed system is proved by simulation against malicious nodes.

Keywords: IoT, trust management system, attacks, trust, dishonest recommendations, K-means algorithm

Procedia PDF Downloads 212
28746 Verbal Prefix Selection in Old Japanese: A Corpus-Based Study

Authors: Zixi You

Abstract:

There are a number of verbal prefixes in Old Japanese. However, the selection or the compatibility of verbs and verbal prefixes is among the least investigated topics on Old Japanese language. Unlike other types of prefixes, verbal prefixes in dictionaries are more often than not listed with very brief information such as ‘unknown meaning’ or ‘rhythmic function only’. To fill in a part of this knowledge gap, this paper presents an exhaustive investigation based on the newly developed ‘Oxford Corpus of Old Japanese’ (OCOJ), which included nearly all existing resource of Old Japanese language, with detailed linguistics information in TEI-XML tags. In this paper, we propose the possibility that the following three prefixes, i-, sa-, ta- (with ta- being considered as a variation of sa-), are relevant to split intransitivity in Old Japanese, with evidence that unergative verbs favor i- and that unergative verbs favor sa-(ta-). This might be undermined by the fact that transitives are also found to follow i-. However, with several manifestations of split intransitivity in Old Japanese discussed, the behavior of transitives in verbal prefix selection is no longer as surprising as it may seem to be when one look at the selection of verbal prefix in isolation. It is possible that there are one or more features that played essential roles in determining the selection of i-, and the attested transitive verbs happen to have these features. The data suggest that this feature is a sense of ‘change’ of location or state involved in the event donated by the verb, which is a feature of typical unaccusatives. This is further discussed in the ‘affectedness’ hierarchy. The presentation of this paper, which includes a brief demonstration of the OCOJ, is expected to be of the interest of both specialists and general audiences.

Keywords: old Japanese, split intransitivity, unaccusatives, unergatives, verbal prefix selection

Procedia PDF Downloads 415
28745 Fault-Detection and Self-Stabilization Protocol for Wireless Sensor Networks

Authors: Ather Saeed, Arif Khan, Jeffrey Gosper

Abstract:

Sensor devices are prone to errors and sudden node failures, which are difficult to detect in a timely manner when deployed in real-time, hazardous, large-scale harsh environments and in medical emergencies. Therefore, the loss of data can be life-threatening when the sensed phenomenon is not disseminated due to sudden node failure, battery depletion or temporary malfunctioning. We introduce a set of partial differential equations for localizing faults, similar to Green’s and Maxwell’s equations used in Electrostatics and Electromagnetism. We introduce a node organization and clustering scheme for self-stabilizing sensor networks. Green’s theorem is applied to regions where the curve is closed and continuously differentiable to ensure network connectivity. Experimental results show that the proposed GTFD (Green’s Theorem fault-detection and Self-stabilization) protocol not only detects faulty nodes but also accurately generates network stability graphs where urgent intervention is required for dynamically self-stabilizing the network.

Keywords: Green’s Theorem, self-stabilization, fault-localization, RSSI, WSN, clustering

Procedia PDF Downloads 75
28744 Methods of Livable Goal-Oriented Master Urban Design: A Case Study on Zibo City

Authors: Xiaoping Zhang, Fengying Yan

Abstract:

The implementation of the 'Urban Design Management Measures' requires that the master urban design should aim at creating a livable urban space. However, to our best knowledge, the existing researches and practices of master urban design not only focus less on the livable space but also face a number of problems such as paying more attention to the image of the city, ignoring the people-oriented and lacking dynamic continuity. In order to make the master urban design can better guide the construction of city. Firstly, the paper proposes the livable city hierarchy system to meet the needs of different groups of people and then constructs the framework of livable goal-oriented master urban design based on the theory of livable content and the ideological origin of people-oriented. Secondly, the paper takes the master urban design practice of Zibo as a sample and puts forward the design strategy of strengthening the pattern, improve the quality of space, shape the feature, and establish a series of action plans based on the strategy of urban space development. Finally, the paper explores the method system of livable goal-oriented master urban design from the aspects of safety pattern, morphology pattern, neighborhood scale, open space, street space, public interface, style feature, public participation and action plans.

Keywords: livable, master urban design, public participation, zibo city

Procedia PDF Downloads 316
28743 Design of Agricultural Machinery Factory Facility Layout

Authors: Nilda Tri Putri, Muhammad Taufik

Abstract:

Tools and agricultural machinery (Alsintan) is a tool used in agribusiness activities. Alsintan used to change the traditional farming systems generally use manual equipment into modern agriculture with mechanization. CV Nugraha Chakti Consultant make an action plan for industrial development Alsintan West Sumatra in 2012 to develop medium industries of Alsintan become a major industry of Alsintan, one of efforts made is increase the production capacity of the industry Alsintan. Production capacity for superior products as hydrotiller and threshers set each for 2.000 units per year. CV Citra Dragon as one of the medium industry alsintan in West Sumatra has a plan to relocate the existing plant to meet growing consumer demand each year. Increased production capacity and plant relocation plan has led to a change in the layout; therefore need to design the layout of the plant facility CV Citra Dragon. First step the to design of plant layout is design the layout of the production floor. The design of the production floor layout is done by applying group technology layout. The initial step is to do a machine grouping and part family using the Average Linkage Clustering (ALC) and Rank Order Clustering (ROC). Furthermore done independent work station design and layout design using the Modified Spanning Tree (MST). Alternative selection layout is done to select the best production floor layout between ALC and ROC cell grouping. Furthermore, to design the layout of warehouses, offices and other production support facilities. Activity Relationship Chart methods used to organize the placement of factory facilities has been designed. After structuring plan facilities, calculated cost manufacturing facility plant establishment. Type of layout is used on the production floor layout technology group. The production floor is composed of four cell machinery, assembly area and painting area. The total distance of the displacement of material in a single production amounted to 1120.16 m which means need 18,7minutes of transportation time for one time production. Alsintan Factory has designed a circular flow pattern with 11 facilities. The facilities were designed consisting of 10 rooms and 1 parking space. The measure of factory building is 84 m x 52 m.

Keywords: Average Linkage Clustering (ALC), Rank Order Clustering (ROC), Modified Spanning Tree (MST), Activity Relationship Chart (ARC)

Procedia PDF Downloads 496
28742 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 134
28741 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 135
28740 Code Embedding for Software Vulnerability Discovery Based on Semantic Information

Authors: Joseph Gear, Yue Xu, Ernest Foo, Praveen Gauravaran, Zahra Jadidi, Leonie Simpson

Abstract:

Deep learning methods have been seeing an increasing application to the long-standing security research goal of automatic vulnerability detection for source code. Attention, however, must still be paid to the task of producing vector representations for source code (code embeddings) as input for these deep learning models. Graphical representations of code, most predominantly Abstract Syntax Trees and Code Property Graphs, have received some use in this task of late; however, for very large graphs representing very large code snip- pets, learning becomes prohibitively computationally expensive. This expense may be reduced by intelligently pruning this input to only vulnerability-relevant information; however, little research in this area has been performed. Additionally, most existing work comprehends code based solely on the structure of the graph at the expense of the information contained by the node in the graph. This paper proposes Semantic-enhanced Code Embedding for Vulnerability Discovery (SCEVD), a deep learning model which uses semantic-based feature selection for its vulnerability classification model. It uses information from the nodes as well as the structure of the code graph in order to select features which are most indicative of the presence or absence of vulnerabilities. This model is implemented and experimentally tested using the SARD Juliet vulnerability test suite to determine its efficacy. It is able to improve on existing code graph feature selection methods, as demonstrated by its improved ability to discover vulnerabilities.

Keywords: code representation, deep learning, source code semantics, vulnerability discovery

Procedia PDF Downloads 158
28739 E-Hailing Taxi Industry Management Mode Innovation Based on the Credit Evaluation

Authors: Yuan-lin Liu, Ye Li, Tian Xia

Abstract:

There are some shortcomings in Chinese existing taxi management modes. This paper suggests to establish the third-party comprehensive information management platform and put forward an evaluation model based on credit. Four indicators are used to evaluate the drivers’ credit, they are passengers’ evaluation score, driving behavior evaluation, drivers’ average bad record number, and personal credit score. A weighted clustering method is used to achieve credit level evaluation for taxi drivers. The management of taxi industry is based on the credit level, while the grade of the drivers is accorded to their credit rating. Credit rating determines the cost, income levels, the market access, useful period of license and the level of wage and bonus, as well as violation fine. These methods can make the credit evaluation effective. In conclusion, more credit data will help to set up a more accurate and detailed classification standard library.

Keywords: credit, mobile internet, e-hailing taxi, management mode, weighted cluster

Procedia PDF Downloads 325
28738 Identification of Watershed Landscape Character Types in Middle Yangtze River within Wuhan Metropolitan Area

Authors: Huijie Wang, Bin Zhang

Abstract:

In China, the middle reaches of the Yangtze River are well-developed, boasting a wealth of different types of watershed landscape. In this regard, landscape character assessment (LCA) can serve as a basis for protection, management and planning of trans-regional watershed landscape types. For this study, we chose the middle reaches of the Yangtze River in Wuhan metropolitan area as our study site, wherein the water system consists of rich variety in landscape types. We analyzed trans-regional data to cluster and identify types of landscape characteristics at two levels. 55 basins were analyzed as variables with topography, land cover and river system features in order to identify the watershed landscape character types. For watershed landscape, drainage density and degree of curvature were specified as special variables to directly reflect the regional differences of river system features. Then, we used the principal component analysis (PCA) method and hierarchical clustering algorithm based on the geographic information system (GIS) and statistical products and services solution (SPSS) to obtain results for clusters of watershed landscape which were divided into 8 characteristic groups. These groups highlighted watershed landscape characteristics of different river systems as well as key landscape characteristics that can serve as a basis for targeted protection of watershed landscape characteristics, thus helping to rationally develop multi-value landscape resources and promote coordinated development of trans-regions.

Keywords: GIS, hierarchical clustering, landscape character, landscape typology, principal component analysis, watershed

Procedia PDF Downloads 228
28737 The Relationship between Human Pose and Intention to Fire a Handgun

Authors: Joshua van Staden, Dane Brown, Karen Bradshaw

Abstract:

Gun violence is a significant problem in modern-day society. Early detection of carried handguns through closed-circuit television (CCTV) can aid in preventing potential gun violence. However, CCTV operators have a limited attention span. Machine learning approaches to automating the detection of dangerous gun carriers provide a way to aid CCTV operators in identifying these individuals. This study provides insight into the relationship between human key points extracted using human pose estimation (HPE) and their intention to fire a weapon. We examine the feature importance of each keypoint and their correlations. We use principal component analysis (PCA) to reduce the feature space and optimize detection. Finally, we run a set of classifiers to determine what form of classifier performs well on this data. We find that hips, shoulders, and knees tend to be crucial aspects of the human pose when making these predictions. Furthermore, the horizontal position plays a larger role than the vertical position. Of the 66 key points, nine principal components could be used to make nonlinear classifications with 86% accuracy. Furthermore, linear classifications could be done with 85% accuracy, showing that there is a degree of linearity in the data.

Keywords: feature engineering, human pose, machine learning, security

Procedia PDF Downloads 93
28736 Occupational Attainment of Second Generation of Ethnic Minority Immigrants in the UK

Authors: Rukhsana Kausar, Issam Malki

Abstract:

The integration and assimilation of ethnic minority immigrants (EMIs) and their subsequent generations remains a serious unsettled issue in most of the host countries. This study conducts the labour market gender analysis to investigate specifically whether second generation of ethnic minority immigrants in the UK is gaining access to professional and managerial employment and advantaged occupational positions on par with their native counterparts. The data used to examine the labour market achievements of EMIs is taken from Labour Force Survey (LFS) for the period 2014-2018. We apply a multivalued treatment under ignorability as proposed by Cattaneo (2010), which refers to treatment effects under the assumptions of (i) selection – on – observables and (ii) common support. We report estimates of Average Treatment Effect (ATE), Average Treatment Effect on the Treated (ATET), and Potential Outcomes Means (POM) using three estimators, including the Regression Adjustment (RA), Augmented Inverse Probability Weighting (AIPW) and Inverse Probability Weighting- Regression Adjustment (IPWRA). We consider two cases: the case with four categories where the first-generation natives are the base category, the second case combine all natives as a base group. Our findings suggest the following. Under Case 1, the estimated probabilities and differences across groups are consistently similar and highly significant. As expected, first generation natives have the highest probability for higher career attainment among both men and women. The findings also suggest that first generation immigrants perform better than the remaining two groups, including the second-generation natives and immigrants. Furthermore, second generation immigrants have higher probability to attain higher professional career, while this is lower for a managerial career. Similar conclusions are reached under Case 2. That is to say that both first – generation and second – generation immigrants have a lower probability for higher career and managerial attainment. First – generation immigrants are found to perform better than second – generation immigrants.

Keywords: immigrnats, second generation, occupational attainment, ethnicity

Procedia PDF Downloads 108
28735 GBKMeans: A Genetic Based K-Means Applied to the Capacitated Planning of Reading Units

Authors: Anderson S. Fonseca, Italo F. S. Da Silva, Robert D. A. Santos, Mayara G. Da Silva, Pedro H. C. Vieira, Antonio M. S. Sobrinho, Victor H. B. Lemos, Petterson S. Diniz, Anselmo C. Paiva, Eliana M. G. Monteiro

Abstract:

In Brazil, the National Electric Energy Agency (ANEEL) establishes that electrical energy companies are responsible for measuring and billing their customers. Among these regulations, it’s defined that a company must bill your customers within 27-33 days. If a relocation or a change of period is required, the consumer must be notified in writing, in advance of a billing period. To make it easier to organize a workday’s measurements, these companies create a reading plan. These plans consist of grouping customers into reading groups, which are visited by an employee responsible for measuring consumption and billing. The creation process of a plan efficiently and optimally is a capacitated clustering problem with constraints related to homogeneity and compactness, that is, the employee’s working load and the geographical position of the consuming unit. This process is a work done manually by several experts who have experience in the geographic formation of the region, which takes a large number of days to complete the final planning, and because it’s human activity, there is no guarantee of finding the best optimization for planning. In this paper, the GBKMeans method presents a technique based on K-Means and genetic algorithms for creating a capacitated cluster that respects the constraints established in an efficient and balanced manner, that minimizes the cost of relocating consumer units and the time required for final planning creation. The results obtained by the presented method are compared with the current planning of a real city, showing an improvement of 54.71% in the standard deviation of working load and 11.97% in the compactness of the groups.

Keywords: capacitated clustering, k-means, genetic algorithm, districting problems

Procedia PDF Downloads 198
28734 Static vs. Stream Mining Trajectories Similarity Measures

Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh

Abstract:

Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.

Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining

Procedia PDF Downloads 396
28733 The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition

Authors: Fawaz S. Al-Anzi, Dia AbuZeina

Abstract:

Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit.

Keywords: speech recognition, acoustic features, mel frequency, cepstral coefficients

Procedia PDF Downloads 259
28732 Genomic Prediction Reliability Using Haplotypes Defined by Different Methods

Authors: Sohyoung Won, Heebal Kim, Dajeong Lim

Abstract:

Genomic prediction is an effective way to measure the abilities of livestock for breeding based on genomic estimated breeding values, statistically predicted values from genotype data using best linear unbiased prediction (BLUP). Using haplotypes, clusters of linked single nucleotide polymorphisms (SNPs), as markers instead of individual SNPs can improve the reliability of genomic prediction since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with markers is higher. To efficiently use haplotypes in genomic prediction, finding optimal ways to define haplotypes is needed. In this study, 770K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 2506 cattle. Haplotypes were first defined in three different ways using 770K SNP chip data: haplotypes were defined based on 1) length of haplotypes (bp), 2) the number of SNPs, and 3) k-medoids clustering by LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; in each method, haplotypes defined to have an average number of 5, 10, 20 or 50 SNPs were tested respectively. A modified GBLUP method using haplotype alleles as predictor variables was implemented for testing the prediction reliability of each haplotype set. Also, conventional genomic BLUP (GBLUP) method, which uses individual SNPs were tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight was used as the phenotype for testing. As a result, using haplotypes defined by all three methods showed increased reliability compared to conventional GBLUP. There were not many differences in the reliability between different haplotype defining methods. The reliability of genomic prediction was highest when the average number of SNPs per haplotype was 20 in all three methods, implying that haplotypes including around 20 SNPs can be optimal to use as markers for genomic prediction. When the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles. Using haplotype alleles for genomic prediction showed better performance, suggesting improved accuracy in genomic selection. The number of predictor variables was decreased when the LD-based method was used while all three haplotype defining methods showed similar performances. This suggests that defining haplotypes based on LD can reduce computational costs and allows efficient prediction. Finding optimal ways to define haplotypes and using the haplotype alleles as markers can provide improved performance and efficiency in genomic prediction.

Keywords: best linear unbiased predictor, genomic prediction, haplotype, linkage disequilibrium

Procedia PDF Downloads 141
28731 ANOVA-Based Feature Selection and Machine Learning System for IoT Anomaly Detection

Authors: Muhammad Ali

Abstract:

Cyber-attacks and anomaly detection on the Internet of Things (IoT) infrastructure is emerging concern in the domain of data-driven intrusion. Rapidly increasing IoT risk is now making headlines around the world. denial of service, malicious control, data type probing, malicious operation, DDos, scan, spying, and wrong setup are attacks and anomalies that can affect an IoT system failure. Everyone talks about cyber security, connectivity, smart devices, and real-time data extraction. IoT devices expose a wide variety of new cyber security attack vectors in network traffic. For further than IoT development, and mainly for smart and IoT applications, there is a necessity for intelligent processing and analysis of data. So, our approach is too secure. We train several machine learning models that have been compared to accurately predicting attacks and anomalies on IoT systems, considering IoT applications, with ANOVA-based feature selection with fewer prediction models to evaluate network traffic to help prevent IoT devices. The machine learning (ML) algorithms that have been used here are KNN, SVM, NB, D.T., and R.F., with the most satisfactory test accuracy with fast detection. The evaluation of ML metrics includes precision, recall, F1 score, FPR, NPV, G.M., MCC, and AUC & ROC. The Random Forest algorithm achieved the best results with less prediction time, with an accuracy of 99.98%.

Keywords: machine learning, analysis of variance, Internet of Thing, network security, intrusion detection

Procedia PDF Downloads 125
28730 Drug-Drug Interaction Prediction in Diabetes Mellitus

Authors: Rashini Maduka, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

Drug-drug interactions (DDIs) can happen when two or more drugs are taken together. Today DDIs have become a serious health issue due to adverse drug effects. In vivo and in vitro methods for identifying DDIs are time-consuming and costly. Therefore, in-silico-based approaches are preferred in DDI identification. Most machine learning models for DDI prediction are used chemical and biological drug properties as features. However, some drug features are not available and costly to extract. Therefore, it is better to make automatic feature engineering. Furthermore, people who have diabetes already suffer from other diseases and take more than one medicine together. Then adverse drug effects may happen to diabetic patients and cause unpleasant reactions in the body. In this study, we present a model with a graph convolutional autoencoder and a graph decoder using a dataset from DrugBank version 5.1.3. The main objective of the model is to identify unknown interactions between antidiabetic drugs and the drugs taken by diabetic patients for other diseases. We considered automatic feature engineering and used Known DDIs only as the input for the model. Our model has achieved 0.86 in AUC and 0.86 in AP.

Keywords: drug-drug interaction prediction, graph embedding, graph convolutional networks, adverse drug effects

Procedia PDF Downloads 100
28729 Towards a Distributed Computation Platform Tailored for Educational Process Discovery and Analysis

Authors: Awatef Hicheur Cairns, Billel Gueni, Hind Hafdi, Christian Joubert, Nasser Khelifa

Abstract:

Given the ever changing needs of the job markets, education and training centers are increasingly held accountable for student success. Therefore, education and training centers have to focus on ways to streamline their offers and educational processes in order to achieve the highest level of quality in curriculum contents and managerial decisions. Educational process mining is an emerging field in the educational data mining (EDM) discipline, concerned with developing methods to discover, analyze and provide a visual representation of complete educational processes. In this paper, we present our distributed computation platform which allows different education centers and institutions to load their data and access to advanced data mining and process mining services. To achieve this, we present also a comparative study of the different clustering techniques developed in the context of process mining to partition efficiently educational traces. Our goal is to find the best strategy for distributing heavy analysis computations on many processing nodes of our platform.

Keywords: educational process mining, distributed process mining, clustering, distributed platform, educational data mining, ProM

Procedia PDF Downloads 454
28728 Study for an Optimal Cable Connection within an Inner Grid of an Offshore Wind Farm

Authors: Je-Seok Shin, Wook-Won Kim, Jin-O Kim

Abstract:

The offshore wind farm needs to be designed carefully considering economics and reliability aspects. There are many decision-making problems for designing entire offshore wind farm, this paper focuses on an inner grid layout which means the connection between wind turbines as well as between wind turbines and an offshore substation. A methodology proposed in this paper determines the connections and the cable type for each connection section using K-clustering, minimum spanning tree and cable selection algorithms. And then, a cost evaluation is performed in terms of investment, power loss and reliability. Through the cost evaluation, an optimal layout of inner grid is determined so as to have the lowest total cost. In order to demonstrate the validity of the methodology, the case study is conducted on 240MW offshore wind farm, and the results show that it is helpful to design optimally offshore wind farm.

Keywords: offshore wind farm, optimal layout, k-clustering algorithm, minimum spanning algorithm, cable type selection, power loss cost, reliability cost

Procedia PDF Downloads 385
28727 An Improved Tracking Approach Using Particle Filter and Background Subtraction

Authors: Amir Mukhtar, Dr. Likun Xia

Abstract:

An improved, robust and efficient visual target tracking algorithm using particle filtering is proposed. Particle filtering has been proven very successful in estimating non-Gaussian and non-linear problems. In this paper, the particle filter is used with color feature to estimate the target state with time. Color distributions are applied as this feature is scale and rotational invariant, shows robustness to partial occlusion and computationally efficient. The performance is made more robust by choosing the different (YIQ) color scheme. Tracking is performed by comparison of chrominance histograms of target and candidate positions (particles). Color based particle filter tracking often leads to inaccurate results when light intensity changes during a video stream. Furthermore, background subtraction technique is used for size estimation of the target. The qualitative evaluation of proposed algorithm is performed on several real-world videos. The experimental results demonstrate that the improved algorithm can track the moving objects very well under illumination changes, occlusion and moving background.

Keywords: tracking, particle filter, histogram, corner points, occlusion, illumination

Procedia PDF Downloads 381
28726 Assessment of the Groundwater Agricultural Pollution Risk: Case of the Semi-Arid Region (Batna-East Algeria)

Authors: Dib Imane, Chettah Wahid, Khedidja Abdelhamid

Abstract:

The plain of Gadaïne - Ain Yaghout, located in the wilaya of Batna (Eastern Algeria), experiences intensive human activities, particularly in agricultural practices which are accompanied by an increasing use of chemical fertilizers and manure. These activities lead to a degradation of the quality of water resources. In order to protect the quality of groundwater in this plain and formulate effective strategies to mitigate or avoid any contamination of groundwater, a risk assessment using the European method known as “COSTE Action 620” was applied to the mio-. plio-quaternary aquifer of this plain. Risk assessment requires the identification of existing dangers and their potential impact on groundwater by using a system of evaluation and weighting. In addition, it also requires the integration of the hydrogeological factors that influence the movement of contaminants by means of the intrinsic vulnerability maps of groundwater, which were produced according to the modified DRASTIC method. The overall danger on the plain ranges from very low to high. Farms containing stables, houses detached from the public sewer system, and sometimes manure piles were assigned a weighting factor expressing the highest degree of harmfulness; this created a medium to high danger index. Large areas for agricultural practice and grazing are characterized, successively, by low to very low danger. Therefore, the risks present at the study site are classified according to a range from medium to very high-risk intensity. These classes successively represent 3%, 49%, and 0.2% of the surface of the plain. Cultivated land and farms present a high to very high level of risk successively. In addition, with the exception of the salt mine, which presents a very high level of risk, the gas stations and cemeteries, as well as the railway line, represent a high level of risk.

Keywords: semi-arid, quality of water resources, risk assessment, vulnerability, contaminants

Procedia PDF Downloads 49
28725 Semantic Based Analysis in Complaint Management System with Analytics

Authors: Francis Alterado, Jennifer Enriquez

Abstract:

Semantic Based Analysis in Complaint Management System with Analytics is an enhanced tool of providing complaints by the clients as well as a mechanism for Palawan Polytechnic College to gather, process, and monitor status of these complaints. The study has a mobile application that serves as a remote facility of communication between the students and the school management on the issues encountered by the student and the solution of every complaint received. In processing the complaints, text mining and clustering algorithms were utilized. Every module of the systems was tested and based on the results; these are 100% free from error before integration was done. A system testing was also done by checking the expected functionality of the system which was 100% functional. The system was tested by 10 students by forwarding complaints to 10 departments. Based on results, the students were able to submit complaints, the system was able to process accordingly by identifying to which department the complaints are intended, and the concerned department was able to give feedback on the complaint received to the student. With this, the system gained 4.7 rating which means Excellent.

Keywords: technology adoption, emerging technology, issues challenges, algorithm, text mining, mobile technology

Procedia PDF Downloads 199
28724 Adaptive Anchor Weighting for Improved Localization with Levenberg-Marquardt Optimization

Authors: Basak Can

Abstract:

This paper introduces an iterative and weighted localization method that utilizes a unique cost function formulation to significantly enhance the performance of positioning systems. The system employs locators, such as Gateways (GWs), to estimate and track the position of an End Node (EN). Performance is evaluated relative to the number of locators, with known locations determined through calibration. Performance evaluation is presented utilizing low cost single-antenna Bluetooth Low Energy (BLE) devices. The proposed approach can be applied to alternative Internet of Things (IoT) modulation schemes, as well as Ultra WideBand (UWB) or millimeter-wave (mmWave) based devices. In non-line-of-sight (NLOS) scenarios, using four or eight locators yields a 95th percentile localization performance of 2.2 meters and 1.5 meters, respectively, in a 4,305 square feet indoor area with BLE 5.1 devices. This method outperforms conventional RSSI-based techniques, achieving a 51% improvement with four locators and a 52 % improvement with eight locators. Future work involves modeling interference impact and implementing data curation across multiple channels to mitigate such effects.

Keywords: lateration, least squares, Levenberg-Marquardt algorithm, localization, path-loss, RMS error, RSSI, sensors, shadow fading, weighted localization

Procedia PDF Downloads 25
28723 Optimal Maintenance Clustering for Rail Track Components Subject to Possession Capacity Constraints

Authors: Cuong D. Dao, Rob J.I. Basten, Andreas Hartmann

Abstract:

This paper studies the optimal maintenance planning of preventive maintenance and renewal activities for components in a single railway track when the available time for maintenance is limited. The rail-track system consists of several types of components, such as rail, ballast, and switches with different preventive maintenance and renewal intervals. To perform maintenance or renewal on the track, a train free period for maintenance, called a possession, is required. Since a major possession directly affects the regular train schedule, maintenance and renewal activities are clustered as much as possible. In a highly dense and utilized railway network, the possession time on the track is critical since the demand for train operations is very high and a long possession has a severe impact on the regular train schedule. We present an optimization model and investigate the maintenance schedules with and without the possession capacity constraint. In addition, we also integrate the social-economic cost related to the effects of the maintenance time to the variable possession cost into the optimization model. A numerical example is provided to illustrate the model.

Keywords: rail-track components, maintenance, optimal clustering, possession capacity

Procedia PDF Downloads 262