Search results for: unsupervised feature ranking
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1987

Search results for: unsupervised feature ranking

1957 Fake News Detection Based on Fusion of Domain Knowledge and Expert Knowledge

Authors: Yulan Wu

Abstract:

The spread of fake news on social media has posed significant societal harm to the public and the nation, with its threats spanning various domains, including politics, economics, health, and more. News on social media often covers multiple domains, and existing models studied by researchers and relevant organizations often perform well on datasets from a single domain. However, when these methods are applied to social platforms with news spanning multiple domains, their performance significantly deteriorates. Existing research has attempted to enhance the detection performance of multi-domain datasets by adding single-domain labels to the data. However, these methods overlook the fact that a news article typically belongs to multiple domains, leading to the loss of domain knowledge information contained within the news text. To address this issue, research has found that news records in different domains often use different vocabularies to describe their content. In this paper, we propose a fake news detection framework that combines domain knowledge and expert knowledge. Firstly, it utilizes an unsupervised domain discovery module to generate a low-dimensional vector for each news article, representing domain embeddings, which can retain multi-domain knowledge of the news content. Then, a feature extraction module uses the domain embeddings discovered through unsupervised domain knowledge to guide multiple experts in extracting news knowledge for the total feature representation. Finally, a classifier is used to determine whether the news is fake or not. Experiments show that this approach can improve multi-domain fake news detection performance while reducing the cost of manually labeling domain labels.

Keywords: fake news, deep learning, natural language processing, multiple domains

Procedia PDF Downloads 37
1956 Combining Shallow and Deep Unsupervised Machine Learning Techniques to Detect Bad Actors in Complex Datasets

Authors: Jun Ming Moey, Zhiyaun Chen, David Nicholson

Abstract:

Bad actors are often hard to detect in data that imprints their behaviour patterns because they are comparatively rare events embedded in non-bad actor data. An unsupervised machine learning framework is applied here to detect bad actors in financial crime datasets that record millions of transactions undertaken by hundreds of actors (<0.01% bad). Specifically, the framework combines ‘shallow’ (PCA, Isolation Forest) and ‘deep’ (Autoencoder) methods to detect outlier patterns. Detection performance analysis for both the individual methods and their combination is reported.

Keywords: detection, machine learning, deep learning, unsupervised, outlier analysis, data science, fraud, financial crime

Procedia PDF Downloads 66
1955 Feature-Based Summarizing and Ranking from Customer Reviews

Authors: Dim En Nyaung, Thin Lai Lai Thein

Abstract:

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Keywords: opinion mining, opinion summarization, sentiment analysis, text mining

Procedia PDF Downloads 307
1954 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 635
1953 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning

Authors: Samina Khalid, Shamila Nasreen

Abstract:

Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA

Procedia PDF Downloads 454
1952 Contrastive Learning for Unsupervised Object Segmentation in Sequential Images

Authors: Tian Zhang

Abstract:

Unsupervised object segmentation aims at segmenting objects in sequential images and obtaining the mask of each object without any manual intervention. Unsupervised segmentation remains a challenging task due to the lack of prior knowledge about these objects. Previous methods often require manually specifying the action of each object, which is often difficult to obtain. Instead, this paper does not need action information of objects and automatically learns the actions and relations among objects from the structured environment. To obtain the object segmentation of sequential images, the relationships between objects and images are extracted to infer the action and interaction of objects based on the multi-head attention mechanism. Three types of objects’ relationships in the object segmentation task are proposed: the relationship between objects in the same frame, the relationship between objects in two frames, and the relationship between objects and historical information. Based on these relationships, the proposed model (1) is effective in multiple objects segmentation tasks, (2) just needs images as input, and (3) produces better segmentation results as more relationships are considered. The experimental results on multiple datasets show that this paper’s method achieves state-of-art performance. The quantitative and qualitative analyses of the result are conducted. The proposed method could be easily extended to other similar applications.

Keywords: unsupervised object segmentation, attention mechanism, contrastive learning, structured environment

Procedia PDF Downloads 84
1951 Empirical Study on Factors Influencing SEO

Authors: Pakinee Aimmanee, Phoom Chokratsamesiri

Abstract:

Search engine has become an essential tool nowadays for people to search for their needed information on the internet. In this work, we evaluate the performance of the search engine from three factors: the keyword frequency, the number of inbound links, and the difficulty of the keyword. The evaluations are based on the ranking position and the number of days that Google has seen or detect the webpage. We find that the keyword frequency and the difficulty of the keyword do not affect the Google ranking where the number of inbound links gives remarkable improvement of the ranking position. The optimal number of inbound links found in the experiment is 10.

Keywords: SEO, information retrieval, web search, knowledge technologies

Procedia PDF Downloads 259
1950 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 37
1949 Influential Health Care System Rankings Can Conceal Maximal Inequities: A Simulation Study

Authors: Samuel Reisman

Abstract:

Background: Comparative rankings are increasingly used to evaluate health care systems. These rankings combine discrete attribute rankings into a composite overall ranking. Health care equity is a component of overall rankings, but excelling in other categories can counterbalance low inequity grades. Highly ranked inequitable health care would commend systems that disregard human rights. We simulated the ranking of a maximally inequitable health care system using a published, influential ranking methodology. Methods: We used The Commonwealth Fund’s ranking of eleven health care systems to simulate the rank of a maximally inequitable system. Eighty performance indicators were simulated, assuming maximal ineptitude in equity benchmarks. Maximal rankings in all non-equity subcategories were assumed. Subsequent stepwise simulations lowered all non-equity rank positions by one. Results: The maximally non-equitable health care system ranked first overall. Three subsequent stepwise simulations, lowering non-equity rankings by one, each resulted in an overall ranking within the top three. Discussion: Our results demonstrate that grossly inequitable health care systems can rank highly in comparative health care system rankings. These findings challenge the validity of ranking methodologies that subsume equity under broader benchmarks. We advocate limiting maximum overall rankings of health care systems to their individual equity rankings. Such limits are logical given the insignificance of health care system improvements to those lacking adequate health care.

Keywords: global health, health equity, healthcare systems, international health

Procedia PDF Downloads 367
1948 Hybrid Feature Selection Method for Sentiment Classification of Movie Reviews

Authors: Vishnu Goyal, Basant Agarwal

Abstract:

Sentiment analysis research provides methods for identifying the people’s opinion written in blogs, reviews, social networking websites etc. Sentiment analysis is to understand what opinion people have about any given entity, object or thing. Sentiment analysis research can be broadly categorised into three types of approaches i.e. semantic orientation, machine learning and lexicon based approaches. Feature selection methods improve the performance of the machine learning algorithms by eliminating the irrelevant features. Information gain feature selection method has been considered best method for sentiment analysis; however, it has the drawback of selection of threshold. Therefore, in this paper, we propose a hybrid feature selection methods comprising of information gain and proposed feature selection method. Initially, features are selected using Information Gain (IG) and further more noisy features are eliminated using the proposed feature selection method. Experimental results show the efficiency of the proposed feature selection methods.

Keywords: feature selection, sentiment analysis, hybrid feature selection

Procedia PDF Downloads 301
1947 Feature Location Restoration for Under-Sampled Photoplethysmogram Using Spline Interpolation

Authors: Hangsik Shin

Abstract:

The purpose of this research is to restore the feature location of under-sampled photoplethysmogram using spline interpolation and to investigate feasibility for feature shape restoration. We obtained 10 kHz-sampled photoplethysmogram and decimated it to generate under-sampled dataset. Decimated dataset has 5 kHz, 2.5 k Hz, 1 kHz, 500 Hz, 250 Hz, 25 Hz and 10 Hz sampling frequency. To investigate the restoration performance, we interpolated under-sampled signals with 10 kHz, then compared feature locations with feature locations of 10 kHz sampled photoplethysmogram. Features were upper and lower peak of photplethysmography waveform. Result showed that time differences were dramatically decreased by interpolation. Location error was lesser than 1 ms in both feature types. In 10 Hz sampled cases, location error was also deceased a lot, however, they were still over 10 ms.

Keywords: peak detection, photoplethysmography, sampling, signal reconstruction

Procedia PDF Downloads 336
1946 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched the direction to the digital world. The domain of politics as one of the hottest topics of opinion mining research merged together with the behavior analysis for affiliation determination in text which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 are constituted by Linguistic Inquiry and Word Count (LIWC) features are tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that Decision Tree, Rule Induction and M5 Rule classifiers when used with SVM and IGR feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “function” as an aggregate feature of the linguistic category, is obtained as the most differentiating feature among the 68 features with 81% accuracy by itself in classifying articles either as Republican or Democrat.

Keywords: feature selection, LIWC, machine learning, politics

Procedia PDF Downloads 353
1945 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 308
1944 Ranking of Inventory Policies Using Distance Based Approach Method

Authors: Gupta Amit, Kumar Ramesh, P. C. Tewari

Abstract:

Globalization is putting enormous pressure on the business organizations specially manufacturing one to rethink the supply chain in innovative manners. Inventory consumes major portion of total sale revenue. Effective and efficient inventory management plays a vital role for the successful functioning of any organization. Selection of inventory policy is one of the important purchasing activities. This paper focuses on selection and ranking of alternative inventory policies. A deterministic quantitative model-based on Distance Based Approach (DBA) method has been developed for evaluation and ranking of inventory policies. We have employed this concept first time for this type of the selection problem. Four inventory policies Economic Order Quantity (EOQ), Just in Time (JIT), Vendor Managed Inventory (VMI) and monthly policy are considered. Improper selection could affect a company’s competitiveness in terms of the productivity of its facilities and quality of its products. The ranking of inventory policies is a multi-criteria problem. There is a need to first identify the selection criteria and then processes the information with reference to relative importance of attributes for comparison. Criteria values for each inventory policy can be obtained either analytically or by using a simulation technique or they are linguistic subjective judgments defined by fuzzy sets, like, for example, the values of criteria. A methodology is developed and applied to rank the inventory policies.

Keywords: inventory policy, ranking, DBA, selection criteria

Procedia PDF Downloads 352
1943 An Efficiency Measurement of E-Government Performance for United Nation Ranking Index

Authors: Yassine Jadi, Lin Jie

Abstract:

In order to serve the society in an electronic manner, many developing countries have launched tremendous e-government projects. The strategies of development and implementation e-government system have reached different levels, and to ensure consistency of development, the governments need to evaluate e-government performance. The United nation has design e-government development ranking index (EGDI) that rely on three indexes, Online service index (OSI), Telecommunication Infrastructure index (TII), and human capital index( HCI) which are not reflecting the interaction between a government and their citizens. Based on data envelopment analyses (DEA) technique, we are using E-participating index (EPI) as an output of government effort to evaluate the performance of e-government system. Therefore, the ranking index can be achieved in efficiency manner.

Keywords: e-government, DEA, efficiency measurement, EGDI

Procedia PDF Downloads 347
1942 A Two-Step Framework for Unsupervised Speaker Segmentation Using BIC and Artificial Neural Network

Authors: Ahmad Alwosheel, Ahmed Alqaraawi

Abstract:

This work proposes a new speaker segmentation approach for two speakers. It is an online approach that does not require a prior information about speaker models. It has two phases, a conventional approach such as unsupervised BIC-based is utilized in the first phase to detect speaker changes and train a Neural Network, while in the second phase, the output trained parameters from the Neural Network are used to predict next incoming audio stream. Using this approach, a comparable accuracy to similar BIC-based approaches is achieved with a significant improvement in terms of computation time.

Keywords: artificial neural network, diarization, speaker indexing, speaker segmentation

Procedia PDF Downloads 468
1941 A Deep Learning Approach to Online Social Network Account Compromisation

Authors: Edward K. Boahen, Brunel E. Bouya-Moko, Changda Wang

Abstract:

The major threat to online social network (OSN) users is account compromisation. Spammers now spread malicious messages by exploiting the trust relationship established between account owners and their friends. The challenge in detecting a compromised account by service providers is validating the trusted relationship established between the account owners, their friends, and the spammers. Another challenge is the increase in required human interaction with the feature selection. Research available on supervised learning (machine learning) has limitations with the feature selection and accounts that cannot be profiled, like application programming interface (API). Therefore, this paper discusses the various behaviours of the OSN users and the current approaches in detecting a compromised OSN account, emphasizing its limitations and challenges. We propose a deep learning approach that addresses and resolve the constraints faced by the previous schemes. We detailed our proposed optimized nonsymmetric deep auto-encoder (OPT_NDAE) for unsupervised feature learning, which reduces the required human interaction levels in the selection and extraction of features. We evaluated our proposed classifier using the NSL-KDD and KDDCUP'99 datasets in a graphical user interface enabled Weka application. The results obtained indicate that our proposed approach outperformed most of the traditional schemes in OSN compromised account detection with an accuracy rate of 99.86%.

Keywords: computer security, network security, online social network, account compromisation

Procedia PDF Downloads 90
1940 K-Means Clustering-Based Infinite Feature Selection Method

Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Abstract:

Infinite Feature Selection (IFS) algorithm is an efficient feature selection algorithm that selects a subset of features of all sizes (including infinity). In this paper, we present an improved version of it, called clustering IFS (CIFS), by clustering the dataset in advance. To do so, first, we apply the K-means algorithm to cluster the dataset, then we apply IFS. In the CIFS method, the spatial and temporal complexities are reduced compared to the IFS method. Experimental results on 6 datasets show the superiority of CIFS compared to IFS in terms of accuracy, running time, and memory consumption.

Keywords: feature selection, infinite feature selection, clustering, graph

Procedia PDF Downloads 94
1939 Product Feature Modelling for Integrating Product Design and Assembly Process Planning

Authors: Baha Hasan, Jan Wikander

Abstract:

This paper describes a part of the integrating work between assembly design and assembly process planning domains (APP). The work is based, in its first stage, on modelling assembly features to support APP. A multi-layer architecture, based on feature-based modelling, is proposed to establish a dynamic and adaptable link between product design using CAD tools and APP. The proposed approach is based on deriving “specific function” features from the “generic” assembly and form features extracted from the CAD tools. A hierarchal structure from “generic” to “specific” and from “high level geometrical entities” to “low level geometrical entities” is proposed in order to integrate geometrical and assembly data extracted from geometrical and assembly modelers to the required processes and resources in APP. The feature concept, feature-based modelling, and feature recognition techniques are reviewed.

Keywords: assembly feature, assembly process planning, feature, feature-based modelling, form feature, ontology

Procedia PDF Downloads 274
1938 Analysis and Identification of Different Factors Affecting Students’ Performance Using a Correlation-Based Network Approach

Authors: Jeff Chak-Fu Wong, Tony Chun Yin Yip

Abstract:

The transition from secondary school to university seems exciting for many first-year students but can be more challenging than expected. Enabling instructors to know students’ learning habits and styles enhances their understanding of the students’ learning backgrounds, allows teachers to provide better support for their students, and has therefore high potential to improve teaching quality and learning, especially in any mathematics-related courses. The aim of this research is to collect students’ data using online surveys, to analyze students’ factors using learning analytics and educational data mining and to discover the characteristics of the students at risk of falling behind in their studies based on students’ previous academic backgrounds and collected data. In this paper, we use correlation-based distance methods and mutual information for measuring student factor relationships. We then develop a factor network using the Minimum Spanning Tree method and consider further study for analyzing the topological properties of these networks using social network analysis tools. Under the framework of mutual information, two graph-based feature filtering methods, i.e., unsupervised and supervised infinite feature selection algorithms, are used to analyze the results for students’ data to rank and select the appropriate subsets of features and yield effective results in identifying the factors affecting students at risk of failing. This discovered knowledge may help students as well as instructors enhance educational quality by finding out possible under-performers at the beginning of the first semester and applying more special attention to them in order to help in their learning process and improve their learning outcomes.

Keywords: students' academic performance, correlation-based distance method, social network analysis, feature selection, graph-based feature filtering method

Procedia PDF Downloads 93
1937 Optimal Selection of Replenishment Policies Using Distance Based Approach

Authors: Amit Gupta, Deepak Juneja, Sorabh Gupta

Abstract:

This paper presents a model based on distance based approach (DBA) method employed for evaluation, selection, and ranking of replenishment policies for a single location inventory, which hitherto not developed in the literature. This work recognizes the significance of the selection problem, identifies the selection criteria, the relative importance of selection criteria for this research problem. The developed model is capable of comparing any number of alternate inventory policies for various selection criteria where cardinal values are assigned as a rating to alternate inventory polices for selection criteria and weights of selection criteria. The illustrated example demonstrates the model and presents the result in terms of ranking of replenishment policies.

Keywords: DBA, ranking, replenishment policies, selection criteria

Procedia PDF Downloads 122
1936 An Elaborated Software Solution: The Tennis Ranking System

Authors: Dionysios Kakaroumpas, Jesseka Farago, Stephen Webber

Abstract:

Athletes and spectators depend on the tennis ranking system to represent the truest caliber of athletic prowess; a careful look at the current ranking system though, reveals its main weakness: it undermines expectations of fans and players. Our study proposes several key changes to the existing ranking formula that provide a fair and accurate approach to measure player performance. The study proposes a modification of the system to value: participation, continued advancement, and overall achievement. The new ranking formula facilitates closing the trust gap, encouraging competition equality, engaging the fan base, attracting investment, and promoting tennis involvement worldwide. To probe the crux of our main contention we performed week-by-week comparisons between results procured from the current and proposed formulae. After performing this rigorous case-study of top players of each gender, the findings strongly indicated that there is identifiable inflation in the ranks and enhanced the conviction that the current system should be updated. The new system is accompanied by a web-based software package freely available to anyone involved or interested in tennis rankings. The software package is designed to automatically calculate new player rankings based on a responsive, multi-faceted formula that also generates projected point scenarios and provides separate rankings for the three different court surfaces. By taking a critical look at the current tennis ranking system with consideration to the perspective of fans, players, and businesses involved, an upgrade is in order for it to maintain the balance of trust between fans and the evaluation process. In closure, this proposed solution increases fair play competition, eliminates rank inflation, and better engages fans, players, and sponsors by bringing in a new era of professional tennis.

Keywords: measurement and evaluation, rules and regulations, sports management and marketing, tennis ranking system

Procedia PDF Downloads 243
1935 Ranking of Performance Measures of GSCM towards Sustainability: Using Analytic Hierarchy Process

Authors: Dixit Garg, S. Luthra, A. Haleem

Abstract:

During recent years, the natural environment has become a challenging topic that business organizations must consider due to the economic and ecological impacts and increasing awareness of environment protection among society. Organizations are trying to achieve the goals of improvement in environment, low cost, high quality, flexibility and more customer satisfaction. Performance measurement frameworks are very useful to monitor the performance of any organization. The basic goal of this paper is to identify performance measures and ranking of these performance measures of GSCM performance measurement towards sustainability framework. Five perspectives (Environment, Economic, Social, Operational and Cost performances) and nineteen performance measures of GSCM performance towards sustainability have been have been identified from extensive literature review. Analytical Hierarchy Process (AHP) technique has been utilized for ranking of these performance perspectives and measures. All pair comparisons in AHP have been made on the basis on the experts’ opinions (selected from academia and industry). Ranking of these performance perspectives and measures will help to understand the importance of environmental, economic, social, operational performances, and cost performances in the supply chain.

Keywords: analytical hierarchy process, green supply chain management, performance measures, sustainability

Procedia PDF Downloads 489
1934 Machine Vision System for Measuring the Quality of Bulk Sun-dried Organic Raisins

Authors: Navab Karimi, Tohid Alizadeh

Abstract:

An intelligent vision-based system was designed to measure the quality and purity of raisins. A machine vision setup was utilized to capture the images of bulk raisins in ranges of 5-50% mixed pure-impure berries. The textural features of bulk raisins were extracted using Grey-level Histograms, Co-occurrence Matrix, and Local Binary Pattern (a total of 108 features). Genetic Algorithm and neural network regression were used for selecting and ranking the best features (21 features). As a result, the GLCM features set was found to have the highest accuracy (92.4%) among the other sets. Followingly, multiple feature combinations of the previous stage were fed into the second regression (linear regression) to increase accuracy, wherein a combination of 16 features was found to be the optimum. Finally, a Support Vector Machine (SVM) classifier was used to differentiate the mixtures, producing the best efficiency and accuracy of 96.2% and 97.35%, respectively.

Keywords: sun-dried organic raisin, genetic algorithm, feature extraction, ann regression, linear regression, support vector machine, south azerbaijan.

Procedia PDF Downloads 44
1933 Supervised/Unsupervised Mahalanobis Algorithm for Improving Performance for Cyberattack Detection over Communications Networks

Authors: Radhika Ranjan Roy

Abstract:

Deployment of machine learning (ML)/deep learning (DL) algorithms for cyberattack detection in operational communications networks (wireless and/or wire-line) is being delayed because of low-performance parameters (e.g., recall, precision, and f₁-score). If datasets become imbalanced, which is the usual case for communications networks, the performance tends to become worse. Complexities in handling reducing dimensions of the feature sets for increasing performance are also a huge problem. Mahalanobis algorithms have been widely applied in scientific research because Mahalanobis distance metric learning is a successful framework. In this paper, we have investigated the Mahalanobis binary classifier algorithm for increasing cyberattack detection performance over communications networks as a proof of concept. We have also found that high-dimensional information in intermediate features that are not utilized as much for classification tasks in ML/DL algorithms are the main contributor to the state-of-the-art of improved performance of the Mahalanobis method, even for imbalanced and sparse datasets. With no feature reduction, MD offers uniform results for precision, recall, and f₁-score for unbalanced and sparse NSL-KDD datasets.

Keywords: Mahalanobis distance, machine learning, deep learning, NS-KDD, local intrinsic dimensionality, chi-square, positive semi-definite, area under the curve

Procedia PDF Downloads 50
1932 Unsupervised Reciter Recognition Using Gaussian Mixture Models

Authors: Ahmad Alwosheel, Ahmed Alqaraawi

Abstract:

This work proposes an unsupervised text-independent probabilistic approach to recognize Quran reciter voice. It is an accurate approach that works on real time applications. This approach does not require a prior information about reciter models. It has two phases, where in the training phase the reciters' acoustical features are modeled using Gaussian Mixture Models, while in the testing phase, unlabeled reciter's acoustical features are examined among GMM models. Using this approach, a high accuracy results are achieved with efficient computation time process.

Keywords: Quran, speaker recognition, reciter recognition, Gaussian Mixture Model

Procedia PDF Downloads 352
1931 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
1930 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases

Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha

Abstract:

Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.

Keywords: feature fusion, image retrieval, membership function, normalization

Procedia PDF Downloads 321
1929 Triangular Geometric Feature for Offline Signature Verification

Authors: Zuraidasahana Zulkarnain, Mohd Shafry Mohd Rahim, Nor Anita Fairos Ismail, Mohd Azhar M. Arsad

Abstract:

Handwritten signature is accepted widely as a biometric characteristic for personal authentication. The use of appropriate features plays an important role in determining accuracy of signature verification; therefore, this paper presents a feature based on the geometrical concept. To achieve the aim, triangle attributes are exploited to design a new feature since the triangle possesses orientation, angle and transformation that would improve accuracy. The proposed feature uses triangulation geometric set comprising of sides, angles and perimeter of a triangle which is derived from the center of gravity of a signature image. For classification purpose, Euclidean classifier along with Voting-based classifier is used to verify the tendency of forgery signature. This classification process is experimented using triangular geometric feature and selected global features. Based on an experiment that was validated using Grupo de Senales 960 (GPDS-960) signature database, the proposed triangular geometric feature achieves a lower Average Error Rates (AER) value with a percentage of 34% as compared to 43% of the selected global feature. As a conclusion, the proposed triangular geometric feature proves to be a more reliable feature for accurate signature verification.

Keywords: biometrics, euclidean classifier, features extraction, offline signature verification, voting-based classifier

Procedia PDF Downloads 347
1928 Empirical Decomposition of Time Series of Power Consumption

Authors: Noura Al Akkari, Aurélie Foucquier, Sylvain Lespinats

Abstract:

Load monitoring is a management process for energy consumption towards energy savings and energy efficiency. Non Intrusive Load Monitoring (NILM) is one method of load monitoring used for disaggregation purposes. NILM is a technique for identifying individual appliances based on the analysis of the whole residence data retrieved from the main power meter of the house. Our NILM framework starts with data acquisition, followed by data preprocessing, then event detection, feature extraction, then general appliance modeling and identification at the final stage. The event detection stage is a core component of NILM process since event detection techniques lead to the extraction of appliance features. Appliance features are required for the accurate identification of the household devices. In this research work, we aim at developing a new event detection methodology with accurate load disaggregation to extract appliance features. Time-domain features extracted are used for tuning general appliance models for appliance identification and classification steps. We use unsupervised algorithms such as Dynamic Time Warping (DTW). The proposed method relies on detecting areas of operation of each residential appliance based on the power demand. Then, detecting the time at which each selected appliance changes its states. In order to fit with practical existing smart meters capabilities, we work on low sampling data with a frequency of (1/60) Hz. The data is simulated on Load Profile Generator software (LPG), which was not previously taken into consideration for NILM purposes in the literature. LPG is a numerical software that uses behaviour simulation of people inside the house to generate residential energy consumption data. The proposed event detection method targets low consumption loads that are difficult to detect. Also, it facilitates the extraction of specific features used for general appliance modeling. In addition to this, the identification process includes unsupervised techniques such as DTW. To our best knowledge, there exist few unsupervised techniques employed with low sampling data in comparison to the many supervised techniques used for such cases. We extract a power interval at which falls the operation of the selected appliance along with a time vector for the values delimiting the state transitions of the appliance. After this, appliance signatures are formed from extracted power, geometrical and statistical features. Afterwards, those formed signatures are used to tune general model types for appliances identification using unsupervised algorithms. This method is evaluated using both simulated data on LPG and real-time Reference Energy Disaggregation Dataset (REDD). For that, we compute performance metrics using confusion matrix based metrics, considering accuracy, precision, recall and error-rate. The performance analysis of our methodology is then compared with other detection techniques previously used in the literature review, such as detection techniques based on statistical variations and abrupt changes (Variance Sliding Window and Cumulative Sum).

Keywords: general appliance model, non intrusive load monitoring, events detection, unsupervised techniques;

Procedia PDF Downloads 47