Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 5973

Search results for: supervised dimension reduction

5943 Experiments on Weakly-Supervised Learning on Imperfect Data

Authors: Yan Cheng, Yijun Shao, James Rudolph, Charlene R. Weir, Beth Sahlmann, Qing Zeng-Treitler

Abstract:

Supervised predictive models require labeled data for training purposes. Complete and accurate labeled data, i.e., a ‘gold standard’, is not always available, and imperfectly labeled data may need to serve as an alternative. An important question is if the accuracy of the labeled data creates a performance ceiling for the trained model. In this study, we trained several models to recognize the presence of delirium in clinical documents using data with annotations that are not completely accurate (i.e., weakly-supervised learning). In the external evaluation, the support vector machine model with a linear kernel performed best, achieving an area under the curve of 89.3% and accuracy of 88%, surpassing the 80% accuracy of the training sample. We then generated a set of simulated data and carried out a series of experiments which demonstrated that models trained on imperfect data can (but do not always) outperform the accuracy of the training data, e.g., the area under the curve for some models is higher than 80% when trained on the data with an error rate of 40%. Our experiments also showed that the error resistance of linear modeling is associated with larger sample size, error type, and linearity of the data (all p-values < 0.001). In conclusion, this study sheds light on the usefulness of imperfect data in clinical research via weakly-supervised learning.

Keywords: weakly-supervised learning, support vector machine, prediction, delirium, simulation

Procedia PDF Downloads 168

5942 A Review of Fractal Dimension Computing Methods Applied to Wear Particles

Authors: Manish Kumar Thakur, Subrata Kumar Ghosh

Abstract:

Various types of particles found in lubricant may be characterized by their fractal dimension. Some of the available methods are: yard-stick method or structured walk method, box-counting method. This paper presents a review of the developments and progress in fractal dimension computing methods as applied to characteristics the surface of wear particles. An overview of these methods, their implementation, their advantages and their limits is also present here. It has been accepted that wear particles contain major information about wear and friction of materials. Morphological analysis of wear particles from a lubricant is a very effective way for machine condition monitoring. Fractal dimension methods are used to characterize the morphology of the found particles. It is very useful in the analysis of complexity of irregular substance. The aim of this review is to bring together the fractal methods applicable for wear particles.

Keywords: fractal dimension, morphological analysis, wear, wear particles

Procedia PDF Downloads 451

5941 PathoPy2.0: Application of Fractal Geometry for Early Detection and Histopathological Analysis of Lung Cancer

Authors: Rhea Kapoor

Abstract:

Fractal dimension provides a way to characterize non-geometric shapes like those found in nature. The purpose of this research is to estimate Minkowski fractal dimension of human lung images for early detection of lung cancer. Lung cancer is the leading cause of death among all types of cancer and an early histopathological analysis will help reduce deaths primarily due to late diagnosis. A Python application program, PathoPy2.0, was developed for analyzing medical images in pixelated format and estimating Minkowski fractal dimension using a new box-counting algorithm that allows windowing of images for more accurate calculation in the suspected areas of cancerous growth. Benchmark geometric fractals were used to validate the accuracy of the program and changes in fractal dimension of lung images to indicate the presence of issues in the lung. The accuracy of the program for the benchmark examples was between 93-99% of known values of the fractal dimensions. Fractal dimension values were then calculated for lung images, from National Cancer Institute, taken over time to correctly detect the presence of cancerous growth. For example, as the fractal dimension for a given lung increased from 1.19 to 1.27 due to cancerous growth, it represents a significant change in fractal dimension which lies between 1 and 2 for 2-D images. Based on the results obtained on many lung test cases, it was concluded that fractal dimension of human lungs can be used to diagnose lung cancer early. The ideas behind PathoPy2.0 can also be applied to study patterns in the electrical activity of the human brain and DNA matching.

Keywords: fractals, histopathological analysis, image processing, lung cancer, Minkowski dimension

Procedia PDF Downloads 144

5940 Fractal Analysis of Some Bifurcations of Discrete Dynamical Systems in Higher Dimensions

Authors: Lana Horvat Dmitrović

Abstract:

The main purpose of this paper is to study the box dimension as fractal property of bifurcations of discrete dynamical systems in higher dimensions. The paper contains the fractal analysis of the orbits near the hyperbolic and non-hyperbolic fixed points in discrete dynamical systems. It is already known that in one-dimensional case the orbit near the hyperbolic fixed point has the box dimension equal to zero. On the other hand, the orbit near the non-hyperbolic fixed point has strictly positive box dimension which is connected to the non-degeneracy condition of certain bifurcation. One of the main results in this paper is the generalisation of results about box dimension near the hyperbolic and non-hyperbolic fixed points to higher dimensions. In the process of determining box dimension, the restriction of systems to stable, unstable and center manifolds, Lipschitz property of box dimension and the notion of projective box dimension are used. The analysis of the bifurcations in higher dimensions with one multiplier on the unit circle is done by using the normal forms on one-dimensional center manifolds. This specific change in box dimension of an orbit at the moment of bifurcation has already been explored for some bifurcations in one and two dimensions. It was shown that specific values of box dimension are connected to appropriate bifurcations such as fold, flip, cusp or Neimark-Sacker bifurcation. This paper further explores this connection of box dimension as fractal property to some specific bifurcations in higher dimensions, such as fold-flip and flip-Neimark-Sacker. Furthermore, the application of the results to the unit time map of continuous dynamical system near hyperbolic and non-hyperbolic singularities is presented. In that way, box dimensions which are specific for certain bifurcations of continuous systems can be obtained. The approach to bifurcation analysis by using the box dimension as specific fractal property of orbits can lead to better understanding of bifurcation phenomenon. It could also be useful in detecting the existence or nonexistence of bifurcations of discrete and continuous dynamical systems.

Keywords: bifurcation, box dimension, invariant manifold, orbit near fixed point

Procedia PDF Downloads 223

5939 A Note on the Fractal Dimension of Mandelbrot Set and Julia Sets in Misiurewicz Points

Authors: O. Boussoufi, K. Lamrini Uahabi, M. Atounti

Abstract:

The main purpose of this paper is to calculate the fractal dimension of some Julia Sets and Mandelbrot Set in the Misiurewicz Points. Using Matlab to generate the Julia Sets images that match the Misiurewicz points and using a Fractal software, we were able to find different measures that characterize those fractals in textures and other features. We are actually focusing on fractal dimension and the error calculated by the software. When executing the given equation of regression or the log-log slope of image a Box Counting method is applied to the entire image, and chosen settings are available in a FracLAc Program. Finally, a comparison is done for each image corresponding to the area (boundary) where Misiurewicz Point is located.

Keywords: box counting, FracLac, fractal dimension, Julia Sets, Mandelbrot Set, Misiurewicz Points

Procedia PDF Downloads 184

5938 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 69

5937 Self-Supervised Learning for Hate-Speech Identification

Authors: Shrabani Ghosh

Abstract:

Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in Gab.ai, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on Gab.ai. This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.

Keywords: attention learning, language model, offensive language detection, self-supervised learning

Procedia PDF Downloads 82

5936 An Embarrassingly Simple Semi-supervised Approach to Increase Recall in Online Shopping Domain to Match Structured Data with Unstructured Data

Authors: Sachin Nagargoje

Abstract:

Complete labeled data is often difficult to obtain in a practical scenario. Even if one manages to obtain the data, the quality of the data is always in question. In shopping vertical, offers are the input data, which is given by advertiser with or without a good quality of information. In this paper, an author investigated the possibility of using a very simple Semi-supervised learning approach to increase the recall of unhealthy offers (has badly written Offer Title or partial product details) in shopping vertical domain. The author found that the semisupervised learning method had improved the recall in the Smart Phone category by 30% on A=B testing on 10% traffic and increased the YoY (Year over Year) number of impressions per month by 33% at production. This also made a significant increase in Revenue, but that cannot be publicly disclosed.

Keywords: semi-supervised learning, clustering, recall, coverage

Procedia PDF Downloads 96

5935 Discover a New Technique for Cancer Recognition by Analysis and Determination of Fractal Dimension Images in Matlab Software

Authors: Saeedeh Shahbazkhany

Abstract:

Cancer is a terrible disease that, if not diagnosed early, therapy can be difficult while it is easily medicable if it is diagnosed in early stages. So it is very important for cancer diagnosis that medical procedures are performed. In this paper we introduce a new method. In this method, we only need pictures of healthy cells and cancer cells. In fact, where we suspect cancer, we take a picture of cells or tissue in that area, and then take some pictures of the surrounding tissues. Then, fractal dimension of images are calculated and compared. Cancer can be easily detected by comparing the fractal dimension of images. In this method, we use Matlab software.

Keywords: Matlab software, fractal dimension, cancer, surrounding tissues, cells or tissue, new method

Procedia PDF Downloads 326

5934 Early Stage Suicide Ideation Detection Using Supervised Machine Learning and Neural Network Classifier

Authors: Devendra Kr Tayal, Vrinda Gupta, Aastha Bansal, Khushi Singh, Sristi Sharma, Hunny Gaur

Abstract:

In today's world, suicide is a serious problem. In order to save lives, early suicide attempt detection and prevention should be addressed. A good number of at-risk people utilize social media platforms to talk about their issues or find knowledge on related chores. Twitter and Reddit are two of the most common platforms that are used for expressing oneself. Extensive research has already been done in this field. Through supervised classification techniques like Nave Bayes, Bernoulli Nave Bayes, and Multiple Layer Perceptron on a Reddit dataset, we demonstrate the early recognition of suicidal ideation. We also performed comparative analysis on these approaches and used accuracy, recall score, F1 score, and precision score for analysis.

Keywords: machine learning, suicide ideation detection, supervised classification, natural language processing

Procedia PDF Downloads 65

5933 Spatial Rank-Based High-Dimensional Monitoring through Random Projection

Authors: Chen Zhang, Nan Chen

Abstract:

High-dimensional process monitoring becomes increasingly important in many application domains, where usually the process distribution is unknown and much more complicated than the normal distribution, and the between-stream correlation can not be neglected. However, since the process dimension is generally much bigger than the reference sample size, most traditional nonparametric multivariate control charts fail in high-dimensional cases due to the curse of dimensionality. Furthermore, when the process goes out of control, the influenced variables are quite sparse compared with the whole dimension, which increases the detection difficulty. Targeting at these issues, this paper proposes a new nonparametric monitoring scheme for high-dimensional processes. This scheme first projects the high-dimensional process into several subprocesses using random projections for dimension reduction. Then, for every subprocess with the dimension much smaller than the reference sample size, a local nonparametric control chart is constructed based on the spatial rank test to detect changes in this subprocess. Finally, the results of all the local charts are fused together for decision. Furthermore, after an out-of-control (OC) alarm is triggered, a diagnostic framework is proposed. using the square-root LASSO. Numerical studies demonstrate that the chart has satisfactory detection power for sparse OC changes and robust performance for non-normally distributed data, The diagnostic framework is also effective to identify truly changed variables. Finally, a real-data example is presented to demonstrate the application of the proposed method.

Keywords: random projection, high-dimensional process control, spatial rank, sequential change detection

Procedia PDF Downloads 275

5932 Machine Learning Techniques for COVID-19 Detection: A Comparative Analysis

Authors: Abeer A. Aljohani

Abstract:

COVID-19 virus spread has been one of the extreme pandemics across the globe. It is also referred to as coronavirus, which is a contagious disease that continuously mutates into numerous variants. Currently, the B.1.1.529 variant labeled as omicron is detected in South Africa. The huge spread of COVID-19 disease has affected several lives and has surged exceptional pressure on the healthcare systems worldwide. Also, everyday life and the global economy have been at stake. This research aims to predict COVID-19 disease in its initial stage to reduce the death count. Machine learning (ML) is nowadays used in almost every area. Numerous COVID-19 cases have produced a huge burden on the hospitals as well as health workers. To reduce this burden, this paper predicts COVID-19 disease is based on the symptoms and medical history of the patient. This research presents a unique architecture for COVID-19 detection using ML techniques integrated with feature dimensionality reduction. This paper uses a standard UCI dataset for predicting COVID-19 disease. This dataset comprises symptoms of 5434 patients. This paper also compares several supervised ML techniques to the presented architecture. The architecture has also utilized 10-fold cross validation process for generalization and the principal component analysis (PCA) technique for feature reduction. Standard parameters are used to evaluate the proposed architecture including F1-Score, precision, accuracy, recall, receiver operating characteristic (ROC), and area under curve (AUC). The results depict that decision tree, random forest, and neural networks outperform all other state-of-the-art ML techniques. This achieved result can help effectively in identifying COVID-19 infection cases.

Keywords: supervised machine learning, COVID-19 prediction, healthcare analytics, random forest, neural network

Procedia PDF Downloads 66

5931 Mathematical Based Forecasting of Heart Attack

Authors: Razieh Khalafi

Abstract:

Myocardial infarction (MI) or acute myocardial infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analyzing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behavior of these signals were checked. Results shows this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 506

5930 Microkinetic Modelling of NO Reduction on Pt Catalysts

Authors: Vishnu S. Prasad, Preeti Aghalayam

Abstract:

The major harmful automobile exhausts are nitric oxide (NO) and unburned hydrocarbon (HC). Reduction of NO using unburned fuel HC as a reductant is the technique used in hydrocarbon-selective catalytic reduction (HC-SCR). In this work, we study the microkinetic modelling of NO reduction using propene as a reductant on Pt catalysts. The selectivity of NO reduction to N₂O is detected in some ranges of operating conditions, whereas the effect of inlet O₂% causes a number of changes in the feasible regimes of operation.

Keywords: microkinetic modelling, NOx, platinum on alumina catalysts, selective catalytic reduction

Procedia PDF Downloads 427

5929 Thermal Reduction of Perfect Well Identified Hexagonal Graphene Oxide Nano-Sheets for Super-Capacitor Applications

Authors: A. N. Fouda

Abstract:

A novel well identified hexagonal graphene oxide (GO) nano-sheets were synthesized using modified Hummer method. Low temperature thermal reduction at 350°C in air ambient was performed. After thermal reduction, typical few layers of thermal reduced GO (TRGO) with dimension of few hundreds nanometers were observed using high resolution transmission electron microscopy (HRTEM). GO has a lot of structure models due to variation of the preparation process. Determining the atomic structure of GO is essential for a better understanding of its fundamental properties and for realization of the future technological applications. Structural characterization was identified by x-ray diffraction (XRD), Fourier transform infra-red spectroscopy (FTIR) measurements. A comparison between exper- imental and theoretical IR spectrum were done to confirm the match between experimentally and theoretically proposed GO structure. Partial overlap of the experimental IR spectrum with the theoretical IR was confirmed. The electrochemical properties of TRGO nano-sheets as electrode materials for supercapacitors were investigated by cyclic voltammetry and electrochemical impedance spectroscopy (EIS) measurements. An enhancement in supercapacitance after reduction was confirmed and the area of the CV curve for the TRGO electrode is larger than those for the GO electrode indicating higher specific capacitance which is promising in super-capacitor applications

Keywords: hexagonal graphene oxide, thermal reduction, cyclic voltammetry

Procedia PDF Downloads 476

5928 A New Mathematical Method for Heart Attack Forecasting

Authors: Razi Khalafi

Abstract:

Myocardial Infarction (MI) or acute Myocardial Infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analysing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behaviour of these signals were checked. Results show this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 468

5927 Single-Cell Visualization with Minimum Volume Embedding

Authors: Zhenqiu Liu

Abstract:

Visualizing the heterogeneity within cell-populations for single-cell RNA-seq data is crucial for studying the functional diversity of a cell. However, because of the high level of noises, outlier, and dropouts, it is very challenging to measure the cell-to-cell similarity (distance), visualize and cluster the data in a low-dimension. Minimum volume embedding (MVE) projects the data into a lower-dimensional space and is a promising tool for data visualization. However, it is computationally inefficient to solve a semi-definite programming (SDP) when the sample size is large. Therefore, it is not applicable to single-cell RNA-seq data with thousands of samples. In this paper, we develop an efficient algorithm with an accelerated proximal gradient method and visualize the single-cell RNA-seq data efficiently. We demonstrate that the proposed approach separates known subpopulations more accurately in single-cell data sets than other existing dimension reduction methods.

Keywords: single-cell RNA-seq, minimum volume embedding, visualization, accelerated proximal gradient method

Procedia PDF Downloads 199

5926 Methods for Distinction of Cattle Using Supervised Learning

Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl

Abstract:

Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.

Keywords: genetic data, Pinzgau cattle, supervised learning, machine learning

Procedia PDF Downloads 518

5925 Bipolar Reduction and Lithic Miniaturization: Experimental Results and Archaeological Implications

Authors: Justin Pargeter, Metin Eren

Abstract:

Lithic miniaturization, the systematic production and use of small tools from small cores, was a consequential development in Pleistocene lithic technology. The bipolar reduction is an important, but often overlooked and misidentified, strategy for lithic miniaturization. This experiment addresses the role of axial bipolar reduction in processes of lithic miniaturization. The experiments answer two questions: what benefits does axial bipolar reduction provide, and can we distinguish axial bipolar reduction from freehand reduction? Our experiments demonstrate the numerous advantages of bipolar reduction in contexts of lithic miniaturization. Bipolar reduction produces more cutting edge per gram and is more economical than freehand reduction. Our cutting edge to mass values exceeds even those obtained with pressure blade production on high-quality obsidian. The experimental results show that bipolar reduction produces cutting edge quicker and is more efficient than freehand reduction. We show that bipolar reduction can be distinguished from freehand reduction with a high degree of confidence using the quantitative criteria in these experiments. These observations overturn long-held perceptions about bipolar reduction. We conclude by discussing the role of bipolar reduction in lithic miniaturization and Stone Age economics more broadly.

Keywords: lithic miniaturization, bipolar reduction, late Pleistocene, Southern Africa

Procedia PDF Downloads 692

5924 Identification of Hepatocellular Carcinoma Using Supervised Learning Algorithms

Authors: Sagri Sharma

Abstract:

Analysis of diseases integrating multi-factors increases the complexity of the problem and therefore, development of frameworks for the analysis of diseases is an issue that is currently a topic of intense research. Due to the inter-dependence of the various parameters, the use of traditional methodologies has not been very effective. Consequently, newer methodologies are being sought to deal with the problem. Supervised Learning Algorithms are commonly used for performing the prediction on previously unseen data. These algorithms are commonly used for applications in fields ranging from image analysis to protein structure and function prediction and they get trained using a known dataset to come up with a predictor model that generates reasonable predictions for the response to new data. Gene expression profiles generated by DNA analysis experiments can be quite complex since these experiments can involve hypotheses involving entire genomes. The application of well-known machine learning algorithm - Support Vector Machine - to analyze the expression levels of thousands of genes simultaneously in a timely, automated and cost effective way is thus used. The objectives to undertake the presented work are development of a methodology to identify genes relevant to Hepatocellular Carcinoma (HCC) from gene expression dataset utilizing supervised learning algorithms and statistical evaluations along with development of a predictive framework that can perform classification tasks on new, unseen data.

Keywords: artificial intelligence, biomarker, gene expression datasets, hepatocellular carcinoma, machine learning, supervised learning algorithms, support vector machine

Procedia PDF Downloads 401

5923 Amharic Text News Classification Using Supervised Learning

Authors: Misrak Assefa

Abstract:

The Amharic language is the second most widely spoken Semitic language in the world. There are several new overloaded on the web. Searching some useful documents from the web on a specific topic, which is written in the Amharic language, is a challenging task. Hence, document categorization is required for managing and filtering important information. In the classification of Amharic text news, there is still a gap in the domain of information that needs to be launch. This study attempts to design an automatic Amharic news classification using a supervised learning mechanism on four un-touch classes. To achieve this research, 4,182 news articles were used. Naive Bayes (NB) and Decision tree (j48) algorithms were used to classify the given Amharic dataset. In this paper, k-fold cross-validation is used to estimate the accuracy of the classifier. As a result, it shows those algorithms can be applicable in Amharic news categorization. The best average accuracy result is achieved by j48 decision tree and naïve Bayes is 95.2345 %, and 94.6245 % respectively using three categories. This research indicated that a typical decision tree algorithm is more applicable to Amharic news categorization.

Keywords: text categorization, supervised machine learning, naive Bayes, decision tree

Procedia PDF Downloads 163

5922 The Effects of Affective Dimension of Face on Facial Attractiveness

Authors: Kyung-Ja Cho, Sun Jin Park

Abstract:

This study examined what effective dimension affects facial attractiveness. Two orthogonal dimensions, sharp-soft and babyish-mature, were used to rate the levels of facial attractiveness in 20’s women. This research also investigated the sex difference on the effect of effective dimension of face on attractiveness. The test subjects composed of 15 males and 18 females. They looked 330 photos of women in 20s. Then they rated the levels of the effective dimensions of faces with sharp-soft and babyish-mature, and the attraction with charmless-charming. The respond forms were Likert scales, the answer was scored from 1 to 9. As a result of multiple regression analysis, the subject reported the milder and younger appearance as more attractive. Both male and female subjects showed the same evaluation. This result means that two effective dimensions have the effect on estimating attractiveness.

Keywords: affective dimension of faces, facial attractiveness, sharp-soft, babyish-mature

Procedia PDF Downloads 309

5921 Calculation of Fractal Dimension and Its Relation to Some Morphometric Characteristics of Iranian Landforms

Authors: Mitra Saberi, Saeideh Fakhari, Amir Karam, Ali Ahmadabadi

Abstract:

Geomorphology is the scientific study of the characteristics of form and shape of the Earth's surface. The existence of types of landforms and their variation is mainly controlled by changes in the shape and position of land and topography. In fact, the interest and application of fractal issues in geomorphology is due to the fact that many geomorphic landforms have fractal structures and their formation and transformation can be explained by mathematical relations. The purpose of this study is to identify and analyze the fractal behavior of landforms of macro geomorphologic regions of Iran, as well as studying and analyzing topographic and landform characteristics based on fractal relationships. In this study, using the Iranian digital elevation model in the form of slopes, coefficients of deposition and alluvial fan, the fractal dimensions of the curves were calculated through the box counting method. The morphometric characteristics of the landforms and their fractal dimension were then calculated for 4criteria (height, slope, profile curvature and planimetric curvature) and indices (maximum, Average, standard deviation) using ArcMap software separately. After investigating their correlation with fractal dimension, two-way regression analysis was performed and the relationship between fractal dimension and morphometric characteristics of landforms was investigated. The results show that the fractal dimension in different pixels size of 30, 90 and 200m, topographic curves of different landform units of Iran including mountain, hill, plateau, plain of Iran, from1.06in alluvial fans to1.17in The mountains are different. Generally, for all pixels of different sizes, the fractal dimension is reduced from mountain to plain. The fractal dimension with the slope criterion and the standard deviation index has the highest correlation coefficient, with the curvature of the profile and the mean index has the lowest correlation coefficient, and as the pixels become larger, the correlation coefficient between the indices and the fractal dimension decreases.

Keywords: box counting method, fractal dimension, geomorphology, Iran, landform

Procedia PDF Downloads 65

5920 Incorporating Multiple Supervised Learning Algorithms for Effective Intrusion Detection

Authors: Umar Albalawi, Sang C. Suh, Jinoh Kim

Abstract:

As internet continues to expand its usage with an enormous number of applications, cyber-threats have significantly increased accordingly. Thus, accurate detection of malicious traffic in a timely manner is a critical concern in today’s Internet for security. One approach for intrusion detection is to use Machine Learning (ML) techniques. Several methods based on ML algorithms have been introduced over the past years, but they are largely limited in terms of detection accuracy and/or time and space complexity to run. In this work, we present a novel method for intrusion detection that incorporates a set of supervised learning algorithms. The proposed technique provides high accuracy and outperforms existing techniques that simply utilizes a single learning method. In addition, our technique relies on partial flow information (rather than full information) for detection, and thus, it is light-weight and desirable for online operations with the property of early identification. With the mid-Atlantic CCDC intrusion dataset publicly available, we show that our proposed technique yields a high degree of detection rate over 99% with a very low false alarm rate (0.4%).

Keywords: intrusion detection, supervised learning, traffic classification, computer networks

Procedia PDF Downloads 319

5919 Using of the Fractal Dimensions for the Analysis of Hyperkinetic Movements in the Parkinson's Disease

Authors: Sadegh Marzban, Mohamad Sobhan Sheikh Andalibi, Farnaz Ghassemi, Farzad Towhidkhah

Abstract:

Parkinson's disease (PD), which is characterized by the tremor at rest, rigidity, akinesia or bradykinesia and postural instability, affects the quality of life of involved individuals. The concept of a fractal is most often associated with irregular geometric objects that display self-similarity. Fractal dimension (FD) can be used to quantify the complexity and the self-similarity of an object such as tremor. In this work, we are aimed to propose a new method for evaluating hyperkinetic movements such as tremor, by using the FD and other correlated parameters in patients who are suffered from PD. In this study, we used 'the tremor data of Physionet'. The database consists of fourteen participants, diagnosed with PD including six patients with high amplitude tremor and eight patients with low amplitude. We tried to extract features from data, which can distinguish between patients before and after medication. We have selected fractal dimensions, including correlation dimension, box dimension, and information dimension. Lilliefors test has been used for normality test. Paired t-test or Wilcoxon signed rank test were also done to find differences between patients before and after medication, depending on whether the normality is detected or not. In addition, two-way ANOVA was used to investigate the possible association between the therapeutic effects and features extracted from the tremor. Just one of the extracted features showed significant differences between patients before and after medication. According to the results, correlation dimension was significantly different before and after the patient's medication (p=0.009). Also, two-way ANOVA demonstrates significant differences just in medication effect (p=0.033), and no significant differences were found between subject's differences (p=0.34) and interaction (p=0.97). The most striking result emerged from the data is that correlation dimension could quantify medication treatment based on tremor. This study has provided a technique to evaluate a non-linear measure for quantifying medication, nominally the correlation dimension. Furthermore, this study supports the idea that fractal dimension analysis yields additional information compared with conventional spectral measures in the detection of poor prognosis patients.

Keywords: correlation dimension, non-linear measure, Parkinson’s disease, tremor

Procedia PDF Downloads 220

5918 Framework for Detecting External Plagiarism from Monolingual Documents: Use of Shallow NLP and N-Gram Frequency Comparison

Authors: Saugata Bose, Ritambhra Korpal

Abstract:

The internet has increased the copy-paste scenarios amongst students as well as amongst researchers leading to different levels of plagiarized documents. For this reason, much of research is focused on for detecting plagiarism automatically. In this paper, an initiative is discussed where Natural Language Processing (NLP) techniques as well as supervised machine learning algorithms have been combined to detect plagiarized texts. Here, the major emphasis is on to construct a framework which detects external plagiarism from monolingual texts successfully. For successfully detecting the plagiarism, n-gram frequency comparison approach has been implemented to construct the model framework. The framework is based on 120 characteristics which have been extracted during pre-processing the documents using NLP approach. Afterwards, filter metrics has been applied to select most relevant characteristics and then supervised classification learning algorithm has been used to classify the documents in four levels of plagiarism. Confusion matrix was built to estimate the false positives and false negatives. Our plagiarism framework achieved a very high the accuracy score.

Keywords: lexical matching, shallow NLP, supervised machine learning algorithm, word n-gram

Procedia PDF Downloads 334

5917 The Effect of Soil Fractal Dimension on the Performance of Cement Stabilized Soil

Authors: Nkiru I. Ibeakuzie, Paul D. J. Watson, John F. Pescatore

Abstract:

In roadway construction, the cost of soil-cement stabilization per unit area is significantly influenced by the binder content, hence the need to optimise cement usage. This research work will characterize the influence of soil fractal geometry on properties of cement-stabilized soil, and strive to determine a correlation between mechanical proprieties of cement-stabilized soil and the mass fractal dimension Dₘ indicated by particle size distribution (PSD) of aggregate mixtures. Since strength development in cemented soil relies not only on cement content but also on soil PSD, this study will investigate the possibility of reducing cement content by changing the PSD of soil, without compromising on strength, reduced permeability, and compressibility. A series of soil aggregate mixes will be prepared in the laboratory. The mass fractal dimension Dₘ of each mix will be determined from sieve analysis data prior to stabilization with cement. Stabilized soil samples will be tested for strength, permeability, and compressibility.

Keywords: fractal dimension, particle size distribution, cement stabilization, cement content

Procedia PDF Downloads 186

5916 Semi-Supervised Learning Using Pseudo F Measure

Authors: Mahesh Balan U, Rohith Srinivaas Mohanakrishnan, Venkat Subramanian

Abstract:

Positive and unlabeled learning (PU) has gained more attention in both academic and industry research literature recently because of its relevance to existing business problems today. Yet, there still seems to be some existing challenges in terms of validating the performance of PU learning, as the actual truth of unlabeled data points is still unknown in contrast to a binary classification where we know the truth. In this study, we propose a novel PU learning technique based on the Pseudo-F measure, where we address this research gap. In this approach, we train the PU model to discriminate the probability distribution of the positive and unlabeled in the validation and spy data. The predicted probabilities of the PU model have a two-fold validation – (a) the predicted probabilities of reliable positives and predicted positives should be from the same distribution; (b) the predicted probabilities of predicted positives and predicted unlabeled should be from a different distribution. We experimented with this approach on a credit marketing case study in one of the world’s biggest fintech platforms and found evidence for benchmarking performance and backtested using historical data. This study contributes to the existing literature on semi-supervised learning.

Keywords: PU learning, semi-supervised learning, pseudo f measure, classification

Procedia PDF Downloads 205

5915 Application of Remote Sensing and GIS in Assessing Land Cover Changes within Granite Quarries around Brits Area, South Africa

Authors: Refilwe Moeletsi

Abstract:

Dimension stone quarrying around Brits and Belfast areas started in the early 1930s and has been growing rapidly since then. Environmental impacts associated with these quarries have not been documented, and hence this study aims at detecting any change in the environment that might have been caused by these activities. Landsat images that were used to assess land use/land cover changes in Brits quarries from 1998 - 2015. A supervised classification using maximum likelihood classifier was applied to classify each image into different land use/land cover types. Classification accuracy was assessed using Google Earth™ as a source of reference data. Post-classification change detection method was used to determine changes. The results revealed significant increase in granite quarries and corresponding decrease in vegetation cover within the study region.

Keywords: remote sensing, GIS, change detection, granite quarries

Procedia PDF Downloads 283

5914 Predicting Blockchain Technology Installation Cost in Supply Chain System through Supervised Learning

Authors: Hossein Havaeji, Tony Wong, Thien-My Dao

Abstract:

1. Research Problems and Research Objectives: Blockchain Technology-enabled Supply Chain System (BT-enabled SCS) is the system using BT to drive SCS transparency, security, durability, and process integrity as SCS data is not always visible, available, or trusted. The costs of operating BT in the SCS are a common problem in several organizations. The costs must be estimated as they can impact existing cost control strategies. To account for system and deployment costs, it is necessary to overcome the following hurdle. The problem is that the costs of developing and running a BT in SCS are not yet clear in most cases. Many industries aiming to use BT have special attention to the importance of BT installation cost which has a direct impact on the total costs of SCS. Predicting BT installation cost in SCS may help managers decide whether BT is to be an economic advantage. The purpose of the research is to identify some main BT installation cost components in SCS needed for deeper cost analysis. We then identify and categorize the main groups of cost components in more detail to utilize them in the prediction process. The second objective is to determine the suitable Supervised Learning technique in order to predict the costs of developing and running BT in SCS in a particular case study. The last aim is to investigate how the running BT cost can be involved in the total cost of SCS. 2. Work Performed: Applied successfully in various fields, Supervised Learning is a method to set the data frame, treat the data, and train/practice the method sort. It is a learning model directed to make predictions of an outcome measurement based on a set of unforeseen input data. The following steps must be conducted to search for the objectives of our subject. The first step is to make a literature review to identify the different cost components of BT installation in SCS. Based on the literature review, we should choose some Supervised Learning methods which are suitable for BT installation cost prediction in SCS. According to the literature review, some Supervised Learning algorithms which provide us with a powerful tool to classify BT installation components and predict BT installation cost are the Support Vector Regression (SVR) algorithm, Back Propagation (BP) neural network, and Artificial Neural Network (ANN). Choosing a case study to feed data into the models comes into the third step. Finally, we will propose the best predictive performance to find the minimum BT installation costs in SCS. 3. Expected Results and Conclusion: This study tends to propose a cost prediction of BT installation in SCS with the help of Supervised Learning algorithms. At first attempt, we will select a case study in the field of BT-enabled SCS, and then use some Supervised Learning algorithms to predict BT installation cost in SCS. We continue to find the best predictive performance for developing and running BT in SCS. Finally, the paper will be presented at the conference.

Keywords: blockchain technology, blockchain technology-enabled supply chain system, installation cost, supervised learning

Procedia PDF Downloads 97