Search results for: classification of factors
12315 Discerning Divergent Nodes in Social Networks
Authors: Mehran Asadi, Afrand Agah
Abstract:
In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.Keywords: online social networks, data mining, social cloud computing, interaction and collaboration
Procedia PDF Downloads 15412314 Identification of High-Rise Buildings Using Object Based Classification and Shadow Extraction Techniques
Authors: Subham Kharel, Sudha Ravindranath, A. Vidya, B. Chandrasekaran, K. Ganesha Raj, T. Shesadri
Abstract:
Digitization of urban features is a tedious and time-consuming process when done manually. In addition to this problem, Indian cities have complex habitat patterns and convoluted clustering patterns, which make it even more difficult to map features. This paper makes an attempt to classify urban objects in the satellite image using object-oriented classification techniques in which various classes such as vegetation, water bodies, buildings, and shadows adjacent to the buildings were mapped semi-automatically. Building layer obtained as a result of object-oriented classification along with already available building layers was used. The main focus, however, lay in the extraction of high-rise buildings using spatial technology, digital image processing, and modeling, which would otherwise be a very difficult task to carry out manually. Results indicated a considerable rise in the total number of buildings in the city. High-rise buildings were successfully mapped using satellite imagery, spatial technology along with logical reasoning and mathematical considerations. The results clearly depict the ability of Remote Sensing and GIS to solve complex problems in urban scenarios like studying urban sprawl and identification of more complex features in an urban area like high-rise buildings and multi-dwelling units. Object-Oriented Technique has been proven to be effective and has yielded an overall efficiency of 80 percent in the classification of high-rise buildings.Keywords: object oriented classification, shadow extraction, high-rise buildings, satellite imagery, spatial technology
Procedia PDF Downloads 15412313 Factors Associated with Hand Functional Disability in People with Rheumatoid Arthritis: A Systematic Review and Best-Evidence Synthesis
Authors: Hisham Arab Alkabeya, A. M. Hughes, J. Adams
Abstract:
Background: People with Rheumatoid Arthritis (RA) continue to experience problems with hand function despite new drug advances and targeted medical treatment. Consequently, it is important to identify the factors that influence the impact of RA disease on hand function. This systematic review identified observational studies that reported factors that influenced the impact of RA on hand function. Methods: MEDLINE, EMBASE, CINAL, AMED, PsychINFO, and Web of Science database were searched from January 1990 up to March 2017. Full-text articles published in English that described factors related to hand functional disability in people with RA were selected following predetermined inclusion and exclusion criteria. Pertinent data were thoroughly extracted and documented using a pre-designed data extraction form by the lead author, and cross-checked by the review team for completion and accuracy. Factors related to hand function were classified under the domains of the International Classification of Functioning, Disability, and Health (ICF) framework and health-related factors. Three reviewers independently assessed the methodological quality of the included articles using the quality of cross-sectional studies (AXIS) tool. Factors related to hand function that was investigated in two or more studies were explored using a best-evidence synthesis. Results: Twenty articles form 19 studies met the inclusion criteria from 1,271 citations; all presented cross-sectional data (five high quality and 15 low quality studies), resulting in at best limited evidence in the best-evidence synthesis. For the factors classified under the ICF domains, the best-evidence synthesis indicates that there was a range of body structure and function factors that were related with hand functional disability. However, key factors were hand strength, disease activity, and pain intensity. Low functional status (physical, emotional and social) level was found to be related with limited hand function. For personal factors, there is limited evidence that gender is not related with hand function; whereas, conflicting evidence was found regarding the relationship between age and hand function. In the domain of environmental factors, there was limited evidence that work activity was not related with hand function. Regarding health-related factors, there was limited evidence that the level of the rheumatoid factor (RF) was not related to hand function. Finally, conflicting evidence was found regarding the relationship between hand function and disease duration and general health status. Conclusion: Studies focused on body structure and function factors, highlighting a lack of investigation into personal and environmental factors when considering the impact of RA on hand function. The level of evidence which exists was limited, but identified that modifiable factors such as grip or pinch strength, disease activity and pain are the most influential factors on hand function in people with RA. The review findings suggest that important personal and environmental factors that impact on hand function in people with RA are not yet considered or reported in clinical research. Well-designed longitudinal, preferably cohort, studies are now needed to better understand the causality between personal and environmental factors and hand functional disability in people with RA.Keywords: factors, hand function, rheumatoid arthritis, systematic review
Procedia PDF Downloads 14512312 Design and Implementation of Generative Models for Odor Classification Using Electronic Nose
Authors: Kumar Shashvat, Amol P. Bhondekar
Abstract:
In the midst of the five senses, odor is the most reminiscent and least understood. Odor testing has been mysterious and odor data fabled to most practitioners. The delinquent of recognition and classification of odor is important to achieve. The facility to smell and predict whether the artifact is of further use or it has become undesirable for consumption; the imitation of this problem hooked on a model is of consideration. The general industrial standard for this classification is color based anyhow; odor can be improved classifier than color based classification and if incorporated in machine will be awfully constructive. For cataloging of odor for peas, trees and cashews various discriminative approaches have been used Discriminative approaches offer good prognostic performance and have been widely used in many applications but are incapable to make effectual use of the unlabeled information. In such scenarios, generative approaches have better applicability, as they are able to knob glitches, such as in set-ups where variability in the series of possible input vectors is enormous. Generative models are integrated in machine learning for either modeling data directly or as a transitional step to form an indeterminate probability density function. The algorithms or models Linear Discriminant Analysis and Naive Bayes Classifier have been used for classification of the odor of cashews. Linear Discriminant Analysis is a method used in data classification, pattern recognition, and machine learning to discover a linear combination of features that typifies or divides two or more classes of objects or procedures. The Naive Bayes algorithm is a classification approach base on Bayes rule and a set of qualified independence theory. Naive Bayes classifiers are highly scalable, requiring a number of restraints linear in the number of variables (features/predictors) in a learning predicament. The main recompenses of using the generative models are generally a Generative Models make stronger assumptions about the data, specifically, about the distribution of predictors given the response variables. The Electronic instrument which is used for artificial odor sensing and classification is an electronic nose. This device is designed to imitate the anthropological sense of odor by providing an analysis of individual chemicals or chemical mixtures. The experimental results have been evaluated in the form of the performance measures i.e. are accuracy, precision and recall. The investigational results have proven that the overall performance of the Linear Discriminant Analysis was better in assessment to the Naive Bayes Classifier on cashew dataset.Keywords: odor classification, generative models, naive bayes, linear discriminant analysis
Procedia PDF Downloads 38712311 A Comparative Study for Various Techniques Using WEKA for Red Blood Cells Classification
Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy
Abstract:
Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifyig the red blood cells as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectivelyKeywords: red blood cells, classification, radial basis function neural networks, suport vector machine, k-nearest neighbors algorithm
Procedia PDF Downloads 47912310 A Spatial Hypergraph Based Semi-Supervised Band Selection Method for Hyperspectral Imagery Semantic Interpretation
Authors: Akrem Sellami, Imed Riadh Farah
Abstract:
Hyperspectral imagery (HSI) typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image. Hence, a pixel in HSI is a high-dimensional vector of intensities with a large spectral range and a high spectral resolution. Therefore, the semantic interpretation is a challenging task of HSI analysis. We focused in this paper on object classification as HSI semantic interpretation. However, HSI classification still faces some issues, among which are the following: The spatial variability of spectral signatures, the high number of spectral bands, and the high cost of true sample labeling. Therefore, the high number of spectral bands and the low number of training samples pose the problem of the curse of dimensionality. In order to resolve this problem, we propose to introduce the process of dimensionality reduction trying to improve the classification of HSI. The presented approach is a semi-supervised band selection method based on spatial hypergraph embedding model to represent higher order relationships with different weights of the spatial neighbors corresponding to the centroid of pixel. This semi-supervised band selection has been developed to select useful bands for object classification. The presented approach is evaluated on AVIRIS and ROSIS HSIs and compared to other dimensionality reduction methods. The experimental results demonstrate the efficacy of our approach compared to many existing dimensionality reduction methods for HSI classification.Keywords: dimensionality reduction, hyperspectral image, semantic interpretation, spatial hypergraph
Procedia PDF Downloads 30512309 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles
Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis
Abstract:
Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review
Procedia PDF Downloads 16112308 Proposed Organizational Development Interventions in Managing Occupational Stressors for Business Schools in Batangas City
Authors: Marlon P. Perez
Abstract:
The study intended to determine the level of occupational stress that was experienced by faculty members of private and public business schools in Batangas City with the end in view of proposing organizational development interventions in managing occupational stressors. Stressors such as factors intrinsic to the job, role in the organization, relationships at work, career development and organizational structure and climate were used as determinants of occupational stress level. Descriptive method of research was used as its research design. There were only 64 full-time faculty members coming from private and public business schools in Batangas City – University of Batangas, Lyceum of the Philippines University-Batangas, Golden Gate Colleges, Batangas State University and Colegio ng Lungsod ng Batangas. Survey questionnaire was used as data gathering instrument. It was found out that all occupational stressors were assessed stressful when grouped according to its classification of tertiary schools while response of subject respondents differs on their assessment of occupational stressors. Age variable has become significantly related to respondents’ assessments on factors intrinsic to the job and career development; however, it was not significantly related to role in the organization, relationships at work and organizational structure and climate. On the other hand, gender, marital status, highest educational attainment, employment status, length of service, area of specialization and classification of tertiary school were revealed to be not significantly related to all occupational stressors. Various organizational development interventions have been proposed to manage the occupational stressors that are experienced by business faculty members in the institution.Keywords: occupational stress, business school, organizational development, intervention, stressors, faculty members, assessment, manage
Procedia PDF Downloads 43112307 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm
Authors: Kamel Belammi, Houria Fatrim
Abstract:
imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes
Procedia PDF Downloads 53112306 Anemia Among Pregnant Women in Kuwait: Findings from Kuwait Birth Cohort Study
Authors: Majeda Hammoud
Abstract:
Background: Anemia during pregnancy increases the risk of delivery by cesarean section, low birth weight, preterm birth, perinatal mortality, stillbirth, and maternal mortality. In this study, we aimed to assess the prevalence of anemia in pregnant women and its associated factors in the Kuwait birth cohort study. Methods: The Kuwait birth cohort (N=1108) was a prospective cohort study in which pregnant women were recruited in the third trimester. Data were collected through personal interviews with mothers who attend antenatal care visits, including data on socio-economic status and lifestyle factors. Blood samples were taken after the recruitment to measure multiple laboratory indicators. Clinical data were extracted from the medical records by a clinician including data on comorbidities. Anemia was defined as having Hemoglobin (Hb) <110 g/L with further classification as mild (100-109 g/L), moderate (70-99 g/L), or severe (<70 g/L). Predictors of anemia were classified as underlying or direct factors, and logistic regression was used to investigate their association with anemia. Results: The mean Hb level in the study group was 115.21 g/L (95%CI: 114.56- 115.87 g/L), with significant differences between age groups (p=0.034). The prevalence of anemia was 28.16% (95%CI: 25.53-30.91%), with no significant difference by age group (p=0.164). Of all 1108 pregnant women, 8.75% had moderate anemia, and 19.40% had mild anemia, but no pregnant women had severe anemia. In multivariable analysis, getting pregnant while using contraception, adjusted odds ratio (AOR) 1.73(95%CI:1.01-2.96); p=0.046 and current use of supplements, AOR 0.50 (95%CI: 0.26-0.95); p=0.035 were significantly associated with anemia (underlying factors). From the direct factors group, only iron and ferritin levels were significantly associated with anemia (P<0.001). Conclusion: Although the severe form of anemia is low among pregnant women in Kuwait, mild and moderate anemia remains a significant health problem despite free access to antenatal care.Keywords: anemia, pregnancy, hemoglobin, ferritin
Procedia PDF Downloads 4912305 Unsupervised Classification of DNA Barcodes Species Using Multi-Library Wavelet Networks
Authors: Abdesselem Dakhli, Wajdi Bellil, Chokri Ben Amar
Abstract:
DNA Barcode, a short mitochondrial DNA fragment, made up of three subunits; a phosphate group, sugar and nucleic bases (A, T, C, and G). They provide good sources of information needed to classify living species. Such intuition has been confirmed by many experimental results. Species classification with DNA Barcode sequences has been studied by several researchers. The classification problem assigns unknown species to known ones by analyzing their Barcode. This task has to be supported with reliable methods and algorithms. To analyze species regions or entire genomes, it becomes necessary to use similarity sequence methods. A large set of sequences can be simultaneously compared using Multiple Sequence Alignment which is known to be NP-complete. To make this type of analysis feasible, heuristics, like progressive alignment, have been developed. Another tool for similarity search against a database of sequences is BLAST, which outputs shorter regions of high similarity between a query sequence and matched sequences in the database. However, all these methods are still computationally very expensive and require significant computational infrastructure. Our goal is to build predictive models that are highly accurate and interpretable. This method permits to avoid the complex problem of form and structure in different classes of organisms. On empirical data and their classification performances are compared with other methods. Our system consists of three phases. The first is called transformation, which is composed of three steps; Electron-Ion Interaction Pseudopotential (EIIP) for the codification of DNA Barcodes, Fourier Transform and Power Spectrum Signal Processing. The second is called approximation, which is empowered by the use of Multi Llibrary Wavelet Neural Networks (MLWNN).The third is called the classification of DNA Barcodes, which is realized by applying the algorithm of hierarchical classification.Keywords: DNA barcode, electron-ion interaction pseudopotential, Multi Library Wavelet Neural Networks (MLWNN)
Procedia PDF Downloads 31612304 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification
Authors: A. Elsehemy, M. Abdeen , T. Nazmy
Abstract:
Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology
Procedia PDF Downloads 52312303 Estimating Tree Height and Forest Classification from Multi Temporal Risat-1 HH and HV Polarized Satellite Aperture Radar Interferometric Phase Data
Authors: Saurav Kumar Suman, P. Karthigayani
Abstract:
In this paper the height of the tree is estimated and forest types is classified from the multi temporal RISAT-1 Horizontal-Horizontal (HH) and Horizontal-Vertical (HV) Polarised Satellite Aperture Radar (SAR) data. The novelty of the proposed project is combined use of the Back-scattering Coefficients (Sigma Naught) and the Coherence. It uses Water Cloud Model (WCM). The approaches use two main steps. (a) Extraction of the different forest parameter data from the Product.xml, BAND-META file and from Grid-xxx.txt file come with the HH & HV polarized data from the ISRO (Indian Space Research Centre). These file contains the required parameter during height estimation. (b) Calculation of the Vegetation and Ground Backscattering, Coherence and other Forest Parameters. (c) Classification of Forest Types using the ENVI 5.0 Tool and ROI (Region of Interest) calculation.Keywords: RISAT-1, classification, forest, SAR data
Procedia PDF Downloads 40412302 Ensemble-Based SVM Classification Approach for miRNA Prediction
Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam
Abstract:
In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data
Procedia PDF Downloads 34712301 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis
Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel
Abstract:
Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.Keywords: artificial immune system, breast cancer diagnosis, Euclidean function, Gaussian function
Procedia PDF Downloads 43312300 Causes of Variation Orders in the Egyptian Construction Industry: Time and Cost Impacts
Authors: A. Samer Ezeldin, Jwanda M. El Sarag
Abstract:
Variation orders are of great importance in any construction project. Variation orders are defined as any change in the scope of works of a project that can be an addition omission, or even modification. This paper investigates the variation orders that occur during construction projects in Egypt. The literature review represents a comparison of causes of variation orders among Egypt, Tanzania, Nigeria, Malaysia and the United Kingdom. A classification of occurrence of variation orders due to owner related factors, consultant related factors and other factors are signified in the literature review. These classified events that lead to variation orders were introduced in a survey with 19 events to observe their frequency of occurrence, and their time and cost impacts. The survey data was obtained from 87 participants that included clients, consultants, and contractors and a database of 42 scenarios was created. A model is then developed to help assist project managers in predicting the frequency of variations and account for a budget for any additional costs and minimize any delays that can take place. Two experts with more than 25 years of experience were given the model to verify that the model was working effectively. The model was then validated on a residential compound that was completed in July 2016 to prove that the model actually produces acceptable results.Keywords: construction, cost impact, Egypt, time impact, variation orders
Procedia PDF Downloads 17912299 Incorporating Information Gain in Regular Expressions Based Classifiers
Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler
Abstract:
A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.Keywords: information gain, regular expressions, smith-waterman algorithm, text classification
Procedia PDF Downloads 31912298 One-Class Classification Approach Using Fukunaga-Koontz Transform and Selective Multiple Kernel Learning
Authors: Abdullah Bal
Abstract:
This paper presents a one-class classification (OCC) technique based on Fukunaga-Koontz Transform (FKT) for binary classification problems. The FKT is originally a powerful tool to feature selection and ordering for two-class problems. To utilize the standard FKT for data domain description problem (i.e., one-class classification), in this paper, a set of non-class samples which exist outside of positive class (target class) describing boundary formed with limited training data has been constructed synthetically. The tunnel-like decision boundary around upper and lower border of target class samples has been designed using statistical properties of feature vectors belonging to the training data. To capture higher order of statistics of data and increase discrimination ability, the proposed method, termed one-class FKT (OC-FKT), has been extended to its nonlinear version via kernel machines and referred as OC-KFKT for short. Multiple kernel learning (MKL) is a favorable family of machine learning such that tries to find an optimal combination of a set of sub-kernels to achieve a better result. However, the discriminative ability of some of the base kernels may be low and the OC-KFKT designed by this type of kernels leads to unsatisfactory classification performance. To address this problem, the quality of sub-kernels should be evaluated, and the weak kernels must be discarded before the final decision making process. MKL/OC-FKT and selective MKL/OC-FKT frameworks have been designed stimulated by ensemble learning (EL) to weight and then select the sub-classifiers using the discriminability and diversities measured by eigenvalue ratios. The eigenvalue ratios have been assessed based on their regions on the FKT subspaces. The comparative experiments, performed on various low and high dimensional data, against state-of-the-art algorithms confirm the effectiveness of our techniques, especially in case of small sample size (SSS) conditions.Keywords: ensemble methods, fukunaga-koontz transform, kernel-based methods, multiple kernel learning, one-class classification
Procedia PDF Downloads 1912297 Online Handwritten Character Recognition for South Indian Scripts Using Support Vector Machines
Authors: Steffy Maria Joseph, Abdu Rahiman V, Abdul Hameed K. M.
Abstract:
Online handwritten character recognition is a challenging field in Artificial Intelligence. The classification success rate of current techniques decreases when the dataset involves similarity and complexity in stroke styles, number of strokes and stroke characteristics variations. Malayalam is a complex south indian language spoken by about 35 million people especially in Kerala and Lakshadweep islands. In this paper, we consider the significant feature extraction for the similar stroke styles of Malayalam. This extracted feature set are suitable for the recognition of other handwritten south indian languages like Tamil, Telugu and Kannada. A classification scheme based on support vector machines (SVM) is proposed to improve the accuracy in classification and recognition of online malayalam handwritten characters. SVM Classifiers are the best for real world applications. The contribution of various features towards the accuracy in recognition is analysed. Performance for different kernels of SVM are also studied. A graphical user interface has developed for reading and displaying the character. Different writing styles are taken for each of the 44 alphabets. Various features are extracted and used for classification after the preprocessing of input data samples. Highest recognition accuracy of 97% is obtained experimentally at the best feature combination with polynomial kernel in SVM.Keywords: SVM, matlab, malayalam, South Indian scripts, onlinehandwritten character recognition
Procedia PDF Downloads 57412296 A t-SNE and UMAP Based Neural Network Image Classification Algorithm
Authors: Shelby Simpson, William Stanley, Namir Naba, Xiaodi Wang
Abstract:
Both t-SNE and UMAP are brand new state of art tools to predominantly preserve the local structure that is to group neighboring data points together, which indeed provides a very informative visualization of heterogeneity in our data. In this research, we develop a t-SNE and UMAP base neural network image classification algorithm to embed the original dataset to a corresponding low dimensional dataset as a preprocessing step, then use this embedded database as input to our specially designed neural network classifier for image classification. We use the fashion MNIST data set, which is a labeled data set of images of clothing objects in our experiments. t-SNE and UMAP are used for dimensionality reduction of the data set and thus produce low dimensional embeddings. Furthermore, we use the embeddings from t-SNE and UMAP to feed into two neural networks. The accuracy of the models from the two neural networks is then compared to a dense neural network that does not use embedding as an input to show which model can classify the images of clothing objects more accurately.Keywords: t-SNE, UMAP, fashion MNIST, neural networks
Procedia PDF Downloads 19712295 Classifying Affective States in Virtual Reality Environments Using Physiological Signals
Authors: Apostolos Kalatzis, Ashish Teotia, Vishnunarayan Girishan Prabhu, Laura Stanley
Abstract:
Emotions are functional behaviors influenced by thoughts, stimuli, and other factors that induce neurophysiological changes in the human body. Understanding and classifying emotions are challenging as individuals have varying perceptions of their environments. Therefore, it is crucial that there are publicly available databases and virtual reality (VR) based environments that have been scientifically validated for assessing emotional classification. This study utilized two commercially available VR applications (Guided Meditation VR™ and Richie’s Plank Experience™) to induce acute stress and calm state among participants. Subjective and objective measures were collected to create a validated multimodal dataset and classification scheme for affective state classification. Participants’ subjective measures included the use of the Self-Assessment Manikin, emotional cards and 9 point Visual Analogue Scale for perceived stress, collected using a Virtual Reality Assessment Tool developed by our team. Participants’ objective measures included Electrocardiogram and Respiration data that were collected from 25 participants (15 M, 10 F, Mean = 22.28 4.92). The features extracted from these data included heart rate variability components and respiration rate, both of which were used to train two machine learning models. Subjective responses validated the efficacy of the VR applications in eliciting the two desired affective states; for classifying the affective states, a logistic regression (LR) and a support vector machine (SVM) with a linear kernel algorithm were developed. The LR outperformed the SVM and achieved 93.8%, 96.2%, 93.8% leave one subject out cross-validation accuracy, precision and recall, respectively. The VR assessment tool and data collected in this study are publicly available for other researchers.Keywords: affective computing, biosignals, machine learning, stress database
Procedia PDF Downloads 14012294 Classification of Health Information Needs of Hypertensive Patients in the Online Health Community Based on Content Analysis
Authors: Aijing Luo, Zirui Xin, Yifeng Yuan
Abstract:
Background: With the rapid development of the online health community, more and more patients or families are seeking health information on the Internet. Objective: This study aimed to discuss how to fully reveal the health information needs expressed by hypertensive patients in their questions in the online environment. Methods: This study randomly selected 1,000 text records from the question data of hypertensive patients from 2008 to 2018 collected from the website www.haodf.com and constructed a classification system through literature research and content analysis. This paper identified the background characteristics and questioning the intention of each hypertensive patient based on the patient’s question and used co-occurrence network analysis to explore the features of the health information needs of hypertensive patients. Results: The classification system for health information needs of patients with hypertension is composed of 9 parts: 355 kinds of drugs, 395 kinds of symptoms and signs, 545 kinds of tests and examinations , 526 kinds of demographic data, 80 kinds of diseases, 37 kinds of risk factors, 43 kinds of emotions, 6 kinds of lifestyles, 49 kinds of questions. The characteristics of the explored online health information needs of the hypertensive patients include: i)more than 49% of patients describe the features such as drugs, symptoms and signs, tests and examinations, demographic data, diseases, etc. ii) these groups are most concerned about treatment (77.8%), followed by diagnosis (32.3%); iii) 65.8% of hypertensive patients will ask doctors online several questions at the same time. 28.3% of the patients are very concerned about how to adjust the medication, and they will ask other treatment-related questions at the same time, including drug side effects, whether to take drugs, how to treat a disease, etc.; secondly, 17.6% of the patients will consult the doctors online about the causes of the clinical findings, including the relationship between the clinical findings and a disease, the treatment of a disease, medication, and examinations. Conclusion: In the online environment, the health information needs expressed by Chinese hypertensive patients to doctors are personalized; that is, patients with different background features express their questioning intentions to doctors. The classification system constructed in this study can guide health information service providers in the construction of online health resources, to help solve the problem of information asymmetry in communication between doctors and patients.Keywords: online health community, health information needs, hypertensive patients, doctor-patient communication
Procedia PDF Downloads 11812293 Factor Influencing the Certification to ISO 9000:2008 among SME in Malaysia
Authors: Dolhadi Bin Zainudin
Abstract:
The study attempts to predict the relationship between influencing factors in the adoption of ISO 9000:2008 and to identify which how these factors play the main role in achieving ISO 9000 standard. A survey using structured questionnaire was employed. A total of 255 respondents from 255 small and medium enterprises participated in this study. With regards to influencing factors, a discriminant analysis was conducted and the results showed that three out of nine critical success factors is statistically significant between ISO 9000:2008 and non-ISO 9000 certified companies which are communication for quality, information and analysis and organizational culture.Keywords: ISO 9000, quality management, factors, small and medium enterprise, Malaysia, influencing factors
Procedia PDF Downloads 33612292 A Systemic Review and Comparison of Non-Isolated Bi-Directional Converters
Authors: Rahil Bahrami, Kaveh Ashenayi
Abstract:
This paper presents a systematic classification and comparative analysis of non-isolated bi-directional DC-DC converters. The increasing demand for efficient energy conversion in diverse applications has spurred the development of various converter topologies. In this study, we categorize bi-directional converters into three distinct classes: Inverting, Non-Inverting, and Interleaved. Each category is characterized by its unique operational characteristics and benefits. Furthermore, a practical comparison is conducted by evaluating the results of simulation of each bi-directional converter. BDCs can be classified into isolated and non-isolated topologies. Non-isolated converters share a common ground between input and output, making them suitable for applications with minimal voltage change. They are easy to integrate, lightweight, and cost-effective but have limitations like limited voltage gain, switching losses, and no protection against high voltages. Isolated converters use transformers to separate input and output, offering safety benefits, high voltage gain, and noise reduction. They are larger and more costly but are essential for automotive designs where safety is crucial. The paper focuses on non-isolated systems.The paper discusses the classification of non-isolated bidirectional converters based on several criteria. Common factors used for classification include topology, voltage conversion, control strategy, power capacity, voltage range, and application. These factors serve as a foundation for categorizing converters, although the specific scheme might vary depending on contextual, application, or system-specific requirements. The paper presents a three-category classification for non-isolated bi-directional DC-DC converters: inverting, non-inverting, and interleaved. In the inverting category, converters produce an output voltage with reversed polarity compared to the input voltage, achieved through specific circuit configurations and control strategies. This is valuable in applications such as motor control and grid-tied solar systems. The non-inverting category consists of converters maintaining the same voltage polarity, useful in scenarios like battery equalization. Lastly, the interleaved category employs parallel converter stages to enhance power delivery and reduce current ripple. This classification framework enhances comprehension and analysis of non-isolated bi-directional DC-DC converters. The findings contribute to a deeper understanding of the trade-offs and merits associated with different converter types. As a result, this work aids researchers, practitioners, and engineers in selecting appropriate bi-directional converter solutions for specific energy conversion requirements. The proposed classification framework and experimental assessment collectively enhance the comprehension of non-isolated bi-directional DC-DC converters, fostering advancements in efficient power management and utilization.The simulation process involves the utilization of PSIM to model and simulate non-isolated bi-directional converter from both inverted and non-inverted category. The aim is to conduct a comprehensive comparative analysis of these converters, considering key performance indicators such as rise time, efficiency, ripple factor, and maximum error. This systematic evaluation provides valuable insights into the dynamic response, energy efficiency, output stability, and overall precision of the converters. The results of this comparison facilitate informed decision-making and potential optimizations, ensuring that the chosen converter configuration aligns effectively with the designated operational criteria and performance goals.Keywords: bi-directional, DC-DC converter, non-isolated, energy conversion
Procedia PDF Downloads 9812291 Autonomous Vehicle Detection and Classification in High Resolution Satellite Imagery
Authors: Ali J. Ghandour, Houssam A. Krayem, Abedelkarim A. Jezzini
Abstract:
High-resolution satellite images and remote sensing can provide global information in a fast way compared to traditional methods of data collection. Under such high resolution, a road is not a thin line anymore. Objects such as cars and trees are easily identifiable. Automatic vehicles enumeration can be considered one of the most important applications in traffic management. In this paper, autonomous vehicle detection and classification approach in highway environment is proposed. This approach consists mainly of three stages: (i) first, a set of preprocessing operations are applied including soil, vegetation, water suppression. (ii) Then, road networks detection and delineation is implemented using built-up area index, followed by several morphological operations. This step plays an important role in increasing the overall detection accuracy since vehicles candidates are objects contained within the road networks only. (iii) Multi-level Otsu segmentation is implemented in the last stage, resulting in vehicle detection and classification, where detected vehicles are classified into cars and trucks. Accuracy assessment analysis is conducted over different study areas to show the great efficiency of the proposed method, especially in highway environment.Keywords: remote sensing, object identification, vehicle and road extraction, vehicle and road features-based classification
Procedia PDF Downloads 23012290 Listening Anxiety in Iranian EFL learners
Authors: Samaneh serraj
Abstract:
Listening anxiety has a detrimental effect on language learners. Through a qualitative study on Iranian EFL learners several factors were identified as having influence on their listening anxiety. These factors were divided into three categories, i.e. individual factors (nerves and emotionality, using inappropriate strategies and lack of practice), input factors (lack of time to process, lack of visual support, nature of speech and level of difficulty) and environmental factors (instructors, peers and class environment).Keywords: listening Comprehension, Listening Anxiety, Foreign language learners
Procedia PDF Downloads 46812289 Dynamic Distribution Calibration for Improved Few-Shot Image Classification
Authors: Majid Habib Khan, Jinwei Zhao, Xinhong Hei, Liu Jiedong, Rana Shahzad Noor, Muhammad Imran
Abstract:
Deep learning is increasingly employed in image classification, yet the scarcity and high cost of labeled data for training remain a challenge. Limited samples often lead to overfitting due to biased sample distribution. This paper introduces a dynamic distribution calibration method for few-shot learning. Initially, base and new class samples undergo normalization to mitigate disparate feature magnitudes. A pre-trained model then extracts feature vectors from both classes. The method dynamically selects distribution characteristics from base classes (both adjacent and remote) in the embedding space, using a threshold value approach for new class samples. Given the propensity of similar classes to share feature distributions like mean and variance, this research assumes a Gaussian distribution for feature vectors. Subsequently, distributional features of new class samples are calibrated using a corrected hyperparameter, derived from the distribution features of both adjacent and distant base classes. This calibration augments the new class sample set. The technique demonstrates significant improvements, with up to 4% accuracy gains in few-shot classification challenges, as evidenced by tests on miniImagenet and CUB datasets.Keywords: deep learning, computer vision, image classification, few-shot learning, threshold
Procedia PDF Downloads 6412288 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling
Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed
Abstract:
Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.Keywords: machine learning, pattern recognition, facial pose classification, time series
Procedia PDF Downloads 34912287 COVID-19 Detection from Computed Tomography Images Using UNet Segmentation, Region Extraction, and Classification Pipeline
Authors: Kenan Morani, Esra Kaya Ayana
Abstract:
This study aimed to develop a novel pipeline for COVID-19 detection using a large and rigorously annotated database of computed tomography (CT) images. The pipeline consists of UNet-based segmentation, lung extraction, and a classification part, with the addition of optional slice removal techniques following the segmentation part. In this work, a batch normalization was added to the original UNet model to produce lighter and better localization, which is then utilized to build a full pipeline for COVID-19 diagnosis. To evaluate the effectiveness of the proposed pipeline, various segmentation methods were compared in terms of their performance and complexity. The proposed segmentation method with batch normalization outperformed traditional methods and other alternatives, resulting in a higher dice score on a publicly available dataset. Moreover, at the slice level, the proposed pipeline demonstrated high validation accuracy, indicating the efficiency of predicting 2D slices. At the patient level, the full approach exhibited higher validation accuracy and macro F1 score compared to other alternatives, surpassing the baseline. The classification component of the proposed pipeline utilizes a convolutional neural network (CNN) to make final diagnosis decisions. The COV19-CT-DB dataset, which contains a large number of CT scans with various types of slices and rigorously annotated for COVID-19 detection, was utilized for classification. The proposed pipeline outperformed many other alternatives on the dataset.Keywords: classification, computed tomography, lung extraction, macro F1 score, UNet segmentation
Procedia PDF Downloads 12912286 Exploring Multi-Feature Based Action Recognition Using Multi-Dimensional Dynamic Time Warping
Authors: Guoliang Lu, Changhou Lu, Xueyong Li
Abstract:
In action recognition, previous studies have demonstrated the effectiveness of using multiple features to improve the recognition performance. We focus on two practical issues: i) most studies use a direct way of concatenating/accumulating multi features to evaluate the similarity between two actions. This way could be too strong since each kind of feature can include different dimensions, quantities, etc; ii) in many studies, the employed classification methods lack of a flexible and effective mechanism to add new feature(s) into classification. In this paper, we explore an unified scheme based on recently-proposed multi-dimensional dynamic time warping (MD-DTW). Experiments demonstrated the scheme's effectiveness of combining multi-feature and the flexibility of adding new feature(s) to increase the recognition performance. In addition, the explored scheme also provides us an open architecture for using new advanced classification methods in the future to enhance action recognition.Keywords: action recognition, multi features, dynamic time warping, feature combination
Procedia PDF Downloads 436