Search results for: annotated labels.
63 Multi-labeled Data Expressed by a Set of Labels
Authors: Tetsuya Furukawa, Masahiro Kuzunishi
Abstract:
Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.
Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 130462 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range
Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu
Abstract:
Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 120861 Food Quality Labels and their Perception by Consumers in the Czech Republic
Authors: Sarka Velcovska
Abstract:
The paper deals with quality labels used in the food products market, especially with labels of quality, labels of origin, and labels of organic farming. The aim of the paper is to identify perception of these labels by consumers in the Czech Republic. The first part refers to the definition and specification of food quality labels that are relevant in the Czech Republic. The second part includes the discussion of marketing research results. Data were collected with personal questioning method. Empirical findings on 150 respondents are related to consumer awareness and perception of national and European food quality labels used in the Czech Republic, attitudes to purchases of labelled products, and interest in information regarding the labels. Statistical methods, in the concrete Pearson´s chi-square test of independence, coefficient of contingency, and coefficient of association are used to determinate if significant differences do exist among selected demographic categories of Czech consumers.
Keywords: Food quality labels, quality labels awareness, quality labels perception, marketing research.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 232760 Facial Expression Phoenix (FePh): An Annotated Sequenced Dataset for Facial and Emotion-Specified Expressions in Sign Language
Authors: Marie Alaghband, Niloofar Yousefi, Ivan Garibay
Abstract:
Facial expressions are important parts of both gesture and sign language recognition systems. Despite the recent advances in both fields, annotated facial expression datasets in the context of sign language are still scarce resources. In this manuscript, we introduce an annotated sequenced facial expression dataset in the context of sign language, comprising over 3000 facial images extracted from the daily news and weather forecast of the public tv-station PHOENIX. Unlike the majority of currently existing facial expression datasets, FePh provides sequenced semi-blurry facial images with different head poses, orientations, and movements. In addition, in the majority of images, identities are mouthing the words, which makes the data more challenging. To annotate this dataset we consider primary, secondary, and tertiary dyads of seven basic emotions of "sad", "surprise", "fear", "angry", "neutral", "disgust", and "happy". We also considered the "None" class if the image’s facial expression could not be described by any of the aforementioned emotions. Although we provide FePh as a facial expression dataset of signers in sign language, it has a wider application in gesture recognition and Human Computer Interaction (HCI) systems.Keywords: Annotated Facial Expression Dataset, Sign Language Recognition, Gesture Recognition, Sequenced Facial Expression Dataset.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 71959 OPEN_EmoRec_II- A Multimodal Corpus of Human-Computer Interaction
Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue
Abstract:
OPEN_EmoRec_II is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (facial reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes*. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations.Keywords: Open multimodal emotion corpus, annotated labels.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38958 OPEN_EmoRec_II- A Multimodal Corpus of Human-Computer Interaction
Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue
Abstract:
OPEN_EmoRec_II is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (facial reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes*. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations.Keywords: Open multimodal emotion corpus, annotated labels.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 182057 An Algorithm for the Map Labeling Problem with Two Kinds of Priorities
Authors: Noboru Abe, Yoshinori Amai, Toshinori Nakatake, Sumio Masuda, Kazuaki Yamaguchi
Abstract:
We consider the problem of placing labels of the points on a plane. For each point, its position, the size of its label and a priority are given. Moreover, several candidates of its label positions are prespecified, and each of such label positions is assigned a priority. The objective of our problem is to maximize the total sum of priorities of placed labels and their points. By refining a labeling algorithm that can use these priorities, we propose a new heuristic algorithm which is more suitable for treating the assigned priorities.
Keywords: Map labeling, greedy algorithm, heuristic algorithm, priority.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145156 Meta-Learning for Hierarchical Classification and Applications in Bioinformatics
Authors: Fabio Fabris, Alex A. Freitas
Abstract:
Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation.Keywords: Algorithm recommendation, meta-learning, bioinformatics, hierarchical classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 137055 The Use of Electronic Shelf Labels in the Retail Food Sector
Authors: Brent McKenzie, Victoria Taylor
Abstract:
The use of QR (Quick Response Codes) codes for customer scanning with mobile phones is a rapidly growing trend. The QR code can provide the consumer with product information, user guides, product use, competitive pricing, etc. One sector for QR use has been in retail, through the use of Electronic Shelf Labeling (henceforth, ESL). In Europe, the use of ESL for pricing has been in practice for a number of years but continues to lag in acceptance in North America. Stated concerns include costs as a key constraint, but there is also evidence that consumer acceptance represents a limitation as well. The purpose of this study is to present the findings of a consumer based study to gage the impact on their use in the retail food sector.Keywords: Electronic shelf labels (ESL), consumer insights, retail food sector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 330954 Skolem Sequences and Erdosian Labellings of m Paths with 2 and 3 Vertices
Authors: H. V. Chen
Abstract:
Assume that we have m identical graphs where the graphs consists of paths with k vertices where k is a positive integer. In this paper, we discuss certain labelling of the m graphs called c-Erdösian for some positive integers c. We regard labellings of the vertices of the graphs by positive integers, which induce the edge labels for the paths as the sum of the two incident vertex labels. They have the property that each vertex label and edge label appears only once in the set of positive integers {c, . . . , c+6m- 1}. Here, we show how to construct certain c-Erdösian of m paths with 2 and 3 vertices by using Skolem sequences.Keywords: c-Erdösian, Skolem sequences, magic labelling
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 116353 Featured based Segmentation of Color Textured Images using GLCM and Markov Random Field Model
Authors: Dipti Patra, Mridula J
Abstract:
In this paper, we propose a new image segmentation approach for colour textured images. The proposed method for image segmentation consists of two stages. In the first stage, textural features using gray level co-occurrence matrix(GLCM) are computed for regions of interest (ROI) considered for each class. ROI acts as ground truth for the classes. Ohta model (I1, I2, I3) is the colour model used for segmentation. Statistical mean feature at certain inter pixel distance (IPD) of I2 component was considered to be the optimized textural feature for further segmentation. In the second stage, the feature matrix obtained is assumed to be the degraded version of the image labels and modeled as Markov Random Field (MRF) model to model the unknown image labels. The labels are estimated through maximum a posteriori (MAP) estimation criterion using ICM algorithm. The performance of the proposed approach is compared with that of the existing schemes, JSEG and another scheme which uses GLCM and MRF in RGB colour space. The proposed method is found to be outperforming the existing ones in terms of segmentation accuracy with acceptable rate of convergence. The results are validated with synthetic and real textured images.
Keywords: Texture Image Segmentation, Gray Level Cooccurrence Matrix, Markov Random Field Model, Ohta colour space, ICM algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 217352 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development
Authors: L. Kamandulytė-Merfeldienė
Abstract:
The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.
Keywords: CHILDES, Corpus of Spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 94851 Impovement of a Label Extraction Method for a Risk Search System
Authors: Shigeaki Sakurai, Ryohei Orihara
Abstract:
This paper proposes an improvement method of classification efficiency in a classification model. The model is used in a risk search system and extracts specific labels from articles posted at bulletin board sites. The system can analyze the important discussions composed of the articles. The improvement method introduces ensemble learning methods that use multiple classification models. Also, it introduces expressions related to the specific labels into generation of word vectors. The paper applies the improvement method to articles collected from three bulletin board sites selected by users and verifies the effectiveness of the improvement method.Keywords: Text mining, Risk search system, Corporate reputation, Bulletin board site, Ensemble learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 132550 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient
Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart
Abstract:
Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.Keywords: Data mining, information retrieval system, multi-label, problem transformation, histogram of gradients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 131549 Bayesian Online Learning of Corresponding Points of Objects with Sequential Monte Carlo
Authors: Miika Toivanen, Jouko Lampinen
Abstract:
This paper presents an online method that learns the corresponding points of an object from un-annotated grayscale images containing instances of the object. In the first image being processed, an ensemble of node points is automatically selected which is matched in the subsequent images. A Bayesian posterior distribution for the locations of the nodes in the images is formed. The likelihood is formed from Gabor responses and the prior assumes the mean shape of the node ensemble to be similar in a translation and scale free space. An association model is applied for separating the object nodes and background nodes. The posterior distribution is sampled with Sequential Monte Carlo method. The matched object nodes are inferred to be the corresponding points of the object instances. The results show that our system matches the object nodes as accurately as other methods that train the model with annotated training images.Keywords: Bayesian modeling, Gabor filters, Online learning, Sequential Monte Carlo.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 158248 Class Outliers Mining: Distance-Based Approach
Authors: Nabil M. Hewahi, Motaz K. Saad
Abstract:
In large datasets, identifying exceptional or rare cases with respect to a group of similar cases is considered very significant problem. The traditional problem (Outlier Mining) is to find exception or rare cases in a dataset irrespective of the class label of these cases, they are considered rare events with respect to the whole dataset. In this research, we pose the problem that is Class Outliers Mining and a method to find out those outliers. The general definition of this problem is “given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels". We introduce a novel definition of Outlier that is Class Outlier, and propose the Class Outlier Factor (COF) which measures the degree of being a Class Outlier for a data object. Our work includes a proposal of a new algorithm towards mining of the Class Outliers, presenting experimental results applied on various domains of real world datasets and finally a comparison study with other related methods is performed.Keywords: Class Outliers, Distance-Based Approach, Outliers Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 338847 Implementation of a Paraconsistent-Fuzzy Digital PID Controller in a Level Control Process
Authors: H. M. Côrtes, J. I. Da Silva Filho, M. F. Blos, B. S. Zanon
Abstract:
In a modern society the factor corresponding to the increase in the level of quality in industrial production demand new techniques of control and machinery automation. In this context, this work presents the implementation of a Paraconsistent-Fuzzy Digital PID controller. The controller is based on the treatment of inconsistencies both in the Paraconsistent Logic and in the Fuzzy Logic. Paraconsistent analysis is performed on the signals applied to the system inputs using concepts from the Paraconsistent Annotated Logic with annotation of two values (PAL2v). The signals resulting from the paraconsistent analysis are two values defined as Dc - Degree of Certainty and Dct - Degree of Contradiction, which receive a treatment according to the Fuzzy Logic theory, and the resulting output of the logic actions is a single value called the crisp value, which is used to control dynamic system. Through an example, it was demonstrated the application of the proposed model. Initially, the Paraconsistent-Fuzzy Digital PID controller was built and tested in an isolated MATLAB environment and then compared to the equivalent Digital PID function of this software for standard step excitation. After this step, a level control plant was modeled to execute the controller function on a physical model, making the tests closer to the actual. For this, the control parameters (proportional, integral and derivative) were determined for the configuration of the conventional Digital PID controller and of the Paraconsistent-Fuzzy Digital PID, and the control meshes in MATLAB were assembled with the respective transfer function of the plant. Finally, the results of the comparison of the level control process between the Paraconsistent-Fuzzy Digital PID controller and the conventional Digital PID controller were presented.
Keywords: Fuzzy logic, paraconsistent annotated logic, level control, digital PID.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 123746 A Psychophysiological Evaluation of an Effective Recognition Technique Using Interactive Dynamic Virtual Environments
Authors: Mohammadhossein Moghimi, Robert Stone, Pia Rotshtein
Abstract:
Recording psychological and physiological correlates of human performance within virtual environments and interpreting their impacts on human engagement, ‘immersion’ and related emotional or ‘effective’ states is both academically and technologically challenging. By exposing participants to an effective, real-time (game-like) virtual environment, designed and evaluated in an earlier study, a psychophysiological database containing the EEG, GSR and Heart Rate of 30 male and female gamers, exposed to 10 games, was constructed. Some 174 features were subsequently identified and extracted from a number of windows, with 28 different timing lengths (e.g. 2, 3, 5, etc. seconds). After reducing the number of features to 30, using a feature selection technique, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) methods were subsequently employed for the classification process. The classifiers categorised the psychophysiological database into four effective clusters (defined based on a 3-dimensional space – valence, arousal and dominance) and eight emotion labels (relaxed, content, happy, excited, angry, afraid, sad, and bored). The KNN and SVM classifiers achieved average cross-validation accuracies of 97.01% (±1.3%) and 92.84% (±3.67%), respectively. However, no significant differences were found in the classification process based on effective clusters or emotion labels.
Keywords: Virtual Reality, effective computing, effective VR, emotion-based effective physiological database.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 99445 Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns
Authors: Christian Arcos, Marley Vellasco, Abraham Alcaim
Abstract:
In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario.Keywords: Binary labels, local binary patterns, mask, wavelet coefficients, speech enhancement, speech recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 101744 Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms
Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias
Abstract:
Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.
Keywords: automatic clustering, bioinformatics tool, gene ontology, protein sequence annotation, semantic similarity search
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 312843 Packet Forwarding with Multiprotocol Label Switching
Authors: R.N.Pise, S.A.Kulkarni, R.V.Pawar
Abstract:
MultiProtocol Label Switching (MPLS) is an emerging technology that aims to address many of the existing issues associated with packet forwarding in today-s Internetworking environment. It provides a method of forwarding packets at a high rate of speed by combining the speed and performance of Layer 2 with the scalability and IP intelligence of Layer 3. In a traditional IP (Internet Protocol) routing network, a router analyzes the destination IP address contained in the packet header. The router independently determines the next hop for the packet using the destination IP address and the interior gateway protocol. This process is repeated at each hop to deliver the packet to its final destination. In contrast, in the MPLS forwarding paradigm routers on the edge of the network (label edge routers) attach labels to packets based on the forwarding Equivalence class (FEC). Packets are then forwarded through the MPLS domain, based on their associated FECs , through swapping the labels by routers in the core of the network called label switch routers. The act of simply swapping the label instead of referencing the IP header of the packet in the routing table at each hop provides a more efficient manner of forwarding packets, which in turn allows the opportunity for traffic to be forwarded at tremendous speeds and to have granular control over the path taken by a packet. This paper deals with the process of MPLS forwarding mechanism, implementation of MPLS datapath , and test results showing the performance comparison of MPLS and IP routing. The discussion will focus primarily on MPLS IP packet networks – by far the most common application of MPLS today.Keywords: Forwarding equivalence class, incoming label map, label, next hop label forwarding entry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 269342 Named Entity Recognition using Support Vector Machine: A Language Independent Approach
Authors: Asif Ekbal, Sivaji Bandyopadhyay
Abstract:
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 340341 CAGE Questionnaire as a Screening Tool for Hazardous Drinking in an Acute Admissions Ward: Frequency of Application and Comparison with AUDIT-C Questionnaire
Authors: Ammar Ayad Issa Al-Rifaie, Zuhreya Muazu, Maysam Ali Abdulwahid, Dermot Gleeson
Abstract:
The aim of this audit was to examine the efficiency of alcohol history documentation and screening for hazardous drinkers at the Medical Admission Unit (MAU) of Northern General Hospital (NGH), Sheffield, to identify any potential for enhancing clinical practice. Data were collected from medical clerking sheets, ICE system and directly from 82 patients by three junior medical doctors using both CAGE questionnaire and AUDIT-C tool for newly admitted patients to MAU in NGH, in the period between January and March 2015. Alcohol consumption was documented in around two-third of the patient sample and this was documented fairly accurately by health care professionals. Some used subjective words such as 'social drinking' in the alcohol units’ section of the history. CAGE questionnaire was applied to only four patients and none of the patients had documented advice, education or referral to an alcohol liaison team. AUDIT-C tool had identified 30.4%, while CAGE 10.9%, of patients admitted to the NGH MAU as hazardous drinkers. The amount of alcohol the patient consumes positively correlated with the score of AUDIT-C (Pearson correlation 0.83). Re-audit is planned to be carried out after integrating AUDIT-C tool as labels in the notes and presenting a brief teaching session to junior doctors. Alcohol misuse screening is not adequately undertaken and no appropriate action is being offered to hazardous drinkers. CAGE questionnaire is poorly applied to patients and when satisfactory and adequately used has low sensitivity to detect hazardous drinkers in comparison with AUDIT-C tool. Re-audit of alcohol screening practice after introducing AUDIT-C tool in clerking sheets (as labels) is required to compare the findings and conclude the audit cycle.Keywords: Alcohol screening, AUDIT-C, CAGE, Hazardous drinking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 191040 Saudi Twitter Corpus for Sentiment Analysis
Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari
Abstract:
Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.Keywords: Arabic, Sentiment Analysis, Twitter, annotation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 404439 Sensitivity Analysis of Real-Time Systems
Authors: Benjamin Gorry, Andrew Ireland, Peter King
Abstract:
Verification of real-time software systems can be expensive in terms of time and resources. Testing is the main method of proving correctness but has been shown to be a long and time consuming process. Everyday engineers are usually unwilling to adopt formal approaches to correctness because of the overhead associated with developing their knowledge of such techniques. Performance modelling techniques allow systems to be evaluated with respect to timing constraints. This paper describes PARTES, a framework which guides the extraction of performance models from programs written in an annotated subset of C.Keywords: Performance Modelling, Real-time, SensitivityAnalysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 151338 Semi-Automatic Approach for Semantic Annotation
Authors: Mohammad Yasrebi, Mehran Mohsenzadeh
Abstract:
The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.
Keywords: Semantic Annotation, Metadata, Information Extraction, Semantic Web, knowledge base.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 186737 Part of Speech Tagging Using Statistical Approach for Nepali Text
Authors: Archit Yajnik
Abstract:
Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.Keywords: Hidden Markov model, Viterbi algorithm, POS tagging, natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 170836 Ontology for Semantic Enrichment of Radio Frequency Identification Systems
Authors: Haitham S. Hamza, Mohamed Maher, Shourok Alaa, Aya Khattab, Hadeal Ismail, Kamilia Hosny
Abstract:
Radio Frequency Identification (RFID) has become a key technology in the emerging concept of Internet of Things (IoT). Naturally, business applications would require the deployment of various RFID systems developed by different vendors that use different data formats and structures. This heterogeneity poses a challenge in developing real-life IoT systems with RFID, as integration is becoming very complex and challenging. Semantic integration is a key approach to deal with this challenge. To do so, ontology for RFID systems need to be developed in order to annotated semantically RFID systems, and hence, facilitate their integration. Accordingly, in this paper, we propose ontology for RFID systems. The proposed ontology can be used to semantically enrich RFID systems, and hence, improve their usage and reasoning.Keywords: IoT, RFID, Semantic, sparql, Ontology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 187335 Using Radio Frequency Identification Technology in Supply Chain Management
Authors: Eleonora Tudora, Adriana Alexandru
Abstract:
The radio frequency identification (RFID) is a technology for automatic identification of items, particularly in supply chain, but it is becoming increasingly important for industrial applications. Unlike barcode technology that detects the optical signals reflected from barcode labels, RFID uses radio waves to transmit the information from an RFID tag affixed to the physical object. In contrast to today most often use of this technology in warehouse inventory and supply chain, the focus of this paper is an overview of the structure of RFID systems used by RFID technology and it also presents a solution based on the application of RFID for brand authentication, traceability and tracking, by implementing a production management system and extending its use to traders.Keywords: RFID, RFID Tag, Electronic Product Code (EPC), EPC network, Object Naming Service (ONS), Authentication, Traceability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 170234 Fuzzy Trust for Peer-to-Peer Based Systems
Authors: Farag Azzedin, Ahmad Ridha, Ali Rizvi
Abstract:
Trust management is one of the drawbacks in Peer-to-Peer (P2P) system. Lack of centralized control makes it difficult to control the behavior of the peers. Reputation system is one approach to provide trust assessment in P2P system. In this paper, we use fuzzy logic to model trust in a P2P environment. Our trust model combines first-hand (direct experience) and second-hand (reputation)information to allow peers to represent and reason with uncertainty regarding other peers' trustworthiness. Fuzzy logic can help in handling the imprecise nature and uncertainty of trust. Linguistic labels are used to enable peers assign a trust level intuitively. Our fuzzy trust model is flexible such that inference rules are used to weight first-hand and second-hand accordingly.
Keywords: P2P Systems; Trust, Reputation, Fuzzy Logic.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2158