Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 255

Search results for: annotated labels

165 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 292

164 A Mutually Exclusive Task Generation Method Based on Data Augmentation

Authors: Haojie Wang, Xun Li, Rui Yin

Abstract:

In order to solve the memorization overfitting in the model-agnostic meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to an exponential growth of computation, this paper also proposes a key data extraction method that only extract part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.

Keywords: mutex task generation, data augmentation, meta-learning, text classification.

Procedia PDF Downloads 97

163 A Meta Regression Analysis to Detect Price Premium Threshold for Eco-Labeled Seafood

Authors: Cristina Giosuè, Federica Biondo, Sergio Vitale

Abstract:

In the last years, the consumers' awareness for environmental concerns has been increasing, and seafood eco-labels are considered as a possible instrument to improve both seafood markets and sustainable fishing management. In this direction, the aim of this study was to carry out a meta-analysis on consumers’ willingness to pay (WTP) for eco-labeled wild seafood, by a meta-regression. Therefore, only papers published on ISI journals were searched on “Web of Knowledge” and “SciVerse Scopus” platforms, using the combinations of the following key words: seafood, ecolabel, eco-label, willingness, WTP and premium. The dataset was built considering: paper’s and survey’s codes, year of publication, first author’s nationality, species’ taxa and family, sample size, survey’s continent and country, data collection (where and how), gender and age of consumers, brand and ΔWTP. From analysis the interest on eco labeled seafood emerged clearly, in particular in developed countries. In general, consumers declared greater willingness to pay than that actually applied for eco-label products, with difference related to taxa and brand.

Keywords: eco label, meta regression, seafood, willingness to pay

Procedia PDF Downloads 94

162 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 114

161 Characterization of an Isopropanol-Butanol Clostridium

Authors: Chen Zhang, Fengxue Xin, Jianzhong He

Abstract:

A unique Clostridium beijerinckii species strain BGS1 was obtained from grass land samples, which is capable of producing 8.43g/L butanol and 3.21 isopropanol from 60g/L glucose while generating 4.68g/L volatile fatty acids (VFAs) from 30g/L xylan. The concentration of isopropanol produced by culture BGS1 is ~15% higher than previously reported wild-type Clostridium beijerinckii under similar conditions. Compared to traditional Acetone-Butanol-Ethanol (ABE) fermentation species, culture BGS1 only generates negligible amount of ethanol and acetone, but produces butanol and isopropanol as biosolvent end-products which are pure alcohols and more economical than ABE. More importantly, culture BGS1 can consume acetone to produce isopropanol, e.g., 1.84g/L isopropanol from 0.81g/L acetone in 60g/L glucose medium containing 6.15g/L acetone. The analysis of BGS1 draft genome annotated by RAST server demonstrates that no ethanol production is caused by the lack of pyruvate decarboxylase gene – related to ethanol production. In addition, an alcohol dehydrogenase (adhe gene) was found in BGS1 which could be a potential gene responsible for isopropanol-generation. This is the first report on Isopropanol-Butanol (IB) fermentation by wild-type Clostridium strain and its application for isopropanol and butanol production.

Keywords: acetone conversion, butanol, clostridium, isopropanol

Procedia PDF Downloads 261

160 Highly-Sensitive Nanopore-Based Sensors for Point-Of-Care Medical Diagnostics

Authors: Leyla Esfandiari

Abstract:

Rapid, sensitive detection of nucleic acid (NA) molecules of specific sequence is of interest for a range of diverse health-related applications such as screening for genetic diseases, detecting pathogenic microbes in food and water, and identifying biological warfare agents in homeland security. Sequence-specific nucleic acid detection platforms rely on base pairing interaction between two complementary single stranded NAs, which can be detected by the optical, mechanical, or electrochemical readout. However, many of the existing platforms require amplification by polymerase chain reaction (PCR), fluorescent or enzymatic labels, and expensive or bulky instrumentation. In an effort to address these shortcomings, our research is focused on utilizing the cutting edge nanotechnology and microfluidics along with resistive pulse electrical measurements to design and develop a cost-effective, handheld and highly-sensitive nanopore-based sensor for point-of-care medical diagnostics.

Keywords: diagnostics, nanopore, nucleic acids, sensor

Procedia PDF Downloads 436

159 A Machine Learning Based Method to Detect System Failure in Resource Constrained Environment

Authors: Payel Datta, Abhishek Das, Abhishek Roychoudhury, Dhiman Chattopadhyay, Tanushyam Chattopadhyay

Abstract:

Machine learning (ML) and deep learning (DL) is most predominantly used in image/video processing, natural language processing (NLP), audio and speech recognition but not that much used in system performance evaluation. In this paper, authors are going to describe the architecture of an abstraction layer constructed using ML/DL to detect the system failure. This proposed system is used to detect the system failure by evaluating the performance metrics of an IoT service deployment under constrained infrastructure environment. This system has been tested on the manually annotated data set containing different metrics of the system, like number of threads, throughput, average response time, CPU usage, memory usage, network input/output captured in different hardware environments like edge (atom based gateway) and cloud (AWS EC2). The main challenge of developing such system is that the accuracy of classification should be 100% as the error in the system has an impact on the degradation of the service performance and thus consequently affect the reliability and high availability which is mandatory for an IoT system. Proposed ML/DL classifiers work with 100% accuracy for the data set of nearly 4,000 samples captured within the organization.

Keywords: machine learning, system performance, performance metrics, IoT, edge

Procedia PDF Downloads 167

158 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 561

157 SNR Classification Using Multiple CNNs

Authors: Thinh Ngo, Paul Rad, Brian Kelley

Abstract:

Noise estimation is essential in today wireless systems for power control, adaptive modulation, interference suppression and quality of service. Deep learning (DL) has already been applied in the physical layer for modulation and signal classifications. Unacceptably low accuracy of less than 50% is found to undermine traditional application of DL classification for SNR prediction. In this paper, we use divide-and-conquer algorithm and classifier fusion method to simplify SNR classification and therefore enhances DL learning and prediction. Specifically, multiple CNNs are used for classification rather than a single CNN. Each CNN performs a binary classification of a single SNR with two labels: less than, greater than or equal. Together, multiple CNNs are combined to effectively classify over a range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained CNNs to predict SNR over a wide range of joint channel parameters including multiple Doppler shifts (0, 60, 120 Hz), power-delay profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The approach achieves individual SNR prediction accuracy of 92%, composite accuracy of 70% and prediction convergence one order of magnitude faster than that of traditional estimation.

Keywords: classification, CNN, deep learning, prediction, SNR

Procedia PDF Downloads 100

156 Short Text Classification for Saudi Tweets

Authors: Asma A. Alsufyani, Maram A. Alharthi, Maha J. Althobaiti, Manal S. Alharthi, Huda Rizq

Abstract:

Twitter is one of the most popular microblogging sites that allows users to publish short text messages called 'tweets'. Increasing the number of accounts to follow (followings) increases the number of tweets that will be displayed from different topics in an unclassified manner in the timeline of the user. Therefore, it can be a vital solution for many Twitter users to have their tweets in a timeline classified into general categories to save the user’s time and to provide easy and quick access to tweets based on topics. In this paper, we developed a classifier for timeline tweets trained on a dataset consisting of 3600 tweets in total, which were collected from Saudi Twitter and annotated manually. We experimented with the well-known Bag-of-Words approach to text classification, and we used support vector machines (SVM) in the training process. The trained classifier performed well on a test dataset, with an average F1-measure equal to 92.3%. The classifier has been integrated into an application, which practically proved the classifier’s ability to classify timeline tweets of the user.

Keywords: corpus creation, feature extraction, machine learning, short text classification, social media, support vector machine, Twitter

Procedia PDF Downloads 124

155 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 131

154 Pre-Analysis of Printed Circuit Boards Based on Multispectral Imaging for Vision Based Recognition of Electronics Waste

Authors: Florian Kleber, Martin Kampel

Abstract:

The increasing demand of gallium, indium and rare-earth elements for the production of electronics, e.g. solid state-lighting, photovoltaics, integrated circuits, and liquid crystal displays, will exceed the world-wide supply according to current forecasts. Recycling systems to reclaim these materials are not yet in place, which challenges the sustainability of these technologies. This paper proposes a multispectral imaging system as a basis for a vision based recognition system for valuable components of electronics waste. Multispectral images intend to enhance the contrast of images of printed circuit boards (single components, as well as labels) for further analysis, such as optical character recognition and entire printed circuit board recognition. The results show that a higher contrast is achieved in the near infrared compared to ultraviolet and visible light.

Keywords: electronics waste, multispectral imaging, printed circuit boards, rare-earth elements

Procedia PDF Downloads 390

153 The Effect of Fast Food Globalisation on Students’ Food Choice

Authors: Ijeoma Chinyere Ukonu

Abstract:

This research seeks to investigate how the globalisation of fast food has affected students’ food choice. A mixed method approach was used in this research; basically involving quantitative and qualitative methods. The quantitative method uses a self-completion questionnaire to randomly sample one hundred and four students; while the qualitative method uses a semi structured interview technique to survey four students on their knowledge and choice to consume fast food. A cross tabulation of variables and the Kruskal Wallis nonparametric test were used to analyse the quantitative data; while the qualitative data was analysed through deduction of themes, and trends from the interview transcribe. The findings revealed that globalisation has amplified the evolution of fast food, popularising it among students. Its global presence has affected students’ food choice and preference. Price, convenience, taste, and peer influence are some of the major factors affecting students’ choice of fast food. Though, students are familiar with the health effect of fast food and the significance of using food information labels for healthy choice making, their preference of fast food is more than homemade food.

Keywords: fast food, food choice, globalisation, students

Procedia PDF Downloads 266

152 Combined Proteomic and Metabolomic Analysis Approaches to Investigate the Modification in the Proteome and Metabolome of in vitro Models Treated with Gold Nanoparticles (AuNPs)

Authors: H. Chassaigne, S. Gioria, J. Lobo Vicente, D. Carpi, P. Barboro, G. Tomasi, A. Kinsner-Ovaskainen, F. Rossi

Abstract:

Emerging approaches in the area of exposure to nanomaterials and assessment of human health effects combine the use of in vitro systems and analytical techniques to study the perturbation of the proteome and/or the metabolome. We investigated the modification in the cytoplasmic compartment of the Balb/3T3 cell line exposed to gold nanoparticles. On one hand, the proteomic approach is quite standardized even if it requires precautions when dealing with in vitro systems. On the other hand, metabolomic analysis is challenging due to the chemical diversity of cellular metabolites that complicate data elaboration and interpretation. Differentially expressed proteins were found to cover a range of functions including stress response, cell metabolism, cell growth and cytoskeleton organization. In addition, de-regulated metabolites were annotated using the HMDB database. The "omics" fields hold huge promises in the interaction of nanoparticles with biological systems. The combination of proteomics and metabolomics data is possible however challenging.

Keywords: data processing, gold nanoparticles, in vitro systems, metabolomics, proteomics

Procedia PDF Downloads 477

151 Accuracy Improvement of Traffic Participant Classification Using Millimeter-Wave Radar by Leveraging Simulator Based on Domain Adaptation

Authors: Tokihiko Akita, Seiichi Mita

Abstract:

A millimeter-wave radar is the most robust against adverse environments, making it an essential environment recognition sensor for automated driving. However, the reflection signal is sparse and unstable, so it is difficult to obtain the high recognition accuracy. Deep learning provides high accuracy even for them in recognition, but requires large scale datasets with ground truth. Specially, it takes a lot of cost to annotate for a millimeter-wave radar. For the solution, utilizing a simulator that can generate an annotated huge dataset is effective. Simulation of the radar is more difficult to match with real world data than camera image, and recognition by deep learning with higher-order features using the simulator causes further deviation. We have challenged to improve the accuracy of traffic participant classification by fusing simulator and real-world data with domain adaptation technique. Experimental results with the domain adaptation network created by us show that classification accuracy can be improved even with a few real-world data.

Keywords: millimeter-wave radar, object classification, deep learning, simulation, domain adaptation

Procedia PDF Downloads 61

150 Network Based Molecular Profiling of Intracranial Ependymoma over Spinal Ependymoma

Authors: Hyeon Su Kim, Sungjin Park, Hae Ryung Chang, Hae Rim Jung, Young Zoo Ahn, Yon Hui Kim, Seungyoon Nam

Abstract:

Ependymoma, one of the most common parenchymal spinal cord tumor, represents 3-6% of all CNS tumor. Especially intracranial ependymomas, which are more frequent in childhood, have a more poor prognosis and more malignant than spinal ependymomas. Although there are growing needs to understand pathogenesis, detailed molecular understanding of pathogenesis remains to be explored. A cancer cell is composed of complex signaling pathway networks, and identifying interaction between genes and/or proteins are crucial for understanding these pathways. Therefore, we explored each ependymoma in terms of differential expressed genes and signaling networks. We used Microsoft Excel™ to manipulate microarray data gathered from NCBI’s GEO Database. To analyze and visualize signaling network, we used web-based PATHOME algorithm and Cytoscape. We show HOX family and NEFL are down-regulated but SCL family is up-regulated in cerebrum and posterior fossa cancers over a spinal cancer, and JAK/STAT signaling pathway and Chemokine signaling pathway are significantly different in the both intracranial ependymoma comparing to spinal ependymoma. We are considering there may be an age-dependent mechanism under different histological pathogenesis. We annotated mutation data of each gene subsequently in order to find potential target genes.

Keywords: systems biology, ependymoma, deg, network analysis

Procedia PDF Downloads 272

149 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 73

148 The Vocality of Sibyl Sanderson in Massenet’s Manon and Esclarmonde: Musical Training and Critical Response

Authors: Tamara Thompson

Abstract:

This presentation will address the vocality of American soprano Sibyl Sanderson (1865–1903) in Massenet’s Manon and Esclarmonde as discernible from documentary sources such as vocal treatises, annotated scores, and correspondence. These sources will then be compared and contrasted with Sanderson’s reception in French press. Sanderson sang Manon in 1888, which Massenet revised for her. She then created the role of Esclarmonde for the 1889 l'Exposition Universelle in Paris. The soprano appeared as the Byzantine Empress more than 100 times in the nine months following the premiere, which secured her fame and an international operatic career frought with controversy and criticism as well as adulation. Before her débuts as Manon and Esclarmonde, Sanderson received musical training in California and Paris from multiple teachers with varied and opposing methods. There will be an exploration of the ways in which the disparate pedagogic influences such as those taught by Giovanni Sbriglia and Jean de Reszké may have guided Sanderson’s vocal strategies, and possibly caused or promoted the severe vocal pathologies she battled in subsequent years. In addition, there is interrogation of the vocal writing and revisions made to the titular roles for Sanderson in order to assess how these factors may have affected her technique and vocal health.

Keywords: French, nineteenth-century, opera, pedagogy, vocality

Procedia PDF Downloads 255

147 Local Boundary Analysis for Generative Theory of Tonal Music: From the Aspect of Classic Music Melody Analysis

Authors: Po-Chun Wang, Yan-Ru Lai, Sophia I. C. Lin, Alvin W. Y. Su

Abstract:

The Generative Theory of Tonal Music (GTTM) provides systematic approaches to recognizing local boundaries of music. The rules have been implemented in some automated melody segmentation algorithms. Besides, there are also deep learning methods with GTTM features applied to boundary detection tasks. However, these studies might face constraints such as a lack of or inconsistent label data. The GTTM database is currently the most widely used GTTM database, which includes manually labeled GTTM rules and local boundaries. Even so, we found some problems with these labels. They are sometimes discrepancies with GTTM rules. In addition, since it is labeled at different times by multiple musicians, they are not within the same scope in some cases. Therefore, in this paper, we examine this database with musicians from the aspect of classical music and relabel the scores. The relabeled database - GTTM Database v2.0 - will be released for academic research usage. Despite the experimental and statistical results showing that the relabeled database is more consistent, the improvement in boundary detection is not substantial. It seems that we need more clues than GTTM rules for boundary detection in the future.

Keywords: dataset, GTTM, local boundary, neural network

Procedia PDF Downloads 104

146 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 57

145 SAMRA: Dataset in Al-Soudani Arabic Maghrebi Script for Recognition of Arabic Ancient Words Handwritten

Authors: Sidi Ahmed Maouloud, Cheikh Ba

Abstract:

Much of West Africa’s cultural heritage is written in the Al-Soudani Arabic script, which was widely used in West Africa before the time of European colonization. This Al-Soudani Arabic script is an African version of the Maghrebi script, in particular, the Al-Mebssout script. However, the local African qualities were incorporated into the Al-Soudani script in a way that gave it a unique African diversity and character. Despite the existence of several Arabic datasets in Oriental script, allowing for the analysis, layout, and recognition of texts written in these calligraphies, many Arabic scripts and written traditions remain understudied. In this paper, we present a dataset of words from Al-Soudani calligraphy scripts. This dataset consists of 100 images selected from three different manuscripts written in Al-Soudani Arabic script by different copyists. The primary source for this database was the libraries of Boston University and Cambridge University. This dataset highlights the unique characteristics of the Al-Soudani Arabic script as well as the new challenges it presents in terms of automatic word recognition of Arabic manuscripts. An HTR system based on a hybrid ANN (CRNN-CTC) is also proposed to test this dataset. SAMRA is a dataset of annotated Arabic manuscript words in the Al-Soudani script that can help researchers automatically recognize and analyze manuscript words written in this script.

Keywords: dataset, CRNN-CTC, handwritten words recognition, Al-Soudani Arabic script, HTR, manuscripts

Procedia PDF Downloads 74

144 Image Classification with Localization Using Convolutional Neural Networks

Authors: Bhuyain Mobarok Hossain

Abstract:

Image classification and localization research is currently an important strategy in the field of computer vision. The evolution and advancement of deep learning and convolutional neural networks (CNN) have greatly improved the capabilities of object detection and image-based classification. Target detection is important to research in the field of computer vision, especially in video surveillance systems. To solve this problem, we will be applying a convolutional neural network of multiple scales at multiple locations in the image in one sliding window. Most translation networks move away from the bounding box around the area of interest. In contrast to this architecture, we consider the problem to be a classification problem where each pixel of the image is a separate section. Image classification is the method of predicting an individual category or specifying by a shoal of data points. Image classification is a part of the classification problem, including any labels throughout the image. The image can be classified as a day or night shot. Or, likewise, images of cars and motorbikes will be automatically placed in their collection. The deep learning of image classification generally includes convolutional layers; the invention of it is referred to as a convolutional neural network (CNN).

Keywords: image classification, object detection, localization, particle filter

Procedia PDF Downloads 266

143 Association of Non Synonymous SNP in DC-SIGN Receptor Gene with Tuberculosis (Tb)

Authors: Saima Suleman, Kalsoom Sughra, Naeem Mahmood Ashraf

Abstract:

Mycobacterium tuberculosis is a communicable chronic illness. This disease is being highly focused by researchers as it is present approximately in one third of world population either in active or latent form. The genetic makeup of a person plays an important part in producing immunity against disease. And one important factor association is single nucleotide polymorphism of relevant gene. In this study, we have studied association between single nucleotide polymorphism of CD-209 gene (encode DC-SIGN receptor) and patients of tuberculosis. Dry lab (in silico) and wet lab (RFLP) analysis have been carried out. GWAS catalogue and GEO database have been searched to find out previous association data. No association study has been found related to CD-209 nsSNPs but role of CD-209 in pulmonary tuberculosis have been addressed in GEO database.Therefore, CD-209 has been selected for this study. Different databases like ENSEMBLE and 1000 Genome Project has been used to retrieve SNP data in form of VCF file which is further submitted to different software to sort SNPs into benign and deleterious. Selected SNPs are further annotated by using 3-D modeling techniques using I-TASSER online software. Furthermore, selected nsSNPs were checked in Gujrat and Faisalabad population through RFLP analysis. In this study population two SNPs are found to be associated with tuberculosis while one nsSNP is not found to be associated with the disease.

Keywords: association, CD209, DC-SIGN, tuberculosis

Procedia PDF Downloads 282

142 Students Reading and Viewing the American Novel in a University EFL/ESL Context: A Picture of Real Life

Authors: Nola Nahla Bacha

Abstract:

Research has indicated that ESL/EFL (nonnative students of English) students have difficulty in reading at the university as often times the requirements are long texts in which both cultural and linguistic factors impede their understanding and thus their motivation. This is especially the case in literature courses. It is the author’s view that if readings are selected according to the students’ interests and linguistic level, related to life situations and coupled with film study they will not only be motivated to read, but they will find reading interesting and exciting. They will view novels, and thus literature, as a picture of life. Students will also widen their vocabulary repertoire and overcome many of their linguistic problems. This study describes the procedure used in in a 20th Century American Novel class at one English medium university in Lebanon and explores students’ views on the novels assigned and their recommendations. Findings indicate that students significantly like to read novels, contrary to what some faculty claim and view the inclusion of novels as helping them with expanding their vocabulary repertoire and learning about real life which helps them linguistically, pedagogically, and above all personally during their life in and out of the university. Annotated texts, pictures and film will be used through technological aids to show how the class was conducted and how the students’ interacted with the novels assigned. Implications for teaching reading in the classroom are made.

Keywords: language, literature, novels, reading, university teaching

Procedia PDF Downloads 358

141 Autogenous Diabetic Retinopathy Censor for Ophthalmologists - AKSHI

Authors: Asiri Wijesinghe, N. D. Kodikara, Damitha Sandaruwan

Abstract:

The Diabetic Retinopathy (DR) is a rapidly growing interrogation around the world which can be annotated by abortive metabolism of glucose that causes long-term infection in human retina. This is one of the preliminary reason of visual impairment and blindness of adults. Information on retinal pathological mutation can be recognized using ocular fundus images. In this research, we are mainly focused on resurrecting an automated diagnosis system to detect DR anomalies such as severity level classification of DR patient (Non-proliferative Diabetic Retinopathy approach) and vessel tortuosity measurement of untwisted vessels to assessment of vessel anomalies (Proliferative Diabetic Retinopathy approach). Severity classification method is obtained better results according to the precision, recall, F-measure and accuracy (exceeds 94%) in all formats of cross validation. In ROC (Receiver Operating Characteristic) curves also visualized the higher AUC (Area Under Curve) percentage (exceeds 95%). User level evaluation of severity capturing is obtained higher accuracy (85%) result and fairly better values for each evaluation measurements. Untwisted vessel detection for tortuosity measurement also carried out the good results with respect to the sensitivity (85%), specificity (89%) and accuracy (87%).

Keywords: fundus image, exudates, microaneurisms, hemorrhages, tortuosity, diabetic retinopathy, optic disc, fovea

Procedia PDF Downloads 301

140 Deep Learning Approach to Trademark Design Code Identification

Authors: Girish J. Showkatramani, Arthi M. Krishna, Sashi Nareddi, Naresh Nula, Aaron Pepe, Glen Brown, Greg Gabel, Chris Doninger

Abstract:

Trademark examination and approval is a complex process that involves analysis and review of the design components of the marks such as the visual representation as well as the textual data associated with marks such as marks' description. Currently, the process of identifying marks with similar visual representation is done manually in United States Patent and Trademark Office (USPTO) and takes a considerable amount of time. Moreover, the accuracy of these searches depends heavily on the experts determining the trademark design codes used to catalog the visual design codes in the mark. In this study, we explore several methods to automate trademark design code classification. Based on recent successes of convolutional neural networks in image classification, we have used several different convolutional neural networks such as Google’s Inception v3, Inception-ResNet-v2, and Xception net. The study also looks into other techniques to augment the results from CNNs such as using Open Source Computer Vision Library (OpenCV) to pre-process the images. This paper reports the results of the various models trained on year of annotated trademark images.

Keywords: trademark design code, convolutional neural networks, trademark image classification, trademark image search, Inception-ResNet-v2

Procedia PDF Downloads 198

139 Hyperspectral Data Classification Algorithm Based on the Deep Belief and Self-Organizing Neural Network

Authors: Li Qingjian, Li Ke, He Chun, Huang Yong

Abstract:

In this paper, the method of combining the Pohl Seidman's deep belief network with the self-organizing neural network is proposed to classify the target. This method is mainly aimed at the high nonlinearity of the hyperspectral image, the high sample dimension and the difficulty in designing the classifier. The main feature of original data is extracted by deep belief network. In the process of extracting features, adding known labels samples to fine tune the network, enriching the main characteristics. Then, the extracted feature vectors are classified into the self-organizing neural network. This method can effectively reduce the dimensions of data in the spectrum dimension in the preservation of large amounts of raw data information, to solve the traditional clustering and the long training time when labeled samples less deep learning algorithm for training problems, improve the classification accuracy and robustness. Through the data simulation, the results show that the proposed network structure can get a higher classification precision in the case of a small number of known label samples.

Keywords: DBN, SOM, pattern classification, hyperspectral, data compression

Procedia PDF Downloads 311

138 Fuzzy-Machine Learning Models for the Prediction of Fire Outbreak: A Comparative Analysis

Authors: Uduak Umoh, Imo Eyoh, Emmauel Nyoho

Abstract:

This paper compares fuzzy-machine learning algorithms such as Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) for the predicting cases of fire outbreak. The paper uses the fire outbreak dataset with three features (Temperature, Smoke, and Flame). The data is pre-processed using Interval Type-2 Fuzzy Logic (IT2FL) algorithm. Min-Max Normalization and Principal Component Analysis (PCA) are used to predict feature labels in the dataset, normalize the dataset, and select relevant features respectively. The output of the pre-processing is a dataset with two principal components (PC1 and PC2). The pre-processed dataset is then used in the training of the aforementioned machine learning models. K-fold (with K=10) cross-validation method is used to evaluate the performance of the models using the matrices – ROC (Receiver Operating Curve), Specificity, and Sensitivity. The model is also tested with 20% of the dataset. The validation result shows KNN is the better model for fire outbreak detection with an ROC value of 0.99878, followed by SVM with an ROC value of 0.99753.

Keywords: Machine Learning Algorithms , Interval Type-2 Fuzzy Logic, Fire Outbreak, Support Vector Machine, K-Nearest Neighbour, Principal Component Analysis

Procedia PDF Downloads 136

137 Number Variation of the Personal Pronoun we Used by Chinese English Learners

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. However, language textbooks cannot keep up with these emergent usages. Most Chinese English learners nowadays are still exposed to traditional grammar prescribed in the textbook so that some variational usages cannot be acquired. The personal pronoun we is prescribed as a plural pronoun in the textbook grammar, but its number value is more flexible in actual use. Based on the Chinese Learner English Corpus (CLEC), and with the homemade Friends corpus as reference, the present research explores the number value of the first person pronoun we used by Chinese English learners. With consideration of the subjectivity of we, this paper annotated the number value of all the wes in “we+ PCU (Perception-cognation-utterance) verbs” collocations. Results show that though exposed to traditional textbooks which prescribe the plural reference of we, there still exists some unconventional usage (singular or vague in reference) in the writings of Chinese English learners, which is less frequent than that of the native speeches. Corpus data and results from manual semantic annotation show that this could be due to the impact of formulaic sequence on the learners and the positive transfer from their native language. An improved SLA model of native language, target language and interlanguage is put forward to recognize the existence of variation in second language acquisition, which should be given more attention during teaching.

Keywords: Chinese English learners, number, PCU verbs, Personal pronoun we

Procedia PDF Downloads 328

136 Building a Dynamic News Category Network for News Sources Recommendations

Authors: Swati Gupta, Shagun Sodhani, Dhaval Patel, Biplab Banerjee

Abstract:

It is generic that news sources publish news in different broad categories. These categories can either be generic such as Business, Sports, etc. or time-specific such as World Cup 2015 and Nepal Earthquake or both. It is up to the news agencies to build the categories. Extracting news categories automatically from numerous online news sources is expected to be helpful in many applications including news source recommendations and time specific news category extraction. To address this issue, existing systems like DMOZ directory and Yahoo directory are mostly considered though they are mostly human annotated and do not consider the time dynamism of categories of news websites. As a remedy, we propose an approach to automatically extract news category URLs from news websites in this paper. News category URL is a link which points to a category in news websites. We use the news category URL as a prior knowledge to develop a news source recommendation system which contains news sources listed in various categories in order of ranking. In addition, we also propose an approach to rank numerous news sources in different categories using various parameters like Traffic Based Website Importance, Social media Analysis and Category Wise Article Freshness. Experimental results on category URLs captured from GDELT project during April 2016 to December 2016 show the adequacy of the proposed method.

Keywords: news category, category network, news sources, ranking

Procedia PDF Downloads 358