Search results for: POS12 Tagging for Urdu
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27

Search results for: POS12 Tagging for Urdu

27 AGHAZ : An Expert System Based approach for the Translation of English to Urdu

Authors: Uzair Muhammad, Kashif Bilal, Atif Khan, M. Nasir Khan

Abstract:

Machine Translation (MT 3) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledge base that contains grammatical patterns of English and Urdu, as well as a tense and gender-aware dictionary of Urdu words (with their English equivalents).

Keywords: Machine Translation, Multiword Expressions, Urdulanguage processing, POS12 Tagging for Urdu, Expert Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2323
26 Segmentation Free Nastalique Urdu OCR

Authors: Sobia T. Javed, Sarmad Hussain, Ameera Maqbool, Samia Asloob, Sehrish Jamil, Huma Moin

Abstract:

The electronically available Urdu data is in image form which is very difficult to process. Printed Urdu data is the root cause of problem. So for the rapid progress of Urdu language we need an OCR systems, which can help us to make Urdu data available for the common person. Research has been carried out for years to automata Arabic and Urdu script. But the biggest hurdle in the development of Urdu OCR is the challenge to recognize Nastalique Script which is taken as standard for writing Urdu language. Nastalique script is written diagonally with no fixed baseline which makes the script somewhat complex. Overlap is present not only in characters but in the ligatures as well. This paper proposes a method which allows successful recognition of Nastalique Script.

Keywords: HMM, Image processing, Optical CharacterRecognition, Urdu OCR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2132
25 Urdu Nastaleeq Optical Character Recognition

Authors: Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, Awais Adnan

Abstract:

This paper discusses the Urdu script characteristics, Urdu Nastaleeq and a simple but a novel and robust technique to recognize the printed Urdu script without a lexicon. Urdu being a family of Arabic script is cursive and complex script in its nature, the main complexity of Urdu compound/connected text is not its connections but the forms/shapes the characters change when it is placed at initial, middle or at the end of a word. The characters recognition technique presented here is using the inherited complexity of Urdu script to solve the problem. A word is scanned and analyzed for the level of its complexity, the point where the level of complexity changes is marked for a character, segmented and feeded to Neural Networks. A prototype of the system has been tested on Urdu text and currently achieves 93.4% accuracy on the average.

Keywords: Cursive Script, OCR, Urdu.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2747
24 Extracting Multiword Expressions in Machine Translation from English to Urdu using Relational Data Approach

Authors: Kashif Bilal, Uzair Muhammad, Atif Khan, M. Nasir Khan

Abstract:

Machine Translation, (hereafter in this document referred to as the "MT") faces a lot of complex problems from its origination. Extracting multiword expressions is also one of the complex problems in MT. Finding multiword expressions during translating a sentence from English into Urdu, through existing solutions, takes a lot of time and occupies system resources. We have designed a simple relational data approach, in which we simply set a bit in dictionary (database) for multiword, to find and handle multiword expression. This approach handles multiword efficiently.

Keywords: Machine Translation, Multiword Expressions, Urdulanguage processing, POS (stands for Parts of Speech) Tagging forUrdu, Expert Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2327
23 FCA-based Conceptual Knowledge Discovery in Folksonomy

Authors: Yu-Kyung Kang, Suk-Hyung Hwang, Kyoung-Mo Yang

Abstract:

The tagging data of (users, tags and resources) constitutes a folksonomy that is the user-driven and bottom-up approach to organizing and classifying information on the Web. Tagging data stored in the folksonomy include a lot of very useful information and knowledge. However, appropriate approach for analyzing tagging data and discovering hidden knowledge from them still remains one of the main problems on the folksonomy mining researches. In this paper, we have proposed a folksonomy data mining approach based on FCA for discovering hidden knowledge easily from folksonomy. Also we have demonstrated how our proposed approach can be applied in the collaborative tagging system through our experiment. Our proposed approach can be applied to some interesting areas such as social network analysis, semantic web mining and so on.

Keywords: Folksonomy data mining, formal concept analysis, collaborative tagging, conceptual knowledge discovery, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999
22 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: Hidden Markov model, Viterbi algorithm, POS tagging, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673
21 OCR For Printed Urdu Script Using Feed Forward Neural Network

Authors: Inam Shamsher, Zaheer Ahmad, Jehanzeb Khan Orakzai, Awais Adnan

Abstract:

This paper deals with an Optical Character Recognition system for printed Urdu, a popular Pakistani/Indian script and is the third largest understandable language in the world, especially in the subcontinent but fewer efforts are made to make it understandable to computers. Lot of work has been done in the field of literature and Islamic studies in Urdu, which has to be computerized. In the proposed system individual characters are recognized using our own proposed method/ algorithms. The feature detection methods are simple and robust. Supervised learning is used to train the feed forward neural network. A prototype of the system has been tested on printed Urdu characters and currently achieves 98.3% character level accuracy on average .Although the system is script/ language independent but we have designed it for Urdu characters only.

Keywords: Algorithm, Feed Forward Neural Networks, Supervised learning, Pattern Matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2998
20 A Framework for Urdu Language Translation using LESSA

Authors: Imran Sarwar Bajwa

Abstract:

Internet is one of the major sources of information for the person belonging to almost all the fields of life. Major language that is used to publish information on internet is language. This thing becomes a problem in a country like Pakistan, where Urdu is the national language. Only 10% of Pakistan mass can understand English. The reason is millions of people are deprived of precious information available on internet. This paper presents a system for translation from English to Urdu. A module LESSA is used that uses a rule based algorithm to read the input text in English language, understand it and translate it into Urdu language. The designed approach was further incorporated to translate the complete website from English language o Urdu language. An option appears in the browser to translate the webpage in a new window. The designed system will help the millions of users of internet to get benefit of the internet and approach the latest information and knowledge posted daily on internet.

Keywords: Natural Language Translation, Text Understanding, Knowledge extraction, Text Processing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2632
19 Multi Language Text Editor for Burushaski and Urdu through Unicode

Authors: Irfan Qadir Baig, Muhammad Sharif, Aman Ullah Khan

Abstract:

This paper introduces an isolated and unique ancient language Burushaski, spoken in Hunza, Nagar, Yasin and parts of Gilgit in the Northern Areas of Pakistan. It explains the working mechanism of Multi Language Text Editor for Urdu and Burushaski. It is developed under the use of ISO/IEC 10646 Unicode standards for Urdu and Burushaski open-type fonts. It gives an ample opportunity to this regional ancient language to have a modern Information technology for its promotion and preservation. The main objective of this research paper is to help preserve the heritage of such rare languages and give smart way of automation. It also facilitates to those who are interested in undertaking research on Burushaski or keen to trace fonatic relationship between the national Urdu language and Burushaski. Since this editor covers both Burushaski and Urdu so it can play an important role to introduce Burusho linguistic culture to the world at large. Precisely, as a result of this research paper, Burushaski publication through IT means would be possible.

Keywords: Burushaski, Bri Naqsh, Unicode, Burusho, Hunza, Meshaski.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2061
18 Touch Interaction through Tagging Context

Authors: Gabriel Chavira, Jorge Orozco, Salvador Nava, Eduardo Álvarez, Julio Rolón, Roberto Pichardo

Abstract:

Ambient Intelligence promotes a shift in computing which involves fitting-out the environments with devices to support context-aware applications. One of main objectives is the reduction to a minimum of the user’s interactive effort, the diversity and quantity of devices with which people are surrounded with, in existing environments; increase the level of difficulty to achieve this goal. The mobile phones and their amazing global penetration, makes it an excellent device for delivering new services to the user, without requiring a learning effort. The environment will have to be able to perceive all of the interaction techniques. In this paper, we present the PICTAC model (Perceiving touch Interaction through TAgging Context), which similarly delivers service to members of a research group.

Keywords: Ambient Intelligence, Tagging Context, Touch Interaction, Touching Services.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1807
17 Self-adaptation of Ontologies to Folksonomies in Semantic Web

Authors: Francisco Echarte, José Javier Astrain, Alberto Córdoba, Jesús Villadangos

Abstract:

Ontologies and tagging systems are two different ways to organize the knowledge present in the current Web. In this paper we propose a simple method to model folksonomies, as tagging systems, with ontologies. We show the scalability of the method using real data sets. The modeling method is composed of a generic ontology that represents any folksonomy and an algorithm to transform the information contained in folksonomies to the generic ontology. The method allows representing folksonomies at any instant of time.

Keywords: Folksonomies, ontologies, OWL, semantic web.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1598
16 Tagging by Combining Rules- Based Method and Memory-Based Learning

Authors: Tlili-Guiassa Yamina

Abstract:

Many natural language expressions are ambiguous, and need to draw on other sources of information to be interpreted. Interpretation of the e word تعاون to be considered as a noun or a verb depends on the presence of contextual cues. To interpret words we need to be able to discriminate between different usages. This paper proposes a hybrid of based- rules and a machine learning method for tagging Arabic words. The particularity of Arabic word that may be composed of stem, plus affixes and clitics, a small number of rules dominate the performance (affixes include inflexional markers for tense, gender and number/ clitics include some prepositions, conjunctions and others). Tagging is closely related to the notion of word class used in syntax. This method is based firstly on rules (that considered the post-position, ending of a word, and patterns), and then the anomaly are corrected by adopting a memory-based learning method (MBL). The memory_based learning is an efficient method to integrate various sources of information, and handling exceptional data in natural language processing tasks. Secondly checking the exceptional cases of rules and more information is made available to the learner for treating those exceptional cases. To evaluate the proposed method a number of experiments has been run, and in order, to improve the importance of the various information in learning.

Keywords: Arabic language, Based-rules, exceptions, Memorybased learning, Tagging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593
15 Concept Indexing using Ontology and Supervised Machine Learning

Authors: Rossitza M. Setchi, Qiao Tang

Abstract:

Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.

Keywords: Concepts, indexing, machine learning, ontology, tagging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1648
14 A NXM Version of 5X5 Playfair Cipher for any Natural Language (Urdu as Special Case)

Authors: Muhammad Salam, Nasir Rashid, Shah Khalid, Muhammad Raees Khan

Abstract:

In this paper a modified version NXM of traditional 5X5 playfair cipher is introduced which enable the user to encrypt message of any Natural language by taking appropriate size of the matrix depending upon the size of the natural language. 5X5 matrix has the capability of storing only 26 characters of English language and unable to store characters of any language having more than 26 characters. To overcome this limitation NXM matrix is introduced which solve this limitation. In this paper a special case of Urdu language is discussed. Where # is used for completing odd pair and * is used for repeating letters.

Keywords: cryptography, decryption, encryption, playfair cipher, traditional cipher.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
13 Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach

Authors: Lin Yang, Zhian Mi, Jiacheng Xiao, Rong Li

Abstract:

Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.

Keywords: Harmonic analysis, machine learning, music classification and tagging, MIDI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 707
12 Highlighting Document's Structure

Authors: Sylvie Ratté, Wilfried Njomgue, Pierre-André Ménard

Abstract:

In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).

Keywords: Information retrieval, document structures, symbolic grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1203
11 eMedI: Web-Based E-Training for Multimodal Breast Imaging

Authors: Ioannis Pratikakis, Anna Karahaliou, Katerina Vassiou, Vassilis Virvilis, Dimitrios Kosmopoulos, Stavros Perantonis

Abstract:

In this paper, a Web-based e-Training platform that is dedicated to multimodal breast imaging is presented. The assets of this platform are summarised in (i) the efficient representation of the curriculum flow that will permit efficient training; (ii) efficient tagging of multimodal content appropriate for the completion of realistic cases and (iii) ubiquitous accessibility and platform independence via a web-based approach.

Keywords: Breast imaging, e-Training, web-based learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658
10 A Novel Arabic Text Steganography Method Using Letter Points and Extensions

Authors: Adnan Abdul-Aziz Gutub, Manal Mohammad Fattani

Abstract:

This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.

Keywords: Arabic text, Cryptography, Feature coding, Information security, Text steganography, Text watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3479
9 Folksonomy-based Recommender Systems with User-s Recent Preferences

Authors: Cheng-Lung Huang, Han-Yu Chien, Michael Conyette

Abstract:

Social bookmarking is an environment in which the user gradually changes interests over time so that the tag data associated with the current temporal period is usually more important than tag data temporally far from the current period. This implies that in the social tagging system, the newly tagged items by the user are more relevant than older items. This study proposes a novel recommender system that considers the users- recent tag preferences. The proposed system includes the following stages: grouping similar users into clusters using an E-M clustering algorithm, finding similar resources based on the user-s bookmarks, and recommending the top-N items to the target user. The study examines the system-s information retrieval performance using a dataset from del.icio.us, which is a famous social bookmarking web site. Experimental results show that the proposed system is better and more effective than traditional approaches.

Keywords: Recommender systems, Social bookmarking, Tag

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1377
8 Data about Loggerhead Sea Turtle (Caretta caretta) and Green Turtle (Chelonia mydas) in Vlora Bay, Albania

Authors: Enerit Saçdanaku, Idriz Haxhiu

Abstract:

This study was conducted in the area of Vlora Bay, Albania. Data about Sea Turtles Caretta caretta and Chelonia mydas, belonging to two periods of time (1984 – 1991; 2008 – 2014) are given. All data gathered were analyzed using recent methodologies. For all turtles captured (as by catch), the Curve Carapace Length (CCL) and Curved Carapace Width (CCW) were measured. These data were statistically analyzed, where the mean was 67.11 cm for CCL and 57.57 cm for CCW of all individuals studied (n=13). All untagged individuals of marine turtles were tagged using metallic tags (Stockbrand’s titanium tag) with an Albanian address. Sex was determined and resulted that 45.4% of individuals were females, 27.3% males and 27.3% juveniles. All turtles were studied for the presence of the epibionts. The area of Vlora Bay is used from marine turtles (Caretta caretta) as a migratory corridor to pass from Mediterranean to the northern part of the Adriatic Sea.

Keywords: Caretta caretta, Chelonia mydas, CCL, CCW, tagging, Vlora Bay.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2101
7 Housing Defect of Newly Completed House: An Analysis Using Condition Survey Protocol (CSP) 1 Matrix

Authors: I. Ismail, A.I. Che-Ani, N.M. Tawil, H. Yahaya, M.Z. Abd-Razak

Abstract:

Housing is a basic human right. The provision of new house shall be free from any defects, even for the defects that people do normally considered as 'cosmetic defects'. This paper studies about the building defects of newly completed house of 72 unit of double-storey terraced located in Bangi, Selangor. The building survey implemented using protocol 1 (visual inspection). As for new house, the survey work is very stringent in determining the defects condition and priority. Survey and reporting procedure is carried out based on CSP1 Matrix that involved scoring system, photographs and plan tagging. The analysis is done using Statistical Package for Social Sciences (SPSS). The finding reveals that there are 2119 defects recorded in 72 terraced houses. The cumulative score obtained was 27644 while the overall rating is 13.05. These results indicate that the construction quality of the newly terraced houses is low and not up to an acceptable standard as the new house should be.

Keywords: terraced houses, building defects, construction, CSP1 Matrix, Malaysia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2381
6 Collaborative and Content-based Recommender System for Social Bookmarking Website

Authors: Cheng-Lung Huang, Cheng-Wei Lin

Abstract:

This study proposes a new recommender system based on the collaborative folksonomy. The purpose of the proposed system is to recommend Internet resources (such as books, articles, documents, pictures, audio and video) to users. The proposed method includes four steps: creating the user profile based on the tags, grouping the similar users into clusters using an agglomerative hierarchical clustering, finding similar resources based on the user-s past collections by using content-based filtering, and recommending similar items to the target user. This study examines the system-s performance for the dataset collected from “del.icio.us," which is a famous social bookmarking website. Experimental results show that the proposed tag-based collaborative and content-based filtering hybridized recommender system is promising and effectiveness in the folksonomy-based bookmarking website.

Keywords: Collaborative recommendation, Folksonomy, Social tagging

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2210
5 Neuro-Fuzzy Based Model for Phrase Level Emotion Understanding

Authors: Vadivel Ayyasamy

Abstract:

The present approach deals with the identification of Emotions and classification of Emotional patterns at Phrase-level with respect to Positive and Negative Orientation. The proposed approach considers emotion triggered terms, its co-occurrence terms and also associated sentences for recognizing emotions. The proposed approach uses Part of Speech Tagging and Emotion Actifiers for classification. Here sentence patterns are broken into phrases and Neuro-Fuzzy model is used to classify which results in 16 patterns of emotional phrases. Suitable intensities are assigned for capturing the degree of emotion contents that exist in semantics of patterns. These emotional phrases are assigned weights which supports in deciding the Positive and Negative Orientation of emotions. The approach uses web documents for experimental purpose and the proposed classification approach performs well and achieves good F-Scores.

Keywords: Emotions, sentences, phrases, classification, patterns, fuzzy, positive orientation, negative orientation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1052
4 Gender and Advertisements: A Content Analysis of Pakistani Prime Time Advertisements

Authors: Aaminah Hassan

Abstract:

Advertisements carry a great potential to influence our lives because they are crafted to meet particular ends. Stereotypical representation in advertisements is capable of forming unconscious attitudes among people towards any gender and their abilities. This study focuses on gender representation in Pakistani prime time advertisements. For this purpose, 13 advertisements were selected from three different categories of foods and beverages, cosmetics, cell phones and cellular networks from the prime time slots of one of the leading Pakistani entertainment channel, ‘Urdu 1’. Both quantitative and qualitative analyses are carried out for range of variables like gender, age, roles, activities, setting, appearance and voice overs. The results revealed that gender representation in advertisements is stereotypical. Moreover, in few instances, the portrayal of women is not only culturally inappropriate but is demeaning to the image of women as well. Their bodily charm is used to promote products. Comparing different entertainment channels for their prime time advertisements and broadening the scope of this research will yield greater implications for the researchers who want to carry out the similar research. It is hoped that the current study would help in the promotion of media literacy among the viewers and media authorities in Pakistan.

Keywords: Advertisements, content analysis, gender, prime time.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1068
3 Opinion Mining Framework in the Education Domain

Authors: A. M. H. Elyasir, K. S. M. Anbananthen

Abstract:

The internet is growing larger and becoming the most popular platform for the people to share their opinion in different interests. We choose the education domain specifically comparing some Malaysian universities against each other. This comparison produces benchmark based on different criteria shared by the online users in various online resources including Twitter, Facebook and web pages. The comparison is accomplished using opinion mining framework to extract, process the unstructured text and classify the result to positive, negative or neutral (polarity). Hence, we divide our framework to three main stages; opinion collection (extraction), unstructured text processing and polarity classification. The extraction stage includes web crawling, HTML parsing, Sentence segmentation for punctuation classification, Part of Speech (POS) tagging, the second stage processes the unstructured text with stemming and stop words removal and finally prepare the raw text for classification using Named Entity Recognition (NER). Last phase is to classify the polarity and present overall result for the comparison among the Malaysian universities. The final result is useful for those who are interested to study in Malaysia, in which our final output declares clear winners based on the public opinions all over the web.

Keywords: Entity Recognition, Education Domain, Opinion Mining, Unstructured Text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2937
2 Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

Authors: Mona Soliman Habib

Abstract:

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.

Keywords: Named entity recognition, support vector machines, language independence, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1664
1 Detecting Tomato Flowers in Greenhouses Using Computer Vision

Authors: Dor Oppenheim, Yael Edan, Guy Shani

Abstract:

This paper presents an image analysis algorithm to detect and count yellow tomato flowers in a greenhouse with uneven illumination conditions, complex growth conditions and different flower sizes. The algorithm is designed to be employed on a drone that flies in greenhouses to accomplish several tasks such as pollination and yield estimation. Detecting the flowers can provide useful information for the farmer, such as the number of flowers in a row, and the number of flowers that were pollinated since the last visit to the row. The developed algorithm is designed to handle the real world difficulties in a greenhouse which include varying lighting conditions, shadowing, and occlusion, while considering the computational limitations of the simple processor in the drone. The algorithm identifies flowers using an adaptive global threshold, segmentation over the HSV color space, and morphological cues. The adaptive threshold divides the images into darker and lighter images. Then, segmentation on the hue, saturation and volume is performed accordingly, and classification is done according to size and location of the flowers. 1069 images of greenhouse tomato flowers were acquired in a commercial greenhouse in Israel, using two different RGB Cameras – an LG G4 smartphone and a Canon PowerShot A590. The images were acquired from multiple angles and distances and were sampled manually at various periods along the day to obtain varying lighting conditions. Ground truth was created by manually tagging approximately 25,000 individual flowers in the images. Sensitivity analyses on the acquisition angle of the images, periods throughout the day, different cameras and thresholding types were performed. Precision, recall and their derived F1 score were calculated. Results indicate better performance for the view angle facing the flowers than any other angle. Acquiring images in the afternoon resulted with the best precision and recall results. Applying a global adaptive threshold improved the median F1 score by 3%. Results showed no difference between the two cameras used. Using hue values of 0.12-0.18 in the segmentation process provided the best results in precision and recall, and the best F1 score. The precision and recall average for all the images when using these values was 74% and 75% respectively with an F1 score of 0.73. Further analysis showed a 5% increase in precision and recall when analyzing images acquired in the afternoon and from the front viewpoint.

Keywords: Agricultural engineering, computer vision, image processing, flower detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2329