Search results for: text structuring
1370 Text Localization in Fixed-Layout Documents Using Convolutional Networks in a Coarse-to-Fine Manner
Authors: Beier Zhu, Rui Zhang, Qi Song
Abstract:
Text contained within fixed-layout documents can be of great semantic value and so requires a high localization accuracy, such as ID cards, invoices, cheques, and passports. Recently, algorithms based on deep convolutional networks achieve high performance on text detection tasks. However, for text localization in fixed-layout documents, such algorithms detect word bounding boxes individually, which ignores the layout information. This paper presents a novel architecture built on convolutional neural networks (CNNs). A global text localization network and a regional bounding-box regression network are introduced to tackle the problem in a coarse-to-fine manner. The text localization network simultaneously locates word bounding points, which takes the layout information into account. The bounding-box regression network inputs the features pooled from arbitrarily sized RoIs and refine the localizations. These two networks share their convolutional features and are trained jointly. A typical type of fixed-layout documents: ID cards, is selected to evaluate the effectiveness of the proposed system. These networks are trained on data cropped from nature scene images, and synthetic data produced by a synthetic text generation engine. Experiments show that our approach locates high accuracy word bounding boxes and achieves state-of-the-art performance.Keywords: bounding box regression, convolutional networks, fixed-layout documents, text localization
Procedia PDF Downloads 1541369 Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models (HMMs)
Authors: Rabi Mouhcine, Amrouch Mustapha, Mahani Zouhir, Mammass Driss
Abstract:
In this paper, we present a system for offline recognition cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The system is analytical without explicit segmentation used embedded training to perform and enhance the character models. Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models and trained by embedded training. The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.Keywords: recognition, handwriting, Arabic text, HMMs, embedded training
Procedia PDF Downloads 3161368 Poetics of the Connecting ha’: A Textual Study in the Poetry of Al-Husari Al-Qayrawani
Authors: Mahmoud al-Ashiriy
Abstract:
This paper begins from the idea that the real history of literature is the history of its style. And since the rhyme –as known- is not merely the last letter, that have received a lot of analysis and investigation, but it is a collection of other values in addition to its different markings. This paper will explore the work of the connecting ha’ and its effectiveness in shaping the text of poetry, since it establishes vocal rhythms in addition to its role in indicating references through the pronoun, vertically through the poem through the sequence of its verses, also horizontally through what environs the one verse of sentences. If the scientific formation of prosody stopped at the possibilities and prohibitions; literary criticism and poetry studies should explore what is above the rule of aesthetic horizon of poetic effectiveness that varies from a text to another, a poet to another, a literary period to another, or from a poetic taste to another. Then the paper will explore this poetic essence in the texts of the famous Andalusian Poet Al-Husari Al-Qayrawani through his well-known Daliyya (a poem that its verses end with the letter D), and the role of the connecting ha’ in fulfilling its text and the accomplishment of its poetics, departing from this to the diwan (the big collection of poems) also as a higher text that surpasses the text/poem, and through what it represents of effectiveness the work of the phenomenon in accomplishing the poetics of the poem of Al-Husari Al-Qayrawani who is one of the pillars of Arabic poetics in Andalusia.Keywords: Al-Husari Al-Qayrawni, poetics, rhyme, stylistics, science of the text
Procedia PDF Downloads 5151367 A Clustering Algorithm for Massive Texts
Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen
Abstract:
Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process
Procedia PDF Downloads 3971366 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques
Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart
Abstract:
Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.Keywords: machine learning, text classification, NLP techniques, semantic representation
Procedia PDF Downloads 501365 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text
Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni
Abstract:
The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance
Procedia PDF Downloads 1061364 Symmetric Key Encryption Algorithm Using Indian Traditional Musical Scale for Information Security
Authors: Aishwarya Talapuru, Sri Silpa Padmanabhuni, B. Jyoshna
Abstract:
Cryptography helps in preventing threats to information security by providing various algorithms. This study introduces a new symmetric key encryption algorithm for information security which is linked with the "raagas" which means Indian traditional scale and pattern of music notes. This algorithm takes the plain text as input and starts its encryption process. The algorithm then randomly selects a raaga from the list of raagas that is assumed to be present with both sender and the receiver. The plain text is associated with the thus selected raaga and an intermediate cipher-text is formed as the algorithm converts the plain text characters into other characters, depending upon the rules of the algorithm. This intermediate code or cipher text is arranged in various patterns in three different rounds of encryption performed. The total number of rounds in the algorithm is equal to the multiples of 3. To be more specific, the outcome or output of the sequence of first three rounds is again passed as the input to this sequence of rounds recursively, till the total number of rounds of encryption is performed. The raaga selected by the algorithm and the number of rounds performed will be specified at an arbitrary location in the key, in addition to important information regarding the rounds of encryption, embedded in the key which is known by the sender and interpreted only by the receiver, thereby making the algorithm hack proof. The key can be constructed of any number of bits without any restriction to the size. A software application is also developed to demonstrate this process of encryption, which dynamically takes the plain text as input and readily generates the cipher text as output. Therefore, this algorithm stands as one of the strongest tools for information security.Keywords: cipher text, cryptography, plaintext, raaga
Procedia PDF Downloads 2501363 Direct Synthesis of Composite Materials Type MCM-41/ZSM-5 by Hydrothermal at Atmospheric Pressure in Sealed Pyrex Tubes
Authors: Zoubida Lounis, Naouel Boumesla, Abd El Kader Bengueddach
Abstract:
The main objective of this study is to synthesize a composite materials by direct synthesis at atmospheric pression having the MFI structure and MCM-41 by using double structuring. In the first part of this work we are interested in the study of the synthesis parameters, in addition to temperature, the crystallization time and pH. The second part of this work is to vary the ratio of the concentrations of both structuring C9 [C9H19(CH3)3NBr] and C16 [C16H33(CH3)3NBr] and determining the area of formation of the two materials (microporous and mesoporous at same time), for this reason we performed a battery of experiments ranging from 0 to 100% for both structural. To enhance the economic purposes of this study, the experiments were carried out by using very cheap and simple process, the pyrex tubes were used instead of the reactors, and the synthesis were done at atmospheric pressure and moderate temperature. The final products (composite materials) were obtained at high and pure quality.Keywords: composite materials, syntheisis, catalysts, mesoporous materials, microporous materials
Procedia PDF Downloads 3421362 The Effects of Watching Text-Relevant Video Segments with/without Subtitles on Vocabulary Development of Arabic as a Foreign Language Learners
Authors: Amirreza Karami, Hawraa Nafea Hameed Alzouwain, Freddie A. Bowles
Abstract:
This study investigates the effects of watching text-relevant video segments with/without subtitles on vocabulary development of Arabic as a Foreign Language (AFL) learners. The participants of the study were assigned to two groups: one control group and one experimental group. The control group received no video-based instruction while the experimental group watched a text-relevant video segment in three stages: pre, while, and post-instruction. The preliminary results of the pre-test and post-test show that watching text-relevant video segments through following a pre-while-post procedure can help the vocabulary development of AFL learners more than non-video-based instruction.Keywords: text-relevant video segments, vocabulary development, Arabic as a Foreign Language, AFL, pre-while-post instruction
Procedia PDF Downloads 1231361 A Study of Various Ontology Learning Systems from Text and a Look into Future
Authors: Fatima Al-Aswadi, Chan Yong
Abstract:
With the large volume of unstructured data that increases day by day on the web, the motivation of representing the knowledge in this data in the machine processable form is increased. Ontology is one of the major cornerstones of representing the information in a more meaningful way on the semantic Web. The goal of Ontology learning from text is to elicit and represent domain knowledge in the machine readable form. This paper aims to give a follow-up review on the ontology learning systems from text and some of their defects. Furthermore, it discusses how far the ontology learning process will enhance in the future.Keywords: concept discovery, deep learning, ontology learning, semantic relation, semantic web
Procedia PDF Downloads 4701360 Principle Components Updates via Matrix Perturbations
Authors: Aiman Elragig, Hanan Dreiwi, Dung Ly, Idriss Elmabrook
Abstract:
This paper highlights a new approach to look at online principle components analysis (OPCA). Given a data matrix X ∈ R,^m x n we characterise the online updates of its covariance as a matrix perturbation problem. Up to the principle components, it turns out that online updates of the batch PCA can be captured by symmetric matrix perturbation of the batch covariance matrix. We have shown that as n→ n0 >> 1, the batch covariance and its update become almost similar. Finally, utilize our new setup of online updates to find a bound on the angle distance of the principle components of X and its update.Keywords: online data updates, covariance matrix, online principle component analysis, matrix perturbation
Procedia PDF Downloads 1591359 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah
Authors: Joy Aworo-Okoroh
Abstract:
Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah
Procedia PDF Downloads 921358 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications
Authors: K. P. Sandesh, M. H. Suman
Abstract:
Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms
Procedia PDF Downloads 4721357 Structuring After-School Physical Education Programs That are Engaging, Diverse, and Inclusive
Authors: Micah J. Dobson
Abstract:
After-school programs of physical education provide children with opportunities to engage in physical activities while developing healthy habits. To ensure that these programs are inclusive, diverse, and engaging, however, schools must consider various factors when designing and implementing them. This study sought to bring out efficient strategies for structuring after-school programs of physical education. The literature review was conducted using various databases and search engines. Some databases that index the journals include ERIC, Google Scholar, Scopus, Web of Science, and EBSCOhost. The search terms were combinations of keywords such as “after-school,” “physical education,” “inclusion,” “diversity,” “engagement,” “program design,” “program implementation,” “program effectiveness,” and “best practices.” The findings of this study suggest that schools that desire inclusivity must consider four key factors when designing and implementing after-school physical education programs. First, the programs must be designed with variety and fun by incorporating activities such as dance, sports, and games that appeal to all students. Second, instructors must be trained to create supportive and positive environments that foster student engagement while promoting physical literacy. Third, schools must collaborate with community stakeholders and organizations to ensure that programs are culturally inclusive and responsive. Fourth, schools can incorporate technology into their programs to enhance engagement and provide additional growth and learning opportunities.In conclusion, this study provides valuable insights into efficient strategies for structuring after-school programs of physical education that are inclusive, diverse, and engaging for all students. By considering these factors when designing and implementing their programs, schools can promote physical activity while supporting students’ overall well-being and health.Keywords: after-school programs of physical education, community partnership, inclusivity, instructor training, technology
Procedia PDF Downloads 351356 Psychoanalytic Understanding of the Autistic Self
Authors: Aastha Chaudhry
Abstract:
This continuous structuring of the ego through the developmental ages, starting with the body, has been understood through various perspectives from the object-relations world. Klein, Ogden, Winnicott to name a few, have been masters at helping mark a trajectory for the self to come to fruition. However, what constitutes those states, those relational structures, the dynamics of transference and the concept of inner objects has been more or less left unexplored in the psychoanalytic developmental theory. In this paper, through the help of a case study, Ogden’s ideas of an autistic contagious position and Kleinian theory of object relations is proposed to visualize a lens that helps to understand the relationship of the autistic self and body and allows us to take a look at object relations through countertransference. With the help of case vignettes, an understanding of experience is seen as dominated in the autistic contagious position with the help of defensive structuring that is not only self-fulfilling and sensorial oriented, but is also a pre symbolic mode of relating to the other. The aim of this clinical, experiential study is to better understand the self-body and the self-other relationships, or the absence thereof, in the autistic world and states. The goal of the study was to find such a relationship between play, body, structuring of experience and an autistic self in these individuals through that. Aim being that psychotherapy is brought to fore in the world of autism. The method was case study with one on one intervention, that was psychodynamically informed and play therapy based. Some of the findings after a year of work with these individuals were that: in the absence of a shared vocabulary, communication in two contrasting individuals happens primarily through the assistance of the body. Somatic countertransference, for instance, is how one can be with someone in a therapeutic relationship – and with autistic adolescents it is a further complicated relationship. With a mind somewhere in infanthood, and body experiencing adulthood, it becomes a challenge for the therapist to meet the client where they are. With pre-verbal states, play becomes such a potential space where two individuals could meet – a safe ground for forces to be contained. Play, then, becomes a mode of communication with such a population.Keywords: autism, psychoanalytic, play, self
Procedia PDF Downloads 931355 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues
Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid
Abstract:
New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.Keywords: information visualization, visual analytics, text mining, visual text analytics tools, big data visualization
Procedia PDF Downloads 3641354 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System
Authors: Tadesse Anberbir, Bankole Felix, Tomio Takara
Abstract:
In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation, but neither is shown in orthography. In this paper, to proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test, and we achieved an average Mean Opinion Score (MOS) 3.4 (68%), which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.Keywords: amharic, gemination, Speech synthesis, morphology, epenthesis
Procedia PDF Downloads 391353 Assessment of the Validity of Sentiment Analysis as a Tool to Analyze the Emotional Content of Text
Authors: Trisha Malhotra
Abstract:
Sentiment analysis is a recent field of study that computationally assesses the emotional nature of a body of text. To assess its test-validity, sentiment analysis was carried out on the emotional corpus of text from a personal 15-day mood diary. Self-reported mood scores varied more or less accurately with daily mood evaluation score given by the software. On further assessment, it was found that while sentiment analysis was good at assessing ‘global’ mood, it was not able to ‘locally’ identify and differentially score synonyms of various emotional words. It is further critiqued for treating the intensity of an emotion as universal across cultures. Finally, the software is shown not to account for emotional complexity in sentences by treating emotions as strictly positive or negative. Hence, it is posited that a better output could be two (positive and negative) affect scores for the same body of text.Keywords: analysis, data, diary, emotions, mood, sentiment
Procedia PDF Downloads 2271352 3D Text Toys: Creative Approach to Experiential and Immersive Learning for World Literacy
Authors: Azyz Sharafy
Abstract:
3D Text Toys is an innovative and creative approach that utilizes 3D text objects to enhance creativity, literacy, and basic learning in an enjoyable and gamified manner. By using 3D Text Toys, children can develop their creativity, visually learn words and texts, and apply their artistic talents within their creative abilities. This process incorporates haptic engagement with 2D and 3D texts, word building, and mechanical construction of everyday objects, thereby facilitating better word and text retention. The concept involves constructing visual objects made entirely out of 3D text/words, where each component of the object represents a word or text element. For instance, a bird can be recreated using words or text shaped like its wings, beak, legs, head, and body, resulting in a 3D representation of the bird purely composed of text. This can serve as an art piece or a learning tool in the form of a 3D text toy. These 3D text objects or toys can be crafted using natural materials such as leaves, twigs, strings, or ropes, or they can be made from various physical materials using traditional crafting tools. Digital versions of these objects can be created using 2D or 3D software on devices like phones, laptops, iPads, or computers. To transform digital designs into physical objects, computerized machines such as CNC routers, laser cutters, and 3D printers can be utilized. Once the parts are printed or cut out, students can assemble the 3D texts by gluing them together, resulting in natural or everyday 3D text objects. These objects can be painted to create artistic pieces or text toys, and the addition of wheels can transform them into moving toys. One of the significant advantages of this visual and creative object-based learning process is that students not only learn words but also derive enjoyment from the process of creating, painting, and playing with these objects. The ownership and creation process further enhances comprehension and word retention. Moreover, for individuals with learning disabilities such as dyslexia, ADD (Attention Deficit Disorder), or other learning difficulties, the visual and haptic approach of 3D Text Toys can serve as an additional creative and personalized learning aid. The application of 3D Text Toys extends to both the English language and any other global written language. The adaptation and creative application may vary depending on the country, space, and native written language. Furthermore, the implementation of this visual and haptic learning tool can be tailored to teach foreign languages based on age level and comprehension requirements. In summary, this creative, haptic, and visual approach has the potential to serve as a global literacy tool.Keywords: 3D text toys, creative, artistic, visual learning for world literacy
Procedia PDF Downloads 241351 Motion Effects of Arabic Typography on Screen-Based Media
Authors: Ibrahim Hassan
Abstract:
Motion typography is one of the most important types of visual communication based on display. Through the digital display media, we can control the text properties (size, direction, thickness, color, etc.). The use of motion typography in visual communication made it have several images. We need to adjust the terminology and clarify the different differences between them, so relying on the word motion typography -considered a general term- is not enough to separate the different communicative functions of the moving text. In this paper, we discuss the different effects of motion typography on Arabic writing and how we can achieve harmony between the movement and the letterform, and we will, during our experiments, present a new type of text movement.Keywords: Arabic typography, motion typography, kinetic typography, fluid typography, temporal typography
Procedia PDF Downloads 1121350 Structuring the Role of Indonesia's Dilemma Position in ASEAN to Combat Human Trafficking
Authors: Febi Eka Putri, Prabowo Anggorono
Abstract:
Human Trafficking has become a threat in the global phenomenon, including Indonesia as a country adopting democracy to uphold the human rights value. Indonesia is classified as a source of trafficking in persons which dominate by women and children for sexual exploitation and forced labor purposes. In this case, Indonesia has committed to combat trafficking in persons by enacted domestic law to criminalize all types of human trafficking in domestic and international level. Tracing to the efforts, we cannot just simplify it, however, in 2016 Indonesia has placed as a tier 2 country because the government does not fully achieve the minimum standard by U. S. Trafficking Victims Protection Act due to only making efforts as progress. While as a part of ASEAN member, Indonesia has signed ASEAN Human Rights Declaration but when it comes to Human Trafficking issue, there is only few ASEAN member who has ratified ASEAN Convention on Trafficking in Persons, in particular Women and Children such as Singapore, Cambodia, and Thailand. This brings the evidence to structuring the role of Indonesia to combat human trafficking.Keywords: Indonesia, Association of Southeast Asian Nations (ASEAN), human trafficking, Tier 2 country
Procedia PDF Downloads 3171349 Recognition of Grocery Products in Images Captured by Cellular Phones
Authors: Farshideh Einsele, Hassan Foroosh
Abstract:
In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation, style, illumination, and can suffer from perspective distortion. Pre-processing is performed to make the characters scale and rotation invariant. Since text degradations can not be appropriately defined using wellknown geometric transformations such as translation, rotation, affine transformation and shearing, we use the whole character black pixels as our feature vector. Classification is performed with minimum distance classifier using the maximum likelihood criterion, which delivers very promising Character Recognition Rate (CRR) of 89%. We achieve considerably higher Word Recognition Rate (WRR) of 99% when using lower level linguistic knowledge about product words during the recognition process.Keywords: camera-based OCR, feature extraction, document, image processing, grocery products
Procedia PDF Downloads 3641348 Pragmatic Survey of Precedence as Linguistic 'Déjà Vu' in Political Text and Talk
Authors: Zarine Avetisyan
Abstract:
Both in language and literature there exists the theory of recurrence of text and talk chunks which brings us to the notion of precedence. It must be stated that precedence as a pragma-linguistic phenomenon is yet underknown and it is the main objective of the present research to revisit and reveal it thoroughly. In line with the main research objective, analysis of political text and talk provides abundant relevant data for the illustration of the phenomenon of precedence. The analysis focuses on certain pragmatic universals (e.g. intention) and categories (e.g. speech techniques) which lead to the disclosure of the present object of study.Keywords: intention, precedence, political discourse, pragmatic universals
Procedia PDF Downloads 3851347 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System
Authors: Tadesse Anberbir, Felix Bankole, Tomio Takara, Girma Mamo
Abstract:
In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. In this paper, we proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions, and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test and we achieved an average Mean Opinion Score (MOS) 3.4 (68%) which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.Keywords: Amharic, gemination, speech synthesis, morphology, epenthesis
Procedia PDF Downloads 391346 Part of Speech Tagging Using Statistical Approach for Nepali Text
Authors: Archit Yajnik
Abstract:
Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm
Procedia PDF Downloads 2951345 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts
Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel
Abstract:
We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.Keywords: deep-learning approach, object-classes, semantic classification, Arabic
Procedia PDF Downloads 311344 Electric Field-Induced Deformation of Particle-Laden Drops and Structuring of Surface Particles
Authors: Alexander Mikkelsen, Khobaib Khobaib, Zbigniew Rozynek
Abstract:
Drops covered by particles have found important uses in various fields, ranging from stabilization of emulsions to production of new advanced materials. Particles at drop interfaces can be interlocked to form solid capsules with properties tailored for a myriad of applications. Despite the huge potential of particle-laden drops and capsules, the knowledge of their deformation and stability are limited. In this regard, we contribute with experimental studies on the deformation and manipulation of silicone oil drops covered with micrometer-sized particles subjected to electric fields. A mixture of silicone oil and particles were immersed in castor oil using a mechanical pipette, forming millimeter sized drops. The particles moved and adsorbed at the drop interfaces by sedimentation, and were structured at the interface by electric field-induced electrohydrodynamic flows. When applying a direct current electric field, free charges accumulated at the drop interfaces, yielding electric stress that deformed the drops. In our experiments, we investigated how particle properties affected drop deformation, break-up, and particle structuring. We found that by increasing the size of weakly-conductive clay particles, the drop shape can go from compressed to stretched out in the direction of the electric field. Increasing the particle size and electrical properties were also found to weaken electrohydrodynamic flows, induce break-up of drops at weaker electric field strengths and structure particles in chains. These particle parameters determine the dipolar force between the interfacial particles, which can yield particle chaining. We conclude that the balance between particle chaining and electrohydrodynamic flows governs the observed drop mechanics.Keywords: drop deformation, electric field induced stress, electrohydrodynamic flows, particle structuring at drop interfaces
Procedia PDF Downloads 1651343 Towards a Deconstructive Text: Beyond Language and the Politics of Absences in Samuel Beckett’s Waiting for Godot
Authors: Afia Shahid
Abstract:
The writing of Samuel Beckett is associated with meaning in the meaninglessness and the production of what he calls ‘literature of unword’. The casual escape from the world of words in the form of silences and pauses, in his play Waiting for Godot, urges to ask question of their existence and ultimately leads to investigate the theory behind their use in the play. This paper proposes that these absences (silence and pause) in Beckett’s play force to think ‘beyond’ language. This paper asks how silence and pause in Beckett’s text speak for the emergence of poststructuralist text. It aims to identify the significant features of the philosophy of deconstruction in the play of Beckett to demystify the hostile complicity between literature and philosophy. With the interpretive paradigm of poststructuralism this research focuses on the text as a research data. It attempts to delineate the relationship between poststructuralist theoretical concerns and text of Beckett. Keeping in view the theoretical concerns of Poststructuralist theorist Jacques Derrida, the main concern of the discussion is directed towards the notion of ‘beyond’ language into the absences that are aimed at silencing the existing discourse with the ‘radical irony’ of this anti-formal art that contains its own denial and thus represents the idea of ceaseless questioning and radical contradiction in art and any text. This article asks how text of Beckett vibrates with loud silence and has disrupted language to demonstrate the emptiness of words and thus exploring the limitless void of absences. Beckett’s text resonates with silence and pause that is neither negation nor affirmation rather a poststructuralist’s suspension of reality that is ever changing with the undecidablity of all meanings. Within the theoretical notion of Derrida’s Différance this study interprets silence and pause in Beckett’s art. The silence and pause behave like Derrida’s Différance and have questioned their own existence in the text to deconstruct any definiteness and finality of reality to extend an undecidable threshold of poststructuralists that aims to evade the ‘labyrinth of language’.Keywords: Différance, language, pause, poststructuralism, silence, text
Procedia PDF Downloads 1661342 The Platform for Digitization of Georgian Documents
Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili
Abstract:
Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.Keywords: NLP, OCR, BERT, Kubernetes, transformers
Procedia PDF Downloads 1041341 Structuring and Visualizing Healthcare Claims Data Using Systems Architecture Methodology
Authors: Inas S. Khayal, Weiping Zhou, Jonathan Skinner
Abstract:
Healthcare delivery systems around the world are in crisis. The need to improve health outcomes while decreasing healthcare costs have led to an imminent call to action to transform the healthcare delivery system. While Bioinformatics and Biomedical Engineering have primarily focused on biological level data and biomedical technology, there is clear evidence of the importance of the delivery of care on patient outcomes. Classic singular decomposition approaches from reductionist science are not capable of explaining complex systems. Approaches and methods from systems science and systems engineering are utilized to structure healthcare delivery system data. Specifically, systems architecture is used to develop a multi-scale and multi-dimensional characterization of the healthcare delivery system, defined here as the Healthcare Delivery System Knowledge Base. This paper is the first to contribute a new method of structuring and visualizing a multi-dimensional and multi-scale healthcare delivery system using systems architecture in order to better understand healthcare delivery.Keywords: health informatics, systems thinking, systems architecture, healthcare delivery system, data analytics
Procedia PDF Downloads 307