Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1462

Search results for: text preprocessing

952 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of Big Data Technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centers or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through Vader and Roberta model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and TFIDF Vectorization, and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: counter vectorization, convolutional neural network, crawler, data technology, long short-term memory, web scraping, sentiment analysis

Procedia PDF Downloads 88

951 Emotions Triggered by Children’s Literature Images

Authors: Ana Maria Reis d'Azevedo Breda, Catarina Maria Neto da Cruz

Abstract:

The role of images/illustrations in communicating meanings and triggering emotions assumes an increasingly relevant role in contemporary texts, regardless of the age group for which they are intended or the nature of the texts that host them. It is no coincidence that children's books are full of illustrations and that the image/text ratio decreases as the age group grows. The vast majority of children's books can be considered multimodal texts containing text and images/illustrations interacting with each other to provide the young reader with a broader and more creative understanding of the book's narrative. This interaction is very diverse, ranging from images/illustrations that are not essential for understanding the storytelling to those that contribute significantly to the meaning of the story. Usually, these books are also read by adults, namely by parents, educators, and teachers who act as mediators between the book and the children, explaining aspects that are or seem to be too complex for the child's context. It should be noted that there are books labeled as children's books that are clearly intended for both children and adults. In this work, following a qualitative and interpretative methodology based on written productions, participant observation, and field notes, we will describe the perceptions of future teachers of the 1st cycle of basic education, attending a master's degree at a Portuguese university, about the role of the image in literary and non-literary texts, namely in mathematical texts, and how these can constitute precious resources for emotional regulation and for the design of creative didactic situations. The analysis of the collected data allowed us to obtain evidence regarding the evolution of the participants' perception regarding the crucial role of images in children's literature, not only as an emotional regulator for young readers but also as a creative source for the design of meaningful didactical situations, crossing other scientific areas, other than the mother tongue, namely mathematics.

Keywords: children’s literature, emotions, multimodal texts, soft skills

Procedia PDF Downloads 94

950 Multi-source Question Answering Framework Using Transformers for Attribute Extraction

Authors: Prashanth Pillai, Purnaprajna Mangsuli

Abstract:

Oil exploration and production companies invest considerable time and efforts to extract essential well attributes (like well status, surface, and target coordinates, wellbore depths, event timelines, etc.) from unstructured data sources like technical reports, which are often non-standardized, multimodal, and highly domain-specific by nature. It is also important to consider the context when extracting attribute values from reports that contain information on multiple wells/wellbores. Moreover, semantically similar information may often be depicted in different data syntax representations across multiple pages and document sources. We propose a hierarchical multi-source fact extraction workflow based on a deep learning framework to extract essential well attributes at scale. An information retrieval module based on the transformer architecture was used to rank relevant pages in a document source utilizing the page image embeddings and semantic text embeddings. A question answering framework utilizingLayoutLM transformer was used to extract attribute-value pairs incorporating the text semantics and layout information from top relevant pages in a document. To better handle context while dealing with multi-well reports, we incorporate a dynamic query generation module to resolve ambiguities. The extracted attribute information from various pages and documents are standardized to a common representation using a parser module to facilitate information comparison and aggregation. Finally, we use a probabilistic approach to fuse information extracted from multiple sources into a coherent well record. The applicability of the proposed approach and related performance was studied on several real-life well technical reports.

Keywords: natural language processing, deep learning, transformers, information retrieval

Procedia PDF Downloads 193

949 Experiences Using Autoethnography as a Methodology for Research in Education

Authors: Sarah Amodeo

Abstract:

Drawing on the author’s research about the experiences of female immigrant students in academic Adult Education, in Montreal, Quebec, this paper deconstructs the benefits of autoethnography as a methodology for educators in Adult Education. Autoethnography is an advantageous methodology for teachers in Adult Education as it allows for deep engagement, allowing for educators to reflect on student experiences and their day-to-day realities, and in turn, allowing for professional development, improved andragogy, and changes to classroom practices. Autoethnography is a qualitative research methodology that cultivates strategies for improving adult learning. The paper begins by outlining the context that inspired autoethnography for the author’s work, highlighting the emergence of autoethnography as a method, while examining how it is evolving and drawing on foundational work that continues to inspire research. The basic autoethnographic methodologies that are explored in this paper include the use of memory work in episode formation, the use of personal photographs, and textual readings of artworks. Memory work allows for the researcher to use their professional experience and the lived/shared experiences of their students in their research, drawing on episodes from their past. Personal photographs and descriptions of artwork allow researchers to explore images of learning environments/realities in ways that compliment student experiences. Major findings of the text are examined through the analysis of categories of autoethnography. Specific categories include realism, impressionism, and conceptualism which aid in orientating the analysis and emergent themes that develop through self-study. Finally, the text presents a discussion surrounding the limitations of autoethnography, with attention to the trustworthiness and ethical issues. The paper concludes with a consideration of the implications of autoethnography for adult educators in juxtaposition with youth sector work.

Keywords: artwork, autoethnography, conceptualism, episode formation, impressionism, memory work, personal photographs, and realism, realism

Procedia PDF Downloads 195

948 Morphology Operation and Discrete Wavelet Transform for Blood Vessels Segmentation in Retina Fundus

Authors: Rita Magdalena, N. K. Caecar Pratiwi, Yunendah Nur Fuadah, Sofia Saidah, Bima Sakti

Abstract:

Vessel segmentation of retinal fundus is important for biomedical sciences in diagnosing ailments related to the eye. Segmentation can simplify medical experts in diagnosing retinal fundus image state. Therefore, in this study, we designed a software using MATLAB which enables the segmentation of the retinal blood vessels on retinal fundus images. There are two main steps in the process of segmentation. The first step is image preprocessing that aims to improve the quality of the image to be optimum segmented. The second step is the image segmentation in order to perform the extraction process to retrieve the retina’s blood vessel from the eye fundus image. The image segmentation methods that will be analyzed in this study are Morphology Operation, Discrete Wavelet Transform and combination of both. The amount of data that used in this project is 40 for the retinal image and 40 for manually segmentation image. After doing some testing scenarios, the average accuracy for Morphology Operation method is 88.46 % while for Discrete Wavelet Transform is 89.28 %. By combining the two methods mentioned in later, the average accuracy was increased to 89.53 %. The result of this study is an image processing system that can segment the blood vessels in retinal fundus with high accuracy and low computation time.

Keywords: discrete wavelet transform, fundus retina, morphology operation, segmentation, vessel

Procedia PDF Downloads 197

947 Brain Tumor Detection and Classification Using Pre-Trained Deep Learning Models

Authors: Aditya Karade, Sharada Falane, Dhananjay Deshmukh, Vijaykumar Mantri

Abstract:

Brain tumors pose a significant challenge in healthcare due to their complex nature and impact on patient outcomes. The application of deep learning (DL) algorithms in medical imaging have shown promise in accurate and efficient brain tumour detection. This paper explores the performance of various pre-trained DL models ResNet50, Xception, InceptionV3, EfficientNetB0, DenseNet121, NASNetMobile, VGG19, VGG16, and MobileNet on a brain tumour dataset sourced from Figshare. The dataset consists of MRI scans categorizing different types of brain tumours, including meningioma, pituitary, glioma, and no tumour. The study involves a comprehensive evaluation of these models’ accuracy and effectiveness in classifying brain tumour images. Data preprocessing, augmentation, and finetuning techniques are employed to optimize model performance. Among the evaluated deep learning models for brain tumour detection, ResNet50 emerges as the top performer with an accuracy of 98.86%. Following closely is Xception, exhibiting a strong accuracy of 97.33%. These models showcase robust capabilities in accurately classifying brain tumour images. On the other end of the spectrum, VGG16 trails with the lowest accuracy at 89.02%.

Keywords: brain tumour, MRI image, detecting and classifying tumour, pre-trained models, transfer learning, image segmentation, data augmentation

Procedia PDF Downloads 74

946 The Prevalence of Organized Retail Crime in Riyadh, Saudi Arabia

Authors: Saleh Dabil

Abstract:

This study investigates the level of existence of organized retail crime in supermarkets of Riyadh, Saudi Arabia. The store managers, security managers and general employees were asked about the types of retail crimes occur in the stores. Three independent variables were related to the report of organized retail theft. The independent variables are: (1) the supermarket profile (volume, location, standard and type of the store), (2) the social physical environment of the store (maintenance, cleanness and overall organizational cooperation), (3) the security techniques and loss prevention electronics techniques used. The theoretical framework of this study based on the social disorganization theory. This study concluded that the organized retail theft, in specific, organized theft is moderately apparent in Riyadh stores. The general result showed that the environment of the stores has an effect on the prevalence of organized retail theft with relation to the gender of thieves, age groups, working shift, type of stolen items as well as the number of thieves in one case. Among other reasons, some factors of the organized theft are: economic pressure of customers based on the location of the store. The dealing of theft also was investigated to have a clear picture of stores dealing with organized retail theft. The result showed that mostly, thieves sent without any action and sometimes given written warning. Very few cases dealt with by police. There are other factors in the study can be looked up in the text. This study suggests solving the problem of organized theft; first is ‘the well distributing of the duties and responsibilities between the employees especially for security purposes’. Second is ‘installation of strong security system’ and ‘making well-designed store layout’. Third is ‘giving training for general employees’ and ‘to give periodically security skills training of employees’. There are other suggestions in the study can be looked up in the text.

Keywords: organized crime, retail, theft, loss prevention, store environment

Procedia PDF Downloads 198

945 Archaeological Study of Statues of King Thutmosis III from Luxor

Authors: Mahmoud Abualsoud

Abstract:

The era of Thutmosis III represents a transitional period between the art of the Thutmoside art and the Amarna period, so we intend to declare that it serves as the cradle of Amarna art. The study will examine the Statues of king Thutmose III that was discovered in Luxor by an Egyptian mission. These Statues have been transferred to the Conservation Center of the Grand Egyptian Museum (GEM) to be conserved and made ready to be displayed at the new museum (the project of the century). We focus on three Statues chosen because they relate to different years of the king's reign. These Statues were all made of granite. The first one is a Kneeling statue representing the god Amun showing king Thutmose III offering to the goddess Hathor. The second is decorated with king Thutmose III with the red crown, between the goddess Hathor and the royal wife, Nefertari. The third shows the king offering NW vessels and bread to the god Seker. Each statue is divided into registers containing a description and decorated with scenes of the king presenting offerings to gods. The proposed study will focus on the development which happened sequentially according to differences that occur in each statue. We will use comparative research to determine the workshops of these statues, whether one or several, and what are the distinguishing features of each one. We will examine what innovations the artisans added to royal art. The description and the texts will be translated with linguistic comments. This research focuses on text analyses and technology. Paleographic information found on these objects includes the names and titles of the king. This research focuses on text analyses and technology. The study aims to create a manual that may help in dating the artwork of Thutmosis III. This research will be beneficial and useful for heritage and ancient civilizations, particularly when we talk about opening museums like the Grand Egyptian Museum, which will exhibit a collection of statues. Indeed, this kind of study will open a new destination in order to know how to identify these collections and how to exhibit them commensurate with the nature of ancient Egyptian history and heritage.

Keywords: archaeological study, Giza, new kingdom, statues, royal art

Procedia PDF Downloads 71

944 PsyVBot: Chatbot for Accurate Depression Diagnosis using Long Short-Term Memory and NLP

Authors: Thaveesha Dheerasekera, Dileeka Sandamali Alwis

Abstract:

The escalating prevalence of mental health issues, such as depression and suicidal ideation, is a matter of significant global concern. It is plausible that a variety of factors, such as life events, social isolation, and preexisting physiological or psychological health conditions, could instigate or exacerbate these conditions. Traditional approaches to diagnosing depression entail a considerable amount of time and necessitate the involvement of adept practitioners. This underscores the necessity for automated systems capable of promptly detecting and diagnosing symptoms of depression. The PsyVBot system employs sophisticated natural language processing and machine learning methodologies, including the use of the NLTK toolkit for dataset preprocessing and the utilization of a Long Short-Term Memory (LSTM) model. The PsyVBot exhibits a remarkable ability to diagnose depression with a 94% accuracy rate through the analysis of user input. Consequently, this resource proves to be efficacious for individuals, particularly those enrolled in academic institutions, who may encounter challenges pertaining to their psychological well-being. The PsyVBot employs a Long Short-Term Memory (LSTM) model that comprises a total of three layers, namely an embedding layer, an LSTM layer, and a dense layer. The stratification of these layers facilitates a precise examination of linguistic patterns that are associated with the condition of depression. The PsyVBot has the capability to accurately assess an individual's level of depression through the identification of linguistic and contextual cues. The task is achieved via a rigorous training regimen, which is executed by utilizing a dataset comprising information sourced from the subreddit r/SuicideWatch. The diverse data present in the dataset ensures precise and delicate identification of symptoms linked with depression, thereby guaranteeing accuracy. PsyVBot not only possesses diagnostic capabilities but also enhances the user experience through the utilization of audio outputs. This feature enables users to engage in more captivating and interactive interactions. The PsyVBot platform offers individuals the opportunity to conveniently diagnose mental health challenges through a confidential and user-friendly interface. Regarding the advancement of PsyVBot, maintaining user confidentiality and upholding ethical principles are of paramount significance. It is imperative to note that diligent efforts are undertaken to adhere to ethical standards, thereby safeguarding the confidentiality of user information and ensuring its security. Moreover, the chatbot fosters a conducive atmosphere that is supportive and compassionate, thereby promoting psychological welfare. In brief, PsyVBot is an automated conversational agent that utilizes an LSTM model to assess the level of depression in accordance with the input provided by the user. The demonstrated accuracy rate of 94% serves as a promising indication of the potential efficacy of employing natural language processing and machine learning techniques in tackling challenges associated with mental health. The reliability of PsyVBot is further improved by the fact that it makes use of the Reddit dataset and incorporates Natural Language Toolkit (NLTK) for preprocessing. PsyVBot represents a pioneering and user-centric solution that furnishes an easily accessible and confidential medium for seeking assistance. The present platform is offered as a modality to tackle the pervasive issue of depression and the contemplation of suicide.

Keywords: chatbot, depression diagnosis, LSTM model, natural language process

Procedia PDF Downloads 71

943 Online Handwritten Character Recognition for South Indian Scripts Using Support Vector Machines

Authors: Steffy Maria Joseph, Abdu Rahiman V, Abdul Hameed K. M.

Abstract:

Online handwritten character recognition is a challenging field in Artificial Intelligence. The classification success rate of current techniques decreases when the dataset involves similarity and complexity in stroke styles, number of strokes and stroke characteristics variations. Malayalam is a complex south indian language spoken by about 35 million people especially in Kerala and Lakshadweep islands. In this paper, we consider the significant feature extraction for the similar stroke styles of Malayalam. This extracted feature set are suitable for the recognition of other handwritten south indian languages like Tamil, Telugu and Kannada. A classification scheme based on support vector machines (SVM) is proposed to improve the accuracy in classification and recognition of online malayalam handwritten characters. SVM Classifiers are the best for real world applications. The contribution of various features towards the accuracy in recognition is analysed. Performance for different kernels of SVM are also studied. A graphical user interface has developed for reading and displaying the character. Different writing styles are taken for each of the 44 alphabets. Various features are extracted and used for classification after the preprocessing of input data samples. Highest recognition accuracy of 97% is obtained experimentally at the best feature combination with polynomial kernel in SVM.

Keywords: SVM, matlab, malayalam, South Indian scripts, onlinehandwritten character recognition

Procedia PDF Downloads 576

942 Visualization of Taiwan's Religious Social Networking Sites

Authors: Jia-Jane Shuai

Abstract:

Purpose of this research aims to improve understanding of the nature of online religion by examining the religious social websites. What motivates individual users to use the online religious social websites, and which factors affect those motivations. We survey various online religious social websites provided by different religions, especially the Taiwanese folk religion. Based on the theory of the Content Analysis and Social Network Analysis, religious social websites and religious web activities are examined. This research examined the folk religion websites’ presentation and contents that promote the religious use of the Internet in Taiwan. The difference among different religions and religious websites also be compared. First, this study used keywords to examine what types of messages gained the most clicks of “Like”, “Share” and comments on Facebook. Dividing the messages into four media types, namely, text, link, video, and photo, reveal which category receive more likes and comments than the others. Meanwhile, this study analyzed the five dialogic principles of religious websites accessed from mobile phones and also assessed their mobile readiness. Using the five principles of dialogic theory as a basis, do a general survey on the websites with elements of online religion. Second, the project analyzed the characteristics of Taiwanese participants for online religious activities. Grounded by social network analysis and text mining, this study comparatively explores the network structure, interaction pattern, and geographic distribution of users involved in communication networks of the folk religion in social websites and mobile sites. We studied the linkage preference of different religious groups. The difference among different religions and religious websites also be compared. We examined the reasons for the success of these websites, as well as reasons why young users accept new religious media. The outcome of the research will be useful for online religious service providers and non-profit organizations to manage social websites and internet marketing.

Keywords: content analysis, online religion, social network analysis, social websites

Procedia PDF Downloads 169

941 Integrating Natural Language Processing (NLP) and Machine Learning in Lung Cancer Diagnosis

Authors: Mehrnaz Mostafavi

Abstract:

The assessment and categorization of incidental lung nodules present a considerable challenge in healthcare, often necessitating resource-intensive multiple computed tomography (CT) scans for growth confirmation. This research addresses this issue by introducing a distinct computational approach leveraging radiomics and deep-learning methods. However, understanding local services is essential before implementing these advancements. With diverse tracking methods in place, there is a need for efficient and accurate identification approaches, especially in the context of managing lung nodules alongside pre-existing cancer scenarios. This study explores the integration of text-based algorithms in medical data curation, indicating their efficacy in conjunction with machine learning and deep-learning models for identifying lung nodules. Combining medical images with text data has demonstrated superior data retrieval compared to using each modality independently. While deep learning and text analysis show potential in detecting previously missed nodules, challenges persist, such as increased false positives. The presented research introduces a Structured-Query-Language (SQL) algorithm designed for identifying pulmonary nodules in a tertiary cancer center, externally validated at another hospital. Leveraging natural language processing (NLP) and machine learning, the algorithm categorizes lung nodule reports based on sentence features, aiming to facilitate research and assess clinical pathways. The hypothesis posits that the algorithm can accurately identify lung nodule CT scans and predict concerning nodule features using machine-learning classifiers. Through a retrospective observational study spanning a decade, CT scan reports were collected, and an algorithm was developed to extract and classify data. Results underscore the complexity of lung nodule cohorts in cancer centers, emphasizing the importance of careful evaluation before assuming a metastatic origin. The SQL and NLP algorithms demonstrated high accuracy in identifying lung nodule sentences, indicating potential for local service evaluation and research dataset creation. Machine-learning models exhibited strong accuracy in predicting concerning changes in lung nodule scan reports. While limitations include variability in disease group attribution, the potential for correlation rather than causality in clinical findings, and the need for further external validation, the algorithm's accuracy and potential to support clinical decision-making and healthcare automation represent a significant stride in lung nodule management and research.

Keywords: lung cancer diagnosis, structured-query-language (SQL), natural language processing (NLP), machine learning, CT scans

Procedia PDF Downloads 103

940 Contextual Toxicity Detection with Data Augmentation

Authors: Julia Ive, Lucia Specia

Abstract:

Understanding and detecting toxicity is an important problem to support safer human interactions online. Our work focuses on the important problem of contextual toxicity detection, where automated classifiers are tasked with determining whether a short textual segment (usually a sentence) is toxic within its conversational context. We use “toxicity” as an umbrella term to denote a number of variants commonly named in the literature, including hate, abuse, offence, among others. Detecting toxicity in context is a non-trivial problem and has been addressed by very few previous studies. These previous studies have analysed the influence of conversational context in human perception of toxicity in controlled experiments and concluded that humans rarely change their judgements in the presence of context. They have also evaluated contextual detection models based on state-of-the-art Deep Learning and Natural Language Processing (NLP) techniques. Counterintuitively, they reached the general conclusion that computational models tend to suffer performance degradation in the presence of context. We challenge these empirical observations by devising better contextual predictive models that also rely on NLP data augmentation techniques to create larger and better data. In our study, we start by further analysing the human perception of toxicity in conversational data (i.e., tweets), in the absence versus presence of context, in this case, previous tweets in the same conversational thread. We observed that the conclusions of previous work on human perception are mainly due to data issues: The contextual data available does not provide sufficient evidence that context is indeed important (even for humans). The data problem is common in current toxicity datasets: cases labelled as toxic are either obviously toxic (i.e., overt toxicity with swear, racist, etc. words), and thus context does is not needed for a decision, or are ambiguous, vague or unclear even in the presence of context; in addition, the data contains labeling inconsistencies. To address this problem, we propose to automatically generate contextual samples where toxicity is not obvious (i.e., covert cases) without context or where different contexts can lead to different toxicity judgements for the same tweet. We generate toxic and non-toxic utterances conditioned on the context or on target tweets using a range of techniques for controlled text generation(e.g., Generative Adversarial Networks and steering techniques). On the contextual detection models, we posit that their poor performance is due to limitations on both of the data they are trained on (same problems stated above) and the architectures they use, which are not able to leverage context in effective ways. To improve on that, we propose text classification architectures that take the hierarchy of conversational utterances into account. In experiments benchmarking ours against previous models on existing and automatically generated data, we show that both data and architectural choices are very important. Our model achieves substantial performance improvements as compared to the baselines that are non-contextual or contextual but agnostic of the conversation structure.

Keywords: contextual toxicity detection, data augmentation, hierarchical text classification models, natural language processing

Procedia PDF Downloads 171

939 Autonomous Vehicle Detection and Classification in High Resolution Satellite Imagery

Authors: Ali J. Ghandour, Houssam A. Krayem, Abedelkarim A. Jezzini

Abstract:

High-resolution satellite images and remote sensing can provide global information in a fast way compared to traditional methods of data collection. Under such high resolution, a road is not a thin line anymore. Objects such as cars and trees are easily identifiable. Automatic vehicles enumeration can be considered one of the most important applications in traffic management. In this paper, autonomous vehicle detection and classification approach in highway environment is proposed. This approach consists mainly of three stages: (i) first, a set of preprocessing operations are applied including soil, vegetation, water suppression. (ii) Then, road networks detection and delineation is implemented using built-up area index, followed by several morphological operations. This step plays an important role in increasing the overall detection accuracy since vehicles candidates are objects contained within the road networks only. (iii) Multi-level Otsu segmentation is implemented in the last stage, resulting in vehicle detection and classification, where detected vehicles are classified into cars and trucks. Accuracy assessment analysis is conducted over different study areas to show the great efficiency of the proposed method, especially in highway environment.

Keywords: remote sensing, object identification, vehicle and road extraction, vehicle and road features-based classification

Procedia PDF Downloads 233

938 Evaluation of the Internal Quality for Pineapple Based on the Spectroscopy Approach and Neural Network

Authors: Nonlapun Meenil, Pisitpong Intarapong, Thitima Wongsheree, Pranchalee Samanpiboon

Abstract:

In Thailand, once pineapples are harvested, they must be classified into two classes based on their sweetness: sweet and unsweet. This paper has studied and developed the assessment of internal quality of pineapples using a low-cost compact spectroscopy sensor according to the Spectroscopy approach and Neural Network (NN). During the experiments, Batavia pineapples were utilized, generating 100 samples. The extracted pineapple juice of each sample was used to determine the Soluble Solid Content (SSC) labeling into sweet and unsweet classes. In terms of experimental equipment, the sensor cover was specifically designed to install the sensor and light source to read the reflectance at a five mm depth from pineapple flesh. By using a spectroscopy sensor, data on visible and near-infrared reflectance (Vis-NIR) were collected. The NN was used to classify the pineapple classes. Before the classification step, the preprocessing methods, which are Class balancing, Data shuffling, and Standardization were applied. The 510 nm and 900 nm reflectance values of the middle parts of pineapples were used as features of the NN. With the Sequential model and Relu activation function, 100% accuracy of the training set and 76.67% accuracy of the test set were achieved. According to the abovementioned information, using a low-cost compact spectroscopy sensor has achieved favorable results in classifying the sweetness of the two classes of pineapples.

Keywords: neural network, pineapple, soluble solid content, spectroscopy

Procedia PDF Downloads 79

937 Regression of Hand Kinematics from Surface Electromyography Data Using an Long Short-Term Memory-Transformer Model

Authors: Anita Sadat Sadati Rostami, Reza Almasi Ghaleh

Abstract:

Surface electromyography (sEMG) offers important insights into muscle activation and has applications in fields including rehabilitation and human-computer interaction. The purpose of this work is to predict the degree of activation of two joints in the index finger using an LSTM-Transformer architecture trained on sEMG data from the Ninapro DB8 dataset. We apply advanced preprocessing techniques, such as multi-band filtering and customizable rectification methods, to enhance the encoding of sEMG data into features that are beneficial for regression tasks. The processed data is converted into spike patterns and simulated using Leaky Integrate-and-Fire (LIF) neuron models, allowing for neuromorphic-inspired processing. Our findings demonstrate that adjusting filtering parameters and neuron dynamics and employing the LSTM-Transformer model improves joint angle prediction performance. This study contributes to the ongoing development of deep learning frameworks for sEMG analysis, which could lead to improvements in motor control systems.

Keywords: surface electromyography, LSTM-transformer, spiking neural networks, hand kinematics, leaky integrate-and-fire neuron, band-pass filtering, muscle activity decoding

Procedia PDF Downloads 18

936 Lexical Semantic Analysis to Support Ontology Modeling of Maintenance Activities– Case Study of Offshore Riser Integrity

Authors: Vahid Ebrahimipour

Abstract:

Word representation and context meaning of text-based documents play an essential role in knowledge modeling. Business procedures written in natural language are meant to store technical and engineering information, management decision and operation experience during the production system life cycle. Context meaning representation is highly dependent upon word sense, lexical relativity, and sematic features of the argument. This paper proposes a method for lexical semantic analysis and context meaning representation of maintenance activity in a mass production system. Our approach constructs a straightforward lexical semantic approach to analyze facilitates semantic and syntactic features of context structure of maintenance report to facilitate translation, interpretation, and conversion of human-readable interpretation into computer-readable representation and understandable with less heterogeneity and ambiguity. The methodology will enable users to obtain a representation format that maximizes shareability and accessibility for multi-purpose usage. It provides a contextualized structure to obtain a generic context model that can be utilized during the system life cycle. At first, it employs a co-occurrence-based clustering framework to recognize a group of highly frequent contextual features that correspond to a maintenance report text. Then the keywords are identified for syntactic and semantic extraction analysis. The analysis exercises causality-driven logic of keywords’ senses to divulge the structural and meaning dependency relationships between the words in a context. The output is a word contextualized representation of maintenance activity accommodating computer-based representation and inference using OWL/RDF.

Keywords: lexical semantic analysis, metadata modeling, contextual meaning extraction, ontology modeling, knowledge representation

Procedia PDF Downloads 105

935 Arabic Lexicon Learning to Analyze Sentiment in Microblogs

Authors: Mahmoud B. Rokaya

Abstract:

The study of opinion mining and sentiment analysis includes analysis of opinions, sentiments, evaluations, attitudes, and emotions. The rapid growth of social media, social networks, reviews, forum discussions, microblogs, and Twitter, leads to a parallel growth in the field of sentiment analysis. The field of sentiment analysis tries to develop effective tools to make it possible to capture the trends of people. There are two approaches in the field, lexicon-based and corpus-based methods. A lexicon-based method uses a sentiment lexicon which includes sentiment words and phrases with assigned numeric scores. These scores reveal if sentiment phrases are positive or negative, their intensity, and/or their emotional orientations. Creation of manual lexicons is hard. This brings the need for adaptive automated methods for generating a lexicon. The proposed method generates dynamic lexicons based on the corpus and then classifies text using these lexicons. In the proposed method, different approaches are combined to generate lexicons from text. The proposed method classifies the tweets into 5 classes instead of +ve or –ve classes. The sentiment classification problem is written as an optimization problem, finding optimum sentiment lexicons are the goal of the optimization process. The solution was produced based on mathematical programming approaches to find the best lexicon to classify texts. A genetic algorithm was written to find the optimal lexicon. Then, extraction of a meta-level feature was done based on the optimal lexicon. The experiments were conducted on several datasets. Results, in terms of accuracy, recall and F measure, outperformed the state-of-the-art methods proposed in the literature in some of the datasets. A better understanding of the Arabic language and culture of Arab Twitter users and sentiment orientation of words in different contexts can be achieved based on the sentiment lexicons proposed by the algorithm.

Keywords: social media, Twitter sentiment, sentiment analysis, lexicon, genetic algorithm, evolutionary computation

Procedia PDF Downloads 190

934 Automatic Motion Trajectory Analysis for Dual Human Interaction Using Video Sequences

Authors: Yuan-Hsiang Chang, Pin-Chi Lin, Li-Der Jeng

Abstract:

Advance in techniques of image and video processing has enabled the development of intelligent video surveillance systems. This study was aimed to automatically detect moving human objects and to analyze events of dual human interaction in a surveillance scene. Our system was developed in four major steps: image preprocessing, human object detection, human object tracking, and motion trajectory analysis. The adaptive background subtraction and image processing techniques were used to detect and track moving human objects. To solve the occlusion problem during the interaction, the Kalman filter was used to retain a complete trajectory for each human object. Finally, the motion trajectory analysis was developed to distinguish between the interaction and non-interaction events based on derivatives of trajectories related to the speed of the moving objects. Using a database of 60 video sequences, our system could achieve the classification accuracy of 80% in interaction events and 95% in non-interaction events, respectively. In summary, we have explored the idea to investigate a system for the automatic classification of events for interaction and non-interaction events using surveillance cameras. Ultimately, this system could be incorporated in an intelligent surveillance system for the detection and/or classification of abnormal or criminal events (e.g., theft, snatch, fighting, etc.).

Keywords: motion detection, motion tracking, trajectory analysis, video surveillance

Procedia PDF Downloads 548

933 The Role of Smart Educational Aids in Learning Listening Among Pupils with Attention and Listening Problems

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Adham Al Yaari, Aayah Al Yaari, Montaha Al Yaari, Ayman Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

The recent rise of smart educational aids and the move away from traditional listening aids are leading to a fundamental shift in the way in which individuals with attention and listening problems (ALP) manipulate listening inputs and/or act appropriately to the spoken information presented to them. A total sample of twenty-six ALP pupils (m=20 and f=6) between 7-12 years old was selected from different strata based on gender, region and school. In the sample size, thirteen (10 males and 3 females) received the treatment in terms of smart classes provided with smart educational aids in a listening course that lasted for four months, while others did not (they studied the same course by the same instructor but in ordinary class). A pretest was administered to assess participants’ levels, and a posttest was given to evaluate their attention and listening comprehension performance, namely in phonetic and phonological tests with sociolinguistic themes that have been designed for this purpose. Test results were analyzed both psychoneurolinguistically and statistically. Results reveal a remarkable change in pupils’ behavioral listening where scores witnessed a significant difference in the performance of the experimental ALP group in the pretest compared to the posttest (Pupils performed better at the pretest-posttest on phonetics than at the two tests on phonology). It is concluded that smart educational aids designed for listening skills help not only increase the listening command of pupils with ALP to understand what they listen to but also develop their interactive listening capability and, at the same rate, are responsible for increasing concentrated and in-depth listening capacity. Plus, ALP pupils become able to grasp the audio content of text recordings, including educational audio recordings, news, oral stories and tales, views, spiritual/religious text and general knowledge. However, the pupils have not experienced individual smart audio-visual aids that connect listening to other language receptive and productive skills, which could be the future area of research.

Keywords: smart aids, attention, listening, problems

Procedia PDF Downloads 44

932 Using Computer Vision to Detect and Localize Fractures in Wrist X-ray Images

Authors: John Paul Q. Tomas, Mark Wilson L. de los Reyes, Kirsten Joyce P. Vasquez

Abstract:

The most frequent type of fracture is a wrist fracture, which often makes it difficult for medical professionals to find and locate. In this study, fractures in wrist x-ray pictures were located and identified using deep learning and computer vision. The researchers used image filtering, masking, morphological operations, and data augmentation for the image preprocessing and trained the RetinaNet and Faster R-CNN models with ResNet50 backbones and Adam optimizers separately for each image filtering technique and projection. The RetinaNet model with Anisotropic Diffusion Smoothing filter trained with 50 epochs has obtained the greatest accuracy of 99.14%, precision of 100%, sensitivity/recall of 98.41%, specificity of 100%, and an IoU score of 56.44% for the Posteroanterior projection utilizing augmented data. For the Lateral projection using augmented data, the RetinaNet model with an Anisotropic Diffusion filter trained with 50 epochs has produced the highest accuracy of 98.40%, precision of 98.36%, sensitivity/recall of 98.36%, specificity of 98.43%, and an IoU score of 58.69%. When comparing the test results of the different individual projections, models, and image filtering techniques, the Anisotropic Diffusion filter trained with 50 epochs has produced the best classification and regression scores for both projections.

Keywords: Artificial Intelligence, Computer Vision, Wrist Fracture, Deep Learning

Procedia PDF Downloads 74

931 Archaeological Study of Statues of King Thutmosis III from Luxor

Authors: Ahmed Mamdouh

Abstract:

Introduction: The era of Thutmosis III represents a transitional period between the art of the Thutmoside art and the Amarna period, so we intend to declare that it serves as the cradle of Amarna art. The study will examine the Statues of king Thutmose III that was discovered in Luxor by an Egyptian mission. These Statues have been transferred to the Conservation Center of the Grand Egyptian Museum (GEM) to be conserved and made ready to bedisplayed at the new museum (the project of the century). We focus upon three Statues (GEM numbers 45863, 45864, 45865), chosen because they relate to different years of the king's reign. These Statues were all made of granite. The first one is a Kneeling statue representing the god Amun showing king Thutmose III offering to the goddess Hathor. The second is decorated with king Thutmose III with the red crown, between the goddess Hathor and the royal wife, Nefertari. The third shows the king offering NW vessels and bread to the god Seker. Each Statue is divided into registers containing a description and decorated with scenes of the king presenting offerings to gods. Methodology: The proposed study will focus on the development which happened sequentially according to differences that occur in each Statue. We will use comparative research to determine the workshops of these statues, whether one or several, and what are the distinguishing features of each one. We will examine what innovations the artisans added to royal art. The description and the texts will be translated with linguistic comments. This research focuses on text analyses and technology. Paleographic information found on these objects includes the names and titles of the king. Conclusion: This research focuses on text analyses and technology. The study aims to create a manual that may help in dating the artwork of Thutmosis III. This research will be beneficial and useful for heritage and ancient civilizations, particularly when we talk about opening museums like the Grand Egyptian museum, which will exhibit a collection of statues. Indeed this kind of study will open a new destination in order to know how to identify these collections and how to exhibit them commensurate with the nature of ancient Egyptian history and heritage.

Keywords: archaeological study, Giza, new kingdom, statues, royal art

Procedia PDF Downloads 69

930 The Impact of Smart Educational Aids in Learning Listening Among Pupils with Attention and Listening Problems

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Adham Al Yaari, Ayah Al Yaari, Ayman Al Yaari, Montaha Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

Keywords: smart educational aids, listening attention, pupils, problems

Procedia PDF Downloads 52

929 A Study of the Use of Arguments in Nominalizations as Instanciations of Grammatical Metaphors Finished in -TION in Academic Texts of Native Speakers

Authors: Giovana Perini-Loureiro

Abstract:

The purpose of this research was to identify whether the nominalizations terminating in -TION in the academic discourse of native English speakers contain the arguments required by their input verbs. In the perspective of functional linguistics, ideational metaphors, with nominalization as their most pervasive realization, are lexically dense, and therefore frequent in formal texts. Ideational metaphors allow the academic genre to instantiate objectification, de-personalization, and the ability to construct a chain of arguments. The valence of those nouns present in nominalizations tends to maintain the same elements of the valence from its original verbs, but these arguments are not always expressed. The initial hypothesis was that these arguments would also be present alongside the nominalizations, through anaphora or cataphora. In this study, a qualitative analysis of the occurrences of the five more frequent nominalized terminations in -TION in academic texts was accomplished, and thus a verification of the occurrences of the arguments required by the original verbs. The assembling of the concordance lines was done through COCA (Corpus of Contemporary American English). After identifying the five most frequent nominalizations (attention, action, participation, instruction, intervention), the concordance lines were selected at random to be analyzed, assuring the representativeness and reliability of the sample. It was possible to verify, in all the analyzed instances, the presence of arguments. In most instances, the arguments were not expressed, but recoverable, either in the context or in the shared knowledge among the interactants. It was concluded that the realizations of the arguments which were not expressed alongside the nominalizations are part of a continuum, starting from the immediate context with anaphora and cataphora; up to a knowledge shared outside the text, such as specific area knowledge. The study also has implications for the teaching of academic writing, especially with regards to the impact of nominalizations on the thematic and informational flow of the text. Grammatical metaphors are essential to academic writing, hence acknowledging the occurrence of its arguments is paramount to achieve linguistic awareness and the writing prestige required by the academy.

Keywords: corpus, functional linguistics, grammatical metaphors, nominalizations, academic English

Procedia PDF Downloads 149

928 Writing the Roaming Female Self: Identity and Romantic Selfhood in Mary Wollstonecraft’s Letters Written during a Short Stay in Sweden, Denmark, and Norway (1796)

Authors: Kalyani Gandhi

Abstract:

The eighteenth century in Britain saw a great burst of activity in writing (letters, journals, newspapers, essays); often these modes of writing had a public-spirited bent in-step with the prevailing intellectual atmosphere. Mary Wollstonecraft was one of the leading intellectuals of that period who utilized letter writing to convey her thoughts on the exciting political developments of the late eighteenth century. Fusing together her anxieties and concerns about humanity in general and herself in particular, Wollstonecraft’s views of the world around her are filtered through the lens of her subjectivity. Thus, Wollstonecraft’s letters covered a wide range of topics on both the personal and political level (for the two are often entwined in Wollstonecraft’s characteristic style of analysis) such as sentiment, gender, nature, peasantry, the class system, the legal system, political duties and rights of both rulers and subjects, death, immortality, religion, family and education. Therefore, this paper intends to examine the manner in which Wollstonecraft utilizes letter-writing to constitute and develop Romantic self-hood, understand the world around her and illustrate her ideas on the political and social happenings in Europe. The primary text analyzed will be Mary Wollstonecraft's Letters Written During a Short Stay in Sweden, Denmark and Norway (1796) and the analysis of this text will be supplemented by researching 18th-century British letter writing culture, with a special emphasis on the epistolary habits of women. Within this larger framework, this paper intends to examine the manner in which this hybrid of travel and epistolary writing aided Mary Wollstonecraft's expression on Romantic selfhood and how it was complicated by ideas of gender. This paper reveals Wollstonecraft's text to be wrought with anxiety about the world around her and within her; thus, the personal-public nature of the epistolary format particularly suits her characteristic point of view that looks within and without. That is to say, Wollstonecraft’s anxieties about gender and self, are as much about the women she sees in the world around her as much as they are about her young daughter and herself. Wollstonecraft constantly explores and examines this anxiety within the different but interconnected realms of politics, economics, history and society. In fact, it is her complex technique of entwining these aforementioned concerns with a closer look at interpersonal relationships among men and women (she often mentions specific anecdotes and instances) that make Wollstonecraft's Letters so engaging and insightful. Thus, Wollstonecraft’s Letters is an exemplar of British Romantic writing due to the manner in which it explores the bond between the individual and society. Mary Wollstonecraft's nuances this exploration by incorporating her concerns about women and the playing out of gender in society. Thus, Wollstonecraft’s Letters is an invaluable contribution to the field of British Romanticism, particularly as it offers crucial insight on female Romantic writing that can broaden and enrich the current academic understanding of the field.

Keywords: British romanticism, letters, feminism, travel writing

Procedia PDF Downloads 218

927 Information and Communication Technology (ICT) Education Improvement for Enhancing Learning Performance and Social Equality

Authors: Heichia Wang, Yalan Chao

Abstract:

Social inequality is a persistent problem. One of the ways to solve this problem is through education. At present, vulnerable groups are often less geographically accessible to educational resources. However, compared with educational resources, communication equipment is easier for vulnerable groups. Now that information and communication technology (ICT) has entered the field of education, today we can accept the convenience that ICT provides in education, and the mobility that it brings makes learning independent of time and place. With mobile learning, teachers and students can start discussions in an online chat room without the limitations of time or place. However, because liquidity learning is quite convenient, people tend to solve problems in short online texts with lack of detailed information in a lack of convenient online environment to express ideas. Therefore, the ICT education environment may cause misunderstanding between teachers and students. Therefore, in order to better understand each other's views between teachers and students, this study aims to clarify the essays of the analysts and classify the students into several types of learning questions to clarify the views of teachers and students. In addition, this study attempts to extend the description of possible omissions in short texts by using external resources prior to classification. In short, by applying a short text classification, this study can point out each student's learning problems and inform the instructor where the main focus of the future course is, thus improving the ICT education environment. In order to achieve the goals, this research uses convolutional neural network (CNN) method to analyze short discussion content between teachers and students in an ICT education environment. Divide students into several main types of learning problem groups to facilitate answering student problems. In addition, this study will further cluster sub-categories of each major learning type to indicate specific problems for each student. Unlike most neural network programs, this study attempts to extend short texts with external resources before classifying them to improve classification performance. In short, by applying the classification of short texts, we can point out the learning problems of each student and inform the instructors where the main focus of future courses will improve the ICT education environment. The data of the empirical process will be used to pre-process the chat records between teachers and students and the course materials. An action system will be set up to compare the most similar parts of the teaching material with each student's chat history to improve future classification performance. Later, the function of short text classification uses CNN to classify rich chat records into several major learning problems based on theory-driven titles. By applying these modules, this research hopes to clarify the main learning problems of students and inform teachers that they should focus on future teaching.

Keywords: ICT education improvement, social equality, short text analysis, convolutional neural network

Procedia PDF Downloads 129

926 From Type-I to Type-II Fuzzy System Modeling for Diagnosis of Hepatitis

Authors: Shahabeddin Sotudian, M. H. Fazel Zarandi, I. B. Turksen

Abstract:

Hepatitis is one of the most common and dangerous diseases that affects humankind, and exposes millions of people to serious health risks every year. Diagnosis of Hepatitis has always been a challenge for physicians. This paper presents an effective method for diagnosis of hepatitis based on interval Type-II fuzzy. This proposed system includes three steps: pre-processing (feature selection), Type-I and Type-II fuzzy classification, and system evaluation. KNN-FD feature selection is used as the preprocessing step in order to exclude irrelevant features and to improve classification performance and efficiency in generating the classification model. In the fuzzy classification step, an “indirect approach” is used for fuzzy system modeling by implementing the exponential compactness and separation index for determining the number of rules in the fuzzy clustering approach. Therefore, we first proposed a Type-I fuzzy system that had an accuracy of approximately 90.9%. In the proposed system, the process of diagnosis faces vagueness and uncertainty in the final decision. Thus, the imprecise knowledge was managed by using interval Type-II fuzzy logic. The results that were obtained show that interval Type-II fuzzy has the ability to diagnose hepatitis with an average accuracy of 93.94%. The classification accuracy obtained is the highest one reached thus far. The aforementioned rate of accuracy demonstrates that the Type-II fuzzy system has a better performance in comparison to Type-I and indicates a higher capability of Type-II fuzzy system for modeling uncertainty.

Keywords: hepatitis disease, medical diagnosis, type-I fuzzy logic, type-II fuzzy logic, feature selection

Procedia PDF Downloads 307

925 An Interactive Online Academic Writing Resource for Research Students in Engineering

Authors: Eleanor K. P. Kwan

Abstract:

English academic writing, it has been argued, is an acquired language even for English speakers. For research students whose English is not their first language, however, the acquisition process is often more challenging. Instead of hoping that students would acquire the conventions themselves through extensive reading, there is a need for the explicit teaching of linguistic conventions in academic writing, as explicit teaching could help students to be more aware of the different generic conventions in different disciplines in science. This paper presents an interuniversity effort to develop an online academic writing resource for research students in five subdisciplines in engineering, upon the completion of the needs analysis which indicates that students and faculty members are more concerned about students’ ability to organize an extended text than about grammatical accuracy per se. In particular, this paper focuses on the materials developed for thesis writing (also called dissertation writing in some tertiary institutions), as theses form an essential graduation requirement for all research students and this genre is also expected to demonstrate the writer’s competence in research and contributions to the research community. Drawing on Swalesian move analysis of research articles, this online resource includes authentic materials written by students and faculty members from the participating institutes. Highlight will be given to several aspects and challenges of developing this online resource. First, as the online resource aims at moving beyond providing instructions on academic writing, a range of interactive activities need to be designed to engage the users, which is one feature which differentiates this online resource from other equally informative websites on academic writing. Second, it will also include discussion on divergent textual practices in different subdisciplines, which help to illustrate different practices among these subdisciplines. Third, since theses, probably one of the most extended texts a research student will complete, require effective use of signposting devices to facility readers’ understanding, this online resource will also provide both explanation and activities on different components that contribute to text coherence. Finally results from piloting will also be included to shed light on the effectiveness of the materials, which could be useful for future development.

Keywords: academic writing, English for academic purposes, online language learning materials, scientific writing

Procedia PDF Downloads 270

924 Shared Decision Making in Oropharyngeal Cancer: The Development of a Decision Aid for Resectable Oropharyngeal Carcinoma, a Mixed Methods Study

Authors: Anne N. Heirman, Lisette van der Molen, Richard Dirven, Gyorgi B. Halmos, Michiel W.M. van den Brekel

Abstract:

Background: Due to the rising incidence of oropharyngeal squamous cell cancer (OPSCC), many patients are challenged with choosing between transoral(robotic) surgery and radiotherapy, with equal survival and oncological outcomes. Also, functional outcomes are of little difference over the years. With this study, the wants and needs of patients and caregivers are identified to develop a comprehensible patient decision aid (PDA). Methods: The development of this PDA is based on the International Patient Decision Aid Standards criteria. In phase 1, relevant literature was reviewed and compared to current counseling papers. We interviewed ten post-treatment patients and ten doctors from four head and neck centers in the Netherlands, which were transcribed verbatim and analyzed. With these results, the first draft of the PDA was developed. Phase 2 beholds testing the first draft for comprehensibility and usability. Phase 3 beholds testing for feasibility. After this phase, the final version of the PDA was developed. Results: All doctors and patients agreed a PDA was needed. Phase 1 showed that 50% of patients felt well-informed after standard care and 35% missed information about treatment possibilities. Side effects and functional outcomes were rated as the most important for decision-making. With this information, the first version was developed. Doctors and patients stated (phase 2) that they were satisfied with the comprehensibility and usability, but there was too much text. The PDA underwent text reduction revisions and got more graphics. After revisions, all doctors found the PDA feasible and would contribute to regular counseling. Patients were satisfied with the results and wished they would have seen it before their treatment. Conclusion: Decision-making for OPSCC should focus on differences in side-effects and functional outcomes. Patients and doctors found the PDA to be of great value. Future research will explore the benefits of the PDA in clinical practice.

Keywords: head-and-neck oncology, oropharyngeal cancer, patient decision aid, development, shared decision making

Procedia PDF Downloads 144

923 Training a Neural Network to Segment, Detect and Recognize Numbers

Authors: Abhisek Dash

Abstract:

This study had three neural networks, one for number segmentation, one for number detection and one for number recognition all of which are coupled to one another. All networks were trained on the MNIST dataset and were convolutional. It was assumed that the images had lighter background and darker foreground. The segmentation network took 28x28 images as input and had sixteen outputs. Segmentation training starts when a dark pixel is encountered. Taking a window(7x7) over that pixel as focus, the eight neighborhood of the focus was checked for further dark pixels. The segmentation network was then trained to move in those directions which had dark pixels. To this end the segmentation network had 16 outputs. They were arranged as “go east”, ”don’t go east ”, “go south east”, “don’t go south east”, “go south”, “don’t go south” and so on w.r.t focus window. The focus window was resized into a 28x28 image and the network was trained to consider those neighborhoods which had dark pixels. The neighborhoods which had dark pixels were pushed into a queue in a particular order. The neighborhoods were then popped one at a time stitched to the existing partial image of the number one at a time and trained on which neighborhoods to consider when the new partial image was presented. The above process was repeated until the image was fully covered by the 7x7 neighborhoods and there were no more uncovered black pixels. During testing the network scans and looks for the first dark pixel. From here on the network predicts which neighborhoods to consider and segments the image. After this step the group of neighborhoods are passed into the detection network. The detection network took 28x28 images as input and had two outputs denoting whether a number was detected or not. Since the ground truth of the bounds of a number was known during training the detection network outputted in favor of number not found until the bounds were not met and vice versa. The recognition network was a standard CNN that also took 28x28 images and had 10 outputs for recognition of numbers from 0 to 9. This network was activated only when the detection network votes in favor of number detected. The above methodology could segment connected and overlapping numbers. Additionally the recognition unit was only invoked when a number was detected which minimized false positives. It also eliminated the need for rules of thumb as segmentation is learned. The strategy can also be extended to other characters as well.

Keywords: convolutional neural networks, OCR, text detection, text segmentation

Procedia PDF Downloads 163