Search results for: word segmentation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1170

Search results for: word segmentation

930 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof

Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba

Abstract:

In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.

Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof

Procedia PDF Downloads 129
929 A Morphological Analysis of Swardspeak in the Philippines

Authors: Carlo Gadingan

Abstract:

Swardspeak, as a language, highlights the exclusive identity of the Filipino gay men and the oppression they are confronted in the society. This paper presents a morphological analysis of swardspeak in the Philippines. Specifically, it aims to find out the common morphological processes involved in the construction of codes that may unmask the nature of swardspeak as a language. 30 purposively selected expert users of swardspeak from Luzon, Visayas, and Mindanao were asked to codify 30 natural words through the Facebook Messenger application. The results of the structural analysis affirm that swardspeak follows no specific rules revealing complicated combinations of clipping/stylized clipping, borrowing, connotation through images, connotation through actions, connotation through sounds, affixation, repetition, substitution, and simple reversal. Moreover, it was also found out that most of these word formation processes occur in all word classes which indicate that swardspeak is very unpredictable. Although different codes are used for the same words, there are still codes that are really common to all homosexuals and these are Chaka (ugly), Crayola (cry), and Aida (referring to a person with AIDS). Hence, the prevailing word formation processes explored may be termed as observed time-specific patterns because the codes documented in this study may turn obsolete and may be replaced with novel ones in a matter of weeks to month, knowing the creativity of homosexuals and the multiplicity of societal resources which can be used to make the codes more opaque and more confusing for non-homosexuals.

Keywords: codes, homosexuals, morphological processes, swardspeak

Procedia PDF Downloads 141
928 Analysis of Secondary School Students' Perceptions about Information Technologies through a Word Association Test

Authors: Fetah Eren, Ismail Sahin, Ismail Celik, Ahmet Oguz Akturk

Abstract:

The aim of this study is to discover secondary school students’ perceptions related to information technologies and the connections between concepts in their cognitive structures. A word association test consisting of six concepts related to information technologies is used to collect data from 244 secondary school students. Concept maps that present students’ cognitive structures are drawn with the help of frequency data. Data are analyzed and interpreted according to the connections obtained as a result of the concept maps. It is determined students associate most with these concepts—computer, Internet, and communication of the given concepts, and associate least with these concepts—computer-assisted education and information technologies. These results show the concepts, Internet, communication, and computer, are an important part of students’ cognitive structures. In addition, students mostly answer computer, phone, game, Internet and Facebook as the key concepts. These answers show students regard information technologies as a means for entertainment and free time activity, not as a means for education.

Keywords: word association test, cognitive structure, information technology, secondary school

Procedia PDF Downloads 396
927 Optimization Techniques for Microwave Structures

Authors: Malika Ourabia

Abstract:

A new and efficient method is presented for the analysis of arbitrarily shaped discontinuities. The discontinuities is characterized using a hybrid spectral/numerical technique. This structure presents an arbitrary number of ports, each one with different orientation and dimensions. This article presents a hybrid method based on multimode contour integral and mode matching techniques. The process is based on segmentation and dividing the structure into key building blocks. We use the multimode contour integral method to analyze the blocks including irregular shape discontinuities. Finally, the multimode scattering matrix of the whole structure can be found by cascading the blocks. Therefore, the new method is suitable for analysis of a wide range of waveguide problems. Therefore, the present approach can be applied easily to the analysis of any multiport junctions and cascade blocks. The accuracy of the method is validated comparing with results for several complex problems found in the literature. CPU times are also included to show the efficiency of the new method proposed.

Keywords: segmentation, s parameters, simulation, optimization

Procedia PDF Downloads 512
926 Investigating the Influences of Long-Term, as Compared to Short-Term, Phonological Memory on the Word Recognition Abilities of Arabic Readers vs. Arabic Native Speakers: A Word-Recognition Study

Authors: Insiya Bhalloo

Abstract:

It is quite common in the Muslim faith for non-Arabic speakers to be able to convert written Arabic, especially Quranic Arabic, into a phonological code without significant semantic or syntactic knowledge. This is due to prior experience learning to read the Quran (a religious text written in Classical Arabic), from a very young age such as via enrolment in Quranic Arabic classes. As compared to native speakers of Arabic, these Arabic readers do not have a comprehensive morpho-syntactic knowledge of the Arabic language, nor can understand, or engage in Arabic conversation. The study seeks to investigate whether mere phonological experience (as indicated by the Arabic readers’ experience with Arabic phonology and the sound-system) is sufficient to cause phonological-interference during word recognition of previously-heard words, despite the participants’ non-native status. Both native speakers of Arabic and non-native speakers of Arabic, i.e., those individuals that learned to read the Quran from a young age, will be recruited. Each experimental session will include two phases: An exposure phase and a test phase. During the exposure phase, participants will be presented with Arabic words (n=40) on a computer screen. Half of these words will be common words found in the Quran while the other half will be words commonly found in Modern Standard Arabic (MSA) but either non-existent or prevalent at a significantly lower frequency within the Quran. During the test phase, participants will then be presented with both familiar (n = 20; i.e., those words presented during the exposure phase) and novel Arabic words (n = 20; i.e., words not presented during the exposure phase. ½ of these presented words will be common Quranic Arabic words and the other ½ will be common MSA words but not Quranic words. Moreover, ½ the Quranic Arabic and MSA words presented will be comprised of nouns, while ½ the Quranic Arabic and MSA will be comprised of verbs, thereby eliminating word-processing issues affected by lexical category. Participants will then determine if they had seen that word during the exposure phase. This study seeks to investigate whether long-term phonological memory, such as via childhood exposure to Quranic Arabic orthography, has a differential effect on the word-recognition capacities of native Arabic speakers and Arabic readers; we seek to compare the effects of long-term phonological memory in comparison to short-term phonological exposure (as indicated by the presentation of familiar words from the exposure phase). The researcher’s hypothesis is that, despite the lack of lexical knowledge, early experience with converting written Quranic Arabic text into a phonological code will help participants recall the familiar Quranic words that appeared during the exposure phase more accurately than those that were not presented during the exposure phase. Moreover, it is anticipated that the non-native Arabic readers will also report more false alarms to the unfamiliar Quranic words, due to early childhood phonological exposure to Quranic Arabic script - thereby causing false phonological facilitatory effects.

Keywords: modern standard arabic, phonological facilitation, phonological memory, Quranic arabic, word recognition

Procedia PDF Downloads 338
925 Technique and Use of Machine Readable Dictionary: In Special Reference to Hindi-Marathi Machine Translation

Authors: Milind Patil

Abstract:

Present paper is a discussion on Hindi-Marathi Morphological Analysis and generating rules for Machine Translation on the basis of Machine Readable Dictionary (MRD). This used Transformative Generative Grammar (TGG) rules to design the MRD. As per TGG rules, the suffix of a particular root word is based on its Tense, Aspect, Modality and Voice. That's why the suffix is very important for the word meanings (or root meanings). The Hindi and Marathi Language both have relation with Indo-Aryan language family. Both have been derived from Sanskrit language and their script is 'Devnagari'. But there are lots of differences in terms of semantics and grammatical level too. In Marathi, there are three genders, but in Hindi only two (Masculine and Feminine), the Natural gender is absent in Hindi. Likewise other grammatical categories also differ in their level of use. For MRD the suffixes (or Morpheme) are of particular root word for GNP (Gender, Number and Person) are based on its natural phenomena. A particular Suffix and Morphine change as per the need of person, number and gender. The design of MRD also based on this format. In first, Person, Number, Gender and Tense are key points than root words and suffix of particular Person, Number Gender (PNG). After that the inferences are drawn on the basis of rules that is (V.stem) (Pre.T/Past.T) (x) + (Aux-Pre.T) (x) → (V.Stem.) + (SP.TM) (X).

Keywords: MRD, TGG, stem, morph, morpheme, suffix, PNG, TAM&V, root

Procedia PDF Downloads 304
924 A Self-Built Corpus-Based Study of Four-Word Lexical Bundles in Native English Teachers’ EFL Classroom Discourse in Northeast China: The Significance of Stance

Authors: Fang Tan

Abstract:

This research focuses on the appropriate use of lexical bundles in spoken discourse, particularly in English as a Foreign Language (EFL) classrooms in Northeast China. While previous studies have mainly examined lexical bundles in written discourse, there is a need to investigate their usage in spoken discourse due to the limited availability of spoken discourse corpora. English teachers’ use of lexical bundles is crucial for effective teaching and communication in the EFL classroom. The aim of this study is to investigate the functions of four-word lexical bundles in native English teachers’ EFL oral English classes in Northeast China. Specifically, the research focuses on the usage of stance bundles, which were found to be the most significant type of bundle in the analyzed corpus. By comparing the self-built university spoken English classroom discourse corpus with the other self-built university English for General Purposes (EGP) corpus, the study aims to highlight the difference in bundle usage between native and non-native teachers in EFL classrooms. The research employs a corpus-based study. The observed corpus consists of more than 300,000 tokens, in which the data has been collected in the past five years. The reference corpus is composed of over 800,000 tokens, in which the data has been collected over 12 years. All the primary data collection involved transcribing and annotating spoken English classes taught by native English teachers. The analysis procedures included identifying and categorizing four-word lexical bundles, with specific emphasis on stance bundles. Frequency counts, and comparisons with the Chinese English teachers’ corpus were conducted to identify patterns and differences in bundle usage. The research addresses the following questions: 1) What are the functions of four-word lexical bundles in native English teachers’ EFL oral English classes? 2) How do stance bundles differ in usage between native and non-native English teachers’ classes? 3) What implications can be drawn for English teachers’ professional development based on the findings? In conclusion, this study provides valuable insights into the usage of four-word lexical bundles, particularly stance bundles, in native English teachers’ EFL oral English classes in Northeast China. The research highlights the difference in bundle usage between native and non-native English teachers’ classes and provides implications for English teachers’ professional development. The findings contribute to the understanding of lexical bundle usage in EFL classroom discourse and have theoretical importance for language teaching methodologies. The self-built university English classroom discourse corpus used in this research is a valuable resource for future studies in this field.

Keywords: EFL classroom discourse, four-word lexical bundles, stance, implication

Procedia PDF Downloads 48
923 Embedded Visual Perception for Autonomous Agricultural Machines Using Lightweight Convolutional Neural Networks

Authors: René A. Sørensen, Søren Skovsen, Peter Christiansen, Henrik Karstoft

Abstract:

Autonomous agricultural machines act in stochastic surroundings and therefore, must be able to perceive the surroundings in real time. This perception can be achieved using image sensors combined with advanced machine learning, in particular Deep Learning. Deep convolutional neural networks excel in labeling and perceiving color images and since the cost of high-quality RGB-cameras is low, the hardware cost of good perception depends heavily on memory and computation power. This paper investigates the possibility of designing lightweight convolutional neural networks for semantic segmentation (pixel wise classification) with reduced hardware requirements, to allow for embedded usage in autonomous agricultural machines. Using compression techniques, a lightweight convolutional neural network is designed to perform real-time semantic segmentation on an embedded platform. The network is trained on two large datasets, ImageNet and Pascal Context, to recognize up to 400 individual classes. The 400 classes are remapped into agricultural superclasses (e.g. human, animal, sky, road, field, shelterbelt and obstacle) and the ability to provide accurate real-time perception of agricultural surroundings is studied. The network is applied to the case of autonomous grass mowing using the NVIDIA Tegra X1 embedded platform. Feeding case-specific images to the network results in a fully segmented map of the superclasses in the image. As the network is still being designed and optimized, only a qualitative analysis of the method is complete at the abstract submission deadline. Proceeding this deadline, the finalized design is quantitatively evaluated on 20 annotated grass mowing images. Lightweight convolutional neural networks for semantic segmentation can be implemented on an embedded platform and show competitive performance with regards to accuracy and speed. It is feasible to provide cost-efficient perceptive capabilities related to semantic segmentation for autonomous agricultural machines.

Keywords: autonomous agricultural machines, deep learning, safety, visual perception

Procedia PDF Downloads 373
922 A Comparative Approach to the Concept of Incarnation of God in Hinduism and Christianity

Authors: Cemil Kutluturk

Abstract:

This is a comparative study of the incarnation of God according to Hinduism and Christianity. After dealing with their basic ideas on the concept of the incarnation of God, the main similarities and differences between each other will be examined by quoting references from their sacred texts. In Hinduism, the term avatara is used in order to indicate the concept of the incarnation of God. The word avatara is derived from ava (down) and tri (to cross, to save, attain). Thus avatara means to come down or to descend. Although an avatara is commonly considered as an appearance of any deity on earth, the term refers particularly to descents of Vishnu. According to Hinduism, God becomes an avatara in every age and entering into diverse wombs for the sake of establishing righteousness. On the Christian side, the word incarnation means enfleshment. In Christianity, it is believed that the Logos or Word, the Second Person of Trinity, presumed human reality. Incarnation refers both to the act of God becoming a human being and to the result of his action, namely the permanent union of the divine and human natures in the one Person of the Word. When the doctrines of incarnation and avatara are compared some similarities and differences can be found between each other. The basic similarity is that both doctrines are not bound by the laws of nature as human beings are. They reveal God’s personal love and concern, and emphasize loving devotion. Their entry into the world is generally accompanied by extraordinary signs. In both cases, the descent of God allows for human beings to ascend to God. On the other hand, there are some distinctions between two religious traditions. For instance, according to Hinduism there are many and repeated avataras, while Christ comes only once. Indeed, this is related to the respective cyclic and linear worldviews of the two religions. Another difference is that in Hinduism avataras are real and perfect, while in Christianity Christ is also real, yet imperfect; that is, he has human imperfections, except sin. While Christ has never been thought of as a partial incarnation, in Hinduism there are some partial and full avataras. The other difference is that while the purpose of Christ is primarily ultimate salvation, not every avatara grants ultimate liberation, some of them come only to save a devotee from a specific predicament.

Keywords: Avatara, Christianity, Hinduism, incarnation

Procedia PDF Downloads 238
921 Adaptation of Projection Profile Algorithm for Skewed Handwritten Text Line Detection

Authors: Kayode A. Olaniyi, Tola. M. Osifeko, Adeola A. Ogunleye

Abstract:

Text line segmentation is an important step in document image processing. It represents a labeling process that assigns the same label using distance metric probability to spatially aligned units. Text line detection techniques have successfully been implemented mainly in printed documents. However, processing of the handwritten texts especially unconstrained documents has remained a key problem. This is because the unconstrained hand-written text lines are often not uniformly skewed. The spaces between text lines may not be obvious, complicated by the nature of handwriting and, overlapping ascenders and/or descenders of some characters. Hence, text lines detection and segmentation represents a leading challenge in handwritten document image processing. Text line detection methods that rely on the traditional global projection profile of the text document cannot efficiently confront with the problem of variable skew angles between different text lines. Hence, the formulation of a horizontal line as a separator is often not efficient. This paper presents a technique to segment a handwritten document into distinct lines of text. The proposed algorithm starts, by partitioning the initial text image into columns, across its width into chunks of about 5% each. At each vertical strip of 5%, the histogram of horizontal runs is projected. We have worked with the assumption that text appearing in a single strip is almost parallel to each other. The algorithm developed provides a sliding window through the first vertical strip on the left side of the page. It runs through to identify the new minimum corresponding to a valley in the projection profile. Each valley would represent the starting point of the orientation line and the ending point is the minimum point on the projection profile of the next vertical strip. The derived text-lines traverse around any obstructing handwritten vertical strips of connected component by associating it to either the line above or below. A decision of associating such connected component is made by the probability obtained from a distance metric decision. The technique outperforms the global projection profile for text line segmentation and it is robust to handle skewed documents and those with lines running into each other.

Keywords: connected-component, projection-profile, segmentation, text-line

Procedia PDF Downloads 106
920 Markov Random Field-Based Segmentation Algorithm for Detection of Land Cover Changes Using Uninhabited Aerial Vehicle Synthetic Aperture Radar Polarimetric Images

Authors: Mehrnoosh Omati, Mahmod Reza Sahebi

Abstract:

The information on land use/land cover changing plays an essential role for environmental assessment, planning and management in regional development. Remotely sensed imagery is widely used for providing information in many change detection applications. Polarimetric Synthetic aperture radar (PolSAR) image, with the discrimination capability between different scattering mechanisms, is a powerful tool for environmental monitoring applications. This paper proposes a new boundary-based segmentation algorithm as a fundamental step for land cover change detection. In this method, first, two PolSAR images are segmented using integration of marker-controlled watershed algorithm and coupled Markov random field (MRF). Then, object-based classification is performed to determine changed/no changed image objects. Compared with pixel-based support vector machine (SVM) classifier, this novel segmentation algorithm significantly reduces the speckle effect in PolSAR images and improves the accuracy of binary classification in object-based level. The experimental results on Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) polarimetric images show a 3% and 6% improvement in overall accuracy and kappa coefficient, respectively. Also, the proposed method can correctly distinguish homogeneous image parcels.

Keywords: coupled Markov random field (MRF), environment, object-based analysis, polarimetric SAR (PolSAR) images

Procedia PDF Downloads 201
919 Information Disclosure And Financial Sentiment Index Using a Machine Learning Approach

Authors: Alev Atak

Abstract:

In this paper, we aim to create a financial sentiment index by investigating the company’s voluntary information disclosures. We retrieve structured content from BIST 100 companies’ financial reports for the period 1998-2018 and extract relevant financial information for sentiment analysis through Natural Language Processing. We measure strategy-related disclosures and their cross-sectional variation and classify report content into generic sections using synonym lists divided into four main categories according to their liquidity risk profile, risk positions, intra-annual information, and exposure to risk. We use Word Error Rate and Cosin Similarity for comparing and measuring text similarity and derivation in sets of texts. In addition to performing text extraction, we will provide a range of text analysis options, such as the readability metrics, word counts using pre-determined lists (e.g., forward-looking, uncertainty, tone, etc.), and comparison with reference corpus (word, parts of speech and semantic level). Therefore, we create an adequate analytical tool and a financial dictionary to depict the importance of granular financial disclosure for investors to identify correctly the risk-taking behavior and hence make the aggregated effects traceable.

Keywords: financial sentiment, machine learning, information disclosure, risk

Procedia PDF Downloads 82
918 Computational Cell Segmentation in Immunohistochemically Image of Meningioma Tumor Using Fuzzy C-Means and Adaptive Vector Directional Filter

Authors: Vahid Anari, Leila Shahmohammadi

Abstract:

Diagnosing and interpreting manually from a large cohort dataset of immunohistochemically stained tissue of tumors using an optical microscope involves subjectivity and also is tedious for pathologist specialists. Moreover, digital pathology today represents more of an evolution than a revolution in pathology. In this paper, we develop and test an unsupervised algorithm that can automatically enhance the IHC image of a meningioma tumor and classify cells into positive (proliferative) and negative (normal) cells. A dataset including 150 images is used to test the scheme. In addition, a new adaptive color image enhancement method is proposed based on a vector directional filter (VDF) and statistical properties of filtering the window. Since the cells are distinguishable by the human eye, the accuracy and stability of the algorithm are quantitatively compared through application to a wide variety of real images.

Keywords: digital pathology, cell segmentation, immunohistochemically, noise reduction

Procedia PDF Downloads 50
917 A U-Net Based Architecture for Fast and Accurate Diagram Extraction

Authors: Revoti Prasad Bora, Saurabh Yadav, Nikita Katyal

Abstract:

In the context of educational data mining, the use case of extracting information from images containing both text and diagrams is of high importance. Hence, document analysis requires the extraction of diagrams from such images and processes the text and diagrams separately. To the author’s best knowledge, none among plenty of approaches for extracting tables, figures, etc., suffice the need for real-time processing with high accuracy as needed in multiple applications. In the education domain, diagrams can be of varied characteristics viz. line-based i.e. geometric diagrams, chemical bonds, mathematical formulas, etc. There are two broad categories of approaches that try to solve similar problems viz. traditional computer vision based approaches and deep learning approaches. The traditional computer vision based approaches mainly leverage connected components and distance transform based processing and hence perform well in very limited scenarios. The existing deep learning approaches either leverage YOLO or faster-RCNN architectures. These approaches suffer from a performance-accuracy tradeoff. This paper proposes a U-Net based architecture that formulates the diagram extraction as a segmentation problem. The proposed method provides similar accuracy with a much faster extraction time as compared to the mentioned state-of-the-art approaches. Further, the segmentation mask in this approach allows the extraction of diagrams of irregular shapes.

Keywords: computer vision, deep-learning, educational data mining, faster-RCNN, figure extraction, image segmentation, real-time document analysis, text extraction, U-Net, YOLO

Procedia PDF Downloads 114
916 Multi-Stage Classification for Lung Lesion Detection on CT Scan Images Applying Medical Image Processing Technique

Authors: Behnaz Sohani, Sahand Shahalinezhad, Amir Rahmani, Aliyu Aliyu

Abstract:

Recently, medical imaging and specifically medical image processing is becoming one of the most dynamically developing areas of medical science. It has led to the emergence of new approaches in terms of the prevention, diagnosis, and treatment of various diseases. In the process of diagnosis of lung cancer, medical professionals rely on computed tomography (CT) scans, in which failure to correctly identify masses can lead to incorrect diagnosis or sampling of lung tissue. Identification and demarcation of masses in terms of detecting cancer within lung tissue are critical challenges in diagnosis. In this work, a segmentation system in image processing techniques has been applied for detection purposes. Particularly, the use and validation of a novel lung cancer detection algorithm have been presented through simulation. This has been performed employing CT images based on multilevel thresholding. The proposed technique consists of segmentation, feature extraction, and feature selection and classification. More in detail, the features with useful information are selected after featuring extraction. Eventually, the output image of lung cancer is obtained with 96.3% accuracy and 87.25%. The purpose of feature extraction applying the proposed approach is to transform the raw data into a more usable form for subsequent statistical processing. Future steps will involve employing the current feature extraction method to achieve more accurate resulting images, including further details available to machine vision systems to recognise objects in lung CT scan images.

Keywords: lung cancer detection, image segmentation, lung computed tomography (CT) images, medical image processing

Procedia PDF Downloads 73
915 Method for Improving ICESAT-2 ATL13 Altimetry Data Utility on Rivers

Authors: Yun Chen, Qihang Liu, Catherine Ticehurst, Chandrama Sarker, Fazlul Karim, Dave Penton, Ashmita Sengupta

Abstract:

The application of ICESAT-2 altimetry data in river hydrology critically depends on the accuracy of the mean water surface elevation (WSE) at a virtual station (VS) where satellite observations intersect with water. The ICESAT-2 track generates multiple VSs as it crosses the different water bodies. The difficulties are particularly pronounced in large river basins where there are many tributaries and meanders often adjacent to each other. One challenge is to split photon segments along a beam to accurately partition them to extract only the true representative water height for individual elements. As far as we can establish, there is no automated procedure to make this distinction. Earlier studies have relied on human intervention or river masks. Both approaches are unsatisfactory solutions where the number of intersections is large, and river width/extent changes over time. We describe here an automated approach called “auto-segmentation”. The accuracy of our method was assessed by comparison with river water level observations at 10 different stations on 37 different dates along the Lower Murray River, Australia. The congruence is very high and without detectable bias. In addition, we compared different outlier removal methods on the mean WSE calculation at VSs post the auto-segmentation process. All four outlier removal methods perform almost equally well with the same R2 value (0.998) and only subtle variations in RMSE (0.181–0.189m) and MAE (0.130–0.142m). Overall, the auto-segmentation method developed here is an effective and efficient approach to deriving accurate mean WSE at river VSs. It provides a much better way of facilitating the application of ICESAT-2 ATL13 altimetry to rivers compared to previously reported studies. Therefore, the findings of our study will make a significant contribution towards the retrieval of hydraulic parameters, such as water surface slope along the river, water depth at cross sections, and river channel bathymetry for calculating flow velocity and discharge from remotely sensed imagery at large spatial scales.

Keywords: lidar sensor, virtual station, cross section, mean water surface elevation, beam/track segmentation

Procedia PDF Downloads 45
914 Probing Language Models for Multiple Linguistic Information

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, large-scale pre-trained language models have achieved state-of-the-art performance on a variety of natural language processing tasks. The word vectors produced by these language models can be viewed as dense encoded presentations of natural language that in text form. However, it is unknown how much linguistic information is encoded and how. In this paper, we construct several corresponding probing tasks for multiple linguistic information to clarify the encoding capabilities of different language models and performed a visual display. We firstly obtain word presentations in vector form from different language models, including BERT, ELMo, RoBERTa and GPT. Classifiers with a small scale of parameters and unsupervised tasks are then applied on these word vectors to discriminate their capability to encode corresponding linguistic information. The constructed probe tasks contain both semantic and syntactic aspects. The semantic aspect includes the ability of the model to understand semantic entities such as numbers, time, and characters, and the grammatical aspect includes the ability of the language model to understand grammatical structures such as dependency relationships and reference relationships. We also compare encoding capabilities of different layers in the same language model to infer how linguistic information is encoded in the model.

Keywords: language models, probing task, text presentation, linguistic information

Procedia PDF Downloads 84
913 Contextual Senses of Ambiguous Words Based on Cognitive Semantics

Authors: Madhavi

Abstract:

All linguistic units are context-dependent. They occur in particular settings, from which they derive much of their import, and are recognized by speakers as distinct entities only through a process of abstraction. Most of the words have several concepts associated with them and convey a number of meanings in different contexts in any language. For instance, there are different uses of the word good as an adjective from English. The adjective good expresses many senses like (1) ‘high quality of someone or something’ (2) ‘efficient’ (3) ‘virtuous’ (4) ‘reliable’ etc. These senses will be analyzed by using cognitive semantics framework. The context has the power to insulate one meaning from all the other meanings in communication. This paper will provide a cognitive semantic analysis. The basic tenet of cognitive semantics is the sense of a word is the way we conceptualize it. Our conceptualization is based on the physical experience we go through. Cognitive semantics tries to capture this conceptualization in terms of some categories like schema, frame, and domain. Cognitive semantics is a subfield of cognitive linguistics. Cognitive linguistics studies the language creation, learning, and usage by the reference to human cognition. The semantic structure is conceptual structure which is related to the concepts which are the elements of reason and constitute the meanings of words and linguistic expressions. Cognitive semantics studies how our mind works for the meaning of any word and how it perceives meaning from the environment through senses and works to map with the knowledge which already exists in our mind through experience. In the present paper, the senses are further classified into some categories.

Keywords: cognitive, contexts, semantics, senses

Procedia PDF Downloads 203
912 Selection of Strategic Suppliers for Partnership: A Model with Two Stages Approach

Authors: Safak Isik, Ozalp Vayvay

Abstract:

Strategic partnerships with suppliers play a vital role for the long-term value-based supply chain. This strategic collaboration keeps still being one of the top priority of many business organizations in order to create more additional value; benefiting mainly from supplier’s specialization, capacity and innovative power, securing supply and better managing costs and quality. However, many organizations encounter difficulties in initiating, developing and managing those partnerships and many attempts result in failures. One of the reasons for such failure is the incompatibility of members of this partnership or in other words wrong supplier selection which emphasize the significance of the selection process since it is the beginning stage. An effective selection process of strategic suppliers is critical to the success of the partnership. Although there are several research studies to select the suppliers in literature, only a few of them is related to strategic supplier selection for long-term partnership. The purpose of this study is to propose a conceptual model for the selection of strategic partnership suppliers. A two-stage approach has been used in proposed model incorporating first segmentation and second selection. In the first stage; considering the fact that not all suppliers are strategically equal and instead of a long list of potential suppliers, Kraljic’s purchasing portfolio matrix can be used for segmentation. This supplier segmentation is the process of categorizing suppliers based on a defined set of criteria in order to identify types of suppliers and determine potential suppliers for strategic partnership. In the second stage, from a pool of potential suppliers defined at first phase, a comprehensive evaluation and selection can be performed to finally define strategic suppliers considering various tangible and intangible criteria. Since a long-term relationship with strategic suppliers is anticipated, criteria should consider both current and future status of the supplier. Based on an extensive literature review; strategical, operational and organizational criteria have been determined and elaborated. The result of the selection can also be used to determine suppliers who are not ready for a partnership but to be developed for strategic partnership. Since the model is based on multiple criteria for both stages, it provides a framework for further utilization of Multi-Criteria Decision Making (MCDM) techniques. The model may also be applied to a wide range of industries and involve managerial features in business organizations.

Keywords: Kraljic’s matrix, purchasing portfolio, strategic supplier selection, supplier collaboration, supplier partnership, supplier segmentation

Procedia PDF Downloads 227
911 The Facilitatory Effect of Phonological Priming on Visual Word Recognition in Arabic as a Function of Lexicality and Overlap Positions

Authors: Ali Al Moussaoui

Abstract:

An experiment was designed to assess the performance of 24 Lebanese adults (mean age 29:5 years) in a lexical decision making (LDM) task to find out how the facilitatory effect of phonological priming (PP) affects the speed of visual word recognition in Arabic as lexicality (wordhood) and phonological overlap positions (POP) vary. The experiment falls in line with previous research on phonological priming in the light of the cohort theory and in relation to visual word recognition. The experiment also departs from the research on the Arabic language in which the importance of the consonantal root as a distinct morphological unit is confirmed. Based on previous research, it is hypothesized that (1) PP has a facilitating effect in LDM with words but not with nonwords and (2) final phonological overlap between the prime and the target is more facilitatory than initial overlap. An LDM task was programmed on PsychoPy application. Participants had to decide if a target (e.g., bayn ‘between’) preceded by a prime (e.g., bayt ‘house’) is a word or not. There were 4 conditions: no PP (NP), nonwords priming nonwords (NN), nonwords priming words (NW), and words priming words (WW). The conditions were simultaneously controlled for word length, wordhood, and POP. The interstimulus interval was 700 ms. Within the PP conditions, POP was controlled for in which there were 3 overlap positions between the primes and the targets: initial (e.g., asad ‘lion’ and asaf ‘sorrow’), final (e.g., kattab ‘cause to write’ 2sg-mas and rattab ‘organize’ 2sg-mas), or two-segmented (e.g., namle ‘ant’ and naħle ‘bee’). There were 96 trials, 24 in each condition, using a within-subject design. The results show that concerning (1), the highest average reaction time (RT) is that in NN, followed firstly by NW and finally by WW. There is statistical significance only between the pairs NN-NW and NN-WW. Regarding (2), the shortest RT is that in the two-segmented overlap condition, followed by the final POP in the first place and the initial POP in the last place. The difference between the two-segmented and the initial overlap is significant, while other pairwise comparisons are not. Based on these results, PP emerges as a facilitatory phenomenon that is highly sensitive to lexicality and POP. While PP can have a facilitating effect under lexicality, it shows no facilitation in its absence, which intersects with several previous findings. Participants are found to be more sensitive to the final phonological overlap than the initial overlap, which also coincides with a body of earlier literature. The results contradict the cohort theory’s stress on the onset overlap position and, instead, give more weight to final overlap, and even heavier weight to the two-segmented one. In conclusion, this study confirms the facilitating effect of PP with words but not when stimuli (at least the primes and at most both the primes and targets) are nonwords. It also shows that the two-segmented priming is the most influential in LDM in Arabic.

Keywords: lexicality, phonological overlap positions, phonological priming, visual word recognition

Procedia PDF Downloads 167
910 The Conflict of Grammaticality and Meaningfulness of the Corrupt Words: A Cross-lingual Sociolinguistic Study

Authors: Jayashree Aanand, Gajjam

Abstract:

The grammatical tradition in Sanskrit literature emphasizes the importance of the correct use of Sanskrit words or linguistic units (sādhu śabda) that brings the meritorious values, denying the attribution of the same religious merit to the incorrect use of Sanskrit words (asādhu śabda) or the vernacular or corrupt forms (apa-śabda or apabhraṁśa), even though they may help in communication. The current research, the culmination of the doctoral research on sentence definition, studies the difference among the comprehension of both correct and incorrect word forms in Sanskrit and Marathi languages in India. Based on the total of 19 experiments (both web-based and classroom-controlled) on approximately 900 Indian readers, it is found that while the incorrect forms in Sanskrit are comprehended with lesser accuracy than the correct word forms, no such difference can be seen for the Marathi language. It is interpreted that the incorrect word forms in the native language or in the language which is spoken daily (such as Marathi) will pose a lesser cognitive load as compared to the language that is not spoken on a daily basis but only used for reading (such as Sanskrit). The theoretical base for the research problem is as follows: among the three main schools of Language Science in ancient India, the Vaiyākaraṇas (Grammarians) hold that the corrupt word forms do have their own expressive power since they convey meaning, while as the Mimāṁsakas (the Exegesists) and the Naiyāyikas (the Logicians) believe that the corrupt forms can only convey the meaning indirectly, by recalling their association and similarity with the correct forms. The grammarians argue that the vernaculars that are born of the speaker’s inability to speak proper Sanskrit are regarded as degenerate versions or fallen forms of the ‘divine’ Sanskrit language and speakers who could not use proper Sanskrit or the standard language were considered as Śiṣṭa (‘elite’). The different ideas of different schools strictly adhere to their textual dispositions. For the last few years, sociolinguists have agreed that no variety of language is inherently better than any other; they are all the same as long as they serve the need of people that use them. Although the standard form of a language may offer the speakers some advantages, the non-standard variety is considered the most natural style of speaking. This is visible in the results. If the incorrect word forms incur the recall of the correct word forms in the reader as the theory suggests, it would have added one extra step in the process of sentential cognition leading to more cognitive load and less accuracy. This has not been the case for the Marathi language. Although speaking and listening to the vernaculars is the common practice and reading the vernacular is not, Marathi readers have readily and accurately comprehended the incorrect word forms in the sentences, as against the Sanskrit readers. The primary reason being Sanskrit is spoken and also read in the standard form only and the vernacular forms in Sanskrit are not found in the conversational data.

Keywords: experimental sociolinguistics, grammaticality and meaningfulness, Marathi, Sanskrit

Procedia PDF Downloads 110
909 Morphological Rules of Bangla Repetition Words for UNL Based Machine Translation

Authors: Nawab Yousuf Ali, S. Golam, A. Ameer, Ashok Toru Roy

Abstract:

This paper develops new morphological rules suitable for Bangla repetition words to be incorporated into an inter lingua representation called Universal Networking Language (UNL). The proposed rules are to be used to combine verb roots and their inflexions to produce words which are then combined with other similar types of words to generate repetition words. This paper outlines the format of morphological rules for different types of repetition words that come from verb roots based on the framework of UNL provided by the UNL centre of the Universal Networking Digital Language (UNDL) foundation.

Keywords: Universal Networking Language (UNL), universal word (UW), head word (HW), Bangla-UNL Dictionary, morphological rule, enconverter (EnCo)

Procedia PDF Downloads 296
908 Geographic Information System and Dynamic Segmentation of Very High Resolution Images for the Semi-Automatic Extraction of Sandy Accumulation

Authors: A. Bensaid, T. Mostephaoui, R. Nedjai

Abstract:

A considerable area of Algerian lands is threatened by the phenomenon of wind erosion. For a long time, wind erosion and its associated harmful effects on the natural environment have posed a serious threat, especially in the arid regions of the country. In recent years, as a result of increases in the irrational exploitation of natural resources (fodder) and extensive land clearing, wind erosion has particularly accentuated. The extent of degradation in the arid region of the Algerian Mecheria department generated a new situation characterized by the reduction of vegetation cover, the decrease of land productivity, as well as sand encroachment on urban development zones. In this study, we attempt to investigate the potential of remote sensing and geographic information systems for detecting the spatial dynamics of the ancient dune cords based on the numerical processing of LANDSAT images (5, 7, and 8) of three scenes 197/37, 198/36 and 198/37 for the year 2020. As a second step, we prospect the use of geospatial techniques to monitor the progression of sand dunes on developed (urban) lands as well as on the formation of sandy accumulations (dune, dunes fields, nebkha, barkhane, etc.). For this purpose, this study made use of the semi-automatic processing method for the dynamic segmentation of images with very high spatial resolution (SENTINEL-2 and Google Earth). This study was able to demonstrate that urban lands under current conditions are located in sand transit zones that are mobilized by the winds from the northwest and southwest directions.

Keywords: land development, GIS, segmentation, remote sensing

Procedia PDF Downloads 133
907 Optical Coherence Tomography in Parkinson’s Disease: A Potential in-vivo Retinal α-Synuclein Biomarker in Parkinson’s Disease

Authors: Jessica Chorostecki, Aashka Shah, Fen Bao, Ginny Bao, Edwin George, Navid Seraji-Bozorgzad, Veronica Gorden, Christina Caon, Elliot Frohman

Abstract:

Background: Parkinson’s Disease (PD) is a neuro degenerative disorder associated with the loss of dopaminergic cells and the presence α-synuclein (AS) aggregation in of Lewy bodies. Both dopaminergic cells and AS are found in the retina. Optical coherence tomography (OCT) allows high-resolution in-vivo examination of retinal structure injury in neuro degenerative disorders including PD. Methods: We performed a cross-section OCT study in patients with definite PD and healthy controls (HC) using Spectral Domain SD-OCT platform to measure the peripapillary retinal nerve fiber layer (pRNFL) thickness and total macular volume (TMV). We performed intra-retinal segmentation with fully automated segmentation software to measure the volume of the RNFL, ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), and the outer nuclear layer (ONL). Segmentation was performed blinded to the clinical status of the study participants. Results: 101 eyes from 52 PD patients (mean age 65.8 years) and 46 eyes from 24 HC subjects (mean age 64.1 years) were included in the study. The mean pRNFL thickness was not significantly different (96.95 μm vs 94.42 μm, p=0.07) but the TMV was significantly lower in PD compared to HC (8.33 mm3 vs 8.58 mm3 p=0.0002). Intra-retinal segmentation showed no significant difference in the RNFL volume between the PD and HC groups (0.95 mm3 vs 0.92 mm3 p=0.454). However, GCL, IPL, INL, and ONL volumes were significantly reduced in PD compared to HC. In contrast, the volume of OPL was significantly increased in PD compared to HC. Conclusions: Our finding of the enlarged OPL corresponds with mRNA expression studies showing localization of AS in the OPL across vertebrate species and autopsy studies demonstrating AS aggregation in the deeper layers of retina in PD. We propose that the enlargement of the OPL may represent a potential biomarker of AS aggregation in PD. Longitudinal studies in larger cohorts are warranted to confirm our observations that may have significant implications in disease monitoring and therapeutic development.

Keywords: Optical Coherence Tomography, biomarker, Parkinson's disease, alpha-synuclein, retina

Procedia PDF Downloads 418
906 The Development of Chinese-English Homophonic Word Pairs Databases for English Teaching and Learning

Authors: Yuh-Jen Wu, Chun-Min Lin

Abstract:

Homophonic words are common in Mandarin Chinese which belongs to the tonal language family. Using homophonic cues to study foreign languages is one of the learning techniques of mnemonics that can aid the retention and retrieval of information in the human memory. When learning difficult foreign words, some learners transpose them with words in a language they are familiar with to build an association and strengthen working memory. These phonological clues are beneficial means for novice language learners. In the classroom, if mnemonic skills are used at the appropriate time in the instructional sequence, it may achieve their maximum effectiveness. For Chinese-speaking students, proper use of Chinese-English homophonic word pairs may help them learn difficult vocabulary. In this study, a database program is developed by employing Visual Basic. The database contains two corpora, one with Chinese lexical items and the other with English ones. The Chinese corpus contains 59,053 Chinese words that were collected by a web crawler. The pronunciations of this group of words are compared with words in an English corpus based on WordNet, a lexical database for the English language. Words in both databases with similar pronunciation chunks and batches are detected. A total of approximately 1,000 Chinese lexical items are located in the preliminary comparison. These homophonic word pairs can serve as a valuable tool to assist Chinese-speaking students in learning and memorizing new English vocabulary.

Keywords: Chinese, corpus, English, homophonic words, vocabulary

Procedia PDF Downloads 159
905 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 273
904 Robust Segmentation of Salient Features in Automatic Breast Ultrasound (ABUS) Images

Authors: Lamees Nasser, Yago Diez, Robert Martí, Joan Martí, Ibrahim Sadek

Abstract:

Automated 3D breast ultrasound (ABUS) screening is a novel modality in medical imaging because of its common characteristics shared with other ultrasound modalities in addition to the three orthogonal planes (i.e., axial, sagittal, and coronal) that are useful in analysis of tumors. In the literature, few automatic approaches exist for typical tasks such as segmentation or registration. In this work, we deal with two problems concerning ABUS images: nipple and rib detection. Nipple and ribs are the most visible and salient features in ABUS images. Determining the nipple position plays a key role in some applications for example evaluation of registration results or lesion follow-up. We present a nipple detection algorithm based on color and shape of the nipple, besides an automatic approach to detect the ribs. In point of fact, rib detection is considered as one of the main stages in chest wall segmentation. This approach consists of four steps. First, images are normalized in order to minimize the intensity variability for a given set of regions within the same image or a set of images. Second, the normalized images are smoothed by using anisotropic diffusion filter. Next, the ribs are detected in each slice by analyzing the eigenvalues of the 3D Hessian matrix. Finally, a breast mask and a probability map of regions detected as ribs are used to remove false positives (FP). Qualitative and quantitative evaluation obtained from a total of 22 cases is performed. For all cases, the average and standard deviation of the root mean square error (RMSE) between manually annotated points placed on the rib surface and detected points on rib borders are 15.1188 mm and 14.7184 mm respectively.

Keywords: Automated 3D Breast Ultrasound, Eigenvalues of Hessian matrix, Nipple detection, Rib detection

Procedia PDF Downloads 315
903 A Combined Feature Extraction and Thresholding Technique for Silence Removal in Percussive Sounds

Authors: B. Kishore Kumar, Pogula Rakesh, T. Kishore Kumar

Abstract:

The music analysis is a part of the audio content analysis used to analyze the music by using the different features of audio signal. In music analysis, the first step is to divide the music signal to different sections based on the feature profiles of the music signal. In this paper, we present a music segmentation technique that will effectively segmentize the signal and thresholding technique to remove silence from the percussive sounds produced by percussive instruments, which uses two features of music, namely signal energy and spectral centroid. The proposed method impose thresholds on both the features which will vary depends on the music signal. Depends on the threshold, silence part is removed and the segmentation is done. The effectiveness of the proposed method is analyzed using MATLAB.

Keywords: percussive sounds, spectral centroid, spectral energy, silence removal, feature extraction

Procedia PDF Downloads 570
902 Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm

Authors: Suparman

Abstract:

Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.

Keywords: piecewise regression, bayesian, reversible jump MCMC, segmentation

Procedia PDF Downloads 354
901 Polish Catholic Discourse on Gender Equality in the Face of Social and Cultural Changes in Poland

Authors: Anna Jagielska

Abstract:

Five years ago, the word ‘gender’ was discussed in Poland exclusively in academic contexts. One year later, it was chosen as the word of the year and omnipresent in the Polish media. The rapid career of this word is due to the involvement of the Polish church hierarchy who strategically brought this term into relation with abortion, pornography and paedophilia. ‘Gender’ is more than a political slogan. It is a symbol of social anxiety and moral panic in Poland which need to be historically considered. The aim of this paper is to present selected rhetorical strategies used by the Polish Catholic clergy who strive to have an impact on the current gender discourse in Poland. In particular, the gender debate, culminated in the pastoral letter of the Bishops' Conference of Poland, will be discussed. The church’s protest against the Council of Europe’s Convention on Preventing and Combating Violence against Women and Domestic Violence will be analyzed and the recent heated debates in Poland on contraception, abortion, in vitro fertilization, and sex education will be mentioned. To provide explanations on the specificity of Polish gender debates the role of the Catholic Church in the fall of communism in Poland as well as the charismatisation of Polish society by Pope John Paul II will be explained. The social constructions of communism and feminism which are manifested in both written and symbolic contracts on gender equality between the Church and the State will be demonstrated. At the end of the paper, theories about the changing role of religion in society will be applied.

Keywords: gender, Poland, religion, catholicism, feminism

Procedia PDF Downloads 272