Search results for: speech language pathologist
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4171

Search results for: speech language pathologist

4111 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Felix Bankole, Tomio Takara, Girma Mamo

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. In this paper, we proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions, and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test and we achieved an average Mean Opinion Score (MOS) 3.4 (68%) which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: Amharic, gemination, speech synthesis, morphology, epenthesis

Procedia PDF Downloads 87
4110 Effects of Exposing Learners to Speech Acts in the German Teaching Material Schritte International: The Case of Requests

Authors: Wan-Lin Tsai

Abstract:

Speech act of requests is an important issue in the field of language learning and teaching because we cannot avoid making requesting in our daily life. This study examined whether or not the subjects who were freshmen and majored in German at Wenzao University of Languages were able to use the linguistic forms which they had learned from their course book Schritte International to make appropriate requests through dialogue completed tasks (DCT). The results revealed that the majority of the subjects were unable to use the forms to make appropriate requests in German due to the lack of explicit instructions. Furthermore, Chinese interference was observed in students' productions. Explicit instructions in speech acts are strongly recommended.

Keywords: Chinese interference, German pragmatics, German teaching, make appropriate requests in German, speech act of requesting

Procedia PDF Downloads 465
4109 Evaluating Perceived Usability of ProxTalker App Using Arabic Standard Usability Scale: A Student's Perspective

Authors: S. AlBustan, B. AlGhannam

Abstract:

This oral presentation discusses a proposal for a study that evaluates the usability of an evidence based application named ProxTalker App. The significance of this study will inform administration and faculty staff at the Department of Communication Sciences Disorders (CDS), College of Life Sciences, Kuwait University whether the app is a suitable tool to use for CDS students. A case study will be used involving a sample of CDS students taking practicum and internship courses during the academic year 2018/2019. The study will follow a process used by previous study. The process of calculating SUS is well documented and will be followed. ProxTalker App is an alternative and augmentative tool that speech language pathologist (SLP) can use to customize boards for their clients. SLPs can customize different boards using this app for various activities. A board can be created by the SLP to improve and support receptive and expressive language. Using technology to support therapy can aid SLPs to integrate this ProxTalker App as part of their clients therapy. Supported tools, games and motivation are some advantages of incorporating apps during therapy sessions. A quantitative methodology will be used. It involves the utilization of a standard tool that was the was adapted to the Arabic language to accommodate native Arabic language users. The tool that will be utilized in this research is the Arabic Standard Usability Scale (A-SUS) questionnaire which is an adoption of System Usability Scale (SUS). Standard usability questionnaires are reliable, valid and their process is properly documented. This study builds upon the development of A-SUS, which is a psychometrically evaluated questionnaire that targets Arabic native speakers. Results of the usability will give preliminary indication of whether the ProxTalker App under investigation is appropriate to be integrated within the practicum and internship curriculum of CDS. The results of this study will inform the CDS department of this specific app is an appropriate tool to be used for our specific students within our environment because usability depends on the product, environment, and users.

Keywords: A-SUS, communication disorders practicum, evidence based app, Standard Usability Scale

Procedia PDF Downloads 156
4108 Google Translate: AI Application

Authors: Shaima Almalhan, Lubna Shukri, Miriam Talal, Safaa Teskieh

Abstract:

Since artificial intelligence is a rapidly evolving topic that has had a significant impact on technical growth and innovation, this paper examines people's awareness, use, and engagement with the Google Translate application. To see how familiar aware users are with the app and its features, quantitative and qualitative research was conducted. The findings revealed that consumers have a high level of confidence in the application and how far people they benefit from this sort of innovation and how convenient it makes communication.

Keywords: artificial intelligence, google translate, speech recognition, language translation, camera translation, speech to text, text to speech

Procedia PDF Downloads 154
4107 Management of Dysphagia after Supra Glottic Laryngectomy

Authors: Premalatha B. S., Shenoy A. M.

Abstract:

Background: Rehabilitation of swallowing is as vital as speech in surgically treated head and neck cancer patients to maintain nutritional support, enhance wound healing and improve quality of life. Aspiration following supraglottic laryngectomy is very common, and rehabilitation of the same is crucial which requires involvement of speech therapist in close contact with head and neck surgeon. Objectives: To examine the functions of swallowing outcomes after intensive therapy in supraglottic laryngectomy. Materials: Thirty-nine supra glottic laryngectomees were participated in the study. Of them, 36 subjects were males and 3 were females, in the age range of 32-68 years. Eighteen subjects had undergone standard supra glottis laryngectomy (Group1) for supraglottic lesions where as 21 of them for extended supraglottic laryngectomy (Group 2) for base tongue and lateral pharyngeal wall lesion. Prior to surgery visit by speech pathologist was mandatory to assess the sutability for surgery and rehabilitation. Dysphagia rehabilitation started after decannulation of tracheostoma by focusing on orientation about anatomy, physiological variation before and after surgery, which was tailor made for each individual based on their type and extent of surgery. Supraglottic diet - Soft solid with supraglottic swallow method was advocated to prevent aspiration. The success of intervention was documented as number of sessions taken to swallow different food consistency and also percentage of subjects who achieved satisfactory swallow in terms of number of weeks in both the groups. Results: Statistical data was computed in two ways in both the groups 1) to calculate percentage (%) of subjects who swallowed satisfactorily in the time frame of less than 3 weeks to more than 6 weeks, 2) number of sessions taken to swallow without aspiration as far as food consistency was concerned. The study indicated that in group 1 subjects of standard supraglottic laryngectomy, 61% (n=11) of them were successfully rehabilitated but their swallowing normalcy was delayed by an average 29th post operative day (3-6 weeks). Thirty three percentages (33%) (n=6) of the subjects could swallow satisfactorily without aspiration even before 3 weeks and only 5 % (n=1) of the needed more than 6 weeks to achieve normal swallowing ability. Group 2 subjects of extended SGL only 47 %( n=10) of them could achieved satisfactory swallow by 3-6 weeks and 24% (n=5) of them of them achieved normal swallowing ability before 3 weeks. Around 4% (n=1) needed more than 6 weeks and as high as 24 % (n=5) of them continued to be supplemented with naso gastric feeding even after 8-10 months post operative as they exhibited severe aspiration. As far as type of food consistencies were concerned group 1 subject could able to swallow all types without aspiration much earlier than group 2 subjects. Group 1 needed only 8 swallowing therapy sessions for thickened soft solid and 15 sessions for liquids whereas group 2 required 14 sessions for soft solid and 17 sessions for liquids to achieve swallowing normalcy without aspiration. Conclusion: The study highlights the importance of dysphagia intervention in supraglottic laryngectomees by speech pathologist.

Keywords: dysphagia management, supraglotic diet, supraglottic laryngectomy, supraglottic swallow

Procedia PDF Downloads 231
4106 Effect of Classroom Acoustic Factors on Language and Cognition in Bilinguals and Children with Mild to Moderate Hearing Loss

Authors: Douglas MacCutcheon, Florian Pausch, Robert Ljung, Lorna Halliday, Stuart Rosen

Abstract:

Contemporary classrooms are increasingly inclusive of children with mild to moderate disabilities and children from different language backgrounds (bilinguals, multilinguals), but classroom environments and standards have not yet been adapted adequately to meet these challenges brought about by this inclusivity. Additionally, classrooms are becoming noisier as a learner-centered as opposed to teacher-centered teaching paradigm is adopted, which prioritizes group work and peer-to-peer learning. Challenging listening conditions with distracting sound sources and background noise are known to have potentially negative effects on children, particularly those that are prone to struggle with speech perception in noise. Therefore, this research investigates two groups vulnerable to these environmental effects, namely children with a mild to moderate hearing loss (MMHLs) and sequential bilinguals learning in their second language. In the MMHL study, this group was assessed on speech-in-noise perception, and a number of receptive language and cognitive measures (auditory working memory, auditory attention) and correlations were evaluated. Speech reception thresholds were found to be predictive of language and cognitive ability, and the nature of correlations is discussed. In the bilinguals study, sequential bilingual children’s listening comprehension, speech-in-noise perception, listening effort and release from masking was evaluated under a number of different ecologically valid acoustic scenarios in order to pinpoint the extent of the ‘native language benefit’ for Swedish children learning in English, their second language. Scene manipulations included target-to-distractor ratios and introducing spatially separated noise. This research will contribute to the body of findings from which educational institutions can draw when designing or adapting educational environments in inclusive schools.

Keywords: sequential bilinguals, classroom acoustics, mild to moderate hearing loss, speech-in-noise, release from masking

Procedia PDF Downloads 324
4105 A Corpus-Based Contrastive Analysis of Directive Speech Act Verbs in English and Chinese Legal Texts

Authors: Wujian Han

Abstract:

In the process of human interaction and communication, speech act verbs are considered to be the most active component and the main means for information transmission, and are also taken as an indication of the structure of linguistic behavior. The theoretical value and practical significance of such everyday built-in metalanguage have long been recognized. This paper, which is part of a bigger study, is aimed to provide useful insights for a more precise and systematic application to speech act verbs translation between English and Chinese, especially with regard to the degree to which generic integrity is maintained in the practice of translation of legal documents. In this study, the corpus, i.e. Chinese legal texts and their English translations, English legal texts, ordinary Chinese texts, and ordinary English texts, serve as a testing ground for examining contrastively the usage of English and Chinese directive speech act verbs in legal genre. The scope of this paper is relatively wide and essentially covers all directive speech act verbs which are used in ordinary English and Chinese, such as order, command, request, prohibit, threat, advice, warn and permit. The researcher, by combining the corpus methodology with a contrastive perspective, explored a range of characteristics of English and Chinese directive speech act verbs including their semantic, syntactic and pragmatic features, and then contrasted them in a structured way. It has been found that there are similarities between English and Chinese directive speech act verbs in legal genre, such as similar semantic components between English speech act verbs and their translation equivalents in Chinese, formal and accurate usage of English and Chinese directive speech act verbs in legal contexts. But notable differences have been identified in areas of difference between their usage in the original Chinese and English legal texts such as valency patterns and frequency of occurrences. For example, the subjects of some directive speech act verbs are very frequently omitted in Chinese legal texts, but this is not the case in English legal texts. One of the practicable methods to achieve adequacy and conciseness in speech act verb translation from Chinese into English in legal genre is to repeat the subjects or the message with discrepancy, and vice versa. In addition, translation effects such as overuse and underuse of certain directive speech act verbs are also found in the translated English texts compared to the original English texts. Legal texts constitute a particularly valuable material for speech act verb study. Building up such a contrastive picture of the Chinese and English speech act verbs in legal language would yield results of value and interest to legal translators and students of language for legal purposes and have practical application to legal translation between English and Chinese.

Keywords: contrastive analysis, corpus-based, directive speech act verbs, legal texts, translation between English and Chinese

Procedia PDF Downloads 499
4104 Cortical and Subcortical Dementias: A Psychoneurolinguistic Perspective

Authors: Sadeq Al Yaari, Fayza Alhammadi, Ayman Al Yaari, Montaha Al Yaari, Aayah Al Yaari, Adham Al Yaari, Sajedah Al Yaari, Saleh Al Yami

Abstract:

Background: A rapidly increasing number of studies that focus on the relationship between language and cortical (CD) and subcortical dementias (SCD) have recently shown that such correlation is existent. Mounting evidence suggests that cognitive impairments should be investigated against language disorders. Aims: This study aims at investigating how language is associated with dementia diseases namely CD &SCD in light of psychoneurolinguistic approach. Method: Data from multiple sources (e.g., theses, dissertations, articles, research, medical records, direct testing, staff reports, and client observations) have been integrated to provide a detailed analysis of the relationship between language and CD&SCD. The researchers identified over 20 most of dementia types, and described them. Having collected and described data, the researchers then analyzed these data independently to see to what extent CD&SCD are involved in matters concerning language. Results: Results of the present study demonstrate that language and CD&SCD are undoubtedly correlated with each other. The loss of the ability of some organs to perform certain functions (due to any of the dementia diseases) results in no way to the loss of some language aspects and /or speech skills. In clearer terms, it is rare to find a patient with dementia who is not suffering from partial or complete linguistic difficulties. Many deficits run through the current interpretation of linguistic disorders: language disorders, speech disorders, articulation disorders, or voice disorders.

Keywords: cortical dementia, subcortical dementia, diseases, psychoneurolinguistics, language, impairments, relationship

Procedia PDF Downloads 49
4103 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse

Authors: Zarine Avetisyan

Abstract:

Paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies, and techniques. Departing from the viewpoints of many prominent linguists, the paper suggests manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.

Keywords: speech impact, manipulative argumentation, political discourse, technique

Procedia PDF Downloads 508
4102 Speech Enhancement Using Kalman Filter in Communication

Authors: Eng. Alaa K. Satti Salih

Abstract:

Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.

Keywords: autoregressive process, Kalman filter, Matlab, noise speech

Procedia PDF Downloads 344
4101 Comparative Methods for Speech Enhancement and the Effects on Text-Independent Speaker Identification Performance

Authors: R. Ajgou, S. Sbaa, S. Ghendir, A. Chemsa, A. Taleb-Ahmed

Abstract:

The speech enhancement algorithm is to improve speech quality. In this paper, we review some speech enhancement methods and we evaluated their performance based on Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862). All method was evaluated in presence of different kind of noise using TIMIT database and NOIZEUS noisy speech corpus.. The noise was taken from the AURORA database and includes suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train station noise. Simulation results showed improved performance of speech enhancement for Tracking of non-stationary noise approach in comparison with various methods in terms of PESQ measure. Moreover, we have evaluated the effects of the speech enhancement technique on Speaker Identification system based on autoregressive (AR) model and Mel-frequency Cepstral coefficients (MFCC).

Keywords: speech enhancement, pesq, speaker recognition, MFCC

Procedia PDF Downloads 424
4100 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 327
4099 Visual Speech Perception of Arabic Emphatics

Authors: Maha Saliba Foster

Abstract:

Speech perception has been recognized as a bi-sensory process involving the auditory and visual channels. Compared to the auditory modality, the contribution of the visual signal to speech perception is not very well understood. Studying how the visual modality affects speech recognition can have pedagogical implications in second language learning, as well as clinical application in speech therapy. The current investigation explores the potential effect of speech visual cues on the perception of Arabic emphatics (AEs). The corpus consists of 36 minimal pairs each containing two contrasting consonants, an AE versus a non-emphatic (NE). Movies of four Lebanese speakers were edited to allow perceivers to have partial view of facial regions: lips only, lips-cheeks, lips-chin, lips-cheeks-chin, lips-cheeks-chin-neck. In the absence of any auditory information and relying solely on visual speech, perceivers were above chance at correctly identifying AEs or NEs across vowel contexts; moreover, the models were able to predict the probability of perceivers’ accuracy in identifying some of the COIs produced by certain speakers; additionally, results showed an overlap between the measurements selected by the computer and those selected by human perceivers. The lack of significant face effect on the perception of AEs seems to point to the lips, present in all of the videos, as the most important and often sufficient facial feature for emphasis recognition. Future investigations will aim at refining the analyses of visual cues used by perceivers by using Principal Component Analysis and including time evolution of facial feature measurements.

Keywords: Arabic emphatics, machine learning, speech perception, visual speech perception

Procedia PDF Downloads 306
4098 Trends of Code-Mixing in a Bilingual Nigerian Child: An Investigation of a Three-Year-Old Child

Authors: Salamatu Sani

Abstract:

This study is an investigation of how code-mixing manifests in the language development of a Nigerian child, especially in the Hausa speaking environment. It is hinged on the fact that the environment influences the first language acquired by a child regardless of the cultural and/or linguistic background of the parents. The child under investigation has been subjected to close monitoring on her speech hitherto. It is a longitudinal study covering a period of twelve months (January 2018 to December 2018); that was when the subject was between twenty-four and thirty months of age. The speeches have been recorded by means of a tape recorder, video, and a diary. The study employs as a theoretical framework, emergentism, which is an eclectic of the behaviourist and the mentalist theories to the study of language development, for analysis. This is in agreement with the positions of Skinner and Watson. Sequel to this investigation, it was discovered the environment is a major factor that influences the exposure of a child to a language more than the other factors and that, if a child is exposed to more than one language, there is a great tendency for such a child to code-mix and code-switch in her speech production. The child under investigation, in spite of the linguistic background of her parents, speaks the Hausa Language much better than the other languages around her though with remarkable code-mixing with other languages around her such as English and Ebira languages. The study concludes that although a child is born with the innate ability to acquire a particular language, the environment plays a key role to trigger the innate ability and consequently, the child is exposed to the acquisition of the dominant language around her at a particular given time.

Keywords: bilingual, code-mixing, emergentism, environment, Hausa

Procedia PDF Downloads 161
4097 Articles, Delimitation of Speech and Perception

Authors: Nataliya L. Ogurechnikova

Abstract:

The paper aims to clarify the function of articles in the English speech and specify their place and role in the English language, taking into account the use of articles for delimitation of speech. A focus of the paper is the use of the definite and the indefinite articles with different types of noun phrases which comprise either one noun with or without attributes, such as the King, the Queen, the Lion, the Unicorn, a dimple, a smile, a new language, an unknown dialect, or several nouns with or without attributes, such as the King and Queen of Hearts, the Lion and Unicorn, a dimple or smile, a completely isolated language or dialect. It is stated that the function of delimitation is related to perception: the number of speech units in a text correlates with the way the speaker perceives and segments the denotation. The two following combinations of words the house and garden and the house and the garden contain different numbers of speech units, one and two respectively, and reveal two different perception modes which correspond to the use of the definite article in the examples given. Thus, the function of delimitation is twofold, it is related to perception and cognition, on the one hand, and, on the other hand, to grammar, if the subject of grammar is the structure of speech. Analysis of speech units in the paper is not limited by noun phrases and is amplified by discussion of peripheral phenomena which are nevertheless important because they enable to qualify articles as a syntactic phenomenon whereas they are not infrequently described in terms of noun morphology. With this regard attention is given to the history of linguistic studies, specifically to the description of English articles by Niels Haislund, a disciple of Otto Jespersen. A discrepancy is noted between the initial plan of Jespersen who intended to describe articles as a syntactic phenomenon in ‘A Modern English Grammar on Historical Principles’ and the interpretation of articles in terms of noun morphology, finally given by Haislund. Another issue of the paper is correlation between description and denotation, being a traditional aspect of linguistic studies focused on articles. An overview of relevant studies, given in the paper, goes back to the works of G. Frege, which gave rise to a series of scientific works where the meaning of articles was described within the scope of logical semantics. Correlation between denotation and description is treated in the paper as the meaning of article, i.e. a component in its semantic structure, which differs from the function of delimitation and is similar to the meaning of other quantifiers. The paper further explains why the relation between description and denotation, i.e. the meaning of English article, is irrelevant for noun morphology and has nothing to do with nominal categories of the English language.

Keywords: delimitation of speech, denotation, description, perception, speech units, syntax

Procedia PDF Downloads 240
4096 Speech Rhythm Variation in Languages and Dialects: F0, Natural and Inverted Speech

Authors: Imen Ben Abda

Abstract:

Languages have been classified into different rhythm classes. 'Stress-timed' languages are exemplified by English, 'syllable-timed' languages by French and 'mora-timed' languages by Japanese. However, to our best knowledge, acoustic studies have not been unanimous in strictly establishing which rhythm category a given language belongs to and failed to show empirical evidence for isochrony. Perception seems to be a good approach to categorize languages into different rhythm classes. This study, within the scope of experimental phonetics, includes an account of different perceptual experiments using cues from natural and inverted speech, as well as pitch extracted from speech data. It is an attempt to categorize speech rhythm over a large set of Arabic (Tunisian, Algerian, Lebanese and Moroccan) and English dialects (Welsh, Irish, Scottish and Texan) as well as other languages such as Chinese, Japanese, French, and German. Listeners managed to classify the different languages and dialects into different rhythm classes using suprasegmental cues mainly rhythm and pitch (F0). They also perceived rhythmic differences even among languages and dialects belonging to the same rhythm class. This may show that there are different subclasses within very broad rhythmic typologies.

Keywords: F0, inverted speech, mora-timing, rhythm variation, stress-timing, syllable-timing

Procedia PDF Downloads 526
4095 Freedom of Speech and Involvement in Hatred Speech on Social Media Networks

Authors: Sara Chinnasamy, Michelle Gun, M. Adnan Hashim

Abstract:

Federal Constitution guarantees Malaysians the right to free speech and expression; yet hatred speech can be commonly found on social media platforms such as Facebook, Twitter, and Instagram. In Malaysia social media sphere, most hatred speech involves religion, race and politics. Recent cases of racial attacks on social media have created social tensions among Malaysians. Many Malaysians always argue on their rights to freedom of speech. However, there are laws that limit their expression to the public and protecting social media users from being a victim of hate speech. This paper aims to explore the attitude and involvement of Malaysian netizens towards freedom of speech and hatred speech on social media. It also examines the relationship between involvement in hatred speech among Malaysian netizens and attitude towards freedom of speech. For most Malaysians, practicing total freedom of speech in the open is unthinkable. As a result, the best channel to articulate their feelings and opinions liberally is the internet. With the advent of the internet medium, more and more Malaysians are conveying their viewpoints using the various internet channels although sensitivity of the audience is seldom taken into account. Consequently, this situation has led to pockets of social disharmony among the citizens. Although this unhealthy activity is denounced by the authority, netizens are generally of the view that they have the right to write anything they want. Using the quantitative method, survey was conducted among Malaysians aged between 18 and 50 years who are active social media users. Results from the survey reveal that despite a weak relationship level between hatred speech involvement on social media and attitude towards freedom of speech, the association is still considerably significant. As such, it can be safely presumed that hatred speech on social media occurs due to the freedom of speech that exists by way of social media channels.

Keywords: freedom of speech, hatred speech, social media, Malaysia, netizens

Procedia PDF Downloads 457
4094 Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Authors: Van Nhan Nguyen, Harald Holone

Abstract:

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Keywords: automatic speech recognition, asr, air traffic control, atc

Procedia PDF Downloads 399
4093 Gender Difference in the Use of Request Strategies by Urdu/Punjabi Native Speakers

Authors: Muzaffar Hussain

Abstract:

Requests strategies are considered as a part of the speech acts, which are frequently used in everyday communication. Each language provides speech acts to the speakers; therefore, the selection of appropriate form seems more culture-specific rather than language. The present paper investigates the gender-based difference in the use of request strategies by native speakers of Urdu/Punjabi male and female who are learning English as a second language. The data for the present study were collected from 68 graduate students, who are learning English as an L2 in Pakistan. They were given an online close-ended questionnaire, based on Discourse Completion Test (DCT). After analyzing the data, it was found that the L1 male Urdu/Punjabi speakers were inclined to use more direct request strategies while the female Urdu/Punjabi speakers used indirect request strategies. This paper also found that in some situations female participants used more direct strategies than male participants. The present study concludes that the use of request strategies is influenced by culture, social status, and power distribution in a society.

Keywords: gender variation, request strategies, face-threatening, second language pragmatics, language competence

Procedia PDF Downloads 189
4092 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: phonogram, speech signal, temporal characteristics, fundamental frequency, biometric fingerprints

Procedia PDF Downloads 144
4091 Efficacy of Phonological Awareness Intervention for People with Language Impairment

Authors: I. Wardana Ketut, I. Suparwa Nyoman

Abstract:

This study investigated the form and characteristic of speech sound produced by three Balinese subjects who have recovered from aphasia as well as intervened their language impairment on side of linguistic and neuronal aspects of views. The failure of judging the speech sound was caused by impairment of motor cortex that indicated there were lesions in left hemispheric language zone. Sound articulation phenomena were in the forms of phonemes deletion, replacement or assimilation in individual words and meaning building for anomic aphasia. Therefore, the Balinese sound patterns were stimulated by showing pictures to the subjects and recorded to recognize what individual consonants or vowels they unclearly produced and to find out how the sound disorder occurred. The physiology of sound production by subject’s speech organs could not only show the accuracy of articulation but also any level of severity the lesion they suffered from. The subjects’ speech sounds were investigated, classified and analyzed to know how poor the lingual units were and observed to clarify weaknesses of sound characters occurred either for place or manner of articulation. Many fricative and stopped consonants were replaced by glottal or palatal sounds because the cranial nerve, such as facial, trigeminal, and hypoglossal underwent impairment after the stroke. The phonological intervention was applied through a technique called phonemic articulation drill and the examination was conducted to know any change has been obtained. The finding informed that some weak articulation turned into clearer sound and simple meaning of language has been conveyed. The hierarchy of functional parts of brain played important role of language formulation and processing. From this finding, it can be clearly emphasized that this study supports the role of right hemisphere in recovery from aphasia is associated with functional brain reorganization.

Keywords: aphasia, intervention, phonology, stroke

Procedia PDF Downloads 196
4090 A Comparative Study on Vowel Articulation in Malayalam Speaking Children Using Cochlear Implant

Authors: Deepthy Ann Joy, N. Sreedevi

Abstract:

Hearing impairment (HI) at an early age, identified before the onset of language development can reduce the negative effect on speech and language development of children. Early rehabilitation is very important in the improvement of speech production in children with HI. Other than conventional hearing aids, Cochlear Implants are being used in the rehabilitation of children with HI. However, delay in acquisition of speech and language milestones persist in children with Cochlear Implant (CI). Delay in speech milestones are reflected through speech sound errors. These errors reflect the temporal and spectral characteristics of speech. Hence, acoustical analysis of the speech sounds will provide a better representation of speech production skills in children with CI. The present study aimed at investigating the acoustic characteristics of vowels in Malayalam speaking children with a cochlear implant. The participants of the study consisted of 20 Malayalam speaking children in the age range of four and seven years. The experimental group consisted of 10 children with CI, and the control group consisted of 10 typically developing children. Acoustic analysis was carried out for 5 short (/a/, /i/, /u/, /e/, /o/) and 5 long vowels (/a:/, /i:/, /u:/, /e:/, /o:/) in word-initial position. The responses were recorded and analyzed for acoustic parameters such as Vowel duration, Ratio of the duration of a short and long vowel, Formant frequencies (F₁ and F₂) and Formant Centralization Ratio (FCR) computed using the formula (F₂u+F₂a+F₁i+F₁u)/(F₂i+F₁a). Findings of the present study indicated that the values for vowel duration were higher in experimental group compared to the control group for all the vowels except for /u/. Ratio of duration of short and long vowel was also found to be higher in experimental group compared to control group except for /i/. Further F₁ for all vowels was found to be higher in experimental group with variability noticed in F₂ values. FCR was found be higher in experimental group, indicating vowel centralization. Further, the results of independent t-test revealed no significant difference across the parameters in both the groups. It was found that the spectral and temporal measures in children with CI moved towards normal range. The result emphasizes the significance of early rehabilitation in children with hearing impairment. The role of rehabilitation related aspects are also discussed in detail which can be clinically incorporated for the betterment of speech therapeutic services in children with CI.

Keywords: acoustics, cochlear implant, Malayalam, vowels

Procedia PDF Downloads 144
4089 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 80
4088 Automatic Segmentation of the Clean Speech Signal

Authors: M. A. Ben Messaoud, A. Bouzid, N. Ellouze

Abstract:

Speech Segmentation is the measure of the change point detection for partitioning an input speech signal into regions each of which accords to only one speaker. In this paper, we apply two features based on multi-scale product (MP) of the clean speech, namely the spectral centroid of MP, and the zero crossings rate of MP. We focus on multi-scale product analysis as an important tool for segmentation extraction. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. We have evaluated our method on the Keele database. Experimental results show the effectiveness of our method presenting a good performance. It shows that the two simple features can find word boundaries, and extracted the segments of the clean speech.

Keywords: multiscale product, spectral centroid, speech segmentation, zero crossings rate

Procedia PDF Downloads 499
4087 Childhood Apraxia of Speech and Autism: Interaction Influences and Treatment

Authors: Elad Vashdi

Abstract:

It is common to find speech deficit among children diagnosed with Autism. It can be found in the clinical field and recently in research. One of the DSM-V criteria suggests a speech delay (Delay in, or total lack of, the development of spoken language), but doesn't explain the cause of it. A common perception among professionals and families is that the inability to talk results from the autism. Autism is a name for a syndrome which just describes a phenomenon and is defined behaviorally. Since it is not based yet on a physiological gold standard, one can not conclude the nature of a deficit based on the name of the syndrome. A wide retrospective research (n=270) which included children with motor speech difficulties was conducted in Israel. The study analyzed entry evaluations in a private clinic during the years 2006-2013. The data was extracted from the reports. High percentage of children diagnosed with Autism (60%) was found. This result demonstrates the high relationship between Autism and motor speech problem. It also supports recent findings in research of Childhood apraxia of speech (CAS) occurrence among children with ASD. Only small percentage of the participants in this research (10%) were diagnosed with CAS even though their verbal deficits well fitted the guidelines for CAS diagnosis set by ASHA in 2007. This fact raises questions regarding the diagnostic procedure in Israel. The understanding that CAS might highly exist within Autism and can have a remarkable influence on the course of early development should be a guiding tool within the diagnosis procedure. CAS can explain the nature of the speech problem among some of the autistic children and guide the treatment in a more accurate way. Calculating the prevalence of CAS which includes the comorbidity with ASD reveals new numbers and suggests treating differently the CAS population.

Keywords: childhood apraxia of speech, Autism, treatment, speech

Procedia PDF Downloads 275
4086 Eisenhower’s Farewell Speech: Initial and Continuing Communication Effects

Authors: B. Kuiper

Abstract:

When Dwight D. Eisenhower delivered his final Presidential speech in 1961, he was using the opportunity to bid farewell to America, but he was also trying to warn his fellow countrymen about deeper challenges threatening the country. In this analysis, Eisenhower’s speech is examined in light of the impact it had on American culture, communication concepts, and political ramifications. The paper initially highlights the previous literature on the speech, especially in light of its 50th anniversary, and reveals a man whose main concern was how the speech’s words would affect his beloved country. The painstaking approach to the wording of the speech to reveal the intent is key, particularly in light of analyzing the motivations according to “virtuous communication.” This philosophical construct indicates that Eisenhower’s Farewell Address was crafted carefully according to a departing President’s deepest values and concerns, concepts that he wanted to pass along to his successor, to his country, and even to the world.

Keywords: Eisenhower, mass communication, political speech, rhetoric

Procedia PDF Downloads 274
4085 A Sparse Representation Speech Denoising Method Based on Adapted Stopping Residue Error

Authors: Qianhua He, Weili Zhou, Aiwu Chen

Abstract:

A sparse representation speech denoising method based on adapted stopping residue error was presented in this paper. Firstly, the cross-correlation between the clean speech spectrum and the noise spectrum was analyzed, and an estimation method was proposed. In the denoising method, an over-complete dictionary of the clean speech power spectrum was learned with the K-singular value decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the adjusted noise spectrum, and the orthogonal matching pursuit (OMP) approach was applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech was re-synthesised via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the conventional methods in terms of subjective and objective measure.

Keywords: speech denoising, sparse representation, k-singular value decomposition, orthogonal matching pursuit

Procedia PDF Downloads 499
4084 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

Authors: Sanja Seljan, Ivan Dunđer

Abstract:

The paper presents combined automatic speech recognition (ASR) for English and machine translation (MT) for English and Croatian in the domain of business correspondence. The first part presents results of training the ASR commercial system on two English data sets, enriched by error analysis. The second part presents results of machine translation performed by online tool Google Translate for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Keywords: automatic machine translation, integrated language technologies, quality evaluation, speech recognition

Procedia PDF Downloads 484
4083 Detecting Hate Speech And Cyberbullying Using Natural Language Processing

Authors: Nádia Pereira, Paula Ferreira, Sofia Francisco, Sofia Oliveira, Sidclay Souza, Paula Paulino, Ana Margarida Veiga Simão

Abstract:

Social media has progressed into a platform for hate speech among its users, and thus, there is an increasing need to develop automatic detection classifiers of offense and conflicts to help decrease the prevalence of such incidents. Online communication can be used to intentionally harm someone, which is why such classifiers could be essential in social networks. A possible application of these classifiers is the automatic detection of cyberbullying. Even though identifying the aggressive language used in online interactions could be important to build cyberbullying datasets, there are other criteria that must be considered. Being able to capture the language, which is indicative of the intent to harm others in a specific context of online interaction is fundamental. Offense and hate speech may be the foundation of online conflicts, which have become commonly used in social media and are an emergent research focus in machine learning and natural language processing. This study presents two Portuguese language offense-related datasets which serve as examples for future research and extend the study of the topic. The first is similar to other offense detection related datasets and is entitled Aggressiveness dataset. The second is a novelty because of the use of the history of the interaction between users and is entitled the Conflicts/Attacks dataset. Both datasets were developed in different phases. Firstly, we performed a content analysis of verbal aggression witnessed by adolescents in situations of cyberbullying. Secondly, we computed frequency analyses from the previous phase to gather lexical and linguistic cues used to identify potentially aggressive conflicts and attacks which were posted on Twitter. Thirdly, thorough annotation of real tweets was performed byindependent postgraduate educational psychologists with experience in cyberbullying research. Lastly, we benchmarked these datasets with other machine learning classifiers.

Keywords: aggression, classifiers, cyberbullying, datasets, hate speech, machine learning

Procedia PDF Downloads 228
4082 Cross-Language Variation and the ‘Fused’ Zone in Bilingual Mental Lexicon: An Experimental Research

Authors: Yuliya E. Leshchenko, Tatyana S. Ostapenko

Abstract:

Language variation is a widespread linguistic phenomenon which can affect different levels of a language system: phonological, morphological, lexical, syntactic, etc. It is obvious that the scope of possible standard alternations within a particular language is limited by a variety of its norms and regulations which set more or less clear boundaries for what is possible and what is not possible for the speakers. The possibility of lexical variation (alternate usage of lexical items within the same contexts) is based on the fact that the meanings of words are not clearly and rigidly defined in the consciousness of the speakers. Therefore, lexical variation is usually connected with unstable relationship between words and their referents: a case when a particular lexical item refers to different types of referents, or when a particular referent can be named by various lexical items. We assume that the scope of lexical variation in bilingual speech is generally wider than that observed in monolingual speech due to the fact that, besides ‘lexical item – referent’ relations it involves the possibility of cross-language variation of L1 and L2 lexical items. We use the term ‘cross-language variation’ to denote a case when two equivalent words of different languages are treated by a bilingual speaker as freely interchangeable within the common linguistic context. As distinct from code-switching which is traditionally defined as the conscious use of more than one language within one communicative act, in case of cross-language lexical variation the speaker does not perceive the alternate lexical items as belonging to different languages and, therefore, does not realize the change of language code. In the paper, the authors present research of lexical variation of adult Komi-Permyak – Russian bilingual speakers. The two languages co-exist on the territory of the Komi-Permyak District in Russia (Komi-Permyak as the ethnic language and Russian as the official state language), are usually acquired from birth in natural linguistic environment and, according to the data of sociolinguistic surveys, are both identified by the speakers as coordinate mother tongues. The experimental research demonstrated that alternation of Komi-Permyak and Russian words within one utterance/phrase is highly frequent both in speech perception and production. Moreover, our participants estimated cross-language word combinations like ‘маленькая /Russian/ нывка /Komi-Permyak/’ (‘a little girl’) or ‘мунны /Komi-Permyak/ домой /Russian/’ (‘go home’) as regular/habitual, containing no violation of any linguistic rules and being equally possible in speech as the equivalent intra-language word combinations (‘учöтик нывка’ /Komi-Permyak/ or ‘идти домой’ /Russian/). All the facts considered, we claim that constant concurrent use of the two languages results in the fact that a large number of their words tend to be intuitively interpreted by the speakers as lexical variants not only related to the same referent, but also referring to both languages or, more precisely, to none of them in particular. Consequently, we can suppose that bilingual mental lexicon includes an extensive ‘fused’ zone of lexical representations that provide the basis for cross-language variation in bilingual speech.

Keywords: bilingualism, bilingual mental lexicon, code-switching, lexical variation

Procedia PDF Downloads 148