Search results for: part of speech
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7316

Search results for: part of speech

7196 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 105
7195 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 258
7194 Frequency of Consonant Production Errors in Children with Speech Sound Disorder: A Retrospective-Descriptive Study

Authors: Amulya P. Rao, Prathima S., Sreedevi N.

Abstract:

Speech sound disorders (SSD) encompass the major concern in younger population of India with highest prevalence rate among the speech disorders. Children with SSD if not identified and rehabilitated at the earliest, are at risk for academic difficulties. This necessitates early identification using screening tools assessing the frequently misarticulated speech sounds. The literature on frequently misarticulated speech sounds is ample in English and other western languages targeting individuals with various communication disorders. Articulation is language specific, and there are limited studies reporting the same in Kannada, a Dravidian Language. Hence, the present study aimed to identify the frequently misarticulated consonants in Kannada and also to examine the error type. A retrospective, descriptive study was carried out using secondary data analysis of 41 participants (34-phonetic type and 7-phonemic type) with SSD in the age range 3-to 12-years. All the consonants of Kannada were analyzed by considering three words for each speech sound from the Kannada Diagnostic Photo Articulation test (KDPAT). Picture naming task was carried out, and responses were audio recorded. The recorded data were transcribed using IPA 2018 broad transcription. A criterion of 2/3 or 3/3 error productions was set to consider the speech sound to be an error. Number of error productions was calculated for each consonant in each participant. Then, the percentage of participants meeting the criteria were documented for each consonant to identify the frequently misarticulated speech sound. Overall results indicated that velar /k/ (48.78%) and /g/ (43.90%) were frequently misarticulated followed by voiced retroflex /ɖ/ (36.58%) and trill /r/ (36.58%). The lateral retroflex /ɭ/ was misarticulated by 31.70% of the children with SSD. Dentals (/t/, /n/), bilabials (/p/, /b/, /m/) and labiodental /v/ were produced correctly by all the participants. The highly misarticulated velars /k/ and /g/ were frequently substituted by dentals /t/ and /d/ respectively or omitted. Participants with SSD-phonemic type had multiple substitutions for one speech sound whereas, SSD-phonetic type had consistent single sound substitutions. Intra- and inter-judge reliability for 10% of the data using Cronbach’s Alpha revealed good reliability (0.8 ≤ α < 0.9). Analyzing a larger sample by replicating such studies will validate the present study results.

Keywords: consonant, frequently misarticulated, Kannada, SSD

Procedia PDF Downloads 92
7193 The Effect of Speech-Shaped Noise and Speaker’s Voice Quality on First-Grade Children’s Speech Perception and Listening Comprehension

Authors: I. Schiller, D. Morsomme, A. Remacle

Abstract:

Children’s ability to process spoken language develops until the late teenage years. At school, where efficient spoken language processing is key to academic achievement, listening conditions are often unfavorable. High background noise and poor teacher’s voice represent typical sources of interference. It can be assumed that these factors particularly affect primary school children, because their language and literacy skills are still low. While it is generally accepted that background noise and impaired voice impede spoken language processing, there is an increasing need for analyzing impacts within specific linguistic areas. Against this background, the aim of the study was to investigate the effect of speech-shaped noise and imitated dysphonic voice on first-grade primary school children’s speech perception and sentence comprehension. Via headphones, 5 to 6-year-old children, recruited within the French-speaking community of Belgium, listened to and performed a minimal-pair discrimination task and a sentence-picture matching task. Stimuli were randomly presented according to four experimental conditions: (1) normal voice / no noise, (2) normal voice / noise, (3) impaired voice / no noise, and (4) impaired voice / noise. The primary outcome measure was task score. How did performance vary with respect to listening condition? Preliminary results will be presented with respect to speech perception and sentence comprehension and carefully interpreted in the light of past findings. This study helps to support our understanding of children’s language processing skills under adverse conditions. Results shall serve as a starting point for probing new measures to optimize children’s learning environment.

Keywords: impaired voice, sentence comprehension, speech perception, speech-shaped noise, spoken language processing

Procedia PDF Downloads 164
7192 Programmed Speech to Text Summarization Using Graph-Based Algorithm

Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba

Abstract:

Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculations

Keywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization

Procedia PDF Downloads 182
7191 Reconstructed Phase Space Features for Estimating Post Traumatic Stress Disorder

Authors: Andre Wittenborn, Jarek Krajewski

Abstract:

Trauma-related sadness in speech can alter the voice in several ways. The generation of non-linear aerodynamic phenomena within the vocal tract is crucial when analyzing trauma-influenced speech production. They include non-laminar flow and formation of jets rather than well-behaved laminar flow aspects. Especially state-space reconstruction methods based on chaotic dynamics and fractal theory have been suggested to describe these aerodynamic turbulence-related phenomena of the speech production system. To extract the non-linear properties of the speech signal, we used the time delay embedding method to reconstruct from a scalar time series (reconstructed phase space, RPS). This approach results in the extraction of 7238 Features per .wav file (N= 47, 32 m, 15 f). The speech material was prompted by telling about autobiographical related sadness-inducing experiences (sampling rate 16 kHz, 8-bit resolution). After combining these features in a support vector machine based machine learning approach (leave-one-sample out validation), we achieved a correlation of r = .41 with the well-established, self-report ground truth measure (RATS) of post-traumatic stress disorder (PTSD).

Keywords: non-linear dynamics features, post traumatic stress disorder, reconstructed phase space, support vector machine

Procedia PDF Downloads 79
7190 Speech Perception by Video Hosting Services Actors: Urban Planning Conflicts

Authors: M. Pilgun

Abstract:

The report presents the results of a study of the specifics of speech perception by actors of video hosting services on the material of urban planning conflicts. To analyze the content, the multimodal approach using neural network technologies is employed. Analysis of word associations and associative networks of relevant stimulus revealed the evaluative reactions of the actors. Analysis of the data identified key topics that generated negative and positive perceptions from the participants. The calculation of social stress and social well-being indices based on user-generated content made it possible to build a rating of road transport construction objects according to the degree of negative and positive perception by actors.

Keywords: social media, speech perception, video hosting, networks

Procedia PDF Downloads 121
7189 Functions and Pragmatic Aspects of English Nonsense

Authors: Natalia V. Ursul

Abstract:

In linguistic studies, the question of nonsense is attracting increasing interest. Nonsense is usually defined as spoken or written words that have no meaning. However, this definition is likely to be outdated as any speech act is generated due to the speaker’s pragmatic reasons, thus it cannot be purely illogical or meaningless. In the current paper a new working definition of nonsense as a linguistic medium will be formulated; moreover, the pragmatic peculiarities of newly coined linguistic patterns and possible ways of their interpretation will be discussed.

Keywords: nonsense, nonse verse, pragmatics, speech act

Procedia PDF Downloads 488
7188 Effects of Therapeutic Horseback Riding in Speech and Communication Skills of Children with Autism

Authors: Aristi Alopoudi, Sofia Beloka, Vassiliki Pliogou

Abstract:

Autism is a complex neuro-developmental disorder with a variety of difficulties in many aspects such as social interaction, communication skills and verbal communication (speech). The aim of this study was to examine the impact of therapeutic horseback riding in improving the verbal and communication skills of children diagnosed with autism during 16 sessions. The researcher examined whether the expression of speech, the use of vocabulary, semantics, pragmatics, echolalia and communication skills were influenced by the therapeutic horseback riding when we increase the frequency of the sessions. The researcher observed two subjects of primary-school aged, in a two case observation design, with autism during 16 therapeutic horseback riding sessions (one riding session per week). Compared to baseline, at the end of the 16th therapeutic session, therapeutic horseback riding increased both verbal skills such as vocabulary, semantics, pragmatics, formation of sentences and communication skills such as eye contact, greeting, participation in dialogue and spontaneous speech. It was noticeable that echolalia remained stable. Increased frequency of therapeutic horseback riding was beneficial for significant improvement in verbal and communication skills. More specifically, from the first to the last riding session there was a great increase of vocabulary, semantics, and formation of sentences. Pragmatics reached a lower level than semantics but the same as the right usage of the first person (for example, I make a hug) and echolalia used for that. A great increase of spontaneous speech was noticed. The eye contact was presented in a lower level, and there was a slow but important raise at the greeting as well as the participation in dialogue. Last but not least; this is a first study conducted in therapeutic horseback riding studying the verbal communication and communication skills in autistic children. According to the references, therapeutic horseback riding is a therapy with a variety of benefits, thus; this research made clear that in the benefits of this therapy there should be included the improvement of verbal speech and communication.

Keywords: Autism, communication skills, speech, therapeutic horseback riding

Procedia PDF Downloads 239
7187 Co-Design of Accessible Speech Recognition for Users with Dysarthric Speech

Authors: Elizabeth Howarth, Dawn Green, Sean Connolly, Geena Vabulas, Sara Smolley

Abstract:

Through the EU Horizon 2020 Nuvoic Project, the project team recruited 70 individuals in the UK and Ireland to test the Voiceitt speech recognition app and provide user feedback to developers. The app is designed for people with dysarthric speech, to support communication with unfamiliar people and access to speech-driven technologies such as smart home equipment and smart assistants. Participants with atypical speech, due to a range of conditions such as cerebral palsy, acquired brain injury, Down syndrome, stroke and hearing impairment, were recruited, primarily through organisations supporting disabled people. Most had physical or learning disabilities in addition to dysarthric speech. The project team worked with individuals, their families and local support teams, to provide access to the app, including through additional assistive technologies where needed. Testing was user-led, with participants asked to identify and test use cases most relevant to their daily lives over a period of three months or more. Ongoing technical support and training were provided remotely and in-person throughout the testing period. Structured interviews were used to collect feedback on users' experiences, with delivery adapted to individuals' needs and preferences. Informal feedback was collected through ongoing contact between participants, their families and support teams and the project team. Focus groups were held to collect feedback on specific design proposals. User feedback shared with developers has led to improvements to the user interface and functionality, including faster voice training, simplified navigation, the introduction of gamification elements and of switch access as an alternative to touchscreen access, with other feature requests from users still in development. This work offers a case-study in successful and inclusive co-design with the disabled community.

Keywords: co-design, assistive technology, dysarthria, inclusive speech recognition

Procedia PDF Downloads 76
7186 Low-Income African-American Fathers' Gendered Relationships with Their Children: A Study Examining the Impact of Child Gender on Father-Child Interactions

Authors: M. Lim Haslip

Abstract:

This quantitative study explores the correlation between child gender and father-child interactions. The author analyzes data from videotaped interactions between African-American fathers and their boy or girl toddler to explain how African-American fathers and toddlers interact with each other and whether these interactions differ by child gender. The purpose of this study is to investigate the research question: 'How, if at all, do fathers’ speech and gestures differ when interacting with their two-year-old sons versus daughters during free play?' The objectives of this study are to describe how child gender impacts African-American fathers’ verbal communication, examine how fathers gesture and speak to their toddler by gender, and to guide interventions for low-income African-American families and their children in early language development. This study involves a sample of 41 low-income African-American fathers and their 24-month-old toddlers. The videotape data will be used to observe 10-minute father-child interactions during free play. This study uses the already transcribed and coded data provided by Dr. Meredith Rowe, who did her study on the impact of African-American fathers’ verbal input on their children’s language development. The Child Language Data Exchange System (CHILDES program), created to study conversational interactions, was used for transcription and coding of the videotape data. The findings focus on the quantity of speech, diversity of speech, complexity of speech, and the quantity of gesture to inform the vocabulary usage, number of spoken words, length of speech, and the number of object pointings observed during father-toddler interactions in a free play setting. This study will help intervention and prevention scientists understand early language development in the African-American population. It will contribute to knowledge of the role of African-American fathers’ interactions on their children’s language development. It will guide interventions for the early language development of African-American children.

Keywords: parental engagement, early language development, African-American families, quantity of speech, diversity of speech, complexity of speech and the quantity of gesture

Procedia PDF Downloads 82
7185 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants

Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis

Abstract:

Bone Anchored Hearing Implants (BAHI) are routinely used in patients with conductive or mixed hearing loss, e.g. if conventional air conduction hearing aids cannot be used. New sound processors and new fitting software now allow the adjustment of parameters such as loudness compression ratios or maximum power output separately. Today it is unclear, how the choice of these parameters influences aided speech understanding in BAHI users. In this prospective experimental study, the effect of varying the compression ratio and lowering the maximum power output in a BAHI were investigated. Twelve experienced adult subjects with a mixed hearing loss participated in this study. Four different compression ratios (1.0; 1.3; 1.6; 2.0) were tested along with two different maximum power output settings, resulting in a total of eight different programs. Each participant tested each program during two weeks. A blinded Latin square design was used to minimize bias. For each of the eight programs, speech understanding in quiet and in noise was assessed. For speech in quiet, the Freiburg number test and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL were used. For speech in noise, the Oldenburg sentence test was administered. Speech understanding in quiet and in noise was improved significantly in the aided condition in any program, when compared to the unaided condition. However, no significant differences were found between any of the eight programs. In contrast, on a subjective level there was a significant preference for medium compression ratios of 1.3 to 1.6 and higher maximum power output.

Keywords: Bone Anchored Hearing Implant, baha, compression, maximum power output, speech understanding

Procedia PDF Downloads 353
7184 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik

Abstract:

We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.

Keywords: noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping

Procedia PDF Downloads 381
7183 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi

Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty

Abstract:

The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.

Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity

Procedia PDF Downloads 328
7182 A Comparative Study on Vowel Articulation in Malayalam Speaking Children Using Cochlear Implant

Authors: Deepthy Ann Joy, N. Sreedevi

Abstract:

Hearing impairment (HI) at an early age, identified before the onset of language development can reduce the negative effect on speech and language development of children. Early rehabilitation is very important in the improvement of speech production in children with HI. Other than conventional hearing aids, Cochlear Implants are being used in the rehabilitation of children with HI. However, delay in acquisition of speech and language milestones persist in children with Cochlear Implant (CI). Delay in speech milestones are reflected through speech sound errors. These errors reflect the temporal and spectral characteristics of speech. Hence, acoustical analysis of the speech sounds will provide a better representation of speech production skills in children with CI. The present study aimed at investigating the acoustic characteristics of vowels in Malayalam speaking children with a cochlear implant. The participants of the study consisted of 20 Malayalam speaking children in the age range of four and seven years. The experimental group consisted of 10 children with CI, and the control group consisted of 10 typically developing children. Acoustic analysis was carried out for 5 short (/a/, /i/, /u/, /e/, /o/) and 5 long vowels (/a:/, /i:/, /u:/, /e:/, /o:/) in word-initial position. The responses were recorded and analyzed for acoustic parameters such as Vowel duration, Ratio of the duration of a short and long vowel, Formant frequencies (F₁ and F₂) and Formant Centralization Ratio (FCR) computed using the formula (F₂u+F₂a+F₁i+F₁u)/(F₂i+F₁a). Findings of the present study indicated that the values for vowel duration were higher in experimental group compared to the control group for all the vowels except for /u/. Ratio of duration of short and long vowel was also found to be higher in experimental group compared to control group except for /i/. Further F₁ for all vowels was found to be higher in experimental group with variability noticed in F₂ values. FCR was found be higher in experimental group, indicating vowel centralization. Further, the results of independent t-test revealed no significant difference across the parameters in both the groups. It was found that the spectral and temporal measures in children with CI moved towards normal range. The result emphasizes the significance of early rehabilitation in children with hearing impairment. The role of rehabilitation related aspects are also discussed in detail which can be clinically incorporated for the betterment of speech therapeutic services in children with CI.

Keywords: acoustics, cochlear implant, Malayalam, vowels

Procedia PDF Downloads 116
7181 EEG and ABER Abnormalities in Children with Speech and Language Delay

Authors: Bharati Mehta, Manish Parakh, Bharti Bhandari, Sneha Ambwani

Abstract:

Speech and language delay (SLD) is seen commonly as a co-morbidity in children having severe resistant focal and generalized, syndromic and symptomatic epilepsies. It is however not clear whether epilepsy contributes to or is a mere association in the pathogenesis of SLD. Also, it is acknowledged that Auditory Brainstem Evoked Responses (ABER), besides used for evaluating hearing threshold, also aid in prognostication of neurological disorders and abnormalities in the hearing pathway in the brainstem. There is no circumscribed or surrogate neurophysiologic laboratory marker to adjudge the extent of SLD. The current study was designed to evaluate the abnormalities in Electroencephalography (EEG) and ABER in children with SLD who do not have an overt hearing deficit or autism. 94 children of age group 2-8 years with predominant SLD and without any gross motor developmental delay, head injury, gross hearing disorder, cleft lip/palate and autism were selected. Standard video Electroencephalography using the 10:20 international system and ABER after click stimulus with intensities 110 db until 40 db was performed in all children. EEG was abnormal in 47.9% (n= 45; 36 boys and 9 girls) children. In the children with abnormal EEG, 64.5% (n=29) had an abnormal background, 57.8% (n=27) had presence of generalized interictal epileptiform discharges (IEDs), 20% (n=9) had focal epileptiform discharges exclusively from left side and 33.3% (n=15) had multifocal IEDs occurring both in isolation or associated with generalised abnormalities. In ABER, surprisingly, the peak latencies for waves I, III & V, inter-peak latencies I-III & I-V, III-V and wave amplitude ratio V/I, were found within normal limits in both ears of all the children. Thus in the current study it is certain that presence of generalized IEDs in EEG are seen in higher frequency with SLD and focal IEDs are seen exclusively in left hemisphere in these children. It may be possible that even with generalized EEG abnormalities present in these children, left hemispheric abnormalities as a part of this generalized dysfunction may be responsible for the speech and language dysfunction. The current study also emphasizes that ABER may not be routinely recommended as diagnostic or prognostic tool in children with SLD without frank hearing deficit or autism, thus reducing the burden on electro physiologists, laboratories and saving time and financial resources.

Keywords: ABER, EEG, speech, language delay

Procedia PDF Downloads 489
7180 Exploring Pre-Trained Automatic Speech Recognition Model HuBERT for Early Alzheimer’s Disease and Mild Cognitive Impairment Detection in Speech

Authors: Monica Gonzalez Machorro

Abstract:

Dementia is hard to diagnose because of the lack of early physical symptoms. Early dementia recognition is key to improving the living condition of patients. Speech technology is considered a valuable biomarker for this challenge. Recent works have utilized conventional acoustic features and machine learning methods to detect dementia in speech. BERT-like classifiers have reported the most promising performance. One constraint, nonetheless, is that these studies are either based on human transcripts or on transcripts produced by automatic speech recognition (ASR) systems. This research contribution is to explore a method that does not require transcriptions to detect early Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This is achieved by fine-tuning a pre-trained ASR model for the downstream early AD and MCI tasks. To do so, a subset of the thoroughly studied Pitt Corpus is customized. The subset is balanced for class, age, and gender. Data processing also involves cropping the samples into 10-second segments. For comparison purposes, a baseline model is defined by training and testing a Random Forest with 20 extracted acoustic features using the librosa library implemented in Python. These are: zero-crossing rate, MFCCs, spectral bandwidth, spectral centroid, root mean square, and short-time Fourier transform. The baseline model achieved a 58% accuracy. To fine-tune HuBERT as a classifier, an average pooling strategy is employed to merge the 3D representations from audio into 2D representations, and a linear layer is added. The pre-trained model used is ‘hubert-large-ls960-ft’. Empirically, the number of epochs selected is 5, and the batch size defined is 1. Experiments show that our proposed method reaches a 69% balanced accuracy. This suggests that the linguistic and speech information encoded in the self-supervised ASR-based model is able to learn acoustic cues of AD and MCI.

Keywords: automatic speech recognition, early Alzheimer’s recognition, mild cognitive impairment, speech impairment

Procedia PDF Downloads 97
7179 A Semantic Analysis of Modal Verbs in Barak Obama’s 2012 Presidential Campaign Speech

Authors: Kais A. Kadhim

Abstract:

This paper is a semantic analysis of the English modals in Obama’s speech. The main objective of this study is to analyze selected modal auxiliaries identified in selected speeches of Obama’s campaign based on Coates’ (1983) semantic clusters. A total of fifteen speeches of Obama’s campaign were selected as the primary data and the modal auxiliaries selected for analysis include will, would, can, could, should, must, ought, shall, may and might. All the modal auxiliaries taken from the speeches of Barack Obama were analyzed based on the framework of Coates’ semantic clusters. Such analytical framework was carried out to examine how modal auxiliaries are used in the context of persuading people in Obama’s campaign speeches. The findings reveal that modals of intention, prediction, futurity and modals of possibility, ability, permission are mostly used in Obama’s campaign speeches.

Keywords: modals, meaning, persuasion, speech

Procedia PDF Downloads 377
7178 Cross-Cultural Pragmatics: Apology Strategies by Libyans

Authors: Ahmed Elgadri

Abstract:

In the last thirty years, studies on cross-cultural pragmatics in general and apology strategies in specific have focused on western and East-Asian societies. A small volume of research has been conducted in investigating speech acts production by Arabic dialect speakers. Therefore, this study investigated the apology strategies used by Libyan Arabic speakers using an online Discourse Completion Task (DCT) questionnaire. The DCT consisted of six situations covering different social contexts. The survey was written in Libyan Arabic dialect to help generate vernacular speech as much as possible. The participants were 25 Libyan nationals, 12 females, and 13 males. Also, to get a deeper understanding of the motivation behind the use of certain strategies, the researcher interviewed four participants using the Libyan Arabic dialect as well. The results revealed a high use of IFID, offer of repair, and explanation. Although this might support the universality claim of speech acts strategies, it was clear that cultural norms and religion determined the choice of apology strategies significantly. This led to the discovery of new culture-specific strategies, as outlined later in this paper. This study gives an insight into politeness strategies in Libyan society, and it is hoped to contribute to the field of cross-cultural pragmatics.

Keywords: apologies, cross-cultural pragmatics, language and culture, Libyan Arabic, politeness, pragmatics, socio-pragmatics, speech acts

Procedia PDF Downloads 122
7177 Functional Outcome of Speech, Voice and Swallowing Following Excision of Glomus Jugulare Tumor

Authors: B. S. Premalatha, Kausalya Sahani

Abstract:

Background: Glomus jugulare tumors arise within the jugular foramen and are commonly seen in females particularly on the left side. Surgical excision of the tumor may cause lower cranial nerve deficits. Cranial nerve involvement produces hoarseness of voice, slurred speech, and dysphagia along with other physical symptoms, thereby affecting the quality of life of individuals. Though oncological clearance is mainly emphasized on while treating these individuals, little importance is given to their communication, voice and swallowing problems, which play a crucial part in daily functioning. Objective: To examine the functions of voice, speech and swallowing outcomes of the subjects, following excision of glomus jugulare tumor. Methods: Two female subjects aged 56 and 62 years had come with a complaint of change in voice, inability to swallow and reduced clarity of speech following surgery for left glomus jugulare tumor were participants of the study. Their surgical information revealed multiple cranial nerve palsies involving the left facial, left superior and recurrent branches of the vagus nerve, left pharyngeal, left soft palate, left hypoglossal and vestibular nerves. Functional outcomes of voice, speech and swallowing were evaluated by perceptual and objective assessment procedures. Assessment included the examination of oral structures and functions, dysarthria by Frenchey dysarthria assessment, cranial nerve functions and swallowing functions. MDVP and Dr. Speech software were used to evaluate acoustic parameters of voice and quality of voice respectively. Results: The study revealed that both the subjects, subsequent to excision of glomus jugulare tumor, showed a varied picture of affected oral structure and functions, articulation, voice and swallowing functions. The cranial nerve assessment showed impairment of the vagus, hypoglossal, facial and glossopharyngeal nerves. Voice examination indicated vocal cord paralysis associated with breathy quality of voice, weak voluntary cough, reduced pitch and loudness range, and poor respiratory support. Perturbation parameters as jitter, shimmer were affected along with s/z ratio indicative of voice fold pathology. Reduced MPD(Maximum Phonation Duration) of vowels indicated that disturbed coordination between respiratory and laryngeal systems. Hypernasality was found to be a prominent feature which reduced speech intelligibility. Imprecise articulation was seen in both the subjects as the hypoglossal nerve was affected following surgery. Injury to vagus, hypoglossal, gloss pharyngeal and facial nerves disturbed the function of swallowing. All the phases of swallow were affected. Aspiration was observed before and during the swallow, confirming the oropharyngeal dysphagia. All the subsystems were affected as per Frenchey Dysarthria Assessment signifying the diagnosis of flaccid dysarthria. Conclusion: There is an observable communication and swallowing difficulty seen following excision of glomus jugulare tumor. Even with complete resection, extensive rehabilitation may be necessary due to significant lower cranial nerve dysfunction. The finding of the present study stresses the need for involvement of as speech and swallowing therapist for pre-operative counseling and assessment of functional outcomes.

Keywords: functional outcome, glomus jugulare tumor excision, multiple cranial nerve impairment, speech and swallowing

Procedia PDF Downloads 228
7176 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 65
7175 Gender Difference in the Use of Request Strategies by Urdu/Punjabi Native Speakers

Authors: Muzaffar Hussain

Abstract:

Requests strategies are considered as a part of the speech acts, which are frequently used in everyday communication. Each language provides speech acts to the speakers; therefore, the selection of appropriate form seems more culture-specific rather than language. The present paper investigates the gender-based difference in the use of request strategies by native speakers of Urdu/Punjabi male and female who are learning English as a second language. The data for the present study were collected from 68 graduate students, who are learning English as an L2 in Pakistan. They were given an online close-ended questionnaire, based on Discourse Completion Test (DCT). After analyzing the data, it was found that the L1 male Urdu/Punjabi speakers were inclined to use more direct request strategies while the female Urdu/Punjabi speakers used indirect request strategies. This paper also found that in some situations female participants used more direct strategies than male participants. The present study concludes that the use of request strategies is influenced by culture, social status, and power distribution in a society.

Keywords: gender variation, request strategies, face-threatening, second language pragmatics, language competence

Procedia PDF Downloads 159
7174 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis

Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

Abstract:

The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Keywords: auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter

Procedia PDF Downloads 396
7173 A New Dual Forward Affine Projection Adaptive Algorithm for Speech Enhancement in Airplane Cockpits

Authors: Djendi Mohmaed

Abstract:

In this paper, we propose a dual adaptive algorithm, which is based on the combination between the forward blind source separation (FBSS) structure and the affine projection algorithm (APA). This proposed algorithm combines the advantages of the source separation properties of the FBSS structure and the fast convergence characteristics of the APA algorithm. The proposed algorithm needs two noisy observations to provide an enhanced speech signal. This process is done in a blind manner without the need for ant priori information about the source signals. The proposed dual forward blind source separation affine projection algorithm is denoted (DFAPA) and used for the first time in an airplane cockpit context to enhance the communication from- and to- the airplane. Intensive experiments were carried out in this sense to evaluate the performance of the proposed DFAPA algorithm.

Keywords: adaptive algorithm, speech enhancement, system mismatch, SNR

Procedia PDF Downloads 108
7172 Searching Linguistic Synonyms through Parts of Speech Tagging

Authors: Faiza Hussain, Usman Qamar

Abstract:

Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.

Keywords: natural language processing, text mining, information retrieval, parts-of-speech tagging, grammar, semantics

Procedia PDF Downloads 279
7171 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier

Authors: Saurabh Farkya, Govinda Surampudi

Abstract:

Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.

Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)

Procedia PDF Downloads 468
7170 Mistranslation in Cross Cultural Communication: A Discourse Analysis on Former President Bush’s Speech in 2001

Authors: Lowai Abed

Abstract:

The differences in languages play a big role in cross-cultural communication. If meanings are not translated accurately, the risk can be crucial not only on an interpersonal level, but also on the international and political levels. The use of metaphorical language by politicians can cause great confusion, often leading to statements being misconstrued. In these situations, it is the translators who struggle to put forward the intended meaning with clarity and this makes translation an important field to study and analyze when it comes to cross-cultural communication. Owing to the growing importance of language and the power of translation in politics, this research analyzes part of President Bush’s speech in 2001 in which he used the word “Crusade” which caused his statement to be misconstrued. The research uses a discourse analysis of cross-cultural communication literature which provides answers supported by historical, linguistic, and communicative perspectives. The first finding indicates that the word ‘crusade’ carries different meaning and significance in the narratives of the Western world when compared to the Middle East. The second one is that, linguistically, maintaining cultural meanings through translation is quite difficult and challenging. Third, when it comes to the cross-cultural communication perspective, the common and frequent usage of literal translation is a sign of poor strategies being followed in translation training. Based on the example of Bush’s speech, this paper hopes to highlight the weak practices in translation in cross-cultural communication which are still commonly used across the world. Translation studies have to take issues such as this seriously and attempt to find a solution. In every language, there are words and phrases that have cultural, historical and social meanings that are woven into the language. Literal translation is not the solution for this problem because that strategy is unable to convey these meanings in the target language.

Keywords: crusade, metaphor, mistranslation, war in terror

Procedia PDF Downloads 81
7169 A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30dB SNR as a reference for voice activity.

Keywords: atomic decomposition, gabor, gammatone, matching pursuit, voice activity detection

Procedia PDF Downloads 266
7168 The Resistance Reader Program Based on Image Processing

Authors: Janpen Srijan, Nahathai Tanmang, Thanit Purathanang, Anun Dowchern, Saksit Summart, Seangduan Kampimpa

Abstract:

This paper presents the resistance reader program based on image processing by using MATLAB. The proposed program is divided into six parts; the first part is the web camera; the second part is a watt selection before shooting the resistor; the third part is a part of finding the position of the color on the mid-point of resistor; the fourth part is a part of identifying color code of the resistor; the fifth part is a part of taking the number of values for each color for resistance calculation and the last part is a part of displaying result of resistance value. The experimental result of the resistance reader program based on image processing was able to display the resistance value of resistor. The accuracy of proposed program is 85 percent for 1 watt resistor. It has 15 percent of reading error because a problem with the color code of some resistor was too bright.

Keywords: resistance reader program, image processing, resistor, MATLAB

Procedia PDF Downloads 347
7167 Articles, Delimitation of Speech and Perception

Authors: Nataliya L. Ogurechnikova

Abstract:

The paper aims to clarify the function of articles in the English speech and specify their place and role in the English language, taking into account the use of articles for delimitation of speech. A focus of the paper is the use of the definite and the indefinite articles with different types of noun phrases which comprise either one noun with or without attributes, such as the King, the Queen, the Lion, the Unicorn, a dimple, a smile, a new language, an unknown dialect, or several nouns with or without attributes, such as the King and Queen of Hearts, the Lion and Unicorn, a dimple or smile, a completely isolated language or dialect. It is stated that the function of delimitation is related to perception: the number of speech units in a text correlates with the way the speaker perceives and segments the denotation. The two following combinations of words the house and garden and the house and the garden contain different numbers of speech units, one and two respectively, and reveal two different perception modes which correspond to the use of the definite article in the examples given. Thus, the function of delimitation is twofold, it is related to perception and cognition, on the one hand, and, on the other hand, to grammar, if the subject of grammar is the structure of speech. Analysis of speech units in the paper is not limited by noun phrases and is amplified by discussion of peripheral phenomena which are nevertheless important because they enable to qualify articles as a syntactic phenomenon whereas they are not infrequently described in terms of noun morphology. With this regard attention is given to the history of linguistic studies, specifically to the description of English articles by Niels Haislund, a disciple of Otto Jespersen. A discrepancy is noted between the initial plan of Jespersen who intended to describe articles as a syntactic phenomenon in ‘A Modern English Grammar on Historical Principles’ and the interpretation of articles in terms of noun morphology, finally given by Haislund. Another issue of the paper is correlation between description and denotation, being a traditional aspect of linguistic studies focused on articles. An overview of relevant studies, given in the paper, goes back to the works of G. Frege, which gave rise to a series of scientific works where the meaning of articles was described within the scope of logical semantics. Correlation between denotation and description is treated in the paper as the meaning of article, i.e. a component in its semantic structure, which differs from the function of delimitation and is similar to the meaning of other quantifiers. The paper further explains why the relation between description and denotation, i.e. the meaning of English article, is irrelevant for noun morphology and has nothing to do with nominal categories of the English language.

Keywords: delimitation of speech, denotation, description, perception, speech units, syntax

Procedia PDF Downloads 217