Search results for: speech
654 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants
Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis
Abstract:
Bone Anchored Hearing Implants (BAHI) are routinely used in patients with conductive or mixed hearing loss, e.g. if conventional air conduction hearing aids cannot be used. New sound processors and new fitting software now allow the adjustment of parameters such as loudness compression ratios or maximum power output separately. Today it is unclear, how the choice of these parameters influences aided speech understanding in BAHI users. In this prospective experimental study, the effect of varying the compression ratio and lowering the maximum power output in a BAHI were investigated. Twelve experienced adult subjects with a mixed hearing loss participated in this study. Four different compression ratios (1.0; 1.3; 1.6; 2.0) were tested along with two different maximum power output settings, resulting in a total of eight different programs. Each participant tested each program during two weeks. A blinded Latin square design was used to minimize bias. For each of the eight programs, speech understanding in quiet and in noise was assessed. For speech in quiet, the Freiburg number test and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL were used. For speech in noise, the Oldenburg sentence test was administered. Speech understanding in quiet and in noise was improved significantly in the aided condition in any program, when compared to the unaided condition. However, no significant differences were found between any of the eight programs. In contrast, on a subjective level there was a significant preference for medium compression ratios of 1.3 to 1.6 and higher maximum power output.Keywords: Bone Anchored Hearing Implant, baha, compression, maximum power output, speech understanding
Procedia PDF Downloads 387653 Hate Speech Detection in Tunisian Dialect
Authors: Helmi Baazaoui, Mounir Zrigui
Abstract:
This study addresses the challenge of hate speech detection in Tunisian Arabic text, a critical issue for online safety and moderation. Leveraging the strengths of the AraBERT model, we fine-tuned and evaluated its performance against the Bi-LSTM model across four distinct datasets: T-HSAB, TNHS, TUNIZI-Dataset, and a newly compiled dataset with diverse labels such as Offensive Language, Racism, and Religious Intolerance. Our experimental results demonstrate that AraBERT significantly outperforms Bi-LSTM in terms of Recall, Precision, F1-Score, and Accuracy across all datasets. The findings underline the robustness of AraBERT in capturing the nuanced features of Tunisian Arabic and its superior capability in classification tasks. This research not only advances the technology for hate speech detection but also provides practical implications for social media moderation and policy-making in Tunisia. Future work will focus on expanding the datasets and exploring more sophisticated architectures to further enhance detection accuracy, thus promoting safer online interactions.Keywords: hate speech detection, Tunisian Arabic, AraBERT, Bi-LSTM, Gemini annotation tool, social media moderation
Procedia PDF Downloads 16652 Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach
Authors: Ahmed Kamil Hasan Al-Ali, Bouchra Senadji, Ganesh Naik
Abstract:
We propose a system to real environmental noise and channel mismatch for forensic speaker verification systems. This method is based on suppressing various types of real environmental noise by using independent component analysis (ICA) algorithm. The enhanced speech signal is applied to mel frequency cepstral coefficients (MFCC) or MFCC feature warping to extract the essential characteristics of the speech signal. Channel effects are reduced using an intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) approach for classification. The proposed algorithm is evaluated by using an Australian forensic voice comparison database, combined with car, street and home noises from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the MFCC feature warping-ICA achieves a reduction in equal error rate about (48.22%, 44.66%, and 50.07%) over using MFCC feature warping when the test speech signals are corrupted with random sessions of street, car, and home noises at -10 dB SNR.Keywords: noisy forensic speaker verification, ICA algorithm, MFCC, MFCC feature warping
Procedia PDF Downloads 408651 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi
Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty
Abstract:
The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity
Procedia PDF Downloads 358650 A Comparative Study on Vowel Articulation in Malayalam Speaking Children Using Cochlear Implant
Authors: Deepthy Ann Joy, N. Sreedevi
Abstract:
Hearing impairment (HI) at an early age, identified before the onset of language development can reduce the negative effect on speech and language development of children. Early rehabilitation is very important in the improvement of speech production in children with HI. Other than conventional hearing aids, Cochlear Implants are being used in the rehabilitation of children with HI. However, delay in acquisition of speech and language milestones persist in children with Cochlear Implant (CI). Delay in speech milestones are reflected through speech sound errors. These errors reflect the temporal and spectral characteristics of speech. Hence, acoustical analysis of the speech sounds will provide a better representation of speech production skills in children with CI. The present study aimed at investigating the acoustic characteristics of vowels in Malayalam speaking children with a cochlear implant. The participants of the study consisted of 20 Malayalam speaking children in the age range of four and seven years. The experimental group consisted of 10 children with CI, and the control group consisted of 10 typically developing children. Acoustic analysis was carried out for 5 short (/a/, /i/, /u/, /e/, /o/) and 5 long vowels (/a:/, /i:/, /u:/, /e:/, /o:/) in word-initial position. The responses were recorded and analyzed for acoustic parameters such as Vowel duration, Ratio of the duration of a short and long vowel, Formant frequencies (F₁ and F₂) and Formant Centralization Ratio (FCR) computed using the formula (F₂u+F₂a+F₁i+F₁u)/(F₂i+F₁a). Findings of the present study indicated that the values for vowel duration were higher in experimental group compared to the control group for all the vowels except for /u/. Ratio of duration of short and long vowel was also found to be higher in experimental group compared to control group except for /i/. Further F₁ for all vowels was found to be higher in experimental group with variability noticed in F₂ values. FCR was found be higher in experimental group, indicating vowel centralization. Further, the results of independent t-test revealed no significant difference across the parameters in both the groups. It was found that the spectral and temporal measures in children with CI moved towards normal range. The result emphasizes the significance of early rehabilitation in children with hearing impairment. The role of rehabilitation related aspects are also discussed in detail which can be clinically incorporated for the betterment of speech therapeutic services in children with CI.Keywords: acoustics, cochlear implant, Malayalam, vowels
Procedia PDF Downloads 144649 Exploring Pre-Trained Automatic Speech Recognition Model HuBERT for Early Alzheimer’s Disease and Mild Cognitive Impairment Detection in Speech
Authors: Monica Gonzalez Machorro
Abstract:
Dementia is hard to diagnose because of the lack of early physical symptoms. Early dementia recognition is key to improving the living condition of patients. Speech technology is considered a valuable biomarker for this challenge. Recent works have utilized conventional acoustic features and machine learning methods to detect dementia in speech. BERT-like classifiers have reported the most promising performance. One constraint, nonetheless, is that these studies are either based on human transcripts or on transcripts produced by automatic speech recognition (ASR) systems. This research contribution is to explore a method that does not require transcriptions to detect early Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This is achieved by fine-tuning a pre-trained ASR model for the downstream early AD and MCI tasks. To do so, a subset of the thoroughly studied Pitt Corpus is customized. The subset is balanced for class, age, and gender. Data processing also involves cropping the samples into 10-second segments. For comparison purposes, a baseline model is defined by training and testing a Random Forest with 20 extracted acoustic features using the librosa library implemented in Python. These are: zero-crossing rate, MFCCs, spectral bandwidth, spectral centroid, root mean square, and short-time Fourier transform. The baseline model achieved a 58% accuracy. To fine-tune HuBERT as a classifier, an average pooling strategy is employed to merge the 3D representations from audio into 2D representations, and a linear layer is added. The pre-trained model used is ‘hubert-large-ls960-ft’. Empirically, the number of epochs selected is 5, and the batch size defined is 1. Experiments show that our proposed method reaches a 69% balanced accuracy. This suggests that the linguistic and speech information encoded in the self-supervised ASR-based model is able to learn acoustic cues of AD and MCI.Keywords: automatic speech recognition, early Alzheimer’s recognition, mild cognitive impairment, speech impairment
Procedia PDF Downloads 127648 A Semantic Analysis of Modal Verbs in Barak Obama’s 2012 Presidential Campaign Speech
Authors: Kais A. Kadhim
Abstract:
This paper is a semantic analysis of the English modals in Obama’s speech. The main objective of this study is to analyze selected modal auxiliaries identified in selected speeches of Obama’s campaign based on Coates’ (1983) semantic clusters. A total of fifteen speeches of Obama’s campaign were selected as the primary data and the modal auxiliaries selected for analysis include will, would, can, could, should, must, ought, shall, may and might. All the modal auxiliaries taken from the speeches of Barack Obama were analyzed based on the framework of Coates’ semantic clusters. Such analytical framework was carried out to examine how modal auxiliaries are used in the context of persuading people in Obama’s campaign speeches. The findings reveal that modals of intention, prediction, futurity and modals of possibility, ability, permission are mostly used in Obama’s campaign speeches.Keywords: modals, meaning, persuasion, speech
Procedia PDF Downloads 409647 Cross-Cultural Pragmatics: Apology Strategies by Libyans
Authors: Ahmed Elgadri
Abstract:
In the last thirty years, studies on cross-cultural pragmatics in general and apology strategies in specific have focused on western and East-Asian societies. A small volume of research has been conducted in investigating speech acts production by Arabic dialect speakers. Therefore, this study investigated the apology strategies used by Libyan Arabic speakers using an online Discourse Completion Task (DCT) questionnaire. The DCT consisted of six situations covering different social contexts. The survey was written in Libyan Arabic dialect to help generate vernacular speech as much as possible. The participants were 25 Libyan nationals, 12 females, and 13 males. Also, to get a deeper understanding of the motivation behind the use of certain strategies, the researcher interviewed four participants using the Libyan Arabic dialect as well. The results revealed a high use of IFID, offer of repair, and explanation. Although this might support the universality claim of speech acts strategies, it was clear that cultural norms and religion determined the choice of apology strategies significantly. This led to the discovery of new culture-specific strategies, as outlined later in this paper. This study gives an insight into politeness strategies in Libyan society, and it is hoped to contribute to the field of cross-cultural pragmatics.Keywords: apologies, cross-cultural pragmatics, language and culture, Libyan Arabic, politeness, pragmatics, socio-pragmatics, speech acts
Procedia PDF Downloads 150646 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography
Authors: Muhammad Fairul Azreen Mohd Zahid
Abstract:
Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.Keywords: intertextuality, choreography, speech act, performative, deconstruction
Procedia PDF Downloads 99645 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis
Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze
Abstract:
The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.Keywords: auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter
Procedia PDF Downloads 425644 A New Dual Forward Affine Projection Adaptive Algorithm for Speech Enhancement in Airplane Cockpits
Authors: Djendi Mohmaed
Abstract:
In this paper, we propose a dual adaptive algorithm, which is based on the combination between the forward blind source separation (FBSS) structure and the affine projection algorithm (APA). This proposed algorithm combines the advantages of the source separation properties of the FBSS structure and the fast convergence characteristics of the APA algorithm. The proposed algorithm needs two noisy observations to provide an enhanced speech signal. This process is done in a blind manner without the need for ant priori information about the source signals. The proposed dual forward blind source separation affine projection algorithm is denoted (DFAPA) and used for the first time in an airplane cockpit context to enhance the communication from- and to- the airplane. Intensive experiments were carried out in this sense to evaluate the performance of the proposed DFAPA algorithm.Keywords: adaptive algorithm, speech enhancement, system mismatch, SNR
Procedia PDF Downloads 137643 Searching Linguistic Synonyms through Parts of Speech Tagging
Authors: Faiza Hussain, Usman Qamar
Abstract:
Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.Keywords: natural language processing, text mining, information retrieval, parts-of-speech tagging, grammar, semantics
Procedia PDF Downloads 308642 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier
Authors: Saurabh Farkya, Govinda Surampudi
Abstract:
Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)
Procedia PDF Downloads 500641 Bangladesh’s July Revolution: Analyzing the 2024 Movement for Free Speech and Democracy
Authors: Abu Bakar Siddik
Abstract:
The July Movement in Bangladesh marked a pivotal moment in the nation’s struggle for democratic freedom and the right to free speech. This movement, driven by citizens, intellectuals, and activists, opposed authoritarian governance and the violation of civil liberties. By encouraging support for democratic reforms, it significantly changed the political landscape and highlighted the importance of grassroots activism for human rights. This essay examines the sociopolitical dynamics of the July Movement and its roots in popular resistance to authoritarian rule. It explores the movement's beginnings, emphasizing how citizens, scholars, and activists united to challenge the regime that restricted freedom of speech. In order to show how the movement gathered support for democratic reforms and ultimately helped bring about the overthrow of the regime, the article examines significant demonstrations, speeches, and government acts. This book offers a thorough examination of how the July Movement changed Bangladesh's political landscape by acting as a revolution for free speech and a trigger for the overthrow of autocratic authority, using historical documents, media coverage, and firsthand recollections. This study provides insightful information about how grassroots activism advances human rights.Keywords: July movement, Bangladesh, free speech, democracy, authoritarianism, civil liberties, political change, human rights, social movements, protests, political landscape, regime change, activism, socio-political dynamics
Procedia PDF Downloads 26640 A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit
Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic
Abstract:
A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30dB SNR as a reference for voice activity.Keywords: atomic decomposition, gabor, gammatone, matching pursuit, voice activity detection
Procedia PDF Downloads 294639 Articles, Delimitation of Speech and Perception
Authors: Nataliya L. Ogurechnikova
Abstract:
The paper aims to clarify the function of articles in the English speech and specify their place and role in the English language, taking into account the use of articles for delimitation of speech. A focus of the paper is the use of the definite and the indefinite articles with different types of noun phrases which comprise either one noun with or without attributes, such as the King, the Queen, the Lion, the Unicorn, a dimple, a smile, a new language, an unknown dialect, or several nouns with or without attributes, such as the King and Queen of Hearts, the Lion and Unicorn, a dimple or smile, a completely isolated language or dialect. It is stated that the function of delimitation is related to perception: the number of speech units in a text correlates with the way the speaker perceives and segments the denotation. The two following combinations of words the house and garden and the house and the garden contain different numbers of speech units, one and two respectively, and reveal two different perception modes which correspond to the use of the definite article in the examples given. Thus, the function of delimitation is twofold, it is related to perception and cognition, on the one hand, and, on the other hand, to grammar, if the subject of grammar is the structure of speech. Analysis of speech units in the paper is not limited by noun phrases and is amplified by discussion of peripheral phenomena which are nevertheless important because they enable to qualify articles as a syntactic phenomenon whereas they are not infrequently described in terms of noun morphology. With this regard attention is given to the history of linguistic studies, specifically to the description of English articles by Niels Haislund, a disciple of Otto Jespersen. A discrepancy is noted between the initial plan of Jespersen who intended to describe articles as a syntactic phenomenon in ‘A Modern English Grammar on Historical Principles’ and the interpretation of articles in terms of noun morphology, finally given by Haislund. Another issue of the paper is correlation between description and denotation, being a traditional aspect of linguistic studies focused on articles. An overview of relevant studies, given in the paper, goes back to the works of G. Frege, which gave rise to a series of scientific works where the meaning of articles was described within the scope of logical semantics. Correlation between denotation and description is treated in the paper as the meaning of article, i.e. a component in its semantic structure, which differs from the function of delimitation and is similar to the meaning of other quantifiers. The paper further explains why the relation between description and denotation, i.e. the meaning of English article, is irrelevant for noun morphology and has nothing to do with nominal categories of the English language.Keywords: delimitation of speech, denotation, description, perception, speech units, syntax
Procedia PDF Downloads 243638 Motor Speech Profile of Marathi Speaking Adults and Children
Authors: Anindita Banik, Anjali Kant, Aninda Duti Banik, Arun Banik
Abstract:
Speech is a complex, dynamic unique motor activity through which we express thoughts and emotions and respond to and control our environment. The aim was based to compare select Motor Speech parameters and their sub parameters across typical Marathi speaking adults and children. The subjects included a total of 300 divided into Group I, II, III including males and females. Subjects included were reported of no significant medical history and had a rating of 0-1 on GRBAS scale. The recordings were obtained utilizing three stimuli for the acoustic analysis of Diadochokinetic rate (DDK), Second Formant Transition, Voice and Tremor and its sub parameters. And these aforementioned parameters were acoustically analyzed in Motor Speech Profile software in VisiPitch IV. The statistical analyses were done by applying descriptive statistics and Two- Way ANOVA.The results obtained showed statistically significant difference across age groups and gender for the aforementioned parameters and its sub parameters.In DDK, for avp (ms) there was a significant difference only across age groups. However, for avr (/s) there was a significant difference across age groups and gender. It was observed that there was an increase in rate with an increase in age groups. The second formant transition sub parameter F2 magn (Hz) also showed a statistically significant difference across both age groups and gender. There was an increase in mean value with an increase in age. Females had a higher mean when compared to males. For F2 rate (/s) a statistically significant difference was observed across age groups. There was an increase in mean value with increase in age. It was observed for Voice and Tremor MFTR (%) that a statistically significant difference was present across age groups and gender. Also for RATR (Hz) there was statistically significant difference across both age groups and gender. In other words, the values of MFTR and RATR increased with an increase in age. Thus, this study highlights the variation of the motor speech parameters amongst the typical population which would be beneficial for comparison with the individuals with motor speech disorders for assessment and management.Keywords: adult, children, diadochokinetic rate, second formant transition, tremor, voice
Procedia PDF Downloads 309637 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering
Authors: Zelalem Fantahun
Abstract:
Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.Keywords: POS tagging, Amharic, unsupervised learning, k-means
Procedia PDF Downloads 452636 Detection of Phoneme [S] Mispronounciation for Sigmatism Diagnosis in Adults
Authors: Michal Krecichwost, Zauzanna Miodonska, Pawel Badura
Abstract:
The diagnosis of sigmatism is mostly based on the observation of articulatory organs. It is, however, not always possible to precisely observe the vocal apparatus, in particular in the oral cavity of the patient. Speech processing can allow to objectify the therapy and simplify the verification of its progress. In the described study the methodology for classification of incorrectly pronounced phoneme [s] is proposed. The recordings come from adults. They were registered with the speech recorder at the sampling rate of 44.1 kHz and the resolution of 16 bit. The database of pathological and normative speech has been collected for the study including reference assessments provided by the speech therapy experts. Ten adult subjects were asked to simulate a certain type of stigmatism under the speech therapy expert supervision. In the recordings, the analyzed phone [s] was surrounded by vowels, viz: ASA, ESE, ISI, SPA, USU, YSY. Thirteen MFCC (mel-frequency cepstral coefficients) and RMS (root mean square) values are calculated within each frame being a part of the analyzed phoneme. Additionally, 3 fricative formants along with corresponding amplitudes are determined for the entire segment. In order to aggregate the information within the segment, the average value of each MFCC coefficient is calculated. All features of other types are aggregated by means of their 75th percentile. The proposed method of features aggregation reduces the size of the feature vector used in the classification. Binary SVM (support vector machine) classifier is employed at the phoneme recognition stage. The first group consists of pathological phones, while the other of the normative ones. The proposed feature vector yields classification sensitivity and specificity measures above 90% level in case of individual logo phones. The employment of a fricative formants-based information improves the sole-MFCC classification results average of 5 percentage points. The study shows that the employment of specific parameters for the selected phones improves the efficiency of pathology detection referred to the traditional methods of speech signal parameterization.Keywords: computer-aided pronunciation evaluation, sibilants, sigmatism diagnosis, speech processing
Procedia PDF Downloads 284635 Recognition of Noisy Words Using the Time Delay Neural Networks Approach
Authors: Khenfer-Koummich Fatima, Mesbahi Larbi, Hendel Fatiha
Abstract:
This paper presents a recognition system for isolated words like robot commands. It’s carried out by Time Delay Neural Networks; TDNN. To teleoperate a robot for specific tasks as turn, close, etc… In industrial environment and taking into account the noise coming from the machine. The choice of TDNN is based on its generalization in terms of accuracy, in more it acts as a filter that allows the passage of certain desirable frequency characteristics of speech; the goal is to determine the parameters of this filter for making an adaptable system to the variability of speech signal and to noise especially, for this the back propagation technique was used in learning phase. The approach was applied on commands pronounced in two languages separately: The French and Arabic. The results for two test bases of 300 spoken words for each one are 87%, 97.6% in neutral environment and 77.67%, 92.67% when the white Gaussian noisy was added with a SNR of 35 dB.Keywords: TDNN, neural networks, noise, speech recognition
Procedia PDF Downloads 290634 Grammatical Interference in Russian-Spanish Bilingualism
Authors: Olga A. Gnatyuk
Abstract:
The article is devoted to the phenomenon of interference that occurs in the case of the Russian-Spanish language contact. The questions of the definition of the term and levels, as well as prerequisites of interference occurrence, are considered. Interference, which is an essential part of bilingualism, may become apparent at different linguistic levels. Interference is especially evident in oral speech. The article reviews some examples of grammatical interference in Russian-Spanish bilingualism of Russian immigrants living in Spain. According to the results of the research, some cases of mother-tongue interference in Russian-Speaking Spanish language learners’ speech were revealed. Special attention is paid to such key spheres of grammatical interference as articles, personal pronouns, gender, and number of nouns. In the research, the drop of a link-verb, as well as its usage in some incorrect form, are observed in Russian immigrants’ speech. Conclusions are drawn that in the Spanish language, interference errors appear because of a consequence of both the absence in the Russian language of certain phenomena and categories of the Spanish language and the discrepancy of the linguistic systems of the two languages.Keywords: bilingualism, interference, grammatical interference, Russian language, Spanish language
Procedia PDF Downloads 160633 Role of Speech Language Pathologists in Vocational Rehabilitation
Authors: Marlyn Mathew
Abstract:
Communication is the key factor in any vocational /job set-up. However many persons with disabilities suffer a deficit in this very area in terms of comprehension, expression and cognitive skills making it difficult for them to get employed appropriately or stay employed. Vocational Rehabilitation is a continuous and coordinated process which involves the provision of vocational related services designed to enable a person with disability to obtain and maintain employment. Therefore the role of the speech language pathologist is crucial in assessing the communication deficits and needs of the individual at the various phases of employment- right from the time of seeking a job and attending interview with suitable employers and also at regular intervals of the employment. This article discusses the various communication deficits and the obstacles faced by individuals with special needs including but not limited to cognitive- linguistic deficits, execution function deficits, speech and language processing difficulties and strategies that can be introduced in the workplace to overcome these obstacles including use of visual cues, checklists, flow charts. The paper also throws light on the importance of educating colleagues and work partners about the communication difficulties faced by the individual. This would help to reduce the communication barriers in the workplace, help colleagues develop an empathetic approach and also reduce misunderstandings that can arise as a result of the communication impairment.Keywords: vocational rehabilitation, disability, speech language pathologist, cognitive, linguistics
Procedia PDF Downloads 135632 Attention-based Adaptive Convolution with Progressive Learning in Speech Enhancement
Authors: Tian Lan, Yixiang Wang, Wenxin Tai, Yilan Lyu, Zufeng Wu
Abstract:
The monaural speech enhancement task in the time-frequencydomain has a myriad of approaches, with the stacked con-volutional neural network (CNN) demonstrating superiorability in feature extraction and selection. However, usingstacked single convolutions method limits feature represen-tation capability and generalization ability. In order to solvethe aforementioned problem, we propose an attention-basedadaptive convolutional network that integrates the multi-scale convolutional operations into a operation-specific blockvia input dependent attention to adapt to complex auditoryscenes. In addition, we introduce a two-stage progressivelearning method to enlarge the receptive field without a dra-matic increase in computation burden. We conduct a series ofexperiments based on the TIMIT corpus, and the experimen-tal results prove that our proposed model is better than thestate-of-art models on all metrics.Keywords: speech enhancement, adaptive convolu-tion, progressive learning, time-frequency domain
Procedia PDF Downloads 124631 Reed: An Approach Towards Quickly Bootstrapping Multilingual Acoustic Models
Authors: Bipasha Sen, Aditya Agarwal
Abstract:
Multilingual automatic speech recognition (ASR) system is a single entity capable of transcribing multiple languages sharing a common phone space. Performance of such a system is highly dependent on the compatibility of the languages. State of the art speech recognition systems are built using sequential architectures based on recurrent neural networks (RNN) limiting the computational parallelization in training. This poses a significant challenge in terms of time taken to bootstrap and validate the compatibility of multiple languages for building a robust multilingual system. Complex architectural choices based on self-attention networks are made to improve the parallelization thereby reducing the training time. In this work, we propose Reed, a simple system based on 1D convolutions which uses very short context to improve the training time. To improve the performance of our system, we use raw time-domain speech signals directly as input. This enables the convolutional layers to learn feature representations rather than relying on handcrafted features such as MFCC. We report improvement on training and inference times by atleast a factor of 4x and 7.4x respectively with comparable WERs against standard RNN based baseline systems on SpeechOcean's multilingual low resource dataset.Keywords: convolutional neural networks, language compatibility, low resource languages, multilingual automatic speech recognition
Procedia PDF Downloads 124630 Effect of Palatal Lift Prosthesis on Speech Clarity in Flaccid Dysarthria
Authors: Firas Alfwaress, Abdelraheem Bebers Abdelhadi Hamasha, Maha Abu Awaad
Abstract:
Objectives: The aim of the present study was to investigate the effect of Palatal Lift Prosthesis (PLP) on speech clarity in patients with Flaccid Dysarthria. Five speech measures were investigated including Nasalance Scores, Diadchokinetic (DDK), Vowel Duration, airflow, and Sound Intensity. Participants: Twelve (7 Males and 5 females) native speakers of Jordanian Arabic with Flaccid Dysarthria following stroke, traumatic brain injury, and amyotrophic lateral sclerosis were included. The age of the participants ranged from 8–65 years with an average of 31.75 years. Design: Nasalance Scores, Diadchokinetic rate, Vowel Duration, and Sound Intensity were obtained using the Nasometer II, Model 6450 in three conditions. The first condition included obtaining the five measures without wearing the customized Palatal Lift Prosthesis. The second and third conditions included obtaining the five measures immediately after wearing the Palatal Lift Prosthesis and three months later. Results: Palatal lift prosthesis was found to be effective in individuals with flaccid dysarthria. Results showed decrease in the Nasalance Scores for the syllable repetition tasks and vowel prolongation tasks when comparing the means in the pre PLP with the post PLP at p≤0.001 except for the /m/ prolongation task. Results showed increased DDK repetition task, airflow amount, and sound intensity, and a decrease in vowel length at p≤0.001. Conclusions: The use of palatal lift prosthesis is effective in improving the speech of patients with flaccid dysarthria.Keywords: palatal lift prosthesis, flaccid dysarthria, hypernasality, speech clarity, diadchokinetic rate
Procedia PDF Downloads 387629 Setswana Speech Rhythm Development in High-Socioeconomic Status Setswana-English Bilingual Children
Authors: Boikanyego Sebina
Abstract:
The present study investigates the effects of socioeconomic status (SES) and bilingualism on the Setswana speech rhythm of Batswana (citizens) children aged 6-7 years with typical development born and residing in Botswana. Botswana is a country in which there is a diglossic Setswana/English language setting, where English is the dominant high-status language in educational and public contexts. Generally, children from low SES have lower linguistic and cognitive profiles than their age-matched peers from high SES. A greater understanding of these variables would allow educators to distinguish between underdeveloped language skills in children due to impairment and environmental issues for them to successfully enroll children in language development enhancement programs specific to the child’s needs. There are 20 participants: 10 high SES private English-medium educated early sequential Setswana-English bilingual children, taught full-time in English (L2) from the age of 3 years, and for whom English has become dominant; and 10 low SES children who are educated in public schools for whom English is considered a learner language, i.e., L1 Setswana is dominant. The aim is to see whether SES and bilingualism, have had an effect on the Setswana speech rhythm of children in either group. The study primarily uses semi-spontaneous speech based on the telling of the wordless picture storybook. A questionnaire is used to elicit the language use pattern of the children and that of their parents, as well as the education level of the parents and the school the children attend. A comparison of the rhythm shows that children from high SES have a lower durational variability than those from low SES. The findings of the study are that the low durational variability by children from high SES may suggest an underdeveloped rhythm. In conclusion, the results of the present study are against the notion that children from high SES outperform those from low SES in linguistic development.Keywords: bilingualism, Setswana English, socio-economic status, speech-rhythm
Procedia PDF Downloads 68628 Critical Discourse Analysis of President Mamnoon Hussain Speech in the Joint Session of Parliament.
Authors: Saeed Qaisrani
Abstract:
This article briefly reviews the rise of Critical Discourse Analysis about the Pakistani President Mamnoon Hussain speech which delivered in the joint session of Parliament and teases out a detailed analysis of the various critiques that have been levelled at CDA and its practitioners over the last twenty years, both by scholars working within the “critical” paradigm and by other critics. A range of criticisms are discussed which target the underlying premises, the analytical methodology and the disputed areas of reader response and the integration of contextual factors. Controversial issues such as the predominantly negative focus of much CDA scholarship, and the status of CDA as an emergent “intellectual orthodoxy”, are also reviewed. The conclusions offer a summary of the principal criticisms that emerge from this overview, and suggest some ways in which these problems could be attenuated. It also focused on the different views about president speech and how it is presented in the Pakistani print and electronic media.Keywords: Critical Discourse Analysis, Analytical methodology, Corpus linguistics, Reader response theory, Critical paradigm, Contextualization.
Procedia PDF Downloads 486627 Lip Localization Technique for Myanmar Consonants Recognition Based on Lip Movements
Authors: Thein Thein, Kalyar Myo San
Abstract:
Lip reading system is one of the different supportive technologies for hearing impaired, or elderly people or non-native speakers. For normal hearing persons in noisy environments or in conditions where the audio signal is not available, lip reading techniques can be used to increase their understanding of spoken language. Hearing impaired persons have used lip reading techniques as important tools to find out what was said by other people without hearing voice. Thus, visual speech information is important and become active research area. Using visual information from lip movements can improve the accuracy and robustness of a speech recognition system and the need for lip reading system is ever increasing for every language. However, the recognition of lip movement is a difficult task because of the region of interest (ROI) is nonlinear and noisy. Therefore, this paper proposes method to detect the accurate lips shape and to localize lip movement towards automatic lip tracking by using the combination of Otsu global thresholding technique and Moore Neighborhood Tracing Algorithm. Proposed method shows how accurate lip localization and tracking which is useful for speech recognition. In this work of study and experiments will be carried out the automatic lip localizing the lip shape for Myanmar consonants using the only visual information from lip movements which is useful for visual speech of Myanmar languages.Keywords: lip reading, lip localization, lip tracking, Moore neighborhood tracing algorithm
Procedia PDF Downloads 353626 Speech Disorders as Predictors of Social Participation of Children with Cerebral Palsy in the Primary Schools of the Czech Republic
Authors: Marija Zulić, Vanda Hájková, Nina Brkić–Jovanović, Srećko Potić, Sanja Tomić
Abstract:
The name cerebral palsy comes from the word cerebrum, which means the brain and the word palsy, which means seizure, and essentially refers to the movement disorder. In the clinical picture of cerebral palsy, basic neuromotor disorders are associated with other various disorders: behavioural, intellectual, speech, sensory, epileptic seizures, and bone and joint deformities. Motor speech disorders are among the most common difficulties present in people with cerebral palsy. Social participation represents an interaction between an individual and their social environment. Quality of social participation of the students with cerebral palsy at school is an important indicator of their successful participation in adulthood. One of the most important skills for the undisturbed social participation is ability of good communication. The aim of the study was to determine relation between social participation of students with cerebral palsy and presence of their speech impairment in primary schools in the Czech Republic. The study was performed in the Czech Republic in mainstream schools and schools established for the pupils with special education needs. We analysed 75 children with cerebral palsy aged between six and twelve years attending up to sixth grade by using the first and the third part of the school function assessment questionnaire as the main instrument. The other instrument we used in the research is the Gross motor function classification system–five–level classification system, which measures degree of motor functions of children and youth with cerebral palsy. Funding for this study was provided by the Grant Agency of Charles University in Prague.Keywords: cerebral palsy, social participation, speech disorders, The Czech Republic, the school function assessment
Procedia PDF Downloads 287625 Tensor Deep Stacking Neural Networks and Bilinear Mapping Based Speech Emotion Classification Using Facial Electromyography
Authors: P. S. Jagadeesh Kumar, Yang Yung, Wenli Hu
Abstract:
Speech emotion classification is a dominant research field in finding a sturdy and profligate classifier appropriate for different real-life applications. This effort accentuates on classifying different emotions from speech signal quarried from the features related to pitch, formants, energy contours, jitter, shimmer, spectral, perceptual and temporal features. Tensor deep stacking neural networks were supported to examine the factors that influence the classification success rate. Facial electromyography signals were composed of several forms of focuses in a controlled atmosphere by means of audio-visual stimuli. Proficient facial electromyography signals were pre-processed using moving average filter, and a set of arithmetical features were excavated. Extracted features were mapped into consistent emotions using bilinear mapping. With facial electromyography signals, a database comprising diverse emotions will be exposed with a suitable fine-tuning of features and training data. A success rate of 92% can be attained deprived of increasing the system connivance and the computation time for sorting diverse emotional states.Keywords: speech emotion classification, tensor deep stacking neural networks, facial electromyography, bilinear mapping, audio-visual stimuli
Procedia PDF Downloads 256