Search results for: cetaceans and human speech
8960 Subband Coding and Glottal Closure Instant (GCI) Using SEDREAMS Algorithm
Authors: Harisudha Kuresan, Dhanalakshmi Samiappan, T. Rama Rao
Abstract:
In modern telecommunication applications, Glottal Closure Instants location finding is important and is directly evaluated from the speech waveform. Here, we study the GCI using Speech Event Detection using Residual Excitation and the Mean Based Signal (SEDREAMS) algorithm. Speech coding uses parameter estimation using audio signal processing techniques to model the speech signal combined with generic data compression algorithms to represent the resulting modeled in a compact bit stream. This paper proposes a sub-band coder SBC, which is a type of transform coding and its performance for GCI detection using SEDREAMS are evaluated. In SBCs code in the speech signal is divided into two or more frequency bands and each of these sub-band signal is coded individually. The sub-bands after being processed are recombined to form the output signal, whose bandwidth covers the whole frequency spectrum. Then the signal is decomposed into low and high-frequency components and decimation and interpolation in frequency domain are performed. The proposed structure significantly reduces error, and precise locations of Glottal Closure Instants (GCIs) are found using SEDREAMS algorithm.Keywords: SEDREAMS, GCI, SBC, GOI
Procedia PDF Downloads 3588959 A Comparative Analysis on the Impact of the Prevention and Combating of Hate Crimes and Hate Speech Bill of 2016 on the Rights to Human Dignity, Equality, and Freedom in South Africa
Authors: Tholaine Matadi
Abstract:
South Africa is a democratic country with a historical record of racially-motivated marginalisation and exclusion of the majority. During the apartheid era the country was run along pieces of legislation and policies based on racial segregation. The system held a tight clamp on interracial mixing which forced people to remain in segregated areas. For example, a citizen from the Indian community could not own property in an area allocated to white people. In this way, a great majority of people were denied basic human rights. Now, there is a supreme constitution with an entrenched justiciable Bill of Rights founded on democratic values of social justice, human dignity, equality and the advancement of human rights and freedoms. The Constitution also enshrines the values of non-racialism and non-sexism. The Constitutional Court has the power to declare unconstitutional any law or conduct considered to be inconsistent with it. Now, more than two decades down the line, despite the abolition of apartheid, there is evidence that South Africa still experiences hate crimes which violate the entrenched right of vulnerable groups not to be discriminated against on the basis of race, sexual orientation, gender, national origin, occupation, or disability. To remedy this mischief parliament has responded by drafting the Prevention and Combatting of Hate Crimes and Hate Speech Bill. The Bill has been disseminated for public comment and suggestions. It is intended to combat hate crimes and hate speech based on sheer prejudice. The other purpose of the Bill is to bring South Africa in line with international human rights instruments against racism, racial discrimination, xenophobia and related expressions of intolerance identified in several international instruments. It is against this backdrop that this paper intends to analyse the impact of the Bill on the rights to human dignity, equality, and freedom. This study is significant because the Bill was highly contested and creates a huge debate. This study relies on a qualitative evaluative approach based on desktop and library research. The article recurs to primary and secondary sources. For comparative purpose, the paper compares South Africa with countries such as Australia, Canada, Kenya, Cuba, and United Kingdom which have criminalised hate crimes and hate speech. The finding from this study is that despite the Bill’s expressed positive intentions, this draft legislation is problematic for several reasons. The main reason is that it generates considerable controversy mostly because it is considered to infringe the right to freedom of expression. Though the author suggests that the Bill should not be rejected in its entirety, she notes the brutal psychological effect of hate crimes on their direct victims and the writer emphasises that a legislature can succeed to combat hate-crimes only if it provides for them as a separate stand-alone category of offences. In view of these findings, the study recommended that since hate speech clauses have a negative impact on freedom of expression it can be promulgated, subject to the legislature enacting the Prevention and Combatting of Hate-Crimes Bill as a stand-alone law which criminalises hate crimes.Keywords: freedom of expression, hate crimes, hate speech, human dignity
Procedia PDF Downloads 1758958 Google Translate: AI Application
Authors: Shaima Almalhan, Lubna Shukri, Miriam Talal, Safaa Teskieh
Abstract:
Since artificial intelligence is a rapidly evolving topic that has had a significant impact on technical growth and innovation, this paper examines people's awareness, use, and engagement with the Google Translate application. To see how familiar aware users are with the app and its features, quantitative and qualitative research was conducted. The findings revealed that consumers have a high level of confidence in the application and how far people they benefit from this sort of innovation and how convenient it makes communication.Keywords: artificial intelligence, google translate, speech recognition, language translation, camera translation, speech to text, text to speech
Procedia PDF Downloads 1558957 Speech Emotion Recognition: A DNN and LSTM Comparison in Single and Multiple Feature Application
Authors: Thiago Spilborghs Bueno Meyer, Plinio Thomaz Aquino Junior
Abstract:
Through speech, which privileges the functional and interactive nature of the text, it is possible to ascertain the spatiotemporal circumstances, the conditions of production and reception of the discourse, the explicit purposes such as informing, explaining, convincing, etc. These conditions allow bringing the interaction between humans closer to the human-robot interaction, making it natural and sensitive to information. However, it is not enough to understand what is said; it is necessary to recognize emotions for the desired interaction. The validity of the use of neural networks for feature selection and emotion recognition was verified. For this purpose, it is proposed the use of neural networks and comparison of models, such as recurrent neural networks and deep neural networks, in order to carry out the classification of emotions through speech signals to verify the quality of recognition. It is expected to enable the implementation of robots in a domestic environment, such as the HERA robot from the RoboFEI@Home team, which focuses on autonomous service robots for the domestic environment. Tests were performed using only the Mel-Frequency Cepstral Coefficients, as well as tests with several characteristics of Delta-MFCC, spectral contrast, and the Mel spectrogram. To carry out the training, validation and testing of the neural networks, the eNTERFACE’05 database was used, which has 42 speakers from 14 different nationalities speaking the English language. The data from the chosen database are videos that, for use in neural networks, were converted into audios. It was found as a result, a classification of 51,969% of correct answers when using the deep neural network, when the use of the recurrent neural network was verified, with the classification with accuracy equal to 44.09%. The results are more accurate when only the Mel-Frequency Cepstral Coefficients are used for the classification, using the classifier with the deep neural network, and in only one case, it is possible to observe a greater accuracy by the recurrent neural network, which occurs in the use of various features and setting 73 for batch size and 100 training epochs.Keywords: emotion recognition, speech, deep learning, human-robot interaction, neural networks
Procedia PDF Downloads 1718956 Compensatory Articulation of Pressure Consonants in Telugu Cleft Palate Speech: A Spectrographic Analysis
Authors: Indira Kothalanka
Abstract:
For individuals born with a cleft palate (CP), there is no separation between the nasal cavity and the oral cavity, due to which they cannot build up enough air pressure in the mouth for speech. Therefore, it is common for them to have speech problems. Common cleft type speech errors include abnormal articulation (compensatory or obligatory) and abnormal resonance (hyper, hypo and mixed nasality). These are generally resolved after palate repair. However, in some individuals, articulation problems do persist even after the palate repair. Such individuals develop variant articulations in an attempt to compensate for the inability to produce the target phonemes. A spectrographic analysis is used to investigate the compensatory articulatory behaviours of pressure consonants in the speech of 10 Telugu speaking individuals aged between 7-17 years with a history of cleft palate. Telugu is a Dravidian language which is spoken in Andhra Pradesh and Telangana states in India. It is a language with the third largest number of native speakers in India and the most spoken Dravidian language. The speech of the informants is analysed using single word list, sentences, passage and conversation. Spectrographic analysis is carried out using PRAAT, speech analysis software. The place and manner of articulation of consonant sounds is studied through spectrograms with the help of various acoustic cues. The types of compensatory articulation identified are glottal stops, palatal stops, uvular, velar stops and nasal fricatives which are non-native in Telugu.Keywords: cleft palate, compensatory articulation, spectrographic analysis, PRAAT
Procedia PDF Downloads 4468955 Virtual Reality Based 3D Video Games and Speech-Lip Synchronization Superseding Algebraic Code Excited Linear Prediction
Authors: P. S. Jagadeesh Kumar, S. Meenakshi Sundaram, Wenli Hu, Yang Yung
Abstract:
In 3D video games, the dominance of production is unceasingly growing with a protruding level of affordability in terms of budget. Afterward, the automation of speech-lip synchronization technique is customarily onerous and has advanced a critical research subject in virtual reality based 3D video games. This paper presents one of these automatic tools, precisely riveted on the synchronization of the speech and the lip movement of the game characters. A robust and precise speech recognition segment that systematized with Algebraic Code Excited Linear Prediction method is developed which unconventionally delivers lip sync results. The Algebraic Code Excited Linear Prediction algorithm is constructed on that used in code-excited linear prediction, but Algebraic Code Excited Linear Prediction codebooks have an explicit algebraic structure levied upon them. This affords a quicker substitute to the software enactments of lip sync algorithms and thus advances the superiority of service factors abridged production cost.Keywords: algebraic code excited linear prediction, speech-lip synchronization, video games, virtual reality
Procedia PDF Downloads 4748954 A Cross-Dialect Statistical Analysis of Final Declarative Intonation in Tuvinian
Authors: D. Beziakina, E. Bulgakova
Abstract:
This study continues the research on Tuvinian intonation and presents a general cross-dialect analysis of intonation of Tuvinian declarative utterances, specifically the character of the tone movement in order to test the hypothesis about the prevalence of level tone in some Tuvinian dialects. The results of the analysis of basic pitch characteristics of Tuvinian speech (in general and in comparison with two other Turkic languages - Uzbek and Azerbaijani) are also given in this paper. The goal of our work was to obtain the ranges of pitch parameter values typical for Tuvinian speech. Such language-specific values can be used in speaker identification systems in order to get more accurate results of ethnic speech analysis. We also present the results of a cross-dialect analysis of declarative intonation in the poorly studied Tuvinian language.Keywords: speech analysis, statistical analysis, speaker recognition, identification of person
Procedia PDF Downloads 4728953 A Profile of the Patients at the Hearing and Speech Clinic at the University of Jordan: A Retrospective Study
Authors: Maisa Haj-Tas, Jehad Alaraifi
Abstract:
The significance of the study: This retrospective study examined the speech and language profiles of patients who received clinical services at the University of Jordan Hearing and Speech Clinic (UJ-HSC) from 2009 to 2014. The UJ-HSC clinic is located in the capital Amman and was established in the late 1990s. It is the first hearing and speech clinic in Jordan and one of first speech and hearing clinics in the Middle East. This clinic provides services to an annual average of 2000 patients who are diagnosed with different communication disorders. Examining the speech and language profiles of patients in this clinic could provide an insight about the most common disorders seen in patients who attend similar clinics in Jordan. It could also provide information about community awareness of the role of speech therapists in the management of speech and language disorders. Methodology: The researchers examined the clinical records of 1140 patients (797 males and 343 females) who received clinical services at the UJ-HSC between the years 2009 and 2014 for the purpose of data analysis for this study. The main variables examined in the study were disorder type and gender. Participants were divided into four age groups: children, adolescents, adults, and older adults. The examined disorders were classified as either speech disorders, language disorders, or dysphagia (i.e., swallowing problems). The disorders were further classified as childhood language impairments, articulation disorders, stuttering, cluttering, voice disorders, aphasia, and dysphagia. Results: The results indicated that the prevalence for language disorders was the highest (50.7%) followed by speech disorders (48.3%), and dysphagia (0.9%). The majority of patients who were seen at the JU-HSC were diagnosed with childhood language impairments (47.3%) followed consecutively by articulation disorders (21.1%), stuttering (16.3%), voice disorders (12.1%), aphasia (2.2%), dysphagia (0.9%), and cluttering (0.2%). As for gender, the majority of patients seen at the clinic were males in all disorders except for voice disorders and cluttering. Discussion: The results of the present study indicate that the majority of examined patients were diagnosed with childhood language impairments. Based on this result, the researchers suggest that there seems to be a high prevalence of childhood language impairments among children in Jordan compared to other types of speech and language disorders. The researchers also suggest that there is a need for further examination of the actual prevalence data on speech and language disorders in Jordan. The fact that many of the children seen at the UJ-HSC were brought to the clinic either as a result of parental concern or teacher referral indicates that there seems to an increased awareness among parents and teachers about the services speech pathologists can provide about assessment and treatment of childhood speech and language disorders. The small percentage of other disorders (i.e., stuttering, cluttering, dysphasia, aphasia, and voice disorders) seen at the UJ-HSC may indicate a little awareness by the local community about the role of speech pathologists in the assessment and treatment of these disorders.Keywords: clinic, disorders, language, profile, speech
Procedia PDF Downloads 3168952 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition
Authors: Jong Han Joo, Jung Hoon Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi
Abstract:
In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. We propose a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.Keywords: acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector
Procedia PDF Downloads 3758951 Role of Speech Articulation in English Language Learning
Authors: Khadija Rafi, Neha Jamil, Laiba Khalid, Meerub Nawaz, Mahwish Farooq
Abstract:
Speech articulation is a complex process to produce intelligible sounds with the help of precise movements of various structures within the vocal tract. All these structures in the vocal tract are named as articulators, which comprise lips, teeth, tongue, and palate. These articulators work together to produce a range of distinct phonemes, which happen to be the basis of language. It starts with the airstream from the lungs passing through the trachea and into oral and nasal cavities. When the air passes through the mouth, the tongue and the muscles around it form such coordination it creates certain sounds. It can be seen when the tongue is placed in different positions- sometimes near the alveolar ridge, soft palate, roof of the mouth or the back of the teeth which end up creating unique qualities of each phoneme. We can articulate vowels with open vocal tracts, but the height and position of the tongue is different every time depending upon each vowel, while consonants can be pronounced when we create obstructions in the airflow. For instance, the alphabet ‘b’ is a plosive and can be produced only by briefly closing the lips. Articulation disorders can not only affect communication but can also be a hurdle in speech production. To improve articulation skills for such individuals, doctors often recommend speech therapy, which involves various kinds of exercises like jaw exercises and tongue twisters. However, this disorder is more common in children who are going through developmental articulation issues right after birth, but in adults, it can be caused by injury, neurological conditions, or other speech-related disorders. In short, speech articulation is an essential aspect of productive communication, which also includes coordination of the specific articulators to produce different intelligible sounds, which are a vital part of spoken language.Keywords: linguistics, speech articulation, speech therapy, language learning
Procedia PDF Downloads 638950 Hate Speech in Selected Nigerian Newspapers
Authors: Laurel Chikwado Madumere, Kevin O. Ugorji
Abstract:
A speech is said to be full of hate when it appropriates disparaging and vituperative locutions and/or appellations, which are riddled with prejudices and misconceptions about an antagonizing party on the grounds of gender, race, political orientation, religious affiliations, tribe, etc. Due largely to the dichotomies and polarities that exist in Nigeria across political ideological spectrum, tribal affiliations, and gender contradistinctions, there are possibilities for the existence of socioeconomic, religious and political conditions that would induce, provoke and catalyze hate speeches in Nigeria’s mainstream media. Therefore the aim of this paper is to investigate, using select daily newspapers in Nigeria, the extent and complexity of those likely hate speeches that emanate from the pluralism in Nigeria and to set in to relief, the discrepancies and contrariety in the interpretation of those hate words. To achieve the above, the paper shall be qualitative in orientation as it shall be using the Speech Act Theory of J. L. Austin and J. R. Searle to interpret and evaluate the hate speeches in the select Nigerian daily newspapers. Also this paper shall help to elucidate the conditions that generate hate, and inform the government and NGOs how best to approach those conditions and put an end to the possible violence and extremism that emanate from extreme cases of hate.Keywords: extremism, gender, hate speech, pluralism, prejudice, speech act theory
Procedia PDF Downloads 1498949 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English
Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo
Abstract:
This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.Keywords: vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm
Procedia PDF Downloads 2688948 Grammatical and Lexical Cohesion in the Japan’s Prime Minister Shinzo Abe’s Speech Text ‘Nihon wa Modottekimashita’
Authors: Nadya Inda Syartanti
Abstract:
This research aims to identify, classify, and analyze descriptively the aspects of grammatical and lexical cohesion in the speech text of Japan’s Prime Minister Shinzo Abe entitled Nihon wa Modotte kimashita delivered in Washington DC, the United States on February 23, 2013, as a research data source. The method used is qualitative research, which uses descriptions through words that are applied by analyzing aspects of grammatical and lexical cohesion proposed by Halliday and Hasan (1976). The aspects of grammatical cohesion consist of references (personal, demonstrative, interrogative pronouns), substitution, ellipsis, and conjunction. In contrast, lexical cohesion consists of reiteration (repetition, synonym, antonym, hyponym, meronym) and collocation. Data classification is based on the 6 aspects of the cohesion. Through some aspects of cohesion, this research tries to find out the frequency of using grammatical and lexical cohesion in Shinzo Abe's speech text entitled Nihon wa Modotte kimashita. The results of this research are expected to help overcome the difficulty of understanding speech texts in Japanese. Therefore, this research can be a reference for learners, researchers, and anyone who is interested in the field of discourse analysis.Keywords: cohesion, grammatical cohesion, lexical cohesion, speech text, Shinzo Abe
Procedia PDF Downloads 1628947 Speech and Swallowing Function after Tonsillo-Lingual Sulcus Resection with PMMC Flap Reconstruction: A Case Study
Authors: K. Rhea Devaiah, B. S. Premalatha
Abstract:
Background: Tonsillar Lingual sulcus is the area between the tonsils and the base of the tongue. The surgical resection of the lesions in the head and neck results in changes in speech and swallowing functions. The severity of the speech and swallowing problem depends upon the site and extent of the lesion, types and extent of surgery and also the flexibility of the remaining structures. Need of the study: This paper focuses on the importance of speech and swallowing rehabilitation in an individual with the lesion in the Tonsillar Lingual Sulcus and post-operative functions. Aim: Evaluating the speech and swallow functions post-intensive speech and swallowing rehabilitation. The objectives are to evaluate the speech intelligibility and swallowing functions after intensive therapy and assess the quality of life. Method: The present study describes a report of an individual aged 47years male, with the diagnosis of basaloid squamous cell carcinoma, left tonsillar lingual sulcus (pT2n2M0) and underwent wide local excision with left radical neck dissection with PMMC flap reconstruction. Post-surgery the patient came with a complaint of reduced speech intelligibility, and difficulty in opening the mouth and swallowing. Detailed evaluation of the speech and swallowing functions were carried out such as OPME, articulation test, speech intelligibility, different phases of swallowing and trismus evaluation. Self-reported questionnaires such as SHI-E(Speech handicap Index- Indian English), DHI (Dysphagia handicap Index) and SESEQ -K (Self Evaluation of Swallowing Efficiency in Kannada) were also administered to know what the patient feels about his problem. Based on the evaluation, the patient was diagnosed with pharyngeal phase dysphagia associated with trismus and reduced speech intelligibility. Intensive speech and swallowing therapy was advised weekly twice for the duration of 1 hour. Results: Totally the patient attended 10 intensive speech and swallowing therapy sessions. Results indicated misarticulation of speech sounds such as lingua-palatal sounds. Mouth opening was restricted to one finger width with difficulty chewing, masticating, and swallowing the bolus. Intervention strategies included Oro motor exercise, Indirect swallowing therapy, usage of a trismus device to facilitate mouth opening, and change in the food consistency to help to swallow. A practice session was held with articulation drills to improve the production of speech sounds and also improve speech intelligibility. Significant changes in articulatory production and speech intelligibility and swallowing abilities were observed. The self-rated quality of life measures such as DHI, SHI and SESE Q-K revealed no speech handicap and near-normal swallowing ability indicating the improved QOL after the intensive speech and swallowing therapy. Conclusion: Speech and swallowing therapy post carcinoma in the tonsillar lingual sulcus is crucial as the tongue plays an important role in both speech and swallowing. The role of Speech-language and swallowing therapists in oral cancer should be highlighted in treating these patients and improving the overall quality of life. With intensive speech-language and swallowing therapy post-surgery for oral cancer, there can be a significant change in the speech outcome and swallowing functions depending on the site and extent of lesions which will thereby improve the individual’s QOL.Keywords: oral cancer, speech and swallowing therapy, speech intelligibility, trismus, quality of life
Procedia PDF Downloads 1128946 The Communicative Nature of Linguistic Interference in Learning and Teaching of Slavic Languages
Authors: Kseniia Fedorova
Abstract:
The article is devoted to interlinguistic homonymy and enantiosemy analysis. These phenomena belong to the process of linguistic interference, which leads to violation of the communicative utterances integrity and causes misunderstanding between foreign interlocutors - native speakers of different Slavic languages. More attention is paid to investigation of non-typical speech situations, which occurred spontaneously or created by somebody intentionally being based on described phenomenon mechanism. The classification of typical students' mistakes connected with the paradox of interference is being represented in the article. The survey contributes to speech act theory, contemporary linguodidactics, translation science and comparative lexicology of Slavonic languages.Keywords: adherent enantiosemy, interference, interslavonic homonymy, speech act
Procedia PDF Downloads 2468945 Speech Emotion Recognition with Bi-GRU and Self-Attention based Feature Representation
Authors: Bubai Maji, Monorama Swain
Abstract:
Speech is considered an essential and most natural medium for the interaction between machines and humans. However, extracting effective features for speech emotion recognition (SER) is remains challenging. The present studies show that the temporal information captured but high-level temporal-feature learning is yet to be investigated. In this paper, we present an efficient novel method using the Self-attention (SA) mechanism in a combination of Convolutional Neural Network (CNN) and Bi-directional Gated Recurrent Unit (Bi-GRU) network to learn high-level temporal-feature. In order to further enhance the representation of the high-level temporal-feature, we integrate a Bi-GRU output with learnable weights features by SA, and improve the performance. We evaluate our proposed method on our created SITB-OSED and IEMOCAP databases. We report that the experimental results of our proposed method achieve state-of-the-art performance on both databases.Keywords: Bi-GRU, 1D-CNNs, self-attention, speech emotion recognition
Procedia PDF Downloads 1148944 Investigating the Online Effect of Language on Gesture in Advanced Bilinguals of Two Structurally Different Languages in Comparison to L1 Native Speakers of L2 and Explores Whether Bilinguals Will Follow Target L2 Patterns in Speech and Co-speech
Authors: Armita Ghobadi, Samantha Emerson, Seyda Ozcaliskan
Abstract:
Being a bilingual involves mastery of both speech and gesture patterns in a second language (L2). We know from earlier work in first language (L1) production contexts that speech and co-speech gesture form a tightly integrated system: co-speech gesture mirrors the patterns observed in speech, suggesting an online effect of language on nonverbal representation of events in gesture during the act of speaking (i.e., “thinking for speaking”). Relatively less is known about the online effect of language on gesture in bilinguals speaking structurally different languages. The few existing studies—mostly with small sample sizes—suggests inconclusive findings: some show greater achievement of L2 patterns in gesture with more advanced L2 speech production, while others show preferences for L1 gesture patterns even in advanced bilinguals. In this study, we focus on advanced bilingual speakers of two structurally different languages (Spanish L1 with English L2) in comparison to L1 English speakers. We ask whether bilingual speakers will follow target L2 patterns not only in speech but also in gesture, or alternatively, follow L2 patterns in speech but resort to L1 patterns in gesture. We examined this question by studying speech and gestures produced by 23 advanced adult Spanish (L1)-English (L2) bilinguals (Mage=22; SD=7) and 23 monolingual English speakers (Mage=20; SD=2). Participants were shown 16 animated motion event scenes that included distinct manner and path components (e.g., "run over the bridge"). We recorded and transcribed all participant responses for speech and segmented it into sentence units that included at least one motion verb and its associated arguments. We also coded all gestures that accompanied each sentence unit. We focused on motion event descriptions as it shows strong crosslinguistic differences in the packaging of motion elements in speech and co-speech gesture in first language production contexts. English speakers synthesize manner and path into a single clause or gesture (he runs over the bridge; running fingers forward), while Spanish speakers express each component separately (manner-only: el corre=he is running; circle arms next to body conveying running; path-only: el cruza el puente=he crosses the bridge; trace finger forward conveying trajectory). We tallied all responses by group and packaging type, separately for speech and co-speech gesture. Our preliminary results (n=4/group) showed that productions in English L1 and Spanish L1 differed, with greater preference for conflated packaging in L1 English and separated packaging in L1 Spanish—a pattern that was also largely evident in co-speech gesture. Bilinguals’ production in L2 English, however, followed the patterns of the target language in speech—with greater preference for conflated packaging—but not in gesture. Bilinguals used separated and conflated strategies in gesture in roughly similar rates in their L2 English, showing an effect of both L1 and L2 on co-speech gesture. Our results suggest that online production of L2 language has more limited effects on L2 gestures and that mastery of native-like patterns in L2 gesture might take longer than native-like L2 speech patterns.Keywords: bilingualism, cross-linguistic variation, gesture, second language acquisition, thinking for speaking hypothesis
Procedia PDF Downloads 768943 Cognitive Semantics Study of Conceptual and Metonymical Expressions in Johnson's Speeches about COVID-19
Authors: Hussain Hameed Mayuuf
Abstract:
The study is an attempt to investigate the conceptual metonymies is used in political discourse about COVID-19. Thus, this study tries to analyze and investigate how the conceptual metonymies in Johnson's speech about coronavirus are constructed. This study aims at: Identifying how are metonymies relevant to understand the messages in Boris Johnson speeches and to find out how can conceptual blending theory help people to understand the messages in the political speech about COVID-19. Lastly, it tries to Point out the kinds of integration networks are common in political speech. The study is based on the hypotheses that conceptual blending theory is a powerful tool for investigating the intended messages in Johnson's speech and there are different processes of blending networks and conceptual mapping that enable the listeners to identify the messages in political speech. This study presents a qualitative and quantitative analysis of four speeches about COVID-19; they are said by Boris Johnson. The selected data have been tackled from the cognitive-semantic perspective by adopting Conceptual Blending Theory as a model for the analysis. It concludes that CBT is applicable to the analysis of metonymies in political discourse. Its mechanisms enable listeners to analyze and understand these speeches. Also the listener can identify and understand the hidden messages in Biden and Johnson's discourse about COVID-19 by using different conceptual networks. Finally, it is concluded that the double scope networks are the most common types of blending of metonymies in the political speech.Keywords: cognitive, semantics, conceptual, metonymical, Covid-19
Procedia PDF Downloads 1308942 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses
Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas
Abstract:
We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.Keywords: transient noise pulses, noise reduction, dynamic time warping, speech recognition
Procedia PDF Downloads 5598941 Human Security and Human Trafficking Related Corruption
Authors: Ekin D. Horzum
Abstract:
The aim of the proposal is to examine the relationship between human trafficking related corruption and human security. The proposal suggests that the human trafficking related corruption is about willingness of the states to turn a blind eye to the human trafficking cases. Therefore, it is important to approach human trafficking related corruption in terms of human security and human rights violation to find an effective way to fight against human trafficking. In this context, the purpose of this proposal is to examine the human trafficking related corruption as a safe haven in which trafficking thrives for perpetrators.Keywords: human trafficking, human security, human rights, corruption, organized crime
Procedia PDF Downloads 4768940 The Combination of the Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), JITTER and SHIMMER Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech
Authors: Brahim-Fares Zaidi, Malika Boudraa, Sid-Ahmed Selouani
Abstract:
Our work aims to improve our Automatic Recognition System for Dysarthria Speech (ARSDS) based on the Hidden Models of Markov (HMM) and the Hidden Markov Model Toolkit (HTK) to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients (MFCC's) and Perceptual Linear Prediction (PLP's) and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.Keywords: hidden Markov model toolkit (HTK), hidden models of Markov (HMM), Mel-frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP’s)
Procedia PDF Downloads 1648939 The International Prohibition of Religiously-Motivated 'Incitement' to Violence
Authors: J. D. Temperman
Abstract:
Introduction: In particular, in relation to religion, the meaning and scope of freedom of expression have been tested in recent times. This paper investigates the legal justifications for restrictions that have been suggested in this area and asks whether they are sustainable from an international human rights perspective. The universal human rights instruments, particularly the UN International Covenant on Civil and Political Rights (ICCPR), are increasingly geared towards eradicating ‘incitement’ to contingent harms like violence or discrimination, whilst forms of extreme speech that fall short of such incitement are to be protected rather than countered by states. Human Rights Committee’s draft-General Comment on freedom of expression, adopted in 2011, provides another strong indication that this is the envisaged way forward: repealing anti-blasphemy and anti-religious defamation laws, whilst simultaneously increasing efforts to combat ‘incitement’. Within regional human rights frameworks, notably the European Convention system, judgments have in fact supported legal restrictions on both hate speech, holocaust denial, and blasphemy or religious defamation. Major contributions to scholarship: This paper proposes an actus reus for the offense of ‘advocacy of religious hatred that constitutes incitement to discrimination or violence’, as enshrined in Article 20(2) of the UN ICCPR. In underscoring the high threshold of ‘incitement’, the author distinguishes this offense from such notions as ‘blasphemy’ or ‘defamation of religions’. In addition to treating the said provision as a sui generis prohibition, the question is addresses whether a ‘right to be protected against incitement’ may be distilled from the ICCPR. Furthermore, the author will discuss the question of how to judge incitement; notably, is mens rea required to convict someone of incitement, and if so, what degree of mens rea? This analysis also includes the question how to balance content and context factors when addressing alleged instances of incitement, notably what factors make provide for a likelihood that imminent acts of violence or discrimination will ensue from an inciteful speech act? Methodology: This paper takes a double comparative approach: (i) it endeavours to compare and contrast monitoring bodies’ approach to incitement (notably, the UN Human Rights Committee, but also the UN Committee on the Elimination of Racial Discrimination which monitors states’ compliance with Article 4 of ICERD on incitement); and (ii) it endeavours to chart and compare and analyse from an international human rights perspective recent forms of state practice in the field of dealing with incitement (i.e. a comparative legal analysis and vertical human rights analysis of newly emerging incitement legislation in the light of the said international standards). Conclusion: This paper conceptualizes a legal notion – ‘incitement’ – encapsulated in international human rights law that may have a profound bearing on contemporary challenges of radicalization and religious strife.Keywords: incitement, international human rights law, religious hatred, violence
Procedia PDF Downloads 3088938 Cultural-Creative Design with Language Figures of Speech
Authors: Wei Chen Chang, Ming Yu Hsiao
Abstract:
The commodity takes one kind of mark, the designer how to construction and interpretation the user how to use the process and effectively convey message in design education has always been an important issue. Cultural-creative design refers to signifying cultural heritage for product design. In terms of Peirce’s Semiotic Triangle: signifying elements-object-interpretant, signifying elements are the outcomes of design, the object is cultural heritage, and the interpretant is the positioning and description of product design. How to elaborate the positioning, design, and development of a product is a narrative issue of the interpretant, and how to shape the signifying elements of a product by modifying and adapting styles is a rhetoric matter. This study investigated the rhetoric of elements signifying products to develop a rhetoric model with cultural style. Figures of speech are a rhetoric method in narrative. By adapting figures of speech to the interpretant, this study developed the rhetoric context of cultural context by narrative means. In this two-phase study, phase I defines figures of speech and phase II analyzes existing cultural-creative products in terms of figures of speech to develop a rhetoric of style model. We expect it can reference for the future development of Cultural-creative design.Keywords: cultural-creative design, cultural-creative products, figures of speech, Peirce’s semiotic triangle, rhetoric of style model
Procedia PDF Downloads 3738937 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech
Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori
Abstract:
Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing
Procedia PDF Downloads 1338936 Performance Analysis of VoIP Coders for Different Modulations Under Pervasive Environment
Authors: Jasbinder Singh, Harjit Pal Singh, S. A. Khan
Abstract:
The work, in this paper, presents the comparison of encoded speech signals by different VoIP narrow-band and wide-band codecs for different modulation schemes. The simulation results indicate that codec has an impact on the speech quality and also effected by modulation schemes.Keywords: VoIP, coders, modulations, BER, MOS
Procedia PDF Downloads 5198935 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition
Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie
Abstract:
In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks
Procedia PDF Downloads 1148934 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse
Authors: Sheena Christabel Pravin, M. Palanivelan
Abstract:
Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies
Procedia PDF Downloads 2178933 Emotional and Physiological Reaction While Listening the Speech of Adults Who Stutter
Authors: Xharavina V., Gallopeni F., Ahmeti K.
Abstract:
Stuttered speech is filled with intermittent sound prolongations and/or rapid part word repetitions. Oftentimes, these aberrant acoustic behaviors are associated with intermittent physical tension and struggle behaviors such as head jerks, arm jerks, finger tapping, excessive eye-blinks, etc. Additionally, the jarring nature of acoustic and physical manifestations that often accompanies moderate-severe stuttering may induce negative emotional responses in listeners, which alters communication between the person who stutters and their listeners. However, researches for the influence of negative emotions in the communication and for physical reaction are limited. Therefore, to compare psycho-physiological responses of fluent adults, while listening the speech of adults who speak fluency and adults who stutter, are necessary. This study comprises the experimental method, with total of 104 participants (average age-20 years old, SD=2.1), divided into 3 groups. All participants self-reported no impairments in speech, language, or hearing. Exploring the responses of the participants, there were used two records speeches; a voice who speaks fluently and the voice who stutters. Heartbeats and the pulse were measured by the digital blood pressure monitor called 'Tensoval', as a physiological response to the fluent and stuttering sample. Meanwhile, the emotional responses of participants were measured by the self-reporting questionnaire (Steenbarger, 2001). Results showed an increase in heartbeats during the stuttering speech compared with the fluent sample (p < 0.5). The listeners also self-reported themselves as more alive, unhappy, nervous, repulsive, sad, tense, distracted and upset when listening the stuttering words versus the words of the fluent adult (where it was reported to experience positive emotions). These data support the notions that speech with stuttering can bring a psycho-physical reaction to the listeners. Speech pathologists should be aware that listeners show intolerable physiological reactions to stuttering that remain visible over time.Keywords: emotional, physiological, stuttering, fluent speech
Procedia PDF Downloads 1438932 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition
Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar
Abstract:
In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers
Procedia PDF Downloads 458931 Speech Acts of Selected Classroom Encounters: Analyzing the Speech Acts of a Career Technology Lesson
Authors: Michael Amankwaa Adu
Abstract:
Effective communication in the classroom plays a vital role in ensuring successful teaching and learning. In particular, the types of language and speech acts teachers use shape classroom interactions and influence student engagement. This study aims to analyze the speech acts employed by a Career Technology teacher in a junior high school. While much research has focused on speech acts in language classrooms, less attention has been given to how these acts operate in non-language subject areas like technical education. The study explores how different types of speech acts—directives, assertives, expressives, and commissives—are used during three classroom encounters: lesson introduction, content delivery, and classroom management. This research seeks to fill the gap in understanding how teachers of non-language subjects use speech acts to manage classroom dynamics and facilitate learning. The study employs a mixed-methods design, combining qualitative and quantitative approaches. Data was collected through direct classroom observation and audio recordings of a one-hour Career Technology lesson. The transcriptions of the lesson were analyzed using John Searle’s taxonomy of speech acts, classifying the teacher’s utterances into directives, assertives, expressives, and commissives. Results show that directives were the most frequently used speech act, accounting for 59.3% of the teacher's utterances. These speech acts were essential in guiding student behavior, giving instructions, and maintaining classroom control. Assertives made up 20.4% of the speech acts, primarily used for stating facts and reinforcing content. Expressives, at 14.2%, expressed emotions such as approval or frustration, helping to manage the emotional atmosphere of the classroom. Commissives were the least used, representing 6.2% of the speech acts, often used to set expectations or outline future actions. No declarations were observed during the lesson. The findings of this study reveal the critical role that speech acts play in managing classroom behavior and delivering content in technical subjects. Directives were crucial for ensuring students followed instructions and completed tasks, while assertives helped in reinforcing lesson objectives. Expressives contributed to motivating or disciplining students, and commissives, though less frequent, helped set clear expectations for students’ future actions. The absence of declarations suggests that the teacher prioritized guiding students over making formal pronouncements. These insights can inform teaching strategies across various subject areas, demonstrating that a diverse use of speech acts can create a balanced and interactive learning environment. This study contributes to the growing field of pragmatics in education and offers practical recommendations for educators, particularly in non-language classrooms, on how to utilize speech acts to enhance both classroom management and student engagement.Keywords: classroom interaction, pragmatics, speech acts, teacher communication, career technology
Procedia PDF Downloads 22