Search results for: speech detection
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4026

Search results for: speech detection

3996 Detection of Phoneme [S] Mispronounciation for Sigmatism Diagnosis in Adults

Authors: Michal Krecichwost, Zauzanna Miodonska, Pawel Badura

Abstract:

The diagnosis of sigmatism is mostly based on the observation of articulatory organs. It is, however, not always possible to precisely observe the vocal apparatus, in particular in the oral cavity of the patient. Speech processing can allow to objectify the therapy and simplify the verification of its progress. In the described study the methodology for classification of incorrectly pronounced phoneme [s] is proposed. The recordings come from adults. They were registered with the speech recorder at the sampling rate of 44.1 kHz and the resolution of 16 bit. The database of pathological and normative speech has been collected for the study including reference assessments provided by the speech therapy experts. Ten adult subjects were asked to simulate a certain type of stigmatism under the speech therapy expert supervision. In the recordings, the analyzed phone [s] was surrounded by vowels, viz: ASA, ESE, ISI, SPA, USU, YSY. Thirteen MFCC (mel-frequency cepstral coefficients) and RMS (root mean square) values are calculated within each frame being a part of the analyzed phoneme. Additionally, 3 fricative formants along with corresponding amplitudes are determined for the entire segment. In order to aggregate the information within the segment, the average value of each MFCC coefficient is calculated. All features of other types are aggregated by means of their 75th percentile. The proposed method of features aggregation reduces the size of the feature vector used in the classification. Binary SVM (support vector machine) classifier is employed at the phoneme recognition stage. The first group consists of pathological phones, while the other of the normative ones. The proposed feature vector yields classification sensitivity and specificity measures above 90% level in case of individual logo phones. The employment of a fricative formants-based information improves the sole-MFCC classification results average of 5 percentage points. The study shows that the employment of specific parameters for the selected phones improves the efficiency of pathology detection referred to the traditional methods of speech signal parameterization.

Keywords: computer-aided pronunciation evaluation, sibilants, sigmatism diagnosis, speech processing

Procedia PDF Downloads 256
3995 Speech Acts and Politeness Strategies in an EFL Classroom in Georgia

Authors: Tinatin Kurdghelashvili

Abstract:

The paper deals with the usage of speech acts and politeness strategies in an EFL classroom in Georgia (Rep of). It explores the students’ and the teachers’ practice of the politeness strategies and the speech acts of apology, thanking, request, compliment/encouragement, command, agreeing/disagreeing, addressing and code switching. The research method includes observation as well as a questionnaire. The target group involves the students from Georgian public schools and two certified, experienced local English teachers. The analysis is based on Searle’s Speech Act Theory and Brown and Levinson’s politeness strategies. The findings show that the students have certain knowledge regarding politeness yet they fail to apply them in English communication. In addition, most of the speech acts from the classroom interaction are used by the teachers and not the students. Thereby, it is suggested that teachers should cultivate the students’ communicative competence and attempt to give them opportunities to practice more English speech acts than they do today.

Keywords: english as a foreign language, Georgia, politeness principles, speech acts

Procedia PDF Downloads 609
3994 The Influence of Advertising Captions on the Internet through the Consumer Purchasing Decision

Authors: Suwimol Apapol, Punrapha Praditpong

Abstract:

The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions as well as the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of several Goods service. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively which is in harmony with the research hypotheses as well.

Keywords: advertising captions, captions on internet, consumer purchasing decision, e-commerce

Procedia PDF Downloads 245
3993 A Novel RLS Based Adaptive Filtering Method for Speech Enhancement

Authors: Pogula Rakesh, T. Kishore Kumar

Abstract:

Speech enhancement is a long standing problem with numerous applications like teleconferencing, VoIP, hearing aids, and speech recognition. The motivation behind this research work is to obtain a clean speech signal of higher quality by applying the optimal noise cancellation technique. Real-time adaptive filtering algorithms seem to be the best candidate among all categories of the speech enhancement methods. In this paper, we propose a speech enhancement method based on Recursive Least Squares (RLS) adaptive filter of speech signals. Experiments were performed on noisy data which was prepared by adding AWGN, Babble and Pink noise to clean speech samples at -5dB, 0dB, 5dB, and 10dB SNR levels. We then compare the noise cancellation performance of proposed RLS algorithm with existing NLMS algorithm in terms of Mean Squared Error (MSE), Signal to Noise ratio (SNR), and SNR loss. Based on the performance evaluation, the proposed RLS algorithm was found to be a better optimal noise cancellation technique for speech signals.

Keywords: adaptive filter, adaptive noise canceller, mean squared error, noise reduction, NLMS, RLS, SNR, SNR loss

Procedia PDF Downloads 448
3992 Detecting Hate Speech And Cyberbullying Using Natural Language Processing

Authors: Nádia Pereira, Paula Ferreira, Sofia Francisco, Sofia Oliveira, Sidclay Souza, Paula Paulino, Ana Margarida Veiga Simão

Abstract:

Social media has progressed into a platform for hate speech among its users, and thus, there is an increasing need to develop automatic detection classifiers of offense and conflicts to help decrease the prevalence of such incidents. Online communication can be used to intentionally harm someone, which is why such classifiers could be essential in social networks. A possible application of these classifiers is the automatic detection of cyberbullying. Even though identifying the aggressive language used in online interactions could be important to build cyberbullying datasets, there are other criteria that must be considered. Being able to capture the language, which is indicative of the intent to harm others in a specific context of online interaction is fundamental. Offense and hate speech may be the foundation of online conflicts, which have become commonly used in social media and are an emergent research focus in machine learning and natural language processing. This study presents two Portuguese language offense-related datasets which serve as examples for future research and extend the study of the topic. The first is similar to other offense detection related datasets and is entitled Aggressiveness dataset. The second is a novelty because of the use of the history of the interaction between users and is entitled the Conflicts/Attacks dataset. Both datasets were developed in different phases. Firstly, we performed a content analysis of verbal aggression witnessed by adolescents in situations of cyberbullying. Secondly, we computed frequency analyses from the previous phase to gather lexical and linguistic cues used to identify potentially aggressive conflicts and attacks which were posted on Twitter. Thirdly, thorough annotation of real tweets was performed byindependent postgraduate educational psychologists with experience in cyberbullying research. Lastly, we benchmarked these datasets with other machine learning classifiers.

Keywords: aggression, classifiers, cyberbullying, datasets, hate speech, machine learning

Procedia PDF Downloads 195
3991 Prosodic Characteristics of Post Traumatic Stress Disorder Induced Speech Changes

Authors: Jarek Krajewski, Andre Wittenborn, Martin Sauerland

Abstract:

This abstract describes a promising approach for estimating post-traumatic stress disorder (PTSD) based on prosodic speech characteristics. It illustrates the validity of this method by briefly discussing results from an Arabic refugee sample (N= 47, 32 m, 15 f). A well-established standardized self-report scale “Reaction of Adolescents to Traumatic Stress” (RATS) was used to determine the ground truth level of PTSD. The speech material was prompted by telling about autobiographical related sadness inducing experiences (sampling rate 16 kHz, 8 bit resolution). In order to investigate PTSD-induced speech changes, a self-developed set of 136 prosodic speech features was extracted from the .wav files. This set was adapted to capture traumatization related speech phenomena. An artificial neural network (ANN) machine learning model was applied to determine the PTSD level and reached a correlation of r = .37. These results indicate that our classifiers can achieve similar results to those seen in speech-based stress research.

Keywords: speech prosody, PTSD, machine learning, feature extraction

Procedia PDF Downloads 65
3990 An Algorithm Based on the Nonlinear Filter Generator for Speech Encryption

Authors: A. Belmeguenai, K. Mansouri, R. Djemili

Abstract:

This work present a new algorithm based on the nonlinear filter generator for speech encryption and decryption. The proposed algorithm consists on the use a linear feedback shift register (LFSR) whose polynomial is primitive and nonlinear Boolean function. The purpose of this system is to construct Keystream with good statistical properties, but also easily computable on a machine with limited capacity calculated. This proposed speech encryption scheme is very simple, highly efficient, and fast to implement the speech encryption and decryption. We conclude the paper by showing that this system can resist certain known attacks.

Keywords: nonlinear filter generator, stream ciphers, speech encryption, security analysis

Procedia PDF Downloads 266
3989 Modern Machine Learning Conniptions for Automatic Speech Recognition

Authors: S. Jagadeesh Kumar

Abstract:

This expose presents a luculent of recent machine learning practices as employed in the modern and as pertinent to prospective automatic speech recognition schemes. The aspiration is to promote additional traverse ablution among the machine learning and automatic speech recognition factions that have transpired in the precedent. The manuscript is structured according to the chief machine learning archetypes that are furthermore trendy by now or have latency for building momentous hand-outs to automatic speech recognition expertise. The standards offered and convoluted in this article embraces adaptive and multi-task learning, active learning, Bayesian learning, discriminative learning, generative learning, supervised and unsupervised learning. These learning archetypes are aggravated and conferred in the perspective of automatic speech recognition tools and functions. This manuscript bequeaths and surveys topical advances of deep learning and learning with sparse depictions; further limelight is on their incessant significance in the evolution of automatic speech recognition.

Keywords: automatic speech recognition, deep learning methods, machine learning archetypes, Bayesian learning, supervised and unsupervised learning

Procedia PDF Downloads 414
3988 Prosody Generation in Neutral Speech Storytelling Application Using Tilt Model

Authors: Manjare Chandraprabha A., S. D. Shirbahadurkar, Manjare Anil S., Paithne Ajay N.

Abstract:

This paper proposes Intonation Modeling for Prosody generation in Neutral speech for Marathi (language spoken in Maharashtra, India) story telling applications. Nowadays audio story telling devices are very eminent for children. In this paper, we proposed tilt model for stressed words in Marathi for speech modification. Tilt model predicts modification in tone of neutral speech. GMM is used to identify stressed words for modification.

Keywords: tilt model, fundamental frequency, statistical parametric speech synthesis, GMM

Procedia PDF Downloads 359
3987 The Importance of Right Speech in Buddhism and Its Relevance Today

Authors: Gautam Sharda

Abstract:

The concept of right speech is the third stage of the noble eightfold path as prescribed by the Buddha and followed by millions of practicing Buddhists. The Buddha lays a lot of importance on the notion of right speech (Samma Vacca). In the Angutara Nikaya, the Buddha mentioned what constitutes right speech, which is basically four kinds of abstentions; namely abstaining from false speech, abstaining from slanderous speech, abstaining from harsh or hateful speech and abstaining from idle chatter. The Buddha gives reasons in support of his view as to why abstaining from these four kinds of speeches is favourable not only for maintaining the peace and equanimity within an individual but also within a society. It is a known fact that when we say something harsh or slanderous to others, it eventually affects our individual peace of mind too. We also know about the many examples of hate speeches which have led to senseless cases of violence and which are well documented within our country and the world. Also, indulging in false speech is not a healthy sign for individuals within a group as this kind of a social group which is based on falsities and lies cannot really survive for long and will eventually lead to chaos. Buddha also told us to refrain from idle chatter or gossip as generally we have seen that idle chatter or gossip does more harm than any good to the individual and the society. Hence, if most of us actually inculcate this third stage (namely, right speech) of the noble eightfold path of the Buddha in our daily life, it would be highly beneficial both for the individual and for the harmony of the society.

Keywords: Buddhism, speech, individual, society

Procedia PDF Downloads 230
3986 Advances in Artificial intelligence Using Speech Recognition

Authors: Khaled M. Alhawiti

Abstract:

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.

Keywords: speech recognition, acoustic phonetic, artificial intelligence, hidden markov models (HMM), statistical models of speech recognition, human machine performance

Procedia PDF Downloads 445
3985 Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns

Authors: Christian Arcos, Marley Vellasco, Abraham Alcaim

Abstract:

In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario.

Keywords: binary labels, local binary patterns, mask, wavelet coefficients, speech enhancement, speech recognition

Procedia PDF Downloads 197
3984 Application of the Bionic Wavelet Transform and Psycho-Acoustic Model for Speech Compression

Authors: Chafik Barnoussi, Mourad Talbi, Adnane Cherif

Abstract:

In this paper we propose a new speech compression system based on the application of the Bionic Wavelet Transform (BWT) combined with the psychoacoustic model. This compression system is a modified version of the compression system using a MDCT (Modified Discrete Cosine Transform) filter banks of 32 filters each and the psychoacoustic model. This modification consists in replacing the banks of the MDCT filter banks by the bionic wavelet coefficients which are obtained from the application of the BWT to the speech signal to be compressed. These two methods are evaluated and compared with each other by computing bits before and bits after compression. They are tested on different speech signals and the obtained simulation results show that the proposed technique outperforms the second technique and this in term of compressed file size. In term of SNR, PSNR and NRMSE, the outputs speech signals of the proposed compression system are with acceptable quality. In term of PESQ and speech signal intelligibility, the proposed speech compression technique permits to obtain reconstructed speech signals with good quality.

Keywords: speech compression, bionic wavelet transform, filterbanks, psychoacoustic model

Procedia PDF Downloads 355
3983 Speech Intelligibility Improvement Using Variable Level Decomposition DWT

Authors: Samba Raju, Chiluveru, Manoj Tripathy

Abstract:

Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methods

Keywords: discrete wavelet transform, speech intelligibility, STOI, standard deviation

Procedia PDF Downloads 114
3982 The Language Use of Middle Eastern Freedom Activists' Speeches: A Gender Perspective

Authors: Sulistyaningtyas

Abstract:

Examining the role of Middle Eastern freedom activists’ speech based on gender perspective is considered noteworthy because the society in the Middle East is patriarchal. This research aims to examine the language use of the Middle Eastern freedom activists’ speeches through gender perspective. The data sources are from male and female Middle Eastern freedom activists’ speech videos. In analyzing the data, the theories employed are about Language Style from Gender Perspective and The Language for Speech. The result reveals that there are sets of spoken language differences between male and female speakers. In using the language for speech, both male and female speakers produce metaphor, euphemism, the ‘rule of three’, parallelism, and pronouns in random frequency of production, which cannot be separated by genders. Moreover, it cannot be concluded that one gender is more potential than the other to influence the audience in delivering speech. There are other factors, particularly non-verbal factors, existing to give impacts on how a speech can influence the audience.

Keywords: gender perspective, language use, Middle Eastern freedom activists, speech

Procedia PDF Downloads 398
3981 Considering Cultural and Linguistic Variables When Working as a Speech-Language Pathologist with Multicultural Students

Authors: Gabriela Smeckova

Abstract:

The entire world is becoming more and more diverse. The reasons why people migrate are different and unique for each family /individual. Professionals delivering services (including speech-language pathologists) must be prepared to work with clients coming from different cultural and/or linguistic backgrounds. Well-educated speech-language pathologists will consider many factors when delivering services. Some of them will be discussed during the presentation (language spoken, beliefs about health care and disabilities, reasons for immigration, etc.). The communication styles of the client can be different than the styles of the speech-language pathologist. The goal is to become culturally responsive in service delivery.

Keywords: culture, cultural competence, culturallly responsive practices, speech-language pathologist, cultural and linguistical variables, communication styles

Procedia PDF Downloads 46
3980 Effect of Noise Reduction Algorithms on Temporal Splitting of Speech Signal to Improve Speech Perception for Binaural Hearing Aids

Authors: Rajani S. Pujar, Pandurangarao N. Kulkarni

Abstract:

Increased temporal masking affects the speech perception in persons with sensorineural hearing impairment especially under adverse listening conditions. This paper presents a cascaded scheme, which employs a noise reduction algorithm as well as temporal splitting of the speech signal. Earlier investigations have shown that by splitting the speech temporally and presenting alternate segments to the two ears help in reducing the effect of temporal masking. In this technique, the speech signal is processed by two fading functions, complementary to each other, and presented to left and right ears for binaural dichotic presentation. In the present study, half cosine signal is used as a fading function with crossover gain of 6 dB for the perceptual balance of loudness. Temporal splitting is combined with noise reduction algorithm to improve speech perception in the background noise. Two noise reduction schemes, namely spectral subtraction and Wiener filter are used. Listening tests were conducted on six normal-hearing subjects, with sensorineural loss simulated by adding broadband noise to the speech signal at different signal-to-noise ratios (∞, 3, 0, and -3 dB). Objective evaluation using PESQ was also carried out. The MOS score for VCV syllable /asha/ for SNR values of ∞, 3, 0, and -3 dB were 5, 4.46, 4.4 and 4.05 respectively, while the corresponding MOS scores for unprocessed speech were 5, 1.2, 0.9 and 0.65, indicating significant improvement in the perceived speech quality for the proposed scheme compared to the unprocessed speech.

Keywords: MOS, PESQ, spectral subtraction, temporal splitting, wiener filter

Procedia PDF Downloads 303
3979 Efficacy of a Wiener Filter Based Technique for Speech Enhancement in Hearing Aids

Authors: Ajish K. Abraham

Abstract:

Hearing aid is the most fundamental technology employed towards rehabilitation of persons with sensory neural hearing impairment. Hearing in noise is still a matter of major concern for many hearing aid users and thus continues to be a challenging issue for the hearing aid designers. Several techniques are being currently used to enhance the speech at the hearing aid output. Most of these techniques, when implemented, result in reduction of intelligibility of the speech signal. Thus the dissatisfaction of the hearing aid user towards comprehending the desired speech amidst noise is prevailing. Multichannel Wiener Filter is widely implemented in binaural hearing aid technology for noise reduction. In this study, Wiener filter based noise reduction approach is experimented for a single microphone based hearing aid set up. This method checks the status of the input speech signal in each frequency band and then selects the relevant noise reduction procedure. Results showed that the Wiener filter based algorithm is capable of enhancing speech even when the input acoustic signal has a very low Signal to Noise Ratio (SNR). Performance of the algorithm was compared with other similar algorithms on the basis of improvement in intelligibility and SNR of the output, at different SNR levels of the input speech. Wiener filter based algorithm provided significant improvement in SNR and intelligibility compared to other techniques.

Keywords: hearing aid output speech, noise reduction, SNR improvement, Wiener filter, speech enhancement

Procedia PDF Downloads 225
3978 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children

Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman

Abstract:

Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.

Keywords: Automatic Speech Recognition System, children speech, adaptation, Malay

Procedia PDF Downloads 366
3977 The Complaint Speech Act Set Produced by Arab Students in the UAE

Authors: Tanju Deveci

Abstract:

It appears that the speech act of complaint has not received as much attention as other speech acts. However, the face-threatening nature of this speech act requires a special attention in multicultural contexts in particular. The teaching context in the UAE universities, where a big majority of teaching staff comes from other cultures, requires investigations into this speech act in order to improve communication between students and faculty. This session will outline the results of a study conducted with this purpose. The realization of complaints by Freshman English students in Communication courses at Petroleum Institute was investigated to identify communication patterns that seem to cause a strain. Data were collected using a role-play between a teacher and students, and a judgment scale completed by two of the instructors in the Communications Department. The initial findings reveal that the students had difficulty putting their case, produced the speech act of criticism along with a complaint and that they produced both requests and demands as candidate solutions. The judgement scales revealed that the students’ attitude was not appropriate most of the time and that the judges would behave differently from students. It is concluded that speech acts, in general, and complaint, in particular, need to be taught to learners explicitly to improve interpersonal communication in multicultural societies. Some teaching ideas are provided to help increase foreign language learners’ sociolinguistic competence.

Keywords: speech act, complaint, pragmatics, sociolinguistics, language teaching

Procedia PDF Downloads 480
3976 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 258
3975 On Overcoming Common Oral Speech Problems through Authentic Films

Authors: Tamara Matevosyan

Abstract:

The present paper discusses the main problems that students face while developing oral skills through authentic films. It states that special attention should be paid not only to the study of verbal speech but also to non-verbal communication. Authentic films serve as an important tool to understand both native speaker’s gestures and their culture of pausing while speaking. Various phonetic difficulties causing phonetic interference in actual speech are covered in the paper emphasizing the role of authentic films in overcoming them.

Keywords: compressive speech, filled pauses, unfilled pauses, pausing culture

Procedia PDF Downloads 318
3974 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 263
3973 The Convolution Recurrent Network of Using Residual LSTM to Process the Output of the Downsampling for Monaural Speech Enhancement

Authors: Shibo Wei, Ting Jiang

Abstract:

Convolutional-recurrent neural networks (CRN) have achieved much success recently in the speech enhancement field. The common processing method is to use the convolution layer to compress the feature space by multiple upsampling and then model the compressed features with the LSTM layer. At last, the enhanced speech is obtained by deconvolution operation to integrate the global information of the speech sequence. However, the feature space compression process may cause the loss of information, so we propose to model the upsampling result of each step with the residual LSTM layer, then join it with the output of the deconvolution layer and input them to the next deconvolution layer, by this way, we want to integrate the global information of speech sequence better. The experimental results show the network model (RES-CRN) we introduce can achieve better performance than LSTM without residual and overlaying LSTM simply in the original CRN in terms of scale-invariant signal-to-distortion ratio (SI-SNR), speech quality (PESQ), and intelligibility (STOI).

Keywords: convolutional-recurrent neural networks, speech enhancement, residual LSTM, SI-SNR

Procedia PDF Downloads 171
3972 Exploring Pre-Trained Automatic Speech Recognition Model HuBERT for Early Alzheimer’s Disease and Mild Cognitive Impairment Detection in Speech

Authors: Monica Gonzalez Machorro

Abstract:

Dementia is hard to diagnose because of the lack of early physical symptoms. Early dementia recognition is key to improving the living condition of patients. Speech technology is considered a valuable biomarker for this challenge. Recent works have utilized conventional acoustic features and machine learning methods to detect dementia in speech. BERT-like classifiers have reported the most promising performance. One constraint, nonetheless, is that these studies are either based on human transcripts or on transcripts produced by automatic speech recognition (ASR) systems. This research contribution is to explore a method that does not require transcriptions to detect early Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This is achieved by fine-tuning a pre-trained ASR model for the downstream early AD and MCI tasks. To do so, a subset of the thoroughly studied Pitt Corpus is customized. The subset is balanced for class, age, and gender. Data processing also involves cropping the samples into 10-second segments. For comparison purposes, a baseline model is defined by training and testing a Random Forest with 20 extracted acoustic features using the librosa library implemented in Python. These are: zero-crossing rate, MFCCs, spectral bandwidth, spectral centroid, root mean square, and short-time Fourier transform. The baseline model achieved a 58% accuracy. To fine-tune HuBERT as a classifier, an average pooling strategy is employed to merge the 3D representations from audio into 2D representations, and a linear layer is added. The pre-trained model used is ‘hubert-large-ls960-ft’. Empirically, the number of epochs selected is 5, and the batch size defined is 1. Experiments show that our proposed method reaches a 69% balanced accuracy. This suggests that the linguistic and speech information encoded in the self-supervised ASR-based model is able to learn acoustic cues of AD and MCI.

Keywords: automatic speech recognition, early Alzheimer’s recognition, mild cognitive impairment, speech impairment

Procedia PDF Downloads 98
3971 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 160
3970 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 150
3969 The Philippines’ War on Drugs: a Pragmatic Analysis on Duterte's Commemorative Speeches

Authors: Ericson O. Alieto, Aprillete C. Devanadera

Abstract:

The main objective of the study is to determine the dominant speech acts in five commemorative speeches of President Duterte. This study employed Speech Act Theory and Discourse analysis to determine how the speech acts features connote the pragmatic meaning of Duterte’s speeches. Identifying the speech acts is significant in elucidating the underlying message or the pragmatic meaning of the speeches. From the 713 sentences or utterances from the speeches, assertive with 208 occurrences from the corpus or 29% is the dominant speech acts. It was followed by expressive with 177 or 25% occurrences, directive accounts for 152 or 15% occurrences. While commisive accounts for 104 or 15% occurrences and declarative got the lowest percentage of occurrences with 72 or 10% only. These sentences when uttered by Duterte carry a certain power of language to move or influence people. Thus, the present study shows the fundamental message perceived by the listeners. Moreover, the frequent use of assertive and expressive not only explains the pragmatic message of the speeches but also reflects the personality of President Duterte.

Keywords: commemorative speech, discourse analysis, duterte, pragmatics

Procedia PDF Downloads 254
3968 Excitation Modeling for Hidden Markov Model-Based Speech Synthesis Based on Wavelet Analysis

Authors: M. Kiran Reddy, K. Sreenivasa Rao

Abstract:

The conventional Hidden Markov Model (HMM)-based speech synthesis system (HTS) uses only a pulse excitation model, which significantly differs from natural excitation signal. Hence, buzziness can be perceived in the speech generated using HTS. This paper proposes an efficient excitation modeling method that can significantly reduce the buzziness, and improve the quality of HMM-based speech synthesis. The proposed approach models the pitch-synchronous residual frames extracted from the residual excitation signal. Each pitch synchronous residual frame is parameterized using 30 wavelet coefficients. These 30 wavelet coefficients are found to accurately capture the perceptually important information present in the residual waveform. In synthesis phase, the residual frames are reconstructed from the generated wavelet coefficients and are pitch-synchronously overlap-added to generate the excitation signal. The proposed excitation modeling method is integrated into HMM-based speech synthesis system. Evaluation results indicate that the speech synthesized by the proposed excitation model is significantly better than the speech generated using state-of-the-art excitation modeling methods.

Keywords: excitation modeling, hidden Markov models, pitch-synchronous frames, speech synthesis, wavelet coefficients

Procedia PDF Downloads 220
3967 Theory and Practice of Wavelets in Signal Processing

Authors: Jalal Karam

Abstract:

The methods of Fourier, Laplace, and Wavelet Transforms provide transfer functions and relationships between the input and the output signals in linear time invariant systems. This paper shows the equivalence among these three methods and in each case presenting an application of the appropriate (Fourier, Laplace or Wavelet) to the convolution theorem. In addition, it is shown that the same holds for a direct integration method. The Biorthogonal wavelets Bior3.5 and Bior3.9 are examined and the zeros distribution of their polynomials associated filters are located. This paper also presents the significance of utilizing wavelets as effective tools in processing speech signals for common multimedia applications in general, and for recognition and compression in particular. Theoretically and practically, wavelets have proved to be effective and competitive. The practical use of the Continuous Wavelet Transform (CWT) in processing and analysis of speech is then presented along with explanations of how the human ear can be thought of as a natural wavelet transformer of speech. This generates a variety of approaches for applying the (CWT) to many paradigms analysing speech, sound and music. For perception, the flexibility of implementation of this transform allows the construction of numerous scales and we include two of them. Results for speech recognition and speech compression are then included.

Keywords: continuous wavelet transform, biorthogonal wavelets, speech perception, recognition and compression

Procedia PDF Downloads 379