Search results for: speaker individuality
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 49

Search results for: speaker individuality

19 Performance Evaluation of Acoustic-Spectrographic Voice Identification Method in Native and Non-Native Speech

Authors: E. Krasnova, E. Bulgakova, V. Shchemelinin

Abstract:

The paper deals with acoustic-spectrographic voice identification method in terms of its performance in non-native language speech. Performance evaluation is conducted by comparing the result of the analysis of recordings containing native language speech with recordings that contain foreign language speech. Our research is based on Tajik and Russian speech of Tajik native speakers due to the character of the criminal situation with drug trafficking. We propose a pilot experiment that represents a primary attempt enter the field.

Keywords: Speaker identification, acoustic-spectrographic method, non-native speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 820
18 Automatic Recognition of Emotionally Coloured Speech

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

Emotion in speech is an issue that has been attracting the interest of the speech community for many years, both in the context of speech synthesis as well as in automatic speech recognition (ASR). In spite of the remarkable recent progress in Large Vocabulary Recognition (LVR), it is still far behind the ultimate goal of recognising free conversational speech uttered by any speaker in any environment. Current experimental tests prove that using state of the art large vocabulary recognition systems the error rate increases substantially when applied to spontaneous/emotional speech. This paper shows that recognition rate for emotionally coloured speech can be improved by using a language model based on increased representation of emotional utterances.

Keywords: Statistical language model, N-grams, emotionallycoloured speech

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557
17 On Preprocessing of Speech Signals

Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin

Abstract:

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649
16 Automatic Segmentation of the Clean Speech Signal

Authors: M. A. Ben Messaoud, A. Bouzid, N. Ellouze

Abstract:

Speech Segmentation is the measure of the change point detection for partitioning an input speech signal into regions each of which accords to only one speaker. In this paper, we apply two features based on multi-scale product (MP) of the clean speech, namely the spectral centroid of MP, and the zero crossings rate of MP. We focus on multi-scale product analysis as an important tool for segmentation extraction. The MP is based on making the product of the speech wavelet transform coefficients (WTC). We have estimated our method on the Keele database. The results show the effectiveness of our method. It indicates that the two features can find word boundaries, and extracted the segments of the clean speech.

Keywords: Speech segmentation, Multi-scale product, Spectral centroid, Zero crossings rate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2454
15 Stop Consonants in Chinese and Slovak: Contrastive Analysis by Using Praat

Authors: Maria Istvanova

Abstract:

The acquisition of the correct pronunciation in Chinese is closely linked to the initial phase of the study. Based on the contrastive analysis, we determine the differences in the pronunciation of stop consonants in Chinese and Slovak taking into consideration the place and manner of articulation to gain a better understanding of the students' main difficulties in the process of acquiring correct pronunciation of Chinese stop consonants. We employ the software Praat for the analysis of the recorded samples with an emphasis on the pronunciation of the students with a varying command of Chinese. The comparison of the voice onset time (VOT) length for the individual consonants in the students' pronunciation and the pronunciation of the native speaker exposes the differences between the correct pronunciation and the deviant pronunciation of the students.

Keywords: Chinese, contrastive analysis, Praat, pronunciation, Slovak.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 436
14 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments

Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo

Abstract:

This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.

Keywords: Blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1336
13 Comparative Study of Affricate Initial Consonants in Chinese and Slovak

Authors: Maria Istvanova

Abstract:

The purpose of the comparative study of the affricate consonants in Chinese and Slovak is to increase the awareness of the main distinguishing features between these two languages taking into consideration this particular group of consonants. We determine the main difficulties of the Slovak learners in the process of acquiring correct pronunciation of affricate initial consonants in Chinese based on the understanding of the distinguishing features of Chinese and Slovak affricates in combination with the experimental measuring of voice onset time (VOT) values. The software tool Praat is used for the analysis of the recorded language samples. The language samples contain recordings of a Chinese native speaker and Slovak students of Chinese with different language proficiency levels. Based on the results of the analysis in Praat, we identify erroneous pronunciation and provide clarification of its cause.

Keywords: Chinese, comparative study, initial consonants, pronunciation, Slovak

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 394
12 Comparative Study of Filter Characteristics as Statistical Vocal Correlates of Clinical Psychiatric State in Human

Authors: Thaweesak Yingthawornsuk, Chusak Thanawattano

Abstract:

Acoustical properties of speech have been shown to be related to mental states of speaker with symptoms: depression and remission. This paper describes way to address the issue of distinguishing depressed patients from remitted subjects based on measureable acoustics change of their spoken sound. The vocal-tract related frequency characteristics of speech samples from female remitted and depressed patients were analyzed via speech processing techniques and consequently, evaluated statistically by cross-validation with Support Vector Machine. Our results comparatively show the classifier's performance with effectively correct separation of 93% determined from testing with the subjectbased feature model and 88% from the frame-based model based on the same speech samples collected from hospital visiting interview sessions between patients and psychiatrists.

Keywords: Depression, SVM, Vocal Extract, Vocal Tract

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1500
11 Identification of Ice Hockey World Championship International Sports Event through Brand Personality

Authors: Eva Čáslavová, Andrej Višněvský

Abstract:

This research focused on the dimensions of brand personality of the Ice Hockey World Championship sporting event. The authors compared the elements in relation to different demographic groups including gender, age, level of education and student status of the population of Prague. Moreover, the differences of opinions of respondents who had experience of visiting a sports event and those who had not were assessed. In the research, the modified brand personality scale was used. This modified scale consists of five dimensions: responsibility, activity, toughness, individuality and emotionality, none of which was previously tested. The authors had an intentional sample of 291 respondents from Prague available, ranging in age from 18 years to 75 years, with either a high school or university education. The respondents rated the characteristic features in a seven-point Likert Scale and the data was collected in November 2012. The results suggest that the Ice Hockey World Championship is most identified with these dimensions: responsibility, emotionality and activity. Men had higher mean scores (4.93) on the Likert Scale in the emotionality dimension, while women had higher mean scores (4.91) in the activity dimension. Those respondents with experience visiting an Ice Hockey World Championship match had the highest mean score (5.10) in the emotionality dimension. This research had expected to show more pronounced mean values (above six) on the Likert scale in the emotionality and activity dimensions that more strongly characterize the brand personality of the Ice Hockey World Championship, however this expectation was not confirmed.

Keywords: Brand personality dimensions, ice hockey, international sport event, sports marketing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1210
10 A Cross-Gender Statistical Analysis of Tuvinian Intonation Features in Comparison With Uzbek and Azerbaijani

Authors: D. Beziakina, E. Bulgakova

Abstract:

The paper deals with cross-gender and cross-linguistic comparison of pitch characteristics for Tuvinian with two other Turkic languages - Uzbek and Azerbaijani, based on the results of statistical analysis of pitch parameter values and intonation patterns used by male and female speakers.

The main goal of our work is to obtain the ranges of pitch parameter values typical for Tuvinian speakers for the purpose of automatic language identification. We also propose a cross-gender analysis of declarative intonation in the poorly studied Tuvinian language.

The ranges of pitch parameter values were obtained by means of specially developed software that deals with the distribution of pitch values and allows us to obtain statistical language-specific pitch intervals.

Keywords: Speech analysis, Statistical analysis, Speaker recognition, Identification of person.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1806
9 Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading

Authors: Harshit Mehrotra, Gaurav Agrawal, M.C. Srivastava

Abstract:

Computerized lip reading has been one of the most actively researched areas of computer vision in recent past because of its crime fighting potential and invariance to acoustic environment. However, several factors like fast speech, bad pronunciation, poor illumination, movement of face, moustaches and beards make lip reading difficult. In present work, we propose a solution for automatic lip contour tracking and recognizing letters of English language spoken by speakers using the information available from lip movements. Level set method is used for tracking lip contour using a contour velocity model and a feature vector of lip movements is then obtained. Character recognition is performed using modified k nearest neighbor algorithm which assigns more weight to nearer neighbors. The proposed system has been found to have accuracy of 73.3% for character recognition with speaker lip movements as the only input and without using any speech recognition system in parallel. The approach used in this work is found to significantly solve the purpose of lip reading when size of database is small.

Keywords: Contour Velocity Model, Lip Contour Tracking, LipReading, Visual Character Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2346
8 Efficient System for Speech Recognition using General Regression Neural Network

Authors: Abderrahmane Amrouche, Jean Michel Rouvaen

Abstract:

In this paper we present an efficient system for independent speaker speech recognition based on neural network approach. The proposed architecture comprises two phases: a preprocessing phase which consists in segmental normalization and features extraction and a classification phase which uses neural networks based on nonparametric density estimation namely the general regression neural network (GRNN). The relative performances of the proposed model are compared to the similar recognition systems based on the Multilayer Perceptron (MLP), the Recurrent Neural Network (RNN) and the well known Discrete Hidden Markov Model (HMM-VQ) that we have achieved also. Experimental results obtained with Arabic digits have shown that the use of nonparametric density estimation with an appropriate smoothing factor (spread) improves the generalization power of the neural network. The word error rate (WER) is reduced significantly over the baseline HMM method. GRNN computation is a successful alternative to the other neural network and DHMM.

Keywords: Speech Recognition, General Regression NeuralNetwork, Hidden Markov Model, Recurrent Neural Network, ArabicDigits.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2130
7 Trusting Smart Speakers: Analysing the Different Levels of Trust between Technologies

Authors: Alec Wells, Aminu Bello Usman, Justin McKeown

Abstract:

The growing usage of smart speakers raises many privacy and trust concerns compared to other technologies such as smart phones and computers. In this study, a proxy measure of trust is used to gauge users’ opinions on three different technologies based on an empirical study, and to understand which technology most people are most likely to trust. The collected data were analysed using the Kruskal-Wallis H test to determine the statistical differences between the users’ trust level of the three technologies: smart speaker, computer and smart phone. The findings of the study revealed that despite the wide acceptance, ease of use and reputation of smart speakers, people find it difficult to trust smart speakers with their sensitive information via the Direct Voice Input (DVI) and would prefer to use a keyboard or touchscreen offered by computers and smart phones. Findings from this study can inform future work on users’ trust in technology based on perceived ease of use, reputation, perceived credibility and risk of using technologies via DVI.

Keywords: Direct voice input, risk, security, technology and trust.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 525
6 Speech Recognition Using Scaly Neural Networks

Authors: Akram M. Othman, May H. Riadh

Abstract:

This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.

Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1691
5 Evaluation of Features Extraction Algorithms for a Real-Time Isolated Word Recognition System

Authors: Tomyslav Sledevič, Artūras Serackis, Gintautas Tamulevičius, Dalius Navakauskas

Abstract:

Paper presents an comparative evaluation of features extraction algorithm for a real-time isolated word recognition system based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were implemented in hardware/software design. The proposed system was investigated in speaker dependent mode for 100 different Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signal to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients gives best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfy the real-time requirements and is suitable for applications in embedded systems.

Keywords: Isolated word recognition, features extraction, MFCC, LFCC, LPCC, LPC, FPGA, DTW.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3499
4 Multimodal Biometric System Based on Near- Infra-Red Dorsal Hand Geometry and Fingerprints for Single and Whole Hands

Authors: Mohamed K. Shahin, Ahmed M. Badawi, Mohamed E. M. Rasmy

Abstract:

Prior research evidenced that unimodal biometric systems have several tradeoffs like noisy data, intra-class variations, restricted degrees of freedom, non-universality, spoof attacks, and unacceptable error rates. In order for the biometric system to be more secure and to provide high performance accuracy, more than one form of biometrics are required. Hence, the need arise for multimodal biometrics using combinations of different biometric modalities. This paper introduces a multimodal biometric system (MMBS) based on fusion of whole dorsal hand geometry and fingerprints that acquires right and left (Rt/Lt) near-infra-red (NIR) dorsal hand geometry (HG) shape and (Rt/Lt) index and ring fingerprints (FP). Database of 100 volunteers were acquired using the designed prototype. The acquired images were found to have good quality for all features and patterns extraction to all modalities. HG features based on the hand shape anatomical landmarks were extracted. Robust and fast algorithms for FP minutia points feature extraction and matching were used. Feature vectors that belong to similar biometric traits were fused using feature fusion methodologies. Scores obtained from different biometric trait matchers were fused using the Min-Max transformation-based score fusion technique. Final normalized scores were merged using the sum of scores method to obtain a single decision about the personal identity based on multiple independent sources. High individuality of the fused traits and user acceptability of the designed system along with its experimental high performance biometric measures showed that this MMBS can be considered for med-high security levels biometric identification purposes.

Keywords: Unimodal, Multi-Modal, Biometric System, NIR Imaging, Dorsal Hand Geometry, Fingerprint, Whole Hands, Feature Extraction, Feature Fusion, Score Fusion

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2165
3 Continuous Feature Adaptation for Non-Native Speech Recognition

Authors: Y. Deng, X. Li, C. Kwan, B. Raj, R. Stern

Abstract:

The current speech interfaces in many military applications may be adequate for native speakers. However, the recognition rate drops quite a lot for non-native speakers (people with foreign accents). This is mainly because the nonnative speakers have large temporal and intra-phoneme variations when they pronounce the same words. This problem is also complicated by the presence of large environmental noise such as tank noise, helicopter noise, etc. In this paper, we proposed a novel continuous acoustic feature adaptation algorithm for on-line accent and environmental adaptation. Implemented by incremental singular value decomposition (SVD), the algorithm captures local acoustic variation and runs in real-time. This feature-based adaptation method is then integrated with conventional model-based maximum likelihood linear regression (MLLR) algorithm. Extensive experiments have been performed on the NATO non-native speech corpus with baseline acoustic model trained on native American English. The proposed feature-based adaptation algorithm improved the average recognition accuracy by 15%, while the MLLR model based adaptation achieved 11% improvement. The corresponding word error rate (WER) reduction was 25.8% and 2.73%, as compared to that without adaptation. The combined adaptation achieved overall recognition accuracy improvement of 29.5%, and WER reduction of 31.8%, as compared to that without adaptation.

Keywords: speaker adaptation; environment adaptation; robust speech recognition; SVD; non-native speech recognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3169
2 Child Sexual Abuse Prevention: Evaluation of the Program “Sharing Mouth to Mouth: My Body, Nobody Can Touch It”

Authors: Faride Peña, Teresita Castillo, Concepción Campo

Abstract:

Sexual violence, and particularly child sexual abuse, is a serious problem all over the world, México included. Given its importance, there are several preventive and care programs done by the government and the civil society all over the country but most of them are developed in urban areas even though these problems are especially serious in rural areas. Yucatán, a state in southern México, occupies one of the first places in child sexual abuse. Considering the above, the University Unit of Clinical Research and Victimological Attention (UNIVICT) of the Autonomous University of Yucatan, designed, implemented and is currently evaluating the program named “Sharing Mouth to Mouth: My Body, Nobody Can Touch It”, a program to prevent child sexual abuse in rural communities of Yucatán, México. Its aim was to develop skills for the detection of risk situations, providing protection strategies and mechanisms for prevention through culturally relevant psycho-educative strategies to increase personal resources in children, in collaboration with parents, teachers, police and municipal authorities. The diagnosis identified that a particularly vulnerable population were children between 4 and 10 years. The program run during 2015 in primary schools in the municipality whose inhabitants are mostly Mayan. The aim of this paper is to present its evaluation in terms of its effectiveness and efficiency. This evaluation included documental analysis of the work done in the field, psycho-educational and recreational activities with children, evaluation of knowledge by participating children and interviews with parents and teachers. The results show high efficiency in fulfilling the tasks and achieving primary objectives. The efficiency shows satisfactory results but also opportunity areas that can be resolved with minor adjustments to the program. The results also show the importance of including culturally relevant strategies and activities otherwise it minimizes possible achievements. Another highlight is the importance of participatory action research in preventive approaches to child sexual abuse since by becoming aware of the importance of the subject people participate more actively; in addition to design culturally appropriate strategies and measures so that the proposal may not be distant to the people. Discussion emphasizes the methodological implications of prevention programs (convenience of using participatory action research (PAR), importance of monitoring and mediation during implementation, developing detection skills tools in creative ways using psycho-educational interactive techniques and working assessment issued by the participants themselves). As well, it is important to consider the holistic character this type of program should have, in terms of incorporating social and culturally relevant characteristics, according to the community individuality and uniqueness, consider type of communication to be used and children’ language skills considering that there should be variations strongly linked to a specific cultural context.

Keywords: Child sexual abuse, evaluation, PAR, prevention.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1202
1 Teaching Turn-Taking Rules and Pragmatic Principles to Empower EFL Students and Enhance Their Learning in Speaking Modules

Authors: O. F. Elkommos

Abstract:

Teaching and learning EFL speaking modules is one of the most challenging productive modules for both instructors and learners. In a student-centered interactive communicative language teaching approach, learners and instructors should be aware of the fact that the target language must be taught as/for communication. The student must be empowered by tools that will work on more than one level of their communicative competence. Communicative learning will need a teaching and learning methodology that will address the goal. Teaching turn-taking rules, pragmatic principles and speech acts will enhance students' sociolinguistic competence, strategic competence together with discourse competence. Sociolinguistic competence entails the mastering of speech act conventions and illocutionary acts of refusing, agreeing/disagreeing; emotive acts like, thanking, apologizing, inviting, offering; directives like, ordering, requesting, advising, and hinting, among others. Strategic competence includes enlightening students’ consciousness of the various particular turn-taking systemic rules of organizing techniques of opening and closing conversation, adjacency pairs, interrupting, back-channeling, asking for/giving opinion, agreeing/disagreeing, using natural fillers for pauses, gaps, speaker select, self-select, and silence among others. Students will have the tools to manage a conversation. Students are engaged in opportunities of experiencing the natural language not as a mere extra student talking time but rather an empowerment of knowing and using the strategies. They will have the component items they need to use as well as the opportunity to communicate in the target language using topics of their interest and choice. This enhances students' communicative abilities. Available websites and textbooks now use one or more of these tools of turn-taking or pragmatics. These will be students' support in self-study in their independent learning study hours. This will be their reinforcement practice on e-Learning interactive activities. The students' target is to be able to communicate the intended meaning to an addressee that is in turn able to infer that intended meaning. The combination of these tools will be assertive and encouraging to the student to beat the struggle with what to say, how to say it, and when to say it. Teaching the rules, principles and techniques is an act of awareness raising method engaging students in activities that will lead to their pragmatic discourse competence. The aim of the paper is to show how the suggested pragmatic model will empower students with tools and systems that would support their learning. Supporting students with turn taking rules, speech act theory, applying both to texts and practical analysis and using it in speaking classes empowers students’ pragmatic discourse competence and assists them to understand language and its context. They become more spontaneous and ready to learn the discourse pragmatic dimension of the speaking techniques and suitable content. Students showed a better performance and a good motivation to learn. The model is therefore suggested for speaking modules in EFL classes.

Keywords: Communicative competence, EFL, empowering learners, enhance learning, speech acts, teaching speaking, turn-taking, learner centered, pragmatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1321