Search results for: audio-visual speech recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2252

Search results for: audio-visual speech recognition

2162 Speech Acts and Politeness Strategies in an EFL Classroom in Georgia

Authors: Tinatin Kurdghelashvili

Abstract:

The paper deals with the usage of speech acts and politeness strategies in an EFL classroom in Georgia (Rep of). It explores the students’ and the teachers’ practice of the politeness strategies and the speech acts of apology, thanking, request, compliment/encouragement, command, agreeing/disagreeing, addressing and code switching. The research method includes observation as well as a questionnaire. The target group involves the students from Georgian public schools and two certified, experienced local English teachers. The analysis is based on Searle’s Speech Act Theory and Brown and Levinson’s politeness strategies. The findings show that the students have certain knowledge regarding politeness yet they fail to apply them in English communication. In addition, most of the speech acts from the classroom interaction are used by the teachers and not the students. Thereby, it is suggested that teachers should cultivate the students’ communicative competence and attempt to give them opportunities to practice more English speech acts than they do today.

Keywords: english as a foreign language, Georgia, politeness principles, speech acts

Procedia PDF Downloads 608
2161 The Influence of Advertising Captions on the Internet through the Consumer Purchasing Decision

Authors: Suwimol Apapol, Punrapha Praditpong

Abstract:

The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions as well as the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of several Goods service. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively which is in harmony with the research hypotheses as well.

Keywords: advertising captions, captions on internet, consumer purchasing decision, e-commerce

Procedia PDF Downloads 245
2160 The Stylistic Representation of Subjectivity in Exemplary Written and Audiovisual Biographical Records about the Brazilian Modernist Artist Tarsila Do Amaral

Authors: Juliane Noack Napoles, Vivian Martins Nogueira Napoles

Abstract:

This paper operates at the stylistic level of biographical records about the artist Tarsila do Amaral (1886-1973) and the various biographical modes of representation of her subjectivity. Tarsila do Amaral was a Brazilian nationalistic painter, who took part in the first half of the last century in the Antoprofágico Moviment and in the Surrealistic Movement - artistic movements that emerged in the 1920’s. The paper will be developed in the field of Cultural and Media Science and based on an understanding of biography as a subgenre of historical records that will be discussed. Doing that, the theoretical principles about the history genre will also be discussed. In this context, the analytical focus of the present project is the stylistic forms of representation of subjectivity in the postmodern period as expressed in written and audiovisual biographical representation of Tarsila do Amaral. Some exemplary audiovisual biographical records about Tarsila do Amaral will be first analyzed on their own. Then, they will be related to some written biographical records about the painter. At the end, both written and audiovisual records and their stylistic forms of representation of Tarsila do Amaral’s subjectivity are going be analyzed. Tarsila do Amaral will be considered as a Subject Form, following actual concepts about this term in Cultural Studies. For these purposes, it will also be discussed about cultural identity – gender and national identity – and developed a heuristic model so that different understandings and conceptual proposals correlate, including those pertaining to the terms biography, gender, identity, mediality, style, subject and subjectivity. This model will finally be used for the analysis of the selected biographical records.

Keywords: biography, gender, identity, modernism, postmodernism, style, subject, subjectivity, surrealism, Tarsila do Amaral

Procedia PDF Downloads 147
2159 Using Augmented Reality to Enhance Doctor Patient Communication

Authors: Rutusha Bhutada, Gaurav Chavan, Sarvesh Kasat, Varsha Mujumdar

Abstract:

This software system will be an Augmented Reality application designed to maximize the doctor’s productivity by providing tools to assist in automating the patient recognition and updating patient’s records using face and voice recognition features, which would otherwise have to be performed manually. By maximizing the doctor’s work efficiency and production, the application will meet the doctor’s needs while remaining easy to understand and use. More specifically, this application is designed to allow a doctor to manage his productive time in handling the patient without losing eye-contact with him and communicate with a group of other doctors for consultation, for in-place treatments through video streaming, as a video study. The system also contains a relational database containing a list of doctor, patient and display techniques.

Keywords: augmented reality, hand-held devices, head-mounted devices, marker based systems, speech recognition, face detection

Procedia PDF Downloads 409
2158 Facial Recognition on the Basis of Facial Fragments

Authors: Tetyana Baydyk, Ernst Kussul, Sandra Bonilla Meza

Abstract:

There are many articles that attempt to establish the role of different facial fragments in face recognition. Various approaches are used to estimate this role. Frequently, authors calculate the entropy corresponding to the fragment. This approach can only give approximate estimation. In this paper, we propose to use a more direct measure of the importance of different fragments for face recognition. We propose to select a recognition method and a face database and experimentally investigate the recognition rate using different fragments of faces. We present two such experiments in the paper. We selected the PCNC neural classifier as a method for face recognition and parts of the LFW (Labeled Faces in the Wild) face database as training and testing sets. The recognition rate of the best experiment is comparable with the recognition rate obtained using the whole face.

Keywords: face recognition, labeled faces in the wild (LFW) database, random local descriptor (RLD), random features

Procedia PDF Downloads 329
2157 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 105
2156 Using Maximization Entropy in Developing a Filipino Phonetically Balanced Wordlist for a Phoneme-Level Speech Recognition System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

In this paper, a set of Filipino Phonetically Balanced Word list consisting of 250 words (PBW250) were constructed for a phoneme-level ASR system for the Filipino language. The Entropy Maximization is used to obtain phonological balance in the list. Entropy of phonemes in a word is maximized, providing an optimal balance in each word’s phonological distribution using the Add-Delete Method (PBW algorithm) and is compared to the modified PBW algorithm implemented in a dynamic algorithm approach to obtain optimization. The gained entropy score of 4.2791 and 4.2902 for the PBW and modified algorithm respectively. The PBW250 was recorded by 40 respondents, each with 2 sets data. Recordings from 30 respondents were trained to produce an acoustic model that were tested using recordings from 10 respondents using the HMM Toolkit (HTK). The results of test gave the maximum accuracy rate of 97.77% for a speaker dependent test and 89.36% for a speaker independent test.

Keywords: entropy maximization, Filipino language, Hidden Markov Model, phonetically balanced words, speech recognition

Procedia PDF Downloads 430
2155 Prosodic Characteristics of Post Traumatic Stress Disorder Induced Speech Changes

Authors: Jarek Krajewski, Andre Wittenborn, Martin Sauerland

Abstract:

This abstract describes a promising approach for estimating post-traumatic stress disorder (PTSD) based on prosodic speech characteristics. It illustrates the validity of this method by briefly discussing results from an Arabic refugee sample (N= 47, 32 m, 15 f). A well-established standardized self-report scale “Reaction of Adolescents to Traumatic Stress” (RATS) was used to determine the ground truth level of PTSD. The speech material was prompted by telling about autobiographical related sadness inducing experiences (sampling rate 16 kHz, 8 bit resolution). In order to investigate PTSD-induced speech changes, a self-developed set of 136 prosodic speech features was extracted from the .wav files. This set was adapted to capture traumatization related speech phenomena. An artificial neural network (ANN) machine learning model was applied to determine the PTSD level and reached a correlation of r = .37. These results indicate that our classifiers can achieve similar results to those seen in speech-based stress research.

Keywords: speech prosody, PTSD, machine learning, feature extraction

Procedia PDF Downloads 64
2154 An Algorithm Based on the Nonlinear Filter Generator for Speech Encryption

Authors: A. Belmeguenai, K. Mansouri, R. Djemili

Abstract:

This work present a new algorithm based on the nonlinear filter generator for speech encryption and decryption. The proposed algorithm consists on the use a linear feedback shift register (LFSR) whose polynomial is primitive and nonlinear Boolean function. The purpose of this system is to construct Keystream with good statistical properties, but also easily computable on a machine with limited capacity calculated. This proposed speech encryption scheme is very simple, highly efficient, and fast to implement the speech encryption and decryption. We conclude the paper by showing that this system can resist certain known attacks.

Keywords: nonlinear filter generator, stream ciphers, speech encryption, security analysis

Procedia PDF Downloads 266
2153 Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks

Authors: B. R. Campomanes-Alvarez, P. Quiros, B. Fernandez

Abstract:

Automatic Speech Recognition (ASR) is a machine-based process of decoding and transcribing oral speech. A typical ASR system receives acoustic input from a speaker or an audio file, analyzes it using algorithms, and produces an output in the form of a text. Some speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian Mixture Models (GMMs) to determine how well each state of each HMM fits a short window of frames of coefficients that represents the acoustic input. Another way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition systems. Acoustic models for state-of-the-art ASR systems are usually training on massive amounts of data. However, audio files with their corresponding transcriptions can be difficult to obtain, especially in the Spanish language. Hence, in the case of these low-resource scenarios, building an ASR model is considered as a complex task due to the lack of labeled data, resulting in an under-trained system. Semi-supervised learning approaches arise as necessary tasks given the high cost of transcribing audio data. The main goal of this proposal is to develop a procedure based on acoustic semi-supervised learning for Spanish ASR systems by using DNNs. This semi-supervised learning approach consists of: (a) Training a seed ASR model with a DNN using a set of audios and their respective transcriptions. A DNN with a one-hidden-layer network was initialized; increasing the number of hidden layers in training, to a five. A refinement, which consisted of the weight matrix plus bias term and a Stochastic Gradient Descent (SGD) training were also performed. The objective function was the cross-entropy criterion. (b) Decoding/testing a set of unlabeled data with the obtained seed model. (c) Selecting a suitable subset of the validated data to retrain the seed model, thereby improving its performance on the target test set. To choose the most precise transcriptions, three confidence scores or metrics, regarding the lattice concept (based on the graph cost, the acoustic cost and a combination of both), was performed as selection technique. The performance of the ASR system will be calculated by means of the Word Error Rate (WER). The test dataset was renewed in order to extract the new transcriptions added to the training dataset. Some experiments were carried out in order to select the best ASR results. A comparison between a GMM-based model without retraining and the DNN proposed system was also made under the same conditions. Results showed that the semi-supervised ASR-model based on DNNs outperformed the GMM-model, in terms of WER, in all tested cases. The best result obtained an improvement of 6% relative WER. Hence, these promising results suggest that the proposed technique could be suitable for building ASR models in low-resource environments.

Keywords: automatic speech recognition, deep neural networks, machine learning, semi-supervised learning

Procedia PDF Downloads 316
2152 Prosody Generation in Neutral Speech Storytelling Application Using Tilt Model

Authors: Manjare Chandraprabha A., S. D. Shirbahadurkar, Manjare Anil S., Paithne Ajay N.

Abstract:

This paper proposes Intonation Modeling for Prosody generation in Neutral speech for Marathi (language spoken in Maharashtra, India) story telling applications. Nowadays audio story telling devices are very eminent for children. In this paper, we proposed tilt model for stressed words in Marathi for speech modification. Tilt model predicts modification in tone of neutral speech. GMM is used to identify stressed words for modification.

Keywords: tilt model, fundamental frequency, statistical parametric speech synthesis, GMM

Procedia PDF Downloads 358
2151 The Importance of Right Speech in Buddhism and Its Relevance Today

Authors: Gautam Sharda

Abstract:

The concept of right speech is the third stage of the noble eightfold path as prescribed by the Buddha and followed by millions of practicing Buddhists. The Buddha lays a lot of importance on the notion of right speech (Samma Vacca). In the Angutara Nikaya, the Buddha mentioned what constitutes right speech, which is basically four kinds of abstentions; namely abstaining from false speech, abstaining from slanderous speech, abstaining from harsh or hateful speech and abstaining from idle chatter. The Buddha gives reasons in support of his view as to why abstaining from these four kinds of speeches is favourable not only for maintaining the peace and equanimity within an individual but also within a society. It is a known fact that when we say something harsh or slanderous to others, it eventually affects our individual peace of mind too. We also know about the many examples of hate speeches which have led to senseless cases of violence and which are well documented within our country and the world. Also, indulging in false speech is not a healthy sign for individuals within a group as this kind of a social group which is based on falsities and lies cannot really survive for long and will eventually lead to chaos. Buddha also told us to refrain from idle chatter or gossip as generally we have seen that idle chatter or gossip does more harm than any good to the individual and the society. Hence, if most of us actually inculcate this third stage (namely, right speech) of the noble eightfold path of the Buddha in our daily life, it would be highly beneficial both for the individual and for the harmony of the society.

Keywords: Buddhism, speech, individual, society

Procedia PDF Downloads 229
2150 Application of the Bionic Wavelet Transform and Psycho-Acoustic Model for Speech Compression

Authors: Chafik Barnoussi, Mourad Talbi, Adnane Cherif

Abstract:

In this paper we propose a new speech compression system based on the application of the Bionic Wavelet Transform (BWT) combined with the psychoacoustic model. This compression system is a modified version of the compression system using a MDCT (Modified Discrete Cosine Transform) filter banks of 32 filters each and the psychoacoustic model. This modification consists in replacing the banks of the MDCT filter banks by the bionic wavelet coefficients which are obtained from the application of the BWT to the speech signal to be compressed. These two methods are evaluated and compared with each other by computing bits before and bits after compression. They are tested on different speech signals and the obtained simulation results show that the proposed technique outperforms the second technique and this in term of compressed file size. In term of SNR, PSNR and NRMSE, the outputs speech signals of the proposed compression system are with acceptable quality. In term of PESQ and speech signal intelligibility, the proposed speech compression technique permits to obtain reconstructed speech signals with good quality.

Keywords: speech compression, bionic wavelet transform, filterbanks, psychoacoustic model

Procedia PDF Downloads 354
2149 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 102
2148 DBN-Based Face Recognition System Using Light Field

Authors: Bing Gu

Abstract:

Abstract—Most of Conventional facial recognition systems are based on image features, such as LBP, SIFT. Recently some DBN-based 2D facial recognition systems have been proposed. However, we find there are few DBN-based 3D facial recognition system and relative researches. 3D facial images include all the individual biometric information. We can use these information to build more accurate features, So we present our DBN-based face recognition system using Light Field. We can see Light Field as another presentation of 3D image, and Light Field Camera show us a way to receive a Light Field. We use the commercially available Light Field Camera to act as the collector of our face recognition system, and the system receive a state-of-art performance as convenient as conventional 2D face recognition system.

Keywords: DBN, face recognition, light field, Lytro

Procedia PDF Downloads 430
2147 Speech Intelligibility Improvement Using Variable Level Decomposition DWT

Authors: Samba Raju, Chiluveru, Manoj Tripathy

Abstract:

Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methods

Keywords: discrete wavelet transform, speech intelligibility, STOI, standard deviation

Procedia PDF Downloads 114
2146 The Language Use of Middle Eastern Freedom Activists' Speeches: A Gender Perspective

Authors: Sulistyaningtyas

Abstract:

Examining the role of Middle Eastern freedom activists’ speech based on gender perspective is considered noteworthy because the society in the Middle East is patriarchal. This research aims to examine the language use of the Middle Eastern freedom activists’ speeches through gender perspective. The data sources are from male and female Middle Eastern freedom activists’ speech videos. In analyzing the data, the theories employed are about Language Style from Gender Perspective and The Language for Speech. The result reveals that there are sets of spoken language differences between male and female speakers. In using the language for speech, both male and female speakers produce metaphor, euphemism, the ‘rule of three’, parallelism, and pronouns in random frequency of production, which cannot be separated by genders. Moreover, it cannot be concluded that one gender is more potential than the other to influence the audience in delivering speech. There are other factors, particularly non-verbal factors, existing to give impacts on how a speech can influence the audience.

Keywords: gender perspective, language use, Middle Eastern freedom activists, speech

Procedia PDF Downloads 398
2145 Considering Cultural and Linguistic Variables When Working as a Speech-Language Pathologist with Multicultural Students

Authors: Gabriela Smeckova

Abstract:

The entire world is becoming more and more diverse. The reasons why people migrate are different and unique for each family /individual. Professionals delivering services (including speech-language pathologists) must be prepared to work with clients coming from different cultural and/or linguistic backgrounds. Well-educated speech-language pathologists will consider many factors when delivering services. Some of them will be discussed during the presentation (language spoken, beliefs about health care and disabilities, reasons for immigration, etc.). The communication styles of the client can be different than the styles of the speech-language pathologist. The goal is to become culturally responsive in service delivery.

Keywords: culture, cultural competence, culturallly responsive practices, speech-language pathologist, cultural and linguistical variables, communication styles

Procedia PDF Downloads 45
2144 Effect of Noise Reduction Algorithms on Temporal Splitting of Speech Signal to Improve Speech Perception for Binaural Hearing Aids

Authors: Rajani S. Pujar, Pandurangarao N. Kulkarni

Abstract:

Increased temporal masking affects the speech perception in persons with sensorineural hearing impairment especially under adverse listening conditions. This paper presents a cascaded scheme, which employs a noise reduction algorithm as well as temporal splitting of the speech signal. Earlier investigations have shown that by splitting the speech temporally and presenting alternate segments to the two ears help in reducing the effect of temporal masking. In this technique, the speech signal is processed by two fading functions, complementary to each other, and presented to left and right ears for binaural dichotic presentation. In the present study, half cosine signal is used as a fading function with crossover gain of 6 dB for the perceptual balance of loudness. Temporal splitting is combined with noise reduction algorithm to improve speech perception in the background noise. Two noise reduction schemes, namely spectral subtraction and Wiener filter are used. Listening tests were conducted on six normal-hearing subjects, with sensorineural loss simulated by adding broadband noise to the speech signal at different signal-to-noise ratios (∞, 3, 0, and -3 dB). Objective evaluation using PESQ was also carried out. The MOS score for VCV syllable /asha/ for SNR values of ∞, 3, 0, and -3 dB were 5, 4.46, 4.4 and 4.05 respectively, while the corresponding MOS scores for unprocessed speech were 5, 1.2, 0.9 and 0.65, indicating significant improvement in the perceived speech quality for the proposed scheme compared to the unprocessed speech.

Keywords: MOS, PESQ, spectral subtraction, temporal splitting, wiener filter

Procedia PDF Downloads 303
2143 Efficacy of a Wiener Filter Based Technique for Speech Enhancement in Hearing Aids

Authors: Ajish K. Abraham

Abstract:

Hearing aid is the most fundamental technology employed towards rehabilitation of persons with sensory neural hearing impairment. Hearing in noise is still a matter of major concern for many hearing aid users and thus continues to be a challenging issue for the hearing aid designers. Several techniques are being currently used to enhance the speech at the hearing aid output. Most of these techniques, when implemented, result in reduction of intelligibility of the speech signal. Thus the dissatisfaction of the hearing aid user towards comprehending the desired speech amidst noise is prevailing. Multichannel Wiener Filter is widely implemented in binaural hearing aid technology for noise reduction. In this study, Wiener filter based noise reduction approach is experimented for a single microphone based hearing aid set up. This method checks the status of the input speech signal in each frequency band and then selects the relevant noise reduction procedure. Results showed that the Wiener filter based algorithm is capable of enhancing speech even when the input acoustic signal has a very low Signal to Noise Ratio (SNR). Performance of the algorithm was compared with other similar algorithms on the basis of improvement in intelligibility and SNR of the output, at different SNR levels of the input speech. Wiener filter based algorithm provided significant improvement in SNR and intelligibility compared to other techniques.

Keywords: hearing aid output speech, noise reduction, SNR improvement, Wiener filter, speech enhancement

Procedia PDF Downloads 225
2142 The Complaint Speech Act Set Produced by Arab Students in the UAE

Authors: Tanju Deveci

Abstract:

It appears that the speech act of complaint has not received as much attention as other speech acts. However, the face-threatening nature of this speech act requires a special attention in multicultural contexts in particular. The teaching context in the UAE universities, where a big majority of teaching staff comes from other cultures, requires investigations into this speech act in order to improve communication between students and faculty. This session will outline the results of a study conducted with this purpose. The realization of complaints by Freshman English students in Communication courses at Petroleum Institute was investigated to identify communication patterns that seem to cause a strain. Data were collected using a role-play between a teacher and students, and a judgment scale completed by two of the instructors in the Communications Department. The initial findings reveal that the students had difficulty putting their case, produced the speech act of criticism along with a complaint and that they produced both requests and demands as candidate solutions. The judgement scales revealed that the students’ attitude was not appropriate most of the time and that the judges would behave differently from students. It is concluded that speech acts, in general, and complaint, in particular, need to be taught to learners explicitly to improve interpersonal communication in multicultural societies. Some teaching ideas are provided to help increase foreign language learners’ sociolinguistic competence.

Keywords: speech act, complaint, pragmatics, sociolinguistics, language teaching

Procedia PDF Downloads 480
2141 Evaluation of Features Extraction Algorithms for a Real-Time Isolated Word Recognition System

Authors: Tomyslav Sledevič, Artūras Serackis, Gintautas Tamulevičius, Dalius Navakauskas

Abstract:

This paper presents a comparative evaluation of features extraction algorithm for a real-time isolated word recognition system based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were implemented in hardware/software design. The proposed system was investigated in the speaker-dependent mode for 100 different Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signals to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients give best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose, the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfies the real-time requirements and is suitable for applications in embedded systems.

Keywords: isolated word recognition, features extraction, MFCC, LFCC, LPCC, LPC, FPGA, DTW

Procedia PDF Downloads 467
2140 Experimenting the Influence of Input Modality on Involvement Load Hypothesis

Authors: Mohammad Hassanzadeh

Abstract:

As far as incidental vocabulary learning is concerned, the basic contention of the Involvement Load Hypothesis (ILH) is that retention of unfamiliar words is, generally, conditional upon the degree of involvement in processing them. This study examined input modality and incidental vocabulary uptake in a task-induced setting whereby three variously loaded task types (marginal glosses, fill-in-task, and sentence-writing) were alternately assigned to one group of students at Allameh Tabataba’i University (n=2l) during six classroom sessions. While one round of exposure was comprised of the audiovisual medium (TV talk shows), the second round consisted of textual materials with approximately similar subject matter (reading texts). In both conditions, however, the tasks were equivalent to one another. Taken together, the study pursued the dual objectives of establishing a litmus test for the ILH and its proposed values of ‘need’, ‘search’ and ‘evaluation’ in the first place. Secondly, it sought to bring to light the superiority issue of exposure to audiovisual input versus the written input as far as the incorporation of tasks is concerned. At the end of each treatment session, a vocabulary active recall test was administered to measure their incidental gains. Running a one-way analysis of variance revealed that the audiovisual intervention yielded higher gains than the written version even when differing tasks were included. Meanwhile, task 'three' (sentence-writing) turned out the most efficient in tapping learners' active recall of the target vocabulary items. In addition to shedding light on the superiority of audiovisual input over the written input when circumstances are relatively held constant, this study for the most part, did support the underlying tenets of ILH.

Keywords: Keywords— Evaluation, incidental vocabulary learning, input mode, Involvement Load Hypothesis, need, search.

Procedia PDF Downloads 246
2139 Manipulation of Ideological Items in the Audiovisual Translation of Voiced-Over Documentaries in the Arab World

Authors: S. Chabbak

Abstract:

In a widely globalized world, the influence of audiovisual translation on the culture and identity of audiences is unmistakable. However, in the Arab World, there is a noticeable disproportion between this growing influence and the research carried out in the field. As a matter of fact, the voiced-over documentary is one of the most abundantly translated genres in the Arab World that carries lots of ideological elements which are in many cases rendered by manipulation. However, voiced-over documentaries have hardly received any focused attention from researchers in the Arab World. This paper attempts to scrutinize the process of translation of voiced-over documentaries in the Arab World, from French into Arabic in the present case study, by sub-categorizing the ideological items subject to manipulation, identifying the techniques utilized in their translation and exploring the potential extra-linguistic factors that prompt translation agents to opt for manipulative translation. The investigation is based on a corpus of 94 episodes taken from a series entitled 360° GEO Reports, produced by the French German network ARTE in French, and acquired, translated and aired by Al Jazeera Documentary Channel for Arab audiences. The results yielded 124 cases of manipulation in four sub-categories of ideological items, and the use of 10 different oblique procedures in the process of manipulative translation. The study also revealed that manipulation is in most of the instances dictated by the editorial line of the broadcasting channel, in addition to the religious, geopolitical and socio-cultural peculiarities of the target culture.

Keywords: audiovisual translation, ideological items, manipulation, voiced-over documentaries

Procedia PDF Downloads 177
2138 Analysis of Feminist Translation in Subtitling from Arabic into English: A Case Study

Authors: Ghada Ahmed

Abstract:

Feminist translation is one of the strategies adopted in the field of translation studies when a gendered content is being rendered from one language to another, and this strategy has been examined in previous studies on written texts. This research, however, addresses the practice of feminist translation in audiovisual texts that are concerned with the screen, dialogue, image and visual aspects. In this thesis, the objectives are studying feminist translation and its adaptation in subtitling from Arabic into English. It addresses the connections between gender and translation as one domain and feminist translation practices with particular consideration of feminist translation strategies in English subtitles. It examines the visibility of the translator throughout the process, assuming that feminist translation is a product directed by the translator’s feminist position, culture, and ideology as a means of helping unshadow women. It also discusses how subtitling constraints impact feminist translation and how the image that has a narrative value can be integrated into the content of the English subtitles. The reasons for conducting this research project are to study language sexism in English and look into Arabic into English gendered content, taking into consideration the Arabic cultural concepts that may lose their connotations when they are translated into English. This research is also analysing the image in an audiovisual text and its contribution to the written dialogue in subtitling. Thus, this research attempts to answer the following questions: To what extent is there a form of affinity between a gendered content and translation? Is feminist translation an act of merely working on a feminist text or feminising the language of any text, by incorporating the translator’s ideology? How can feminist translation practices be applied in an audiovisual text? How likely is it to adapt feminist translation looking into visual components as well as subtitling constraints? Moreover, the paper searches into the fields of gender and translation; feminist translation, language sexism, media studies, and the gap in the literature related to feminist translation practice in visual texts. For my case study, the "Speed Sisters" film has been chosen so as to analyze its English subtitles for my research. The film is a documentary that was produced in 2015 and directed by Amber Fares. It is about five Palestinian women who try to break the stereotypes about women, and have taken their passion about car-racing forward to be the first all-women car-racing driving team in the Middle East. It tackles the issue of gender in both content and language and this is reflected in the translation. As the research topic is semiotic-channelled, the choice for the theoretical approaches varies and combines between translation studies, audiovisual translation, gender studies, and media studies. Each of which will contribute to understanding a specific field of the research and the results will eventually be integrated to achieve the intended objectives in a way that demonstrates rendering a gendered content in one of the audiovisual translation modes from a language into another.

Keywords: audiovisual translation, feminist translation, films gendered content, subtitling conventions and constraints

Procedia PDF Downloads 260
2137 Face Tracking and Recognition Using Deep Learning Approach

Authors: Degale Desta, Cheng Jian

Abstract:

The most important factor in identifying a person is their face. Even identical twins have their own distinct faces. As a result, identification and face recognition are needed to tell one person from another. A face recognition system is a verification tool used to establish a person's identity using biometrics. Nowadays, face recognition is a common technique used in a variety of applications, including home security systems, criminal identification, and phone unlock systems. This system is more secure because it only requires a facial image instead of other dependencies like a key or card. Face detection and face identification are the two phases that typically make up a human recognition system.The idea behind designing and creating a face recognition system using deep learning with Azure ML Python's OpenCV is explained in this paper. Face recognition is a task that can be accomplished using deep learning, and given the accuracy of this method, it appears to be a suitable approach. To show how accurate the suggested face recognition system is, experimental results are given in 98.46% accuracy using Fast-RCNN Performance of algorithms under different training conditions.

Keywords: deep learning, face recognition, identification, fast-RCNN

Procedia PDF Downloads 91
2136 On Overcoming Common Oral Speech Problems through Authentic Films

Authors: Tamara Matevosyan

Abstract:

The present paper discusses the main problems that students face while developing oral skills through authentic films. It states that special attention should be paid not only to the study of verbal speech but also to non-verbal communication. Authentic films serve as an important tool to understand both native speaker’s gestures and their culture of pausing while speaking. Various phonetic difficulties causing phonetic interference in actual speech are covered in the paper emphasizing the role of authentic films in overcoming them.

Keywords: compressive speech, filled pauses, unfilled pauses, pausing culture

Procedia PDF Downloads 315
2135 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak

Abstract:

The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers

Procedia PDF Downloads 171
2134 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 262
2133 The Convolution Recurrent Network of Using Residual LSTM to Process the Output of the Downsampling for Monaural Speech Enhancement

Authors: Shibo Wei, Ting Jiang

Abstract:

Convolutional-recurrent neural networks (CRN) have achieved much success recently in the speech enhancement field. The common processing method is to use the convolution layer to compress the feature space by multiple upsampling and then model the compressed features with the LSTM layer. At last, the enhanced speech is obtained by deconvolution operation to integrate the global information of the speech sequence. However, the feature space compression process may cause the loss of information, so we propose to model the upsampling result of each step with the residual LSTM layer, then join it with the output of the deconvolution layer and input them to the next deconvolution layer, by this way, we want to integrate the global information of speech sequence better. The experimental results show the network model (RES-CRN) we introduce can achieve better performance than LSTM without residual and overlaying LSTM simply in the original CRN in terms of scale-invariant signal-to-distortion ratio (SI-SNR), speech quality (PESQ), and intelligibility (STOI).

Keywords: convolutional-recurrent neural networks, speech enhancement, residual LSTM, SI-SNR

Procedia PDF Downloads 169