Search results for: Speech understanding.
1291 Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression
Authors: Ehab Mourtaga, Ahmad Sharieh, Mousa Abdallah
Abstract:
An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.Keywords: Hidden Markov Model (HMM), MaximumLikelihood Linear Regression (MLLR), Quran, Regression ClassTree, Speech Recognition, Speaker-independent.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19151290 Face Localization Using Illumination-dependent Face Model for Visual Speech Recognition
Authors: Robert E. Hursig, Jane X. Zhang
Abstract:
A robust still image face localization algorithm capable of operating in an unconstrained visual environment is proposed. First, construction of a robust skin classifier within a shifted HSV color space is described. Then various filtering operations are performed to better isolate face candidates and mitigate the effect of substantial non-skin regions. Finally, a novel Bhattacharyya-based face detection algorithm is used to compare candidate regions of interest with a unique illumination-dependent face model probability distribution function approximation. Experimental results show a 90% face detection success rate despite the demands of the visually noisy environment.Keywords: Audio-visual speech recognition, Bhattacharyyacoefficient, face detection,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16281289 Assessment of the Occupancy’s Effect on Speech Intelligibility in Al-Madinah Holy Mosque
Authors: Wasim Orfali, Hesham Tolba
Abstract:
This research investigates the acoustical characteristics of Al-Madinah Holy Mosque. Extensive field measurements were conducted in different locations of Al-Madinah Holy Mosque to characterize its acoustic characteristics. The acoustical characteristics are usually evaluated by the use of objective parameters in unoccupied rooms due to practical considerations. However, under normal conditions, the room occupancy can vary such characteristics due to the effect of the additional sound absorption present in the room or by the change in signal-to-noise ratio. Based on the acoustic measurements carried out in Al-Madinah Holy Mosque with and without occupancy, and the analysis of such measurements, the existence of acoustical deficiencies has been confirmed.Keywords: Worship sound, Al-Madinah Holy Mosque, mosque acoustics, speech intelligibility.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7301288 A Sociolinguistic Study of the Outcomes of Arabic-French Contact in the Algerian Dialect Tlemcen Speech Community as a Case Study
Authors: R. Rahmoun-Mrabet
Abstract:
It is acknowledged that our style of speaking changes according to a wide range of variables such as gender, setting, the age of both the addresser and the addressee, the conversation topic, and the aim of the interaction. These differences in style are noticeable in monolingual and multilingual speech communities. Yet, they are more observable in speech communities where two or more codes coexist. The linguistic situation in Algeria reflects a state of bilingualism because of the coexistence of Arabic and French. Nevertheless, like all Arab countries, it is characterized by diglossia i.e. the concomitance of Modern Standard Arabic (MSA) and Algerian Arabic (AA), the former standing for the ‘high variety’ and the latter for the ‘low variety’. The two varieties are derived from the same source but are used to fulfil distinct functions that is, MSA is used in the domains of religion, literature, education and formal settings. AA, on the other hand, is used in informal settings, in everyday speech. French has strongly affected the Algerian language and culture because of the historical background of Algeria, thus, what can easily be noticed in Algeria is that everyday speech is characterized by code-switching from dialectal Arabic and French or by the use of borrowings. Tamazight is also very present in many regions of Algeria and is the mother tongue of many Algerians. Yet, it is not used in the west of Algeria, where the study has been conducted. The present work, which was directed in the speech community of Tlemcen-Algeria, aims at depicting some of the outcomes of the contact of Arabic with French such as code-switching, borrowing and interference. The question that has been asked is whether Algerians are aware of their use of borrowings or not. Three steps are followed in this research; the first one is to depict the sociolinguistic situation in Algeria and to describe the linguistic characteristics of the dialect of Tlemcen, which are specific to this city. The second one is concerned with data collection. Data have been collected from 57 informants who were given questionnaires and who have then been classified according to their age, gender and level of education. Information has also been collected through observation, and note taking. The third step is devoted to analysis. The results obtained reveal that most Algerians are aware of their use of borrowings. The present work clarifies how words are borrowed from French, and then adapted to Arabic. It also illustrates the way in which singular words inflect into plural. The results expose the main characteristics of borrowing as opposed to code-switching. The study also clarifies how interference occurs at the level of nouns, verbs and adjectives.
Keywords: Bilingualism, borrowing, code-switching, interference, language contact.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9491287 Tele-Operated Anthropomorphic Arm and Hand Design
Authors: Namal A. Senanayake, Khoo B. How, Quah W. Wai
Abstract:
In this project, a tele-operated anthropomorphic robotic arm and hand is designed and built as a versatile robotic arm system. The robot has the ability to manipulate objects such as pick and place operations. It is also able to function by itself, in standalone mode. Firstly, the robotic arm is built in order to interface with a personal computer via a serial servo controller circuit board. The circuit board enables user to completely control the robotic arm and moreover, enables feedbacks from user. The control circuit board uses a powerful integrated microcontroller, a PIC (Programmable Interface Controller). The PIC is firstly programmed using BASIC (Beginner-s All-purpose Symbolic Instruction Code) and it is used as the 'brain' of the robot. In addition a user friendly Graphical User Interface (GUI) is developed as the serial servo interface software using Microsoft-s Visual Basic 6. The second part of the project is to use speech recognition control on the robotic arm. A speech recognition circuit board is constructed with onboard components such as PIC and other integrated circuits. It replaces the computers- Graphical User Interface. The robotic arm is able to receive instructions as spoken commands through a microphone and perform operations with respect to the commands such as picking and placing operations.Keywords: Tele-operated Anthropomorphic Robotic Arm and Hand, Robot Motion System, Serial Servo Controller, Speech Recognition Controller.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17661286 On a Pitch Duration Technique for Prosody Control
Authors: JongKuk Kim, HernSoo Hahn, Uei-Joong Yoo, MyungJin Bae
Abstract:
In this paper, we propose a method of alter duration in frequency domain that control prosody in real time after pitch alteration. If there has a method to alteration duration freely among prosody information, that may used in several fields such as speech impediment person's pronunciation proof reading or language study. The pitch alteration method used control prosody altered by PSOLA synthesis method which is in time domain processing method. However, the duration of pitch alteration speech is changed by the frequency domain. In this paper, we altered the duration with the method of duration alteration by Fast Fourier Transformation in frequency domain. Consequently, the intelligibility of the pitch and duration are controlled has a slight decrease than the case when only pitch is changed, but the proposed algorithm obtained the higher MOS score about naturalness.Keywords: PSOLA, Pitch Alteration, Duration Control.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16841285 Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time
Authors: Jyh-Da Wei, Hsin-Chen Tsai
Abstract:
This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized in linear time. The power and the zero crossing rate are first calculated segment by segment from a voice signal; by doing so, two feature sequences are generated. We then construct an FIR system across these two sequences. The parameters of this FIR system, used as the input of a multilayer proceptron recognizer, can be derived by recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of this work, we introduce a weighting factor λ to emphasize recent input; therefore, we can further recognize continuous speech signals. Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to recognize voice signals efficiently and accurately.Keywords: Speech Recognition, FIR system, Recursive LSE, Multilayer Perceptron
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14171284 DHT-LMS Algorithm for Sensorineural Loss Patients
Authors: Sunitha S. L., V. Udayashankara
Abstract:
Hearing impairment is the number one chronic disability affecting many people in the world. Background noise is particularly damaging to speech intelligibility for people with hearing loss especially for sensorineural loss patients. Several investigations on speech intelligibility have demonstrated sensorineural loss patients need 5-15 dB higher SNR than the normal hearing subjects. This paper describes Discrete Hartley Transform Power Normalized Least Mean Square algorithm (DHT-LMS) to improve the SNR and to reduce the convergence rate of the Least Means Square (LMS) for sensorineural loss patients. The DHT transforms n real numbers to n real numbers, and has the convenient property of being its own inverse. It can be effectively used for noise cancellation with less convergence time. The simulated result shows the superior characteristics by improving the SNR at least 9 dB for input SNR with zero dB and faster convergence rate (eigenvalue ratio 12) compare to time domain method and DFT-LMS.Keywords: Hearing Impairment, DHT-LMS, Convergence rate, SNR improvement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17251283 Optimized Brain Computer Interface System for Unspoken Speech Recognition: Role of Wernicke Area
Authors: Nassib Abdallah, Pierre Chauvet, Abd El Salam Hajjar, Bassam Daya
Abstract:
In this paper, we propose an optimized brain computer interface (BCI) system for unspoken speech recognition, based on the fact that the constructions of unspoken words rely strongly on the Wernicke area, situated in the temporal lobe. Our BCI system has four modules: (i) the EEG Acquisition module based on a non-invasive headset with 14 electrodes; (ii) the Preprocessing module to remove noise and artifacts, using the Common Average Reference method; (iii) the Features Extraction module, using Wavelet Packet Transform (WPT); (iv) the Classification module based on a one-hidden layer artificial neural network. The present study consists of comparing the recognition accuracy of 5 Arabic words, when using all the headset electrodes or only the 4 electrodes situated near the Wernicke area, as well as the selection effect of the subbands produced by the WPT module. After applying the articial neural network on the produced database, we obtain, on the test dataset, an accuracy of 83.4% with all the electrodes and all the subbands of 8 levels of the WPT decomposition. However, by using only the 4 electrodes near Wernicke Area and the 6 middle subbands of the WPT, we obtain a high reduction of the dataset size, equal to approximately 19% of the total dataset, with 67.5% of accuracy rate. This reduction appears particularly important to improve the design of a low cost and simple to use BCI, trained for several words.Keywords: Brain-computer interface, speech recognition, electroencephalography EEG, Wernicke area, artificial neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9181282 Autistic Children and Different Tense Forms
Authors: Ameneh Zare, Shahin Nematzadeh, Shahla Raghibdoust, Iran Kalbassi
Abstract:
Autism spectrum disorder is characterized by abnormalities in social communication, language abilities and repetitive behaviors. The present study focused on some grammatical deficits in autistic children. We evaluated the impairment of correct use of different Persian verb tenses in autistic children-s speech. Two standardized Language Test were administered then gathered data were analyzed. The main result of this study was significant difference between the mean scores of correct responses to present tense in comparison with past tense in Persian language. This study demonstrated that tense is severely impaired in autistic children-s speech. Our findings indicated those autistic children-s production of simple present/ past tense opposition to be better than production of future and past periphrastic forms (past perfect, present perfect, past progressive).Keywords: Autism, Past, Persian Language, Present, Tense
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27541281 Multidisciplinary Approach to Diagnosis of Primary Progressive Aphasia in a Younger Middle Aged Patient
Authors: Robert Krause
Abstract:
Primary progressive aphasia (PPA) is a neurodegenerative disease similar to frontotemporal and semantic dementia, while having a different clinical image and anatomic pathology topography. Nonetheless, they are often included under an umbrella term: frontotemporal lobar degeneration (FTLD). In the study, examples of diagnosing PPA are presented through the multidisciplinary lens of specialists from different fields (neurologists, psychiatrists, clinical speech therapists, clinical neuropsychologists and others) using a variety of diagnostic tools such as MR, PET/CT, genetic screening and neuropsychological and logopedic methods. Thanks to that, specialists can get a better and clearer understanding of PPA diagnosis. The study summarizes the concrete procedures and results of different specialists while diagnosing PPA in a patient of younger middle age and illustrates the importance of multidisciplinary approach to differential diagnosis of PPA.
Keywords: Primary progressive aphasia, etiology, diagnosis, younger middle age.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6511280 A Review in Advanced Digital Signal Processing Systems
Authors: Roza Dastres, Mohsen Soori
Abstract:
Digital Signal Processing (DSP) is the use of digital processing systems by computers in order to perform a variety of signal processing operations. It is the mathematical manipulation of a digital signal's numerical values in order to increase quality as well as effects of signals. DSP can include linear or nonlinear operators in order to process and analyze the input signals. The nonlinear DSP processing is closely related to nonlinear system detection and can be implemented in time, frequency and space-time domains. Applications of the DSP can be presented as control systems, digital image processing, biomedical engineering, speech recognition systems, industrial engineering, health care systems, radar signal processing and telecommunication systems. In this study, advanced methods and different applications of DSP are reviewed in order to move forward the interesting research filed.Keywords: Digital signal processing, advanced telecommunication, nonlinear signal processing, speech recognition systems.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10381279 Fast Factored DCT-LMS Speech Enhancement for Performance Enhancement of Digital Hearing Aid
Authors: Sunitha. S.L., V. Udayashankara
Abstract:
Background noise is particularly damaging to speech intelligibility for people with hearing loss especially for sensorineural loss patients. Several investigations on speech intelligibility have demonstrated sensorineural loss patients need 5-15 dB higher SNR than the normal hearing subjects. This paper describes Discrete Cosine Transform Power Normalized Least Mean Square algorithm to improve the SNR and to reduce the convergence rate of the LMS for Sensory neural loss patients. Since it requires only real arithmetic, it establishes the faster convergence rate as compare to time domain LMS and also this transformation improves the eigenvalue distribution of the input autocorrelation matrix of the LMS filter. The DCT has good ortho-normal, separable, and energy compaction property. Although the DCT does not separate frequencies, it is a powerful signal decorrelator. It is a real valued function and thus can be effectively used in real-time operation. The advantages of DCT-LMS as compared to standard LMS algorithm are shown via SNR and eigenvalue ratio computations. . Exploiting the symmetry of the basis functions, the DCT transform matrix [AN] can be factored into a series of ±1 butterflies and rotation angles. This factorization results in one of the fastest DCT implementation. There are different ways to obtain factorizations. This work uses the fast factored DCT algorithm developed by Chen and company. The computer simulations results show superior convergence characteristics of the proposed algorithm by improving the SNR at least 10 dB for input SNR less than and equal to 0 dB, faster convergence speed and better time and frequency characteristics.Keywords: Hearing Impairment, DCT Adaptive filter, Sensorineural loss patients, Convergence rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21711278 On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location
Authors: Min Ah Kang, Sangbae Jeong, Minsoo Hahn
Abstract:
This paper presents the source extraction system which can extract only target signals with constraints on source localization in on-line systems. The proposed system is a kind of methods for enhancing a target signal and suppressing other interference signals. But, the performance of proposed system is superior to any other methods and the extraction of target source is comparatively complete. The method has a beamforming concept and uses an improved time-frequency (TF) mask-based BSS algorithm to separate a target signal from multiple noise sources. The target sources are assumed to be in front and test data was recorded in a reverberant room. The experimental results of the proposed method was evaluated by the PESQ score of real-recording sentences and showed a noticeable speech enhancement.
Keywords: Beam forming, Non-stationary noise reduction, Source separation, TF mask.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20221277 Identifying Understanding Expectations of School Administrators Regarding School Assessment
Authors: Eftah Bte. Moh Hj Abdullah, Izazol Binti Idris, Abd Aziz Bin Abd Shukor
Abstract:
This study aims to identify the understanding expectations of school administrators concerning school assessment. The researcher utilized a qualitative descriptive study on 19 administrators from three secondary schools in the North Kinta district. The respondents had been interviewed on their understanding expectations of school assessment using the focus group discussion method. Overall findings showed that the administrators’ understanding expectations of school assessment was weak; especially in terms of content focus, articulation across age and grade, transparency and fairness, as well as the pedagogical implications. Findings from interviews indicated that administrators explained their understanding expectations of school assessment from the aspect of school management, and not from the aspect of instructional leadership or specifically as assessment leaders. The study implications from the administrators’ understanding expectations may hint at the difficulty of the administrators to function as assessment leaders, in order to reduce their focus as manager, and move towards their primary role in the process of teaching and learning. The administrator, as assessment leaders, would be able to reach assessment goals via collaboration in identifying and listing teacher assessment competencies, how to construct assessment capacity, how to interpret assessment correctly, the use of assessment and how to use assessment information to communicate confidently and effectively to the public.
Keywords: Assessment leaders, assessment goals, instructional leadership, understanding expectation of assessment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12731276 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development
Authors: L. Kamandulytė-Merfeldienė
Abstract:
The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.
Keywords: CHILDES, Corpus of Spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9481275 Investigating Medical Students’ Perspectives toward University Teachers’ Talking Features in an English as a Foreign Language Context in Urmia, Iran
Authors: Ismail Baniadam, Nafisa Tadayyon, Javid Fereidoni
Abstract:
This study aimed to investigate medical students’ attitudes toward some teachers’ talking features regarding their gender in the Iranian context. To do so, 60 male and 60 female medical students of Urmia University of Medical Sciences (UMSU) participated in the research. A researcher made Likert-type questionnaire which was initially piloted and was used to gather the data. Comparing the four different factors regarding the features of teacher talk, it was revealed that visual and extra-linguistic information factor, Lexical and syntactic familiarity, Speed of speech, and the use of Persian language had the highest to the lowest mean score, respectively. It was also indicated that female students rather than male students were significantly more in favor of speed of speech and lexical and syntactic familiarity.
Keywords: Attitude, gender, medical student, teacher talk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8001274 Connectionist Approach to Generic Text Summarization
Authors: Rajesh S.Prasad, U. V. Kulkarni, Jayashree.R.Prasad
Abstract:
As the enormous amount of on-line text grows on the World-Wide Web, the development of methods for automatically summarizing this text becomes more important. The primary goal of this research is to create an efficient tool that is able to summarize large documents automatically. We propose an Evolving connectionist System that is adaptive, incremental learning and knowledge representation system that evolves its structure and functionality. In this paper, we propose a novel approach for Part of Speech disambiguation using a recurrent neural network, a paradigm capable of dealing with sequential data. We observed that connectionist approach to text summarization has a natural way of learning grammatical structures through experience. Experimental results show that our approach achieves acceptable performance. Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15911273 The Code-Mixing of Japanese, English and Thai in Line Chat
Authors: Premvadee Na Nakornpanom
Abstract:
Code- mixing in spontaneous speech has been widely discussed, but not in virtual situations; especially in context of the third language learning students. Thus, this study is an attempt to explore the linguistic characteristics of the mixing of Japanese, English and Thai in a mobile Line chat room by students with their background of English as L2, Japanese as L3 and Thai as mother tongue. The result found that insertion of Thai content words is a very common linguistic phenomenon embedded with the other two languages in the sentences. As chatting is to be ‘relational’ or ‘interactional’, it affected the style of lexical choices to be speech-like, more personal and emotionally-related. A personal pronoun in Japanese is often mixed into the sentences. The Japanese sentence-final question particle か “ka” was added to the end of the sentence based on Thai grammar rules. Some unique characteristics were created while chatting.
Keywords: Code-mixing, Japanese, English, Thai, Line chat.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34481272 Freedom with Limitations: The Nature of Free Expression in the European Case-Law
Authors: Laszlo Vari
Abstract:
In the digital age, the spread of the mobile world and the nature of the cyberspace, offers many new opportunities for the prevalence of the fundamental right to free expression, and therefore, for free speech and freedom of the press; however, these new information communication technologies carry many new challenges. Defamation, censorship, fake news, misleading information, hate speech, breach of copyright etc., are only some of the violations, all of which can be derived from the harmful exercise of freedom of expression, all which become more salient in the internet. Here raises the question: how can we eliminate these problems, and practice our fundamental freedom rightfully? To answer this question, we should understand the elements and the characteristic of the nature of freedom of expression, and the role of the actors whose duties and responsibilities are crucial in the prevalence of this fundamental freedom. To achieve this goal, this paper will explore the European practice to understand instructions found in the case-law of the European Court of Human rights for the rightful exercise of freedom of expression.
Keywords: Collision of rights, European case-law, freedom opinion and expression, media law, freedom of information, online expression
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9291271 Hand Gesture Recognition: Sign to Voice System (S2V)
Authors: Oi Mean Foong, Tan Jung Low, Satrio Wibowo
Abstract:
Hand gesture is one of the typical methods used in sign language for non-verbal communication. It is most commonly used by people who have hearing or speech problems to communicate among themselves or with normal people. Various sign language systems have been developed by manufacturers around the globe but they are neither flexible nor cost-effective for the end users. This paper presents a system prototype that is able to automatically recognize sign language to help normal people to communicate more effectively with the hearing or speech impaired people. The Sign to Voice system prototype, S2V, was developed using Feed Forward Neural Network for two-sequence signs detection. Different sets of universal hand gestures were captured from video camera and utilized to train the neural network for classification purpose. The experimental results have shown that neural network has achieved satisfactory result for sign-to-voice translation.Keywords: Hand gesture detection, neural network, signlanguage, sequence detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18561270 A New Vector Quantization Front-End Process for Discrete HMM Speech Recognition System
Authors: M. Debyeche, J.P Haton, A. Houacine
Abstract:
The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.
Keywords: Hidden Markov Model, Vector Quantization, Neural Network, Speech Recognition, Arabic Language
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20561269 Alignment between Understanding and Assessment Practice among Secondary School Teachers
Authors: Eftah Bte. Moh @ Hj Abdullah, Izazol Binti Idris, Abd Aziz Bin Abd Shukor
Abstract:
This study aimed to identify the alignment of understanding and assessment practices among secondary school teachers. The study was carried out using quantitative descriptive study. The sample consisted of 164 teachers who taught Form 1 and 2 from 11 secondary schools in the district of North Kinta, Perak, Malaysia. Data were obtained from 164 respondents who answered Expectation Alignment Understanding and Practices of School Assessment (PEKDAPS) questionnaire. The data were analysed using SPSS 17.0+. The Cronbach’s alpha value obtained through PEKDAPS questionnaire pilot study was 0.86. The results showed that teachers' performance in PEKDAPS based on the mean value was less than 3, which means that perfect alignment does not occur between the understanding and practices of school assessment. Two major PEKDAPS sub-constructs of articulation across grade and age and usability of the system were higher than the moderate alignment of the understanding and practices of school assessment (Min=2.0). The content focused of PEKDAPs sub-constructs which showed lower than the moderate alignment of the understanding and practices of school assessment (Min=2.0). Another two PEKDAPS subconstructs of transparency and fairness and the pedagogical implications showed moderate alignment (2.0). The implications of the study is that teachers need to fully understand the importance of alignment among components of assessment, learning and teaching and learning objectives as strategies to achieve quality assessment process.
Keywords: Alignment, assessment practices, School Based Assessment, understanding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20021268 An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System
Authors: Cheima Ben Soltane, Ittansa Yonas Kelbesa
Abstract:
Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.Keywords: Feature Extraction, Speaker Modeling, Feature Matching, Mel Frequency Cepstrum Coefficient (MFCC), Gaussian mixture model (GMM), Vector Quantization (VQ), Linde-Buzo-Gray (LBG), Expectation Maximization (EM), pre-processing, Voice Activity Detection (VAD), Short Time Energy (STE), Background Noise Statistical Modeling, Closed-Set Tex-Independent Speaker Identification System (CISI).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18861267 Reading and Teaching Poetry as Communicative Discourse: A Pragma-Linguistic Approach
Authors: Omnia Elkommos
Abstract:
Language is communication on several discourse levels. The target of teaching a language and the literature of a foreign language is to communicate a message. Reading, appreciating, analysing, and interpreting poetry as a sophisticated rhetorical expression of human thoughts, emotions, and philosophical messages is more feasible through the use of linguistic pragmatic tools from a communicative discourse perspective. The poet's intention, speech act, illocutionary act, and perlocutionary goal can be better understood when communicative situational context as well as linguistic discourse structure theories are employed. The use of linguistic theories in the teaching of poetry is, therefore, intrinsic to students' comprehension, interpretation, and appreciation of poetry of the different ages. It is the purpose of this study to show how both teachers as well as students can apply these linguistic theories and tools to dramatic poetic texts for an engaging, enlightening, and effective interpretation and appreciation of the language. Theories drawn from areas of pragmatics, discourse analysis, embedded discourse level, communicative situational context, and other linguistic approaches were applied to selected poetry texts from the different centuries. Further, in a simple statistical count of the number of poems with dialogic dramatic discourse with embedded two or three levels of discourse in different anthologies outweighs the number of descriptive poems with a one level of discourse, between the poet and the reader. Poetry is thus discourse on one, two, or three levels. It is, therefore, recommended that teachers and students in the area of ESL/EFL use the linguistics theories for a better understanding of poetry as communicative discourse. The practice of applying these linguistic theories in classrooms and in research will allow them to perceive the language and its linguistic, social, and cultural aspect. Texts will become live illocutionary acts with a perlocutionary acts goal rather than mere literary texts in anthologies.
Keywords: Coda, commissives, communicative situation, context of culture, context of reference, context of utterance, dialogue, directives, discourse analysis, dramatic discourse interaction, duologue, embedded discourse levels, language for communication, linguistic structures, literary texts, poetry, pragmatic theories, reader response, speech acts (macro/micro), stylistics, teaching literature, TEFL, terms of address, turn-taking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17321266 Architecture of Speech-based Registration System
Authors: Mayank Kumar, D B Mahesh Kumar, Ashwin S Kumar, N K Srinath
Abstract:
In this era of technology, fueled by the pervasive usage of the internet, security is a prime concern. The number of new attacks by the so-called “bots", which are automated programs, is increasing at an alarming rate. They are most likely to attack online registration systems. Technology, called “CAPTCHA" (Completely Automated Public Turing test to tell Computers and Humans Apart) do exist, which can differentiate between automated programs and humans and prevent replay attacks. Traditionally CAPTCHA-s have been implemented with the challenge involved in recognizing textual images and reproducing the same. We propose an approach where the visual challenge has to be read out from which randomly selected keywords are used to verify the correctness of spoken text and in turn detect the presence of human. This is supplemented with a speaker recognition system which can identify the speaker also. Thus, this framework fulfills both the objectives – it can determine whether the user is a human or not and if it is a human, it can verify its identity.
Keywords: CAPTCHA, automatic speech recognition, keyword spotting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15471265 Mistranslation in Cross Cultural Communication: A Discourse Analysis on Former President Bush’s Speech in 2001
Authors: Lowai Abed
Abstract:
The differences in languages play a big role in cross-cultural communication. If meanings are not translated accurately, the risk can be crucial not only on an interpersonal level, but also on the international and political levels. The use of metaphorical language by politicians can cause great confusion, often leading to statements being misconstrued. In these situations, it is the translators who struggle to put forward the intended meaning with clarity and this makes translation an important field to study and analyze when it comes to cross-cultural communication. Owing to the growing importance of language and the power of translation in politics, this research analyzes part of President Bush’s speech in 2001 in which he used the word “Crusade” which caused his statement to be misconstrued. The research uses a discourse analysis of cross-cultural communication literature which provides answers supported by historical, linguistic, and communicative perspectives. The first finding indicates that the word ‘crusade’ carries different meaning and significance in the narratives of the Western world when compared to the Middle East. The second one is that, linguistically, maintaining cultural meanings through translation is quite difficult and challenging. Third, when it comes to the cross-cultural communication perspective, the common and frequent usage of literal translation is a sign of poor strategies being followed in translation training. Based on the example of Bush’s speech, this paper hopes to highlight the weak practices in translation in cross-cultural communication which are still commonly used across the world. Translation studies have to take issues such as this seriously and attempt to find a solution. In every language, there are words and phrases that have cultural, historical and social meanings that are woven into the language. Literal translation is not the solution for this problem because that strategy is unable to convey these meanings in the target language.
Keywords: Crusade, metaphor, mistranslation, war in terror.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8441264 Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading
Authors: Harshit Mehrotra, Gaurav Agrawal, M.C. Srivastava
Abstract:
Computerized lip reading has been one of the most actively researched areas of computer vision in recent past because of its crime fighting potential and invariance to acoustic environment. However, several factors like fast speech, bad pronunciation, poor illumination, movement of face, moustaches and beards make lip reading difficult. In present work, we propose a solution for automatic lip contour tracking and recognizing letters of English language spoken by speakers using the information available from lip movements. Level set method is used for tracking lip contour using a contour velocity model and a feature vector of lip movements is then obtained. Character recognition is performed using modified k nearest neighbor algorithm which assigns more weight to nearer neighbors. The proposed system has been found to have accuracy of 73.3% for character recognition with speaker lip movements as the only input and without using any speech recognition system in parallel. The approach used in this work is found to significantly solve the purpose of lip reading when size of database is small.Keywords: Contour Velocity Model, Lip Contour Tracking, LipReading, Visual Character Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24011263 Effect of Teaching Games for Understanding Approach on Students- Cognitive Learning Outcome
Authors: Malathi Balakrishnan, Shabeshan Rengasamy, Mohd Salleh Aman
Abstract:
The study investigated the effects of Teaching Games for Understanding approach on students ‘cognitive learning outcome. The study was a quasi-experimental non-equivalent pretest-posttest control group design whereby 10 year old primary school students (n=72) were randomly assigned to an experimental and a control group. The experimental group students were exposed with TGfU approach and the control group with the Traditional Skill approach of handball game. Game Performance Assessment Instrument (GPAI) was used to measure students' tactical understanding and decision making in 3 versus 3 handball game situations. Analysis of covariance (ANCOVA) was used to analyze the data. The results reveal that there was a significant difference between the TGfU approach group and the traditional skill approach group students on post test score (F (1, 69) = 248.83, p < .05). The findings of this study suggested the importance of TGfU approach to improve primary students’ tactical understanding and decision making in handball game.Keywords: Constructivism, learning outcome, tactical understanding, and Teaching Game for Understanding (TGfU)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 46031262 Neuro-Fuzzy Based Model for Phrase Level Emotion Understanding
Authors: Vadivel Ayyasamy
Abstract:
The present approach deals with the identification of Emotions and classification of Emotional patterns at Phrase-level with respect to Positive and Negative Orientation. The proposed approach considers emotion triggered terms, its co-occurrence terms and also associated sentences for recognizing emotions. The proposed approach uses Part of Speech Tagging and Emotion Actifiers for classification. Here sentence patterns are broken into phrases and Neuro-Fuzzy model is used to classify which results in 16 patterns of emotional phrases. Suitable intensities are assigned for capturing the degree of emotion contents that exist in semantics of patterns. These emotional phrases are assigned weights which supports in deciding the Positive and Negative Orientation of emotions. The approach uses web documents for experimental purpose and the proposed classification approach performs well and achieves good F-Scores.
Keywords: Emotions, sentences, phrases, classification, patterns, fuzzy, positive orientation, negative orientation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1079