Search results for: Kazakh speech dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1933

Search results for: Kazakh speech dataset

1933 Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition

Authors: A. Shoiynbek, K. Kozhakhmet, P. Menezes, D. Kuanyshbay, D. Bayazitov

Abstract:

Speech emotion recognition has received increasing research interest all through current years. There was used emotional speech that was collected under controlled conditions in most research work. Actors imitating and artificially producing emotions in front of a microphone noted those records. There are four issues related to that approach, namely, (1) emotions are not natural, and it means that machines are learning to recognize fake emotions. (2) Emotions are very limited by quantity and poor in their variety of speaking. (3) There is language dependency on SER. (4) Consequently, each time when researchers want to start work with SER, they need to find a good emotional database on their language. In this paper, we propose the approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describe the sequence of actions of the proposed approach. One of the first objectives of the sequence of actions is a speech detection issue. The paper gives a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian languages. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To illustrate the working capacity of the developed model, we have performed an analysis of speech detection and extraction from real tasks.

Keywords: deep neural networks, speech detection, speech emotion recognition, Mel-frequency cepstrum coefficients, collecting speech emotion corpus, collecting speech emotion dataset, Kazakh speech dataset

Procedia PDF Downloads 99
1932 Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition

Authors: Aisultan Shoiynbek, Darkhan Kuanyshbay, Paulo Menezes, Akbayan Bekarystankyzy, Assylbek Mukhametzhanov, Temirlan Shoiynbek

Abstract:

Speech emotion recognition (SER) has received increasing research interest in recent years. It is a common practice to utilize emotional speech collected under controlled conditions recorded by actors imitating and artificially producing emotions in front of a microphone. There are four issues related to that approach: emotions are not natural, meaning that machines are learning to recognize fake emotions; emotions are very limited in quantity and poor in variety of speaking; there is some language dependency in SER; consequently, each time researchers want to start work with SER, they need to find a good emotional database in their language. This paper proposes an approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describes the sequence of actions involved in the proposed approach. One of the first objectives in the sequence of actions is the speech detection issue. The paper provides a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To investigate the working capacity of the developed model, an analysis of speech detection and extraction from real tasks has been performed.

Keywords: deep neural networks, speech detection, speech emotion recognition, Mel-frequency cepstrum coefficients, collecting speech emotion corpus, collecting speech emotion dataset, Kazakh speech dataset

Procedia PDF Downloads 26
1931 Developing Kazakh Language Fluency Test in Nazarbayev University

Authors: Saule Mussabekova, Samal Abzhanova

Abstract:

The Kazakh Language Fluency Test, based on the IELTS exam, was implemented in 2012 at Nazarbayev University in Astana, Kazakhstan. We would like to share our experience in developing this exam and some exam results with other language instructors. In this paper, we will cover all these peculiarities and their related issues. The Kazakh Language Fluency Test is a young exam. During its development, we faced many difficulties. One of the goals of the university and the country is to encourage fluency in the Kazakh language for all citizens of the Republic. Nazarbayev University has introduced a Kazakh language program to assist in achieving this goal. This policy is one-step in ensuring that NU students have a thorough understanding of the Kazakh language through a fluency test based on the International English Language Testing System (IELTS). The Kazakh Language Fluency Test exam aims to determine student’s knowledge of Kazakh language. The fact is that there are three types of students at Nazarbayev University: Kazakh-speaking heritage learners, Russian-speaking and English-speaking students. Unfortunately, we have Kazakh students who do not speak Kazakh. All students who finished school with Russian language instruction are given Kazakh Language Fluency Test in order to determine their Kazakh level. After the test exam, all students can choose appropriate Kazakh course: Basic Kazakh, Intermediate Kazakh and Upper-Intermediate Kazakh. The Kazakh Language Fluency Test consists of four parts: Listening, Reading, Writing and Speaking. They are taken on the same day in the abovementioned order.

Keywords: diagnostic test, kazakh language, placement test, test result

Procedia PDF Downloads 406
1930 Spoken Subcorpus of the Kazakh Language: History, Content, Methodology

Authors: Kuralay Bimoldaevna Kuderinova, Beisenkhan Samal

Abstract:

The history of creating a linguistic corpus in Kazakh linguistics begins only in 2016. Though within this short period of time, the linguistic corpus has become a national corpus and its several subcorpora, namely historical, cultural, spoken, dialectological, writers’ subcorpus, proverbs subcorpus and poetic texts subcorpus, have appeared and are working effectively. Among them, the spoken corpus has its own characteristics. The Kazakh language is one of the languages belonging to the Kypchak-Nogai group of Turkic peoples. The Kazakh language is a language that, as a part of the former Soviet Union, was directly influenced by the Russian language and underwent major changes in its spoken and written forms. After the Republic of Kazakhstan gained independence, the Kazakh language received the status of the state language in 1991. However, today, the prestige of the Russian language is still higher than that of the Kazakh language. Therefore, the direct influence of the Russian language on the structure, style, and vocabulary of the Kazakh language continues. In particular, it can be said that the national practice of the spoken language is disappearing, as the spoken form of Kazakh is not used in official gatherings and events of state importance. In this regard, it is very important to collect and preserve examples of spoken language. Recording exemplary spoken texts, converting them into written form, and providing their audio along with orphoepic explanations will serve as a valuable tool for teaching and learning the Kazakh language. Therefore, the report will cover interesting aspects and scientific foundations related to the creation, content, and methodology of the oral subcorpus of the Kazakh language.

Keywords: spoken corpus, Kazakh language, orthoepic norm, LLM

Procedia PDF Downloads 8
1929 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 196
1928 The Application of a Hybrid Neural Network for Recognition of a Handwritten Kazakh Text

Authors: Almagul Assainova , Dariya Abykenova, Liudmila Goncharenko, Sergey Sybachin, Saule Rakhimova, Abay Aman

Abstract:

The recognition of a handwritten Kazakh text is a relevant objective today for the digitization of materials. The study presents a model of a hybrid neural network for handwriting recognition, which includes a convolutional neural network and a multi-layer perceptron. Each network includes 1024 input neurons and 42 output neurons. The model is implemented in the program, written in the Python programming language using the EMNIST database, NumPy, Keras, and Tensorflow modules. The neural network training of such specific letters of the Kazakh alphabet as ә, ғ, қ, ң, ө, ұ, ү, h, і was conducted. The neural network model and the program created on its basis can be used in electronic document management systems to digitize the Kazakh text.

Keywords: handwriting recognition system, image recognition, Kazakh font, machine learning, neural networks

Procedia PDF Downloads 261
1927 Kazakh Language Assessment in a New Multilingual Kazakhstan

Authors: Karlygash Adamova

Abstract:

This article is focused on the KazTest as one of the most important high-stakes tests and the key tool in Kazakh language assessment. The research will also include the brief introduction to the language policy in Kazakhstan. Particularly, it is going to be changed significantly and turn from bilingualism (Kazakh, Russian) to multilingual policy (three languages - Kazakh, Russian, English). Therefore, the current status of the abovementioned languages will be described. Due to the various educational reforms in the country, the language evaluation system should also be improved and moderated. The research will present the most significant test of Kazakhstan – the KazTest, which is aimed to evaluate the Kazakh language proficiency. Assessment is an ongoing process that encompasses a wide area of knowledge upon the productive performance of the learners. Test is widely defined as a standardized or standard method of research, testing, diagnostics, verification, etc. The two most important characteristics of any test, as the main element of the assessment - validity and reliability - will also be described in this paper. Therefore, the preparation and design of the test, which is assumed to be an indicator of knowledge, and it is highly important to take into account all these properties.

Keywords: multilingualism, language assessment, testing, language policy

Procedia PDF Downloads 136
1926 The Influence of English Learning on Ethnic Kazakh Minority Students’ Identity (Re)Construction at Chinese Universities

Authors: Sharapat Sharapat

Abstract:

English language is perceived as cultural capital in many non-native English-speaking countries, and minority groups in these social contexts seem to invest in the language to be empowered and reposition themselves from the imbalanced power relation with the dominant group. This study is devoted to explore how English learning influence minority Kazakh students’ identity (re)construction at Chinese universities from the scope of ‘imagined community, investment, and identity’ theory of Norton (2013). To this end the three research questions were designed as follows: 1) Kazakh minority students’ English learning experiences at Chinese universities; 2) Kazakh minority students’ views about benefits and opportunities of English learning; 3) the influence of English learning on Kazakh minority students’ identity (re)construction. The study employs an interview-based qualitative research method by interviewing nine Kazakh minority students in universities in Xinjiang and other inland cities in China. The findings suggest that through English learning, some students have reconstructed multiple identities as multicultural and global identities, which created ‘a third space’ to break limits of their ethnic and national identities and confused identity as someone in-between. Meanwhile, most minority students were empowered by the English language to resist inferior or marginalized positions and reconstruct imagined elite identity. However, English learning disempowered students who have little previous English education in school and placed them on unequal footing with other students, which further escalated the educational inequities.

Keywords: minority in China, identity construction, multilingual education, language empowerment

Procedia PDF Downloads 231
1925 Hate Speech Detection in Tunisian Dialect

Authors: Helmi Baazaoui, Mounir Zrigui

Abstract:

This study addresses the challenge of hate speech detection in Tunisian Arabic text, a critical issue for online safety and moderation. Leveraging the strengths of the AraBERT model, we fine-tuned and evaluated its performance against the Bi-LSTM model across four distinct datasets: T-HSAB, TNHS, TUNIZI-Dataset, and a newly compiled dataset with diverse labels such as Offensive Language, Racism, and Religious Intolerance. Our experimental results demonstrate that AraBERT significantly outperforms Bi-LSTM in terms of Recall, Precision, F1-Score, and Accuracy across all datasets. The findings underline the robustness of AraBERT in capturing the nuanced features of Tunisian Arabic and its superior capability in classification tasks. This research not only advances the technology for hate speech detection but also provides practical implications for social media moderation and policy-making in Tunisia. Future work will focus on expanding the datasets and exploring more sophisticated architectures to further enhance detection accuracy, thus promoting safer online interactions.

Keywords: hate speech detection, Tunisian Arabic, AraBERT, Bi-LSTM, Gemini annotation tool, social media moderation

Procedia PDF Downloads 11
1924 Language Factor in the Formation of National and Cultural Identity of Kazakhstan

Authors: Andabayeva Dina, Avakova Raushangul, Kortabayeva Gulzhamal, Rakhymbay Bauyrzhan

Abstract:

This article attempts to give an overview of the language situation and language planning in Kazakhstan. Statistical data is given and excursion to history of languages in Kazakhstan is done. Particular emphasis is placed on the national- cultural component of the Kazakh people, namely the impact of the specificity of the Kazakh language on ethnic identity. Language is one of the basic aspects of national identity. Recently, in the Republic of Kazakhstan purposeful work on language development has been conducted. Optimal solution of language problems is a factor of interethnic relations harmonization, strengthening and consolidation of the peoples and public consent. Development of languages - one of the important directions of the state policy in the Republic of Kazakhstan. The problem of the state language, as part of national (civil) identification play a huge role in the successful integration process of Kazakh society. And quite rightly assume that one of the foundations of a new civic identity is knowing Kazakh language by all citizens of Kazakhstan. The article is an analysis of the language situation in Kazakhstan in close connection with the peculiarities of cultural identity.

Keywords: Kazakhstan, mentality, language policy, ethnolinguistics, language planning, language personality

Procedia PDF Downloads 634
1923 Robust Noisy Speech Identification Using Frame Classifier Derived Features

Authors: Punnoose A. K.

Abstract:

This paper presents an approach for identifying noisy speech recording using a multi-layer perception (MLP) trained to predict phonemes from acoustic features. Characteristics of the MLP posteriors are explored for clean speech and noisy speech at the frame level. Appropriate density functions are used to fit the softmax probability of the clean and noisy speech. A function that takes into account the ratio of the softmax probability density of noisy speech to clean speech is formulated. These phoneme independent scoring is weighted using a phoneme-specific weightage to make the scoring more robust. Simple thresholding is used to identify the noisy speech recording from the clean speech recordings. The approach is benchmarked on standard databases, with a focus on precision.

Keywords: noisy speech identification, speech pre-processing, noise robustness, feature engineering

Procedia PDF Downloads 126
1922 An Analysis of Illocutioary Act in Martin Luther King Jr.'s Propaganda Speech Entitled 'I Have a Dream'

Authors: Mahgfirah Firdaus Soberatta

Abstract:

Language cannot be separated from human life. Humans use language to convey ideas, thoughts, and feelings. We can use words for different things for example like asserted, advising, promise, give opinions, hopes, etc. Propaganda is an attempt which seeks to obtain stable behavior to adopt everyone to his everyday life. It also controls the thoughts and attitudes of individuals in social settings permanent. In this research, the writer will discuss about the speech act in a propaganda speech delivered by Martin Luther King Jr. in Washington at Lincoln Memorial on August 28, 1963. 'I Have a Dream' is a public speech delivered by American civil rights activist MLK, he calls from an end to racism in USA. In this research, the writer uses Searle theory to analyze the types of illocutionary speech act that used by Martin Luther King Jr. in his propaganda speech. In this research, the writer uses a qualitative method described in descriptive, because the research wants to describe and explain the types of illocutionary speech acts used by Martin Luther King Jr. in his propaganda speech. The findings indicate that there are five types of speech acts in Martin Luther King Jr. speech. MLK also used direct speech and indirect speech in his propaganda speech. However, direct speech is the dominant speech act that MLK used in his propaganda speech. It is hoped that this research is useful for the readers to enrich their knowledge in a particular field of pragmatic speech acts.

Keywords: speech act, propaganda, Martin Luther King Jr., speech

Procedia PDF Downloads 440
1921 The Online Advertising Speech that Effect to the Thailand Internet User Decision Making

Authors: Panprae Bunyapukkna

Abstract:

This study investigated figures of speech used in fragrance advertising captions on the Internet. The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions and the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of Lancôme. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively.

Keywords: advertising speech, fragrance advertisements, figures of speech, metaphor

Procedia PDF Downloads 240
1920 Optimized Brain Computer Interface System for Unspoken Speech Recognition: Role of Wernicke Area

Authors: Nassib Abdallah, Pierre Chauvet, Abd El Salam Hajjar, Bassam Daya

Abstract:

In this paper, we propose an optimized brain computer interface (BCI) system for unspoken speech recognition, based on the fact that the constructions of unspoken words rely strongly on the Wernicke area, situated in the temporal lobe. Our BCI system has four modules: (i) the EEG Acquisition module based on a non-invasive headset with 14 electrodes; (ii) the Preprocessing module to remove noise and artifacts, using the Common Average Reference method; (iii) the Features Extraction module, using Wavelet Packet Transform (WPT); (iv) the Classification module based on a one-hidden layer artificial neural network. The present study consists of comparing the recognition accuracy of 5 Arabic words, when using all the headset electrodes or only the 4 electrodes situated near the Wernicke area, as well as the selection effect of the subbands produced by the WPT module. After applying the articial neural network on the produced database, we obtain, on the test dataset, an accuracy of 83.4% with all the electrodes and all the subbands of 8 levels of the WPT decomposition. However, by using only the 4 electrodes near Wernicke Area and the 6 middle subbands of the WPT, we obtain a high reduction of the dataset size, equal to approximately 19% of the total dataset, with 67.5% of accuracy rate. This reduction appears particularly important to improve the design of a low cost and simple to use BCI, trained for several words.

Keywords: brain-computer interface, speech recognition, artificial neural network, electroencephalography, EEG, wernicke area

Procedia PDF Downloads 269
1919 TeleMe Speech Booster: Web-Based Speech Therapy and Training Program for Children with Articulation Disorders

Authors: C. Treerattanaphan, P. Boonpramuk, P. Singla

Abstract:

Frequent, continuous speech training has proven to be a necessary part of a successful speech therapy process, but constraints of traveling time and employment dispensation become key obstacles especially for individuals living in remote areas or for dependent children who have working parents. In order to ameliorate speech difficulties with ample guidance from speech therapists, a website has been developed that supports speech therapy and training for people with articulation disorders in the standard Thai language. This web-based program has the ability to record speech training exercises for each speech trainee. The records will be stored in a database for the speech therapist to investigate, evaluate, compare and keep track of all trainees’ progress in detail. Speech trainees can request live discussions via video conference call when needed. Communication through this web-based program facilitates and reduces training time in comparison to walk-in training or appointments. This type of training also allows people with articulation disorders to practice speech lessons whenever or wherever is convenient for them, which can lead to a more regular training processes.

Keywords: web-based remote training program, Thai speech therapy, articulation disorders, speech booster

Procedia PDF Downloads 375
1918 Development of Non-Intrusive Speech Evaluation Measure Using S-Transform and Light-Gbm

Authors: Tusar Kanti Dash, Ganapati Panda

Abstract:

The evaluation of speech quality and intelligence is critical to the overall effectiveness of the Speech Enhancement Algorithms. Several intrusive and non-intrusive measures are employed to calculate these parameters. Non-Intrusive Evaluation is most challenging as, very often, the reference clean speech data is not available. In this paper, a novel non-intrusive speech evaluation measure is proposed using audio features derived from the Stockwell transform. These features are used with the Light Gradient Boosting Machine for the effective prediction of speech quality and intelligibility. The proposed model is analyzed using noisy and reverberant speech from four databases, and the results are compared with the standard Intrusive Evaluation Measures. It is observed from the comparative analysis that the proposed model is performing better than the standard Non-Intrusive models.

Keywords: non-Intrusive speech evaluation, S-transform, light GBM, speech quality, and intelligibility

Procedia PDF Downloads 258
1917 Annexation (Al-Iḍāfah) in Thariq bin Ziyad’s Speech

Authors: Annisa D. Febryandini

Abstract:

Annexation is a typical construction that commonly used in Arabic language. The use of the construction appears in Arabic speech such as the speech of Thariq bin Ziyad. The speech as one of the most famous speeches in the history of Islam uses many annexations. This qualitative research paper uses the secondary data by library method. Based on the data, this paper concludes that the speech has two basic structures with some variations and has some grammatical relationship. Different from the other researches that identify the speech in sociology field, the speech in this paper will be analyzed in linguistic field to take a look at the structure of its annexation as well as the grammatical relationship.

Keywords: annexation, Thariq bin Ziyad, grammatical relationship, Arabic syntax

Procedia PDF Downloads 317
1916 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments

Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo

Abstract:

This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.

Keywords: blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer

Procedia PDF Downloads 282
1915 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 101
1914 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse

Authors: Zarine Avetisyan

Abstract:

Paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies, and techniques. Departing from the viewpoints of many prominent linguists, the paper suggests manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.

Keywords: speech impact, manipulative argumentation, political discourse, technique

Procedia PDF Downloads 508
1913 Speech Enhancement Using Kalman Filter in Communication

Authors: Eng. Alaa K. Satti Salih

Abstract:

Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.

Keywords: autoregressive process, Kalman filter, Matlab, noise speech

Procedia PDF Downloads 344
1912 Comparative Methods for Speech Enhancement and the Effects on Text-Independent Speaker Identification Performance

Authors: R. Ajgou, S. Sbaa, S. Ghendir, A. Chemsa, A. Taleb-Ahmed

Abstract:

The speech enhancement algorithm is to improve speech quality. In this paper, we review some speech enhancement methods and we evaluated their performance based on Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862). All method was evaluated in presence of different kind of noise using TIMIT database and NOIZEUS noisy speech corpus.. The noise was taken from the AURORA database and includes suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train station noise. Simulation results showed improved performance of speech enhancement for Tracking of non-stationary noise approach in comparison with various methods in terms of PESQ measure. Moreover, we have evaluated the effects of the speech enhancement technique on Speaker Identification system based on autoregressive (AR) model and Mel-frequency Cepstral coefficients (MFCC).

Keywords: speech enhancement, pesq, speaker recognition, MFCC

Procedia PDF Downloads 423
1911 Freedom of Speech and Involvement in Hatred Speech on Social Media Networks

Authors: Sara Chinnasamy, Michelle Gun, M. Adnan Hashim

Abstract:

Federal Constitution guarantees Malaysians the right to free speech and expression; yet hatred speech can be commonly found on social media platforms such as Facebook, Twitter, and Instagram. In Malaysia social media sphere, most hatred speech involves religion, race and politics. Recent cases of racial attacks on social media have created social tensions among Malaysians. Many Malaysians always argue on their rights to freedom of speech. However, there are laws that limit their expression to the public and protecting social media users from being a victim of hate speech. This paper aims to explore the attitude and involvement of Malaysian netizens towards freedom of speech and hatred speech on social media. It also examines the relationship between involvement in hatred speech among Malaysian netizens and attitude towards freedom of speech. For most Malaysians, practicing total freedom of speech in the open is unthinkable. As a result, the best channel to articulate their feelings and opinions liberally is the internet. With the advent of the internet medium, more and more Malaysians are conveying their viewpoints using the various internet channels although sensitivity of the audience is seldom taken into account. Consequently, this situation has led to pockets of social disharmony among the citizens. Although this unhealthy activity is denounced by the authority, netizens are generally of the view that they have the right to write anything they want. Using the quantitative method, survey was conducted among Malaysians aged between 18 and 50 years who are active social media users. Results from the survey reveal that despite a weak relationship level between hatred speech involvement on social media and attitude towards freedom of speech, the association is still considerably significant. As such, it can be safely presumed that hatred speech on social media occurs due to the freedom of speech that exists by way of social media channels.

Keywords: freedom of speech, hatred speech, social media, Malaysia, netizens

Procedia PDF Downloads 455
1910 Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Authors: Van Nhan Nguyen, Harald Holone

Abstract:

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Keywords: automatic speech recognition, asr, air traffic control, atc

Procedia PDF Downloads 398
1909 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: phonogram, speech signal, temporal characteristics, fundamental frequency, biometric fingerprints

Procedia PDF Downloads 141
1908 Wolof Voice Response Recognition System: A Deep Learning Model for Wolof Audio Classification

Authors: Krishna Mohan Bathula, Fatou Bintou Loucoubar, FNU Kaleemunnisa, Christelle Scharff, Mark Anthony De Castro

Abstract:

Voice recognition algorithms such as automatic speech recognition and text-to-speech systems with African languages can play an important role in bridging the digital divide of Artificial Intelligence in Africa, contributing to the establishment of a fully inclusive information society. This paper proposes a Deep Learning model that can classify the user responses as inputs for an interactive voice response system. A dataset with Wolof language words ‘yes’ and ‘no’ is collected as audio recordings. A two stage Data Augmentation approach is adopted for enhancing the dataset size required by the deep neural network. Data preprocessing and feature engineering with Mel-Frequency Cepstral Coefficients are implemented. Convolutional Neural Networks (CNNs) have proven to be very powerful in image classification and are promising for audio processing when sounds are transformed into spectra. For performing voice response classification, the recordings are transformed into sound frequency feature spectra and then applied image classification methodology using a deep CNN model. The inference model of this trained and reusable Wolof voice response recognition system can be integrated with many applications associated with both web and mobile platforms.

Keywords: automatic speech recognition, interactive voice response, voice response recognition, wolof word classification

Procedia PDF Downloads 115
1907 Intervention of Self-Limiting L1 Inner Speech during L2 Presentations: A Study of Bangla-English Bilinguals

Authors: Abdul Wahid

Abstract:

Inner speech, also known as verbal thinking, self-talk or private speech, is characterized by the subjective language experience in the absence of overt or audible speech. It is a psychological form of verbal activity which is being rehearsed without the articulation of any sound wave. In Psychology, self-limiting speech means the type of speech which contains information that inhibits the development of the self. People, in most cases, experience inner speech in their first language. It is very frequent in Bangladesh where the Bangla (L1) speaking students lose track of speech during their presentations in English (L2). This paper investigates into the long pauses (more than 0.4 seconds long) in English (L2) presentations by Bangla speaking students (18-21 year old) and finds the intervention of Bangla (L1) inner speech as one of its causes. The overt speeches of the presenters are placed on Audacity Audio Editing software where the length of pauses are measured in milliseconds. Varieties of inner speech questionnaire (VISQ) have been conducted randomly amongst the participants out of whom 20 were selected who have similar phenomenology of inner speech. They have been interviewed to describe the type and content of the voices that went on in their head during the long pauses. The qualitative interview data are then codified and converted into quantitative data. It was observed that in more than 80% cases students experience self-limiting inner speech/self-talk during their unwanted pauses in L2 presentations.

Keywords: Bangla-English Bilinguals, inner speech, L1 intervention in bilingualism, motor schema, pauses, phonological loop, phonological store, working memory

Procedia PDF Downloads 151
1906 Performance Evaluation of Acoustic-Spectrographic Voice Identification Method in Native and Non-Native Speech

Authors: E. Krasnova, E. Bulgakova, V. Shchemelinin

Abstract:

The paper deals with acoustic-spectrographic voice identification method in terms of its performance in non-native language speech. Performance evaluation is conducted by comparing the result of the analysis of recordings containing native language speech with recordings that contain foreign language speech. Our research is based on Tajik and Russian speech of Tajik native speakers due to the character of the criminal situation with drug trafficking. We propose a pilot experiment that represents a primary attempt enter the field.

Keywords: speaker identification, acoustic-spectrographic method, non-native speech, performance evaluation

Procedia PDF Downloads 445
1905 Automatic Segmentation of the Clean Speech Signal

Authors: M. A. Ben Messaoud, A. Bouzid, N. Ellouze

Abstract:

Speech Segmentation is the measure of the change point detection for partitioning an input speech signal into regions each of which accords to only one speaker. In this paper, we apply two features based on multi-scale product (MP) of the clean speech, namely the spectral centroid of MP, and the zero crossings rate of MP. We focus on multi-scale product analysis as an important tool for segmentation extraction. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. We have evaluated our method on the Keele database. Experimental results show the effectiveness of our method presenting a good performance. It shows that the two simple features can find word boundaries, and extracted the segments of the clean speech.

Keywords: multiscale product, spectral centroid, speech segmentation, zero crossings rate

Procedia PDF Downloads 498
1904 Philosophical Foundations of Education at the Kazakh Languages by Aiding Communicative Methods

Authors: Duisenova Marzhan

Abstract:

This paper considers the looking from a philosophical point of view the interactive technology and tiered developing Kazakh language teaching primary school pupils through the method of linguistic communication, content and teaching methods formed in the education system. The values determined by the formation of new practical ways that could lead to a novel qualitative level and solving the problem. In the formation of the communicative competence of elementary school students would be to pay attention to other competencies. It helps to understand the motives and needs socialization of students, the development of their cognitive abilities and participate in language relations arising from different situations. Communicative competence is the potential of its own in pupils creative language activity. In this article, the Kazakh language teaching in primary school communicative method is presented. The purpose of learning communicative method, personal development, effective psychological development of the child, himself-education, expansion and growth of language skills and vocabulary, socialization of children, the adoption of the laws of life in the social environment, analyzed the development of vocabulary richness of the language that forms the erudition to ensure continued improvement of education of the child.

Keywords: communicative, culture, training, process, method, primary, competence

Procedia PDF Downloads 339