Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1462

Search results for: speech to text

1462 Programmed Speech to Text Summarization Using Graph-Based Algorithm

Authors: Hamsini Pulugurtha, P. V. S. L. Jagadamba

Abstract:

Programmed Speech to Text and Text Summarization Using Graph-based Algorithms can be utilized in gatherings to get the short depiction of the gathering for future reference. This gives signature check utilizing Siamese neural organization to confirm the personality of the client and convert the client gave sound record which is in English into English text utilizing the discourse acknowledgment bundle given in python. At times just the outline of the gathering is required, the answer for this text rundown. Thus, the record is then summed up utilizing the regular language preparing approaches, for example, solo extractive text outline calculations

Keywords: Siamese neural network, English speech, English text, natural language processing, unsupervised extractive text summarization

Procedia PDF Downloads 88
1461 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech

Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori

Abstract:

Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.

Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing

Procedia PDF Downloads 18
1460 Grammatical and Lexical Cohesion in the Japan’s Prime Minister Shinzo Abe’s Speech Text ‘Nihon wa Modottekimashita’

Authors: Nadya Inda Syartanti

Abstract:

This research aims to identify, classify, and analyze descriptively the aspects of grammatical and lexical cohesion in the speech text of Japan’s Prime Minister Shinzo Abe entitled Nihon wa Modotte kimashita delivered in Washington DC, the United States on February 23, 2013, as a research data source. The method used is qualitative research, which uses descriptions through words that are applied by analyzing aspects of grammatical and lexical cohesion proposed by Halliday and Hasan (1976). The aspects of grammatical cohesion consist of references (personal, demonstrative, interrogative pronouns), substitution, ellipsis, and conjunction. In contrast, lexical cohesion consists of reiteration (repetition, synonym, antonym, hyponym, meronym) and collocation. Data classification is based on the 6 aspects of the cohesion. Through some aspects of cohesion, this research tries to find out the frequency of using grammatical and lexical cohesion in Shinzo Abe's speech text entitled Nihon wa Modotte kimashita. The results of this research are expected to help overcome the difficulty of understanding speech texts in Japanese. Therefore, this research can be a reference for learners, researchers, and anyone who is interested in the field of discourse analysis.

Keywords: cohesion, grammatical cohesion, lexical cohesion, speech text, Shinzo Abe

Procedia PDF Downloads 68
1459 Google Translate: AI Application

Authors: Shaima Almalhan, Lubna Shukri, Miriam Talal, Safaa Teskieh

Abstract:

Since artificial intelligence is a rapidly evolving topic that has had a significant impact on technical growth and innovation, this paper examines people's awareness, use, and engagement with the Google Translate application. To see how familiar aware users are with the app and its features, quantitative and qualitative research was conducted. The findings revealed that consumers have a high level of confidence in the application and how far people they benefit from this sort of innovation and how convenient it makes communication.

Keywords: artificial intelligence, google translate, speech recognition, language translation, camera translation, speech to text, text to speech

Procedia PDF Downloads 56
1458 Systemic Functional Grammar Analysis of Barack Obama's Second Term Inaugural Speech

Authors: Sadiq Aminu, Ahmed Lamido

Abstract:

This research studies Barack Obama’s second inaugural speech using Halliday’s Systemic Functional Grammar (SFG). SFG is a text grammar which describes how language is used, so that the meaning of the text can be better understood. The primary source of data in this research work is Barack Obama’s second inaugural speech which was obtained from the internet. The analysis of the speech was based on the ideational and textual metafunctions of Systemic Functional Grammar. Specifically, the researcher analyses the Process Types and Participants (ideational) and the Theme/Rheme (textual). It was found that material process (process of doing) was the most frequently used ‘Process type’ and ‘We’ which refers to the people of America was the frequently used ‘Theme’. Application of the SFG theory, therefore, gives a better meaning to Barack Obama’s speech.

Keywords: ideational, metafunction, rheme, textual, theme

Procedia PDF Downloads 69
1457 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 259
1456 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier

Authors: Saurabh Farkya, Govinda Surampudi

Abstract:

Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.

Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)

Procedia PDF Downloads 414
1455 Comparative Methods for Speech Enhancement and the Effects on Text-Independent Speaker Identification Performance

Authors: R. Ajgou, S. Sbaa, S. Ghendir, A. Chemsa, A. Taleb-Ahmed

Abstract:

The speech enhancement algorithm is to improve speech quality. In this paper, we review some speech enhancement methods and we evaluated their performance based on Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862). All method was evaluated in presence of different kind of noise using TIMIT database and NOIZEUS noisy speech corpus.. The noise was taken from the AURORA database and includes suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train station noise. Simulation results showed improved performance of speech enhancement for Tracking of non-stationary noise approach in comparison with various methods in terms of PESQ measure. Moreover, we have evaluated the effects of the speech enhancement technique on Speaker Identification system based on autoregressive (AR) model and Mel-frequency Cepstral coefficients (MFCC).

Keywords: speech enhancement, pesq, speaker recognition, MFCC

Procedia PDF Downloads 332
1454 Intonation Salience as an Underframe to Text Intonation Models

Authors: Tatiana Stanchuliak

Abstract:

It is common knowledge that intonation is not laid over a ready text. On the contrary, intonation forms and accompanies the text on the level of its birth in the speaker’s mind. As a result, intonation plays one of the fundamental roles in the process of transferring a thought into external speech. Intonation structure can highlight the semantic significance of textual elements and become a ranging mark in understanding the information structure of the text. Intonation functions by means of prosodic characteristics, one of which is intonation salience, whose function in texts results in making some textual elements more prominent than others. This function of intonation, therefore, performs as organizing. It helps to form the frame of key elements of the text. The study under consideration made an attempt to look into the inner nature of salience and create a sort of a text intonation model. This general goal brought to some more specific intermediate results. First, there were established degrees of salience on the level of the smallest semantic element - intonation group, as well as prosodic means of creating salience, were examined. Second, the most frequent combinations of prosodic means made it possible to distinguish patterns of salience, which then became constituent elements of a text intonation model. Third, the analysis of the predicate structure allowed to divide the whole text into smaller parts, or units, which performed a specific function in the developing of the general communicative intention. It appeared that such units can be found in any text and they have common characteristics of their intonation arrangement. These findings are certainly very important both for the theory of intonation and their practical application.

Keywords: accentuation , inner speech, intention, intonation, intonation functions, models, patterns, predicate, salience, semantics, sentence stress, text

Procedia PDF Downloads 186
1453 Searching Linguistic Synonyms through Parts of Speech Tagging

Authors: Faiza Hussain, Usman Qamar

Abstract:

Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.

Keywords: natural language processing, text mining, information retrieval, parts-of-speech tagging, grammar, semantics

Procedia PDF Downloads 213
1452 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 208
1451 Robust Noisy Speech Identification Using Frame Classifier Derived Features

Authors: Punnoose A. K.

Abstract:

This paper presents an approach for identifying noisy speech recording using a multi-layer perception (MLP) trained to predict phonemes from acoustic features. Characteristics of the MLP posteriors are explored for clean speech and noisy speech at the frame level. Appropriate density functions are used to fit the softmax probability of the clean and noisy speech. A function that takes into account the ratio of the softmax probability density of noisy speech to clean speech is formulated. These phoneme independent scoring is weighted using a phoneme-specific weightage to make the scoring more robust. Simple thresholding is used to identify the noisy speech recording from the clean speech recordings. The approach is benchmarked on standard databases, with a focus on precision.

Keywords: noisy speech identification, speech pre-processing, noise robustness, feature engineering

Procedia PDF Downloads 9
1450 An Analysis of Illocutioary Act in Martin Luther King Jr.'s Propaganda Speech Entitled 'I Have a Dream'

Authors: Mahgfirah Firdaus Soberatta

Abstract:

Language cannot be separated from human life. Humans use language to convey ideas, thoughts, and feelings. We can use words for different things for example like asserted, advising, promise, give opinions, hopes, etc. Propaganda is an attempt which seeks to obtain stable behavior to adopt everyone to his everyday life. It also controls the thoughts and attitudes of individuals in social settings permanent. In this research, the writer will discuss about the speech act in a propaganda speech delivered by Martin Luther King Jr. in Washington at Lincoln Memorial on August 28, 1963. 'I Have a Dream' is a public speech delivered by American civil rights activist MLK, he calls from an end to racism in USA. In this research, the writer uses Searle theory to analyze the types of illocutionary speech act that used by Martin Luther King Jr. in his propaganda speech. In this research, the writer uses a qualitative method described in descriptive, because the research wants to describe and explain the types of illocutionary speech acts used by Martin Luther King Jr. in his propaganda speech. The findings indicate that there are five types of speech acts in Martin Luther King Jr. speech. MLK also used direct speech and indirect speech in his propaganda speech. However, direct speech is the dominant speech act that MLK used in his propaganda speech. It is hoped that this research is useful for the readers to enrich their knowledge in a particular field of pragmatic speech acts.

Keywords: speech act, propaganda, Martin Luther King Jr., speech

Procedia PDF Downloads 310
1449 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 314
1448 An Early Attempt of Artificial Intelligence-Assisted Language Oral Practice and Assessment

Authors: Paul Lam, Kevin Wong, Chi Him Chan

Abstract:

Constant practicing and accurate, immediate feedback are the keys to improving students’ speaking skills. However, traditional oral examination often fails to provide such opportunities to students. The traditional, face-to-face oral assessment is often time consuming – attending the oral needs of one student often leads to the negligence of others. Hence, teachers can only provide limited opportunities and feedback to students. Moreover, students’ incentive to practice is also reduced by their anxiety and shyness in speaking the new language. A mobile app was developed to use artificial intelligence (AI) to provide immediate feedback to students’ speaking performance as an attempt to solve the above-mentioned problems. Firstly, it was thought that online exercises would greatly increase the learning opportunities of students as they can now practice more without the needs of teachers’ presence. Secondly, the automatic feedback provided by the AI would enhance students’ motivation to practice as there is an instant evaluation of their performance. Lastly, students should feel less anxious and shy compared to directly practicing oral in front of teachers. Technically, the program made use of speech-to-text functions to generate feedback to students. To be specific, the software analyzes students’ oral input through certain speech-to-text AI engine and then cleans up the results further to the point that can be compared with the targeted text. The mobile app has invited English teachers for the pilot use and asked for their feedback. Preliminary trials indicated that the approach has limitations. Many of the users’ pronunciation were automatically corrected by the speech recognition function as wise guessing is already integrated into many of such systems. Nevertheless, teachers have confidence that the app can be further improved for accuracy. It has the potential to significantly improve oral drilling by giving students more chances to practice. Moreover, they believe that the success of this mobile app confirms the potential to extend the AI-assisted assessment to other language skills, such as writing, reading, and listening.

Keywords: artificial Intelligence, mobile learning, oral assessment, oral practice, speech-to-text function

Procedia PDF Downloads 40
1447 Extraction of Text Subtitles in Multimedia Systems

Authors: Amarjit Singh

Abstract:

In this paper, a method for extraction of text subtitles in large video is proposed. The video data needs to be annotated for many multimedia applications. Text is incorporated in digital video for the motive of providing useful information about that video. So need arises to detect text present in video to understanding and video indexing. This is achieved in two steps. First step is text localization and the second step is text verification. The method of text detection can be extended to text recognition which finds applications in automatic video indexing; video annotation and content based video retrieval. The method has been tested on various types of videos.

Keywords: video, subtitles, extraction, annotation, frames

Procedia PDF Downloads 503
1446 Extracting Actions with Improved Part of Speech Tagging for Social Networking Texts

Authors: Yassine Jamoussi, Ameni Youssfi, Henda Ben Ghezala

Abstract:

With the growing interest in social networking, the interaction of social actors evolved to a source of knowledge in which it becomes possible to perform context aware-reasoning. The information extraction from social networking especially Twitter and Facebook is one of the problems in this area. To extract text from social networking, we need several lexical features and large scale word clustering. We attempt to expand existing tokenizer and to develop our own tagger in order to support the incorrect words currently in existence in Facebook and Twitter. Our goal in this work is to benefit from the lexical features developed for Twitter and online conversational text in previous works, and to develop an extraction model for constructing a huge knowledge based on actions

Keywords: social networking, information extraction, part-of-speech tagging, natural language processing

Procedia PDF Downloads 224
1445 Urdu Text Extraction Method from Images

Authors: Samabia Tehsin, Sumaira Kausar

Abstract:

Due to the vast increase in the multimedia data in recent years, efficient and robust retrieval techniques are needed to retrieve and index images/ videos. Text embedded in the images can serve as the strong retrieval tool for images. This is the reason that text extraction is an area of research with increasing attention. English text extraction is the focus of many researchers but very less work has been done on other languages like Urdu. This paper is focusing on Urdu text extraction from video frames. This paper presents a text detection feature set, which has the ability to deal up with most of the problems connected with the text extraction process. To test the validity of the method, it is tested on Urdu news dataset, which gives promising results.

Keywords: caption text, content-based image retrieval, document analysis, text extraction

Procedia PDF Downloads 407
1444 The Online Advertising Speech that Effect to the Thailand Internet User Decision Making

Authors: Panprae Bunyapukkna

Abstract:

This study investigated figures of speech used in fragrance advertising captions on the Internet. The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions and the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of Lancôme. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively.

Keywords: advertising speech, fragrance advertisements, figures of speech, metaphor

Procedia PDF Downloads 167
1443 Pragmatic Survey of Precedence as Linguistic 'Déjà Vu' in Political Text and Talk

Authors: Zarine Avetisyan

Abstract:

Both in language and literature there exists the theory of recurrence of text and talk chunks which brings us to the notion of precedence. It must be stated that precedence as a pragma-linguistic phenomenon is yet underknown and it is the main objective of the present research to revisit and reveal it thoroughly. In line with the main research objective, analysis of political text and talk provides abundant relevant data for the illustration of the phenomenon of precedence. The analysis focuses on certain pragmatic universals (e.g. intention) and categories (e.g. speech techniques) which lead to the disclosure of the present object of study.

Keywords: intention, precedence, political discourse, pragmatic universals

Procedia PDF Downloads 341
1442 TeleMe Speech Booster: Web-Based Speech Therapy and Training Program for Children with Articulation Disorders

Authors: C. Treerattanaphan, P. Boonpramuk, P. Singla

Abstract:

Frequent, continuous speech training has proven to be a necessary part of a successful speech therapy process, but constraints of traveling time and employment dispensation become key obstacles especially for individuals living in remote areas or for dependent children who have working parents. In order to ameliorate speech difficulties with ample guidance from speech therapists, a website has been developed that supports speech therapy and training for people with articulation disorders in the standard Thai language. This web-based program has the ability to record speech training exercises for each speech trainee. The records will be stored in a database for the speech therapist to investigate, evaluate, compare and keep track of all trainees’ progress in detail. Speech trainees can request live discussions via video conference call when needed. Communication through this web-based program facilitates and reduces training time in comparison to walk-in training or appointments. This type of training also allows people with articulation disorders to practice speech lessons whenever or wherever is convenient for them, which can lead to a more regular training processes.

Keywords: web-based remote training program, Thai speech therapy, articulation disorders, speech booster

Procedia PDF Downloads 281
1441 Development of Non-Intrusive Speech Evaluation Measure Using S-Transform and Light-Gbm

Authors: Tusar Kanti Dash, Ganapati Panda

Abstract:

The evaluation of speech quality and intelligence is critical to the overall effectiveness of the Speech Enhancement Algorithms. Several intrusive and non-intrusive measures are employed to calculate these parameters. Non-Intrusive Evaluation is most challenging as, very often, the reference clean speech data is not available. In this paper, a novel non-intrusive speech evaluation measure is proposed using audio features derived from the Stockwell transform. These features are used with the Light Gradient Boosting Machine for the effective prediction of speech quality and intelligibility. The proposed model is analyzed using noisy and reverberant speech from four databases, and the results are compared with the standard Intrusive Evaluation Measures. It is observed from the comparative analysis that the proposed model is performing better than the standard Non-Intrusive models.

Keywords: non-Intrusive speech evaluation, S-transform, light GBM, speech quality, and intelligibility

Procedia PDF Downloads 118
1440 Small Text Extraction From Documents and Chart Images

Authors: Rominkumar Busa, Shahira K. C., Lijiya A.

Abstract:

Text recognition is an important area in computer vision which deals with detecting and recognising text from an image. The Optical Character Recognition (OCR) is a saturated area these days and with very good text recognition accuracy. However the same OCR methods when applied on text with small font sizes like the text data of chart images, the recognition rate is less than 30\%. In this work, aims to extract small text in images using the deep learning model, CRNN with CTC loss. The text recognition accuracy is found to improve by applying image enhancement by super resolution prior to CRNN model. We also observe the text recognition rate further increases by 18\% by applying the proposed method, which involves super resolution and character segmentation followed by CRNN with CTC loss. The efficiency of the proposed method shows that further pre-processing on chart image text and other small text images will improve the accuracy further, thereby helping text extraction from chart images.

Keywords: small text extraction, OCR, scene text recognition, CRNN

Procedia PDF Downloads 2
1439 Annexation (Al-Iḍāfah) in Thariq bin Ziyad’s Speech

Authors: Annisa D. Febryandini

Abstract:

Annexation is a typical construction that commonly used in Arabic language. The use of the construction appears in Arabic speech such as the speech of Thariq bin Ziyad. The speech as one of the most famous speeches in the history of Islam uses many annexations. This qualitative research paper uses the secondary data by library method. Based on the data, this paper concludes that the speech has two basic structures with some variations and has some grammatical relationship. Different from the other researches that identify the speech in sociology field, the speech in this paper will be analyzed in linguistic field to take a look at the structure of its annexation as well as the grammatical relationship.

Keywords: annexation, Thariq bin Ziyad, grammatical relationship, Arabic syntax

Procedia PDF Downloads 197
1438 Blind Speech Separation Using SRP-PHAT Localization and Optimal Beamformer in Two-Speaker Environments

Authors: Hai Quang Hong Dam, Hai Ho, Minh Hoang Le Ngo

Abstract:

This paper investigates the problem of blind speech separation from the speech mixture of two speakers. A voice activity detector employing the Steered Response Power - Phase Transform (SRP-PHAT) is presented for detecting the activity information of speech sources and then the desired speech signals are extracted from the speech mixture by using an optimal beamformer. For evaluation, the algorithm effectiveness, a simulation using real speech recordings had been performed in a double-talk situation where two speakers are active all the time. Evaluations show that the proposed blind speech separation algorithm offers a good interference suppression level whilst maintaining a low distortion level of the desired signal.

Keywords: blind speech separation, voice activity detector, SRP-PHAT, optimal beamformer

Procedia PDF Downloads 210
1437 Human Computer Interaction Using Computer Vision and Speech Processing

Authors: Shreyansh Jain Jeetmal, Shobith P. Chadaga, Shreyas H. Srinivas

Abstract:

Internet of Things (IoT) is seen as the next major step in the ongoing revolution in the Information Age. It is predicted that in the near future billions of embedded devices will be communicating with each other to perform a plethora of tasks with or without human intervention. One of the major ongoing hotbed of research activity in IoT is Human Computer Interaction (HCI). HCI is used to facilitate communication between an intelligent system and a user. An intelligent system typically comprises of a system consisting of various sensors, actuators and embedded controllers which communicate with each other to monitor data collected from the environment. Communication by the user to the system is typically done using voice. One of the major ongoing applications of HCI is in home automation as a personal assistant. The prime objective of our project is to implement a use case of HCI for home automation. Our system is designed to detect and recognize the users and personalize the appliances in the house according to their individual preferences. Our HCI system is also capable of speaking with the user when certain commands are spoken such as searching on the web for information and controlling appliances. Our system can also monitor the environment in the house such as air quality and gas leakages for added safety.

Keywords: human computer interaction, internet of things, computer vision, sensor networks, speech to text, text to speech, android

Procedia PDF Downloads 284
1436 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies

Procedia PDF Downloads 68
1435 Speech Impact Realization via Manipulative Argumentation Techniques in Modern American Political Discourse

Authors: Zarine Avetisyan

Abstract:

Paper presents the discussion of scholars concerning speech impact, peculiarities of its realization, speech strategies, and techniques. Departing from the viewpoints of many prominent linguists, the paper suggests manipulative argumentation be viewed as a most pervasive speech strategy with a certain set of techniques which are to be found in modern American political discourse. The precedence of their occurrence allows us to regard them as pragmatic patterns of speech impact realization in effective public speaking.

Keywords: speech impact, manipulative argumentation, political discourse, technique

Procedia PDF Downloads 421
1434 Speech Enhancement Using Kalman Filter in Communication

Authors: Eng. Alaa K. Satti Salih

Abstract:

Revolutions Applications such as telecommunications, hands-free communications, recording, etc. which need at least one microphone, the signal is usually infected by noise and echo. The important application is the speech enhancement, which is done to remove suppressed noises and echoes taken by a microphone, beside preferred speech. Accordingly, the microphone signal has to be cleaned using digital signal processing DSP tools before it is played out, transmitted, or stored. Engineers have so far tried different approaches to improving the speech by get back the desired speech signal from the noisy observations. Especially Mobile communication, so in this paper will do reconstruction of the speech signal, observed in additive background noise, using the Kalman filter technique to estimate the parameters of the Autoregressive Process (AR) in the state space model and the output speech signal obtained by the MATLAB. The accurate estimation by Kalman filter on speech would enhance and reduce the noise then compare and discuss the results between actual values and estimated values which produce the reconstructed signals.

Keywords: autoregressive process, Kalman filter, Matlab, noise speech

Procedia PDF Downloads 245
1433 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak

Abstract:

The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers

Procedia PDF Downloads 90