Search results for: intelligent speech interface
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2824

Search results for: intelligent speech interface

2794 Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition

Authors: A. Shoiynbek, K. Kozhakhmet, P. Menezes, D. Kuanyshbay, D. Bayazitov

Abstract:

Speech emotion recognition has received increasing research interest all through current years. There was used emotional speech that was collected under controlled conditions in most research work. Actors imitating and artificially producing emotions in front of a microphone noted those records. There are four issues related to that approach, namely, (1) emotions are not natural, and it means that machines are learning to recognize fake emotions. (2) Emotions are very limited by quantity and poor in their variety of speaking. (3) There is language dependency on SER. (4) Consequently, each time when researchers want to start work with SER, they need to find a good emotional database on their language. In this paper, we propose the approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describe the sequence of actions of the proposed approach. One of the first objectives of the sequence of actions is a speech detection issue. The paper gives a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian languages. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To illustrate the working capacity of the developed model, we have performed an analysis of speech detection and extraction from real tasks.

Keywords: deep neural networks, speech detection, speech emotion recognition, Mel-frequency cepstrum coefficients, collecting speech emotion corpus, collecting speech emotion dataset, Kazakh speech dataset

Procedia PDF Downloads 74
2793 The Influence of Advertising Captions on the Internet through the Consumer Purchasing Decision

Authors: Suwimol Apapol, Punrapha Praditpong

Abstract:

The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions as well as the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of several Goods service. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively which is in harmony with the research hypotheses as well.

Keywords: advertising captions, captions on internet, consumer purchasing decision, e-commerce

Procedia PDF Downloads 247
2792 A Novel RLS Based Adaptive Filtering Method for Speech Enhancement

Authors: Pogula Rakesh, T. Kishore Kumar

Abstract:

Speech enhancement is a long standing problem with numerous applications like teleconferencing, VoIP, hearing aids, and speech recognition. The motivation behind this research work is to obtain a clean speech signal of higher quality by applying the optimal noise cancellation technique. Real-time adaptive filtering algorithms seem to be the best candidate among all categories of the speech enhancement methods. In this paper, we propose a speech enhancement method based on Recursive Least Squares (RLS) adaptive filter of speech signals. Experiments were performed on noisy data which was prepared by adding AWGN, Babble and Pink noise to clean speech samples at -5dB, 0dB, 5dB, and 10dB SNR levels. We then compare the noise cancellation performance of proposed RLS algorithm with existing NLMS algorithm in terms of Mean Squared Error (MSE), Signal to Noise ratio (SNR), and SNR loss. Based on the performance evaluation, the proposed RLS algorithm was found to be a better optimal noise cancellation technique for speech signals.

Keywords: adaptive filter, adaptive noise canceller, mean squared error, noise reduction, NLMS, RLS, SNR, SNR loss

Procedia PDF Downloads 450
2791 An Analysis of OpenSim Graphical User Interface Effectiveness

Authors: Sina Saadati

Abstract:

OpenSim is a well-known software in biomechanical studies. There are worthy algorithms developed in this program which are used for modeling and simulation of human motions. In this research, we analyze the OpenSim application from the computer science perspective. It is important that every application have a user-friendly interface. An effective user interface can decrease the time, costs, and energy needed to learn how to use a program. In this paper, we survey the user interface of OpenSim as an important factor of the software. Finally, we infer that there are many challenges to be addressed in the development of OpenSim.

Keywords: biomechanics, computer engineering, graphical user interface, modeling and simulation, interface effectiveness

Procedia PDF Downloads 59
2790 Prosodic Characteristics of Post Traumatic Stress Disorder Induced Speech Changes

Authors: Jarek Krajewski, Andre Wittenborn, Martin Sauerland

Abstract:

This abstract describes a promising approach for estimating post-traumatic stress disorder (PTSD) based on prosodic speech characteristics. It illustrates the validity of this method by briefly discussing results from an Arabic refugee sample (N= 47, 32 m, 15 f). A well-established standardized self-report scale “Reaction of Adolescents to Traumatic Stress” (RATS) was used to determine the ground truth level of PTSD. The speech material was prompted by telling about autobiographical related sadness inducing experiences (sampling rate 16 kHz, 8 bit resolution). In order to investigate PTSD-induced speech changes, a self-developed set of 136 prosodic speech features was extracted from the .wav files. This set was adapted to capture traumatization related speech phenomena. An artificial neural network (ANN) machine learning model was applied to determine the PTSD level and reached a correlation of r = .37. These results indicate that our classifiers can achieve similar results to those seen in speech-based stress research.

Keywords: speech prosody, PTSD, machine learning, feature extraction

Procedia PDF Downloads 70
2789 An Algorithm Based on the Nonlinear Filter Generator for Speech Encryption

Authors: A. Belmeguenai, K. Mansouri, R. Djemili

Abstract:

This work present a new algorithm based on the nonlinear filter generator for speech encryption and decryption. The proposed algorithm consists on the use a linear feedback shift register (LFSR) whose polynomial is primitive and nonlinear Boolean function. The purpose of this system is to construct Keystream with good statistical properties, but also easily computable on a machine with limited capacity calculated. This proposed speech encryption scheme is very simple, highly efficient, and fast to implement the speech encryption and decryption. We conclude the paper by showing that this system can resist certain known attacks.

Keywords: nonlinear filter generator, stream ciphers, speech encryption, security analysis

Procedia PDF Downloads 268
2788 Modern Machine Learning Conniptions for Automatic Speech Recognition

Authors: S. Jagadeesh Kumar

Abstract:

This expose presents a luculent of recent machine learning practices as employed in the modern and as pertinent to prospective automatic speech recognition schemes. The aspiration is to promote additional traverse ablution among the machine learning and automatic speech recognition factions that have transpired in the precedent. The manuscript is structured according to the chief machine learning archetypes that are furthermore trendy by now or have latency for building momentous hand-outs to automatic speech recognition expertise. The standards offered and convoluted in this article embraces adaptive and multi-task learning, active learning, Bayesian learning, discriminative learning, generative learning, supervised and unsupervised learning. These learning archetypes are aggravated and conferred in the perspective of automatic speech recognition tools and functions. This manuscript bequeaths and surveys topical advances of deep learning and learning with sparse depictions; further limelight is on their incessant significance in the evolution of automatic speech recognition.

Keywords: automatic speech recognition, deep learning methods, machine learning archetypes, Bayesian learning, supervised and unsupervised learning

Procedia PDF Downloads 417
2787 Prosody Generation in Neutral Speech Storytelling Application Using Tilt Model

Authors: Manjare Chandraprabha A., S. D. Shirbahadurkar, Manjare Anil S., Paithne Ajay N.

Abstract:

This paper proposes Intonation Modeling for Prosody generation in Neutral speech for Marathi (language spoken in Maharashtra, India) story telling applications. Nowadays audio story telling devices are very eminent for children. In this paper, we proposed tilt model for stressed words in Marathi for speech modification. Tilt model predicts modification in tone of neutral speech. GMM is used to identify stressed words for modification.

Keywords: tilt model, fundamental frequency, statistical parametric speech synthesis, GMM

Procedia PDF Downloads 363
2786 The Importance of Right Speech in Buddhism and Its Relevance Today

Authors: Gautam Sharda

Abstract:

The concept of right speech is the third stage of the noble eightfold path as prescribed by the Buddha and followed by millions of practicing Buddhists. The Buddha lays a lot of importance on the notion of right speech (Samma Vacca). In the Angutara Nikaya, the Buddha mentioned what constitutes right speech, which is basically four kinds of abstentions; namely abstaining from false speech, abstaining from slanderous speech, abstaining from harsh or hateful speech and abstaining from idle chatter. The Buddha gives reasons in support of his view as to why abstaining from these four kinds of speeches is favourable not only for maintaining the peace and equanimity within an individual but also within a society. It is a known fact that when we say something harsh or slanderous to others, it eventually affects our individual peace of mind too. We also know about the many examples of hate speeches which have led to senseless cases of violence and which are well documented within our country and the world. Also, indulging in false speech is not a healthy sign for individuals within a group as this kind of a social group which is based on falsities and lies cannot really survive for long and will eventually lead to chaos. Buddha also told us to refrain from idle chatter or gossip as generally we have seen that idle chatter or gossip does more harm than any good to the individual and the society. Hence, if most of us actually inculcate this third stage (namely, right speech) of the noble eightfold path of the Buddha in our daily life, it would be highly beneficial both for the individual and for the harmony of the society.

Keywords: Buddhism, speech, individual, society

Procedia PDF Downloads 233
2785 Advances in Artificial intelligence Using Speech Recognition

Authors: Khaled M. Alhawiti

Abstract:

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.

Keywords: speech recognition, acoustic phonetic, artificial intelligence, hidden markov models (HMM), statistical models of speech recognition, human machine performance

Procedia PDF Downloads 448
2784 Brain Computer Interface Implementation for Affective Computing Sensing: Classifiers Comparison

Authors: Ramón Aparicio-García, Gustavo Juárez Gracia, Jesús Álvarez Cedillo

Abstract:

A research line of the computer science that involve the study of the Human-Computer Interaction (HCI), which search to recognize and interpret the user intent by the storage and the subsequent analysis of the electrical signals of the brain, for using them in the control of electronic devices. On the other hand, the affective computing research applies the human emotions in the HCI process helping to reduce the user frustration. This paper shows the results obtained during the hardware and software development of a Brain Computer Interface (BCI) capable of recognizing the human emotions through the association of the brain electrical activity patterns. The hardware involves the sensing stage and analogical-digital conversion. The interface software involves algorithms for pre-processing of the signal in time and frequency analysis and the classification of patterns associated with the electrical brain activity. The methods used for the analysis and classification of the signal have been tested separately, by using a database that is accessible to the public, besides to a comparison among classifiers in order to know the best performing.

Keywords: affective computing, interface, brain, intelligent interaction

Procedia PDF Downloads 358
2783 Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns

Authors: Christian Arcos, Marley Vellasco, Abraham Alcaim

Abstract:

In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario.

Keywords: binary labels, local binary patterns, mask, wavelet coefficients, speech enhancement, speech recognition

Procedia PDF Downloads 200
2782 Application of the Bionic Wavelet Transform and Psycho-Acoustic Model for Speech Compression

Authors: Chafik Barnoussi, Mourad Talbi, Adnane Cherif

Abstract:

In this paper we propose a new speech compression system based on the application of the Bionic Wavelet Transform (BWT) combined with the psychoacoustic model. This compression system is a modified version of the compression system using a MDCT (Modified Discrete Cosine Transform) filter banks of 32 filters each and the psychoacoustic model. This modification consists in replacing the banks of the MDCT filter banks by the bionic wavelet coefficients which are obtained from the application of the BWT to the speech signal to be compressed. These two methods are evaluated and compared with each other by computing bits before and bits after compression. They are tested on different speech signals and the obtained simulation results show that the proposed technique outperforms the second technique and this in term of compressed file size. In term of SNR, PSNR and NRMSE, the outputs speech signals of the proposed compression system are with acceptable quality. In term of PESQ and speech signal intelligibility, the proposed speech compression technique permits to obtain reconstructed speech signals with good quality.

Keywords: speech compression, bionic wavelet transform, filterbanks, psychoacoustic model

Procedia PDF Downloads 358
2781 A Pervasive System Architecture for Smart Environments in Internet of Things Context

Authors: Patrick Santos, João Casal, João Santos Luis Varandas, Tiago Alves, Carlos Romeiro, Sérgio Lourenço

Abstract:

Nowadays, technology makes it possible to, in one hand, communicate with various objects of the daily life through the Internet, and in the other, put these objects interacting with each other through this channel. Simultaneously, with the raise of smartphones as the most ubiquitous technology on persons lives, emerge new agents for these devices - Intelligent Personal Assistants. These agents have the goal of helping the user manage and organize his information as well as supporting the user in his/her day-to-day tasks. Moreover, other emergent concept is the Cloud Computing, which allows computation and storage to get out of the users devices, bringing benefits in terms of performance, security, interoperability and others. Connecting these three paradigms, in this work we propose an architecture for an intelligent system which provides an interface that assists the user on smart environments, informing, suggesting actions and allowing to manage the objects of his/her daily life.

Keywords: internet of things, cloud, intelligent personal assistant, architecture

Procedia PDF Downloads 483
2780 Interactive Multiple Functions User Interface

Authors: Manjit Singh Sidhu, Waleed Maqableh, Jee Geak Ying

Abstract:

Tangible user interfaces (TUI) that employ markers in the augmented reality (AR) environment has hampered the interactivity between the user and the software application. This is because the user lacks focus on visualizing the contents due to the interaction mechanisms whereby multiple markers may need to be used to perform a particular function. In this research, we have designed a novel TUI user interface where multiple functions could be triggered similar to a natural keyboard thus allowing user to focus more on its digital contents such as 2D/3D, text input, animation and sound. Test results of the user interface with potential users and HCI experts revealed that the multiple functions user interface was new, preferred and appreciated more as opposed to marker based user interface.

Keywords: multimedia, augmented reality, engineering, user interface, visualization

Procedia PDF Downloads 419
2779 Hate Speech Detection Using Deep Learning and Machine Learning Models

Authors: Nabil Shawkat, Jamil Saquer

Abstract:

Social media has accelerated our ability to engage with others and eliminated many communication barriers. On the other hand, the widespread use of social media resulted in an increase in online hate speech. This has drastic impacts on vulnerable individuals and societies. Therefore, it is critical to detect hate speech to prevent innocent users and vulnerable communities from becoming victims of hate speech. We investigate the performance of different deep learning and machine learning algorithms on three different datasets. Our results show that the BERT model gives the best performance among all the models by achieving an F1-score of 90.6% on one of the datasets and F1-scores of 89.7% and 88.2% on the other two datasets.

Keywords: hate speech, machine learning, deep learning, abusive words, social media, text classification

Procedia PDF Downloads 105
2778 Co-Design of Accessible Speech Recognition for Users with Dysarthric Speech

Authors: Elizabeth Howarth, Dawn Green, Sean Connolly, Geena Vabulas, Sara Smolley

Abstract:

Through the EU Horizon 2020 Nuvoic Project, the project team recruited 70 individuals in the UK and Ireland to test the Voiceitt speech recognition app and provide user feedback to developers. The app is designed for people with dysarthric speech, to support communication with unfamiliar people and access to speech-driven technologies such as smart home equipment and smart assistants. Participants with atypical speech, due to a range of conditions such as cerebral palsy, acquired brain injury, Down syndrome, stroke and hearing impairment, were recruited, primarily through organisations supporting disabled people. Most had physical or learning disabilities in addition to dysarthric speech. The project team worked with individuals, their families and local support teams, to provide access to the app, including through additional assistive technologies where needed. Testing was user-led, with participants asked to identify and test use cases most relevant to their daily lives over a period of three months or more. Ongoing technical support and training were provided remotely and in-person throughout the testing period. Structured interviews were used to collect feedback on users' experiences, with delivery adapted to individuals' needs and preferences. Informal feedback was collected through ongoing contact between participants, their families and support teams and the project team. Focus groups were held to collect feedback on specific design proposals. User feedback shared with developers has led to improvements to the user interface and functionality, including faster voice training, simplified navigation, the introduction of gamification elements and of switch access as an alternative to touchscreen access, with other feature requests from users still in development. This work offers a case-study in successful and inclusive co-design with the disabled community.

Keywords: co-design, assistive technology, dysarthria, inclusive speech recognition

Procedia PDF Downloads 82
2777 Speech Intelligibility Improvement Using Variable Level Decomposition DWT

Authors: Samba Raju, Chiluveru, Manoj Tripathy

Abstract:

Intelligibility is an essential characteristic of a speech signal, which is used to help in the understanding of information in speech signal. Background noise in the environment can deteriorate the intelligibility of a recorded speech. In this paper, we presented a simple variance subtracted - variable level discrete wavelet transform, which improve the intelligibility of speech. The proposed algorithm does not require an explicit estimation of noise, i.e., prior knowledge of the noise; hence, it is easy to implement, and it reduces the computational burden. The proposed algorithm decides a separate decomposition level for each frame based on signal dominant and dominant noise criteria. The performance of the proposed algorithm is evaluated with speech intelligibility measure (STOI), and results obtained are compared with Universal Discrete Wavelet Transform (DWT) thresholding and Minimum Mean Square Error (MMSE) methods. The experimental results revealed that the proposed scheme outperformed competing methods

Keywords: discrete wavelet transform, speech intelligibility, STOI, standard deviation

Procedia PDF Downloads 118
2776 The Language Use of Middle Eastern Freedom Activists' Speeches: A Gender Perspective

Authors: Sulistyaningtyas

Abstract:

Examining the role of Middle Eastern freedom activists’ speech based on gender perspective is considered noteworthy because the society in the Middle East is patriarchal. This research aims to examine the language use of the Middle Eastern freedom activists’ speeches through gender perspective. The data sources are from male and female Middle Eastern freedom activists’ speech videos. In analyzing the data, the theories employed are about Language Style from Gender Perspective and The Language for Speech. The result reveals that there are sets of spoken language differences between male and female speakers. In using the language for speech, both male and female speakers produce metaphor, euphemism, the ‘rule of three’, parallelism, and pronouns in random frequency of production, which cannot be separated by genders. Moreover, it cannot be concluded that one gender is more potential than the other to influence the audience in delivering speech. There are other factors, particularly non-verbal factors, existing to give impacts on how a speech can influence the audience.

Keywords: gender perspective, language use, Middle Eastern freedom activists, speech

Procedia PDF Downloads 400
2775 Evaluation of the Matching Optimization of Human-Machine Interface Matching in the Cab

Authors: Yanhua Ma, Lu Zhai, Xinchen Wang, Hongyu Liang

Abstract:

In this paper, by understanding the development status of the human-machine interface in today's automobile cab, a subjective and objective evaluation system for evaluating the optimization of human-machine interface matching in automobile cab was established. The man-machine interface of the car cab was divided into a software interface and a hard interface. Objective evaluation method of software human factor analysis is used to evaluate the hard interface matching; The analytic hierarchy process is used to establish the evaluation index system for the software interface matching optimization, and the multi-level fuzzy comprehensive evaluation method is used to evaluate hard interface machine. This article takes Dongfeng Sokon (DFSK) C37 model automobile as an example. The evaluation method given in the paper is used to carry out relevant analysis and evaluation, and corresponding optimization suggestions are given, which have certain reference value for designers.

Keywords: analytic hierarchy process, fuzzy comprehension evaluation method, human-machine interface, matching optimization, software human factor analysis

Procedia PDF Downloads 119
2774 Considering Cultural and Linguistic Variables When Working as a Speech-Language Pathologist with Multicultural Students

Authors: Gabriela Smeckova

Abstract:

The entire world is becoming more and more diverse. The reasons why people migrate are different and unique for each family /individual. Professionals delivering services (including speech-language pathologists) must be prepared to work with clients coming from different cultural and/or linguistic backgrounds. Well-educated speech-language pathologists will consider many factors when delivering services. Some of them will be discussed during the presentation (language spoken, beliefs about health care and disabilities, reasons for immigration, etc.). The communication styles of the client can be different than the styles of the speech-language pathologist. The goal is to become culturally responsive in service delivery.

Keywords: culture, cultural competence, culturallly responsive practices, speech-language pathologist, cultural and linguistical variables, communication styles

Procedia PDF Downloads 49
2773 English Learning Speech Assistant Speak Application in Artificial Intelligence

Authors: Albatool Al Abdulwahid, Bayan Shakally, Mariam Mohamed, Wed Almokri

Abstract:

Artificial intelligence has infiltrated every part of our life and every field we can think of. With technical developments, artificial intelligence applications are becoming more prevalent. We chose ELSA speak because it is a magnificent example of Artificial intelligent applications, ELSA speak is a smartphone application that is free to download on both IOS and Android smartphones. ELSA speak utilizes artificial intelligence to help non-native English speakers pronounce words and phrases similar to a native speaker, as well as enhance their English skills. It employs speech-recognition technology that aids the application to excel the pronunciation of its users. This remarkable feature distinguishes ELSA from other voice recognition algorithms and increase the efficiency of the application. This study focused on evaluating ELSA speak application, by testing the degree of effectiveness based on survey questions. The results of the questionnaire were variable. The generality of the participants strongly agreed that ELSA has helped them enhance their pronunciation skills. However, a few participants were unconfident about the application’s ability to assist them in their learning journey.

Keywords: ELSA speak application, artificial intelligence, speech-recognition technology, language learning, english pronunciation

Procedia PDF Downloads 76
2772 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 53
2771 Effect of Noise Reduction Algorithms on Temporal Splitting of Speech Signal to Improve Speech Perception for Binaural Hearing Aids

Authors: Rajani S. Pujar, Pandurangarao N. Kulkarni

Abstract:

Increased temporal masking affects the speech perception in persons with sensorineural hearing impairment especially under adverse listening conditions. This paper presents a cascaded scheme, which employs a noise reduction algorithm as well as temporal splitting of the speech signal. Earlier investigations have shown that by splitting the speech temporally and presenting alternate segments to the two ears help in reducing the effect of temporal masking. In this technique, the speech signal is processed by two fading functions, complementary to each other, and presented to left and right ears for binaural dichotic presentation. In the present study, half cosine signal is used as a fading function with crossover gain of 6 dB for the perceptual balance of loudness. Temporal splitting is combined with noise reduction algorithm to improve speech perception in the background noise. Two noise reduction schemes, namely spectral subtraction and Wiener filter are used. Listening tests were conducted on six normal-hearing subjects, with sensorineural loss simulated by adding broadband noise to the speech signal at different signal-to-noise ratios (∞, 3, 0, and -3 dB). Objective evaluation using PESQ was also carried out. The MOS score for VCV syllable /asha/ for SNR values of ∞, 3, 0, and -3 dB were 5, 4.46, 4.4 and 4.05 respectively, while the corresponding MOS scores for unprocessed speech were 5, 1.2, 0.9 and 0.65, indicating significant improvement in the perceived speech quality for the proposed scheme compared to the unprocessed speech.

Keywords: MOS, PESQ, spectral subtraction, temporal splitting, wiener filter

Procedia PDF Downloads 305
2770 Delamination of Scale in a Fe Carbon Steel Surface by Effect of Interface Roughness and Oxide Scale Thickness

Authors: J. M. Lee, W. R. Noh, C. Y. Kim, M. G. Lee

Abstract:

Delamination of oxide scale has been often discovered at the interface between Fe carbon steel and oxide scale. Among several mechanisms of this delamination behavior, the normal tensile stress to the substrate-scale interface has been described as one of the main factors. The stress distribution at the interface is also known to be affected by thermal expansion mismatch between substrate and oxide scale, creep behavior during cooling and the geometry of the interface. In this study, stress states near the interface in a Fe carbon steel with oxide scale have been investigated using FE simulations. The thermal and mechanical properties of oxide scales are indicated in literature and Fe carbon steel is measured using tensile testing machine. In particular, the normal and shear stress components developed at the interface during bending are investigated. Preliminary numerical sensitivity analyses are provided to explain the effects of the interface geometry and oxide thickness on the delamination behavior.

Keywords: oxide scale, delamination, Fe analysis, roughness, thickness, stress state

Procedia PDF Downloads 319
2769 Design of Intelligent Scaffolding Learning Management System for Vocational Education

Authors: Seree Chadcham, Niphon Sukvilai

Abstract:

This study is the research and development which is intended to: 1) design of the Intelligent Scaffolding Learning Management System (ISLMS) for vocational education, 2) assess the suitability of the Design of Intelligent Scaffolding Learning Management System for Vocational Education. Its methods are divided into 2 phases. Phase 1 is the design of the ISLMS for Vocational Education and phase 2 is the assessment of the suitability of the design. The samples used in this study are work done by 15 professionals in the field of Intelligent Scaffolding, Learning Management System, Vocational Education, and Information and Communication Technology in education selected using the purposive sampling method. Data analyzed by arithmetic mean and standard deviation. The results showed that the ISLMS for vocational education consists of 2 main components which are: 1) the Intelligent Learning Management System for Vocational Education, 2) the Intelligent Scaffolding Management System. The result of the system suitability assessment from the professionals is in the highest range.

Keywords: intelligent, scaffolding, learning management system, vocational education

Procedia PDF Downloads 771
2768 Efficacy of a Wiener Filter Based Technique for Speech Enhancement in Hearing Aids

Authors: Ajish K. Abraham

Abstract:

Hearing aid is the most fundamental technology employed towards rehabilitation of persons with sensory neural hearing impairment. Hearing in noise is still a matter of major concern for many hearing aid users and thus continues to be a challenging issue for the hearing aid designers. Several techniques are being currently used to enhance the speech at the hearing aid output. Most of these techniques, when implemented, result in reduction of intelligibility of the speech signal. Thus the dissatisfaction of the hearing aid user towards comprehending the desired speech amidst noise is prevailing. Multichannel Wiener Filter is widely implemented in binaural hearing aid technology for noise reduction. In this study, Wiener filter based noise reduction approach is experimented for a single microphone based hearing aid set up. This method checks the status of the input speech signal in each frequency band and then selects the relevant noise reduction procedure. Results showed that the Wiener filter based algorithm is capable of enhancing speech even when the input acoustic signal has a very low Signal to Noise Ratio (SNR). Performance of the algorithm was compared with other similar algorithms on the basis of improvement in intelligibility and SNR of the output, at different SNR levels of the input speech. Wiener filter based algorithm provided significant improvement in SNR and intelligibility compared to other techniques.

Keywords: hearing aid output speech, noise reduction, SNR improvement, Wiener filter, speech enhancement

Procedia PDF Downloads 228
2767 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children

Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman

Abstract:

Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.

Keywords: Automatic Speech Recognition System, children speech, adaptation, Malay

Procedia PDF Downloads 368
2766 The Complaint Speech Act Set Produced by Arab Students in the UAE

Authors: Tanju Deveci

Abstract:

It appears that the speech act of complaint has not received as much attention as other speech acts. However, the face-threatening nature of this speech act requires a special attention in multicultural contexts in particular. The teaching context in the UAE universities, where a big majority of teaching staff comes from other cultures, requires investigations into this speech act in order to improve communication between students and faculty. This session will outline the results of a study conducted with this purpose. The realization of complaints by Freshman English students in Communication courses at Petroleum Institute was investigated to identify communication patterns that seem to cause a strain. Data were collected using a role-play between a teacher and students, and a judgment scale completed by two of the instructors in the Communications Department. The initial findings reveal that the students had difficulty putting their case, produced the speech act of criticism along with a complaint and that they produced both requests and demands as candidate solutions. The judgement scales revealed that the students’ attitude was not appropriate most of the time and that the judges would behave differently from students. It is concluded that speech acts, in general, and complaint, in particular, need to be taught to learners explicitly to improve interpersonal communication in multicultural societies. Some teaching ideas are provided to help increase foreign language learners’ sociolinguistic competence.

Keywords: speech act, complaint, pragmatics, sociolinguistics, language teaching

Procedia PDF Downloads 482
2765 A Three Tier Secure KQML Interface with Novel Performatives

Authors: Dimple Juneja, Aarti Singh, Renu Hooda

Abstract:

Knowledge Query Manipulation Language (KQML) and FIPA ACL are two prime communication languages existing in multi agent systems (MAS). Both languages are more or less similar in terms of semantics (based on speech act theory) and offer cutting edge competition while establishing agent communication across Internet. In contrast to the fact that software agents operating on the internet are required to be more safeguarded from their counter-peer, both protocols lack security performatives. The paper proposes a three tier security interface with few novel security related performatives enhancing the basic architecture of KQML. The three levels are attestation, certification and trust establishment which enforces a tight security and hence reduces the security breeches.

Keywords: multiagent systems, KQML, FIPA ACL, performatives

Procedia PDF Downloads 390