Search results for: intelligent speech interface
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2824

Search results for: intelligent speech interface

2764 On Overcoming Common Oral Speech Problems through Authentic Films

Authors: Tamara Matevosyan

Abstract:

The present paper discusses the main problems that students face while developing oral skills through authentic films. It states that special attention should be paid not only to the study of verbal speech but also to non-verbal communication. Authentic films serve as an important tool to understand both native speaker’s gestures and their culture of pausing while speaking. Various phonetic difficulties causing phonetic interference in actual speech are covered in the paper emphasizing the role of authentic films in overcoming them.

Keywords: compressive speech, filled pauses, unfilled pauses, pausing culture

Procedia PDF Downloads 321
2763 Intelligent IT Infrastructure in the Gas and Oil Industry

Authors: Ahmad Fahad Alotaibi, Khalid Hamed Hajri, Humoud Hudiban Rashidi

Abstract:

Intelligent information technology infrastructure is considered one of the enablers to enhance digital transformation in the gas and oil fields to optimize IT infrastructure reliability by supporting operations and maintenance in a safe and secure method to optimize resources. Smart IT buildings, communication rooms and shelters with intelligent technologies can strengthen the performance and profitability of gas and oil companies by ensuring business continuity. This paper describes the advantages of deploying intelligent IT infrastructure in the oil and gas industry by illustrating its positive impacts on some development aspects, for instance, operations, maintenance, safety, security and resource optimization. Moreover, it highlights the challenges and difficulties of providing smart IT services in a remote area and proposes solutions to overcome such difficulties.

Keywords: intelligent IT infrastructure, remote areas, oil and gas field, digitalization

Procedia PDF Downloads 28
2762 An Intelligent Cloud Radio Access Network (RAN) Architecture for Future 5G Heterogeneous Wireless Network

Authors: Jin Xu

Abstract:

5G network developers need to satisfy the necessary requirements of additional capacity from massive users and spectrally efficient wireless technologies. Therefore, the significant amount of underutilized spectrum in network is motivating operators to combine long-term evolution (LTE) with intelligent spectrum management technology. This new LTE intelligent spectrum management in unlicensed band (LTE-U) has the physical layer topology to access spectrum, specifically the 5-GHz band. We proposed a new intelligent cloud RAN for 5G.

Keywords: cloud radio access network, wireless network, cloud computing, multi-agent

Procedia PDF Downloads 395
2761 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 266
2760 The Convolution Recurrent Network of Using Residual LSTM to Process the Output of the Downsampling for Monaural Speech Enhancement

Authors: Shibo Wei, Ting Jiang

Abstract:

Convolutional-recurrent neural networks (CRN) have achieved much success recently in the speech enhancement field. The common processing method is to use the convolution layer to compress the feature space by multiple upsampling and then model the compressed features with the LSTM layer. At last, the enhanced speech is obtained by deconvolution operation to integrate the global information of the speech sequence. However, the feature space compression process may cause the loss of information, so we propose to model the upsampling result of each step with the residual LSTM layer, then join it with the output of the deconvolution layer and input them to the next deconvolution layer, by this way, we want to integrate the global information of speech sequence better. The experimental results show the network model (RES-CRN) we introduce can achieve better performance than LSTM without residual and overlaying LSTM simply in the original CRN in terms of scale-invariant signal-to-distortion ratio (SI-SNR), speech quality (PESQ), and intelligibility (STOI).

Keywords: convolutional-recurrent neural networks, speech enhancement, residual LSTM, SI-SNR

Procedia PDF Downloads 174
2759 An Approach towards Intelligent Urbanism in New Communities

Authors: Sherine Shafik Aly, Farida Ahmed El Mallah

Abstract:

Technology is a quoted keyword nowadays in all fields; it has been recently thought of and integrated into urban development. This research explains the role of technology in establishing intelligent urbanism to create a convivial and sustainable environment for people to live in. Cities are downgrading socially, economically and environmentally. A framework is to be developed where these three pillars are involved in the planning, design, and spreading of technology to create convivial environments. The aim of this research is achieved by highlighting the importance and approaches of intelligent urbanism, it’s characteristics and principles, then analyzing some relevant examples to achieve a set of guidelines.

Keywords: convivial, intelligent, technology, urban development

Procedia PDF Downloads 226
2758 Detection of Clipped Fragments in Speech Signals

Authors: Sergei Aleinik, Yuri Matveev

Abstract:

In this paper a novel method for the detection of clipping in speech signals is described. It is shown that the new method has better performance than known clipping detection methods, is easy to implement, and is robust to changes in signal amplitude, size of data, etc. Statistical simulation results are presented.

Keywords: clipping, clipped signal, speech signal processing, digital signal processing

Procedia PDF Downloads 369
2757 Developing an Intonation Labeled Dataset for Hindi

Authors: Esha Banerjee, Atul Kumar Ojha, Girish Nath Jha

Abstract:

This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat.

Keywords: speech dataset, Hindi, intonation, labeled corpus

Procedia PDF Downloads 166
2756 Intelligent System of the Grinding Robot for Spiral Welded Pipe

Authors: Getachew Demeissie Ayalew, Yongtao Sun, Yang Yang

Abstract:

The spiral welded pipe manufacturing industry requires strict production standards for automated grinders for welding seams. However, traditional grinding machines in this sector are insufficient due to a lack of quality control protocols and inconsistent performance. This research aims to improve the quality of spiral welded pipes by developing intelligent automated abrasive belt grinding equipment. The system has equipped with six degrees of freedom (6 DOF) KUKA KR360 industrial robots, enabling concurrent grinding operations on both internal and external welds. The grinding robot control system is designed with a PLC, and a human-machine interface (HMI) system is employed for operations. The system includes an electric speed controller, data connection card, DC driver, analog amplifier, and HMI for input data. This control system enables the grinding of spiral welded pipe. It ensures consistent production quality and cost-effectiveness by reducing the product life cycle and minimizing risks in the working environment.

Keywords: Intelligent Systems, Spiral Welded Pipe, Grinding, Industrial Robot, End-Effector, PLC Controller System, 3D Laser Sensor, HMI.

Procedia PDF Downloads 251
2755 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 154
2754 The Philippines’ War on Drugs: a Pragmatic Analysis on Duterte's Commemorative Speeches

Authors: Ericson O. Alieto, Aprillete C. Devanadera

Abstract:

The main objective of the study is to determine the dominant speech acts in five commemorative speeches of President Duterte. This study employed Speech Act Theory and Discourse analysis to determine how the speech acts features connote the pragmatic meaning of Duterte’s speeches. Identifying the speech acts is significant in elucidating the underlying message or the pragmatic meaning of the speeches. From the 713 sentences or utterances from the speeches, assertive with 208 occurrences from the corpus or 29% is the dominant speech acts. It was followed by expressive with 177 or 25% occurrences, directive accounts for 152 or 15% occurrences. While commisive accounts for 104 or 15% occurrences and declarative got the lowest percentage of occurrences with 72 or 10% only. These sentences when uttered by Duterte carry a certain power of language to move or influence people. Thus, the present study shows the fundamental message perceived by the listeners. Moreover, the frequent use of assertive and expressive not only explains the pragmatic message of the speeches but also reflects the personality of President Duterte.

Keywords: commemorative speech, discourse analysis, duterte, pragmatics

Procedia PDF Downloads 257
2753 Excitation Modeling for Hidden Markov Model-Based Speech Synthesis Based on Wavelet Analysis

Authors: M. Kiran Reddy, K. Sreenivasa Rao

Abstract:

The conventional Hidden Markov Model (HMM)-based speech synthesis system (HTS) uses only a pulse excitation model, which significantly differs from natural excitation signal. Hence, buzziness can be perceived in the speech generated using HTS. This paper proposes an efficient excitation modeling method that can significantly reduce the buzziness, and improve the quality of HMM-based speech synthesis. The proposed approach models the pitch-synchronous residual frames extracted from the residual excitation signal. Each pitch synchronous residual frame is parameterized using 30 wavelet coefficients. These 30 wavelet coefficients are found to accurately capture the perceptually important information present in the residual waveform. In synthesis phase, the residual frames are reconstructed from the generated wavelet coefficients and are pitch-synchronously overlap-added to generate the excitation signal. The proposed excitation modeling method is integrated into HMM-based speech synthesis system. Evaluation results indicate that the speech synthesized by the proposed excitation model is significantly better than the speech generated using state-of-the-art excitation modeling methods.

Keywords: excitation modeling, hidden Markov models, pitch-synchronous frames, speech synthesis, wavelet coefficients

Procedia PDF Downloads 224
2752 Electronic States at SnO/SnO2 Heterointerfaces

Authors: A. Albar, U. Schwingenschlogel

Abstract:

Device applications of transparent conducting oxides require a thorough understanding of the physical and chemical properties of the involved interfaces. We use ab-initio calculations within density functional theory to investigate the electronic states at the SnO/SnO2 hetero-interface. Tin dioxide and monoxide are transparent materials with high n-type and p-type mobilities, respectively. This work aims at exploring the modifications of the electronic states, in particular the charge transfer, in the vicinity of the hetero-interface. The (110) interface is modeled by a super-cell approach in order to minimize the mismatch between the lattice parameters of the two compounds. We discuss the electronic density of states as a function of the distance to the interface.

Keywords: density of states, ab-initio calculations, interface states, charge transfer

Procedia PDF Downloads 385
2751 Green Thumb Engineering - Explainable Artificial Intelligence for Managing IoT Enabled Houseplants

Authors: Antti Nurminen, Avleen Malhi

Abstract:

Significant progress in intelligent systems in combination with exceedingly wide application domains having machine learning as the core technology are usually opaque, non-intuitive, and commonly complex for human users. We use innovative IoT technology which monitors and analyzes moisture, humidity, luminosity and temperature levels to assist end users for optimization of environmental conditions for their houseplants. For plant health monitoring, we construct a system yielding the Normalized Difference Vegetation Index (NDVI), supported by visual validation by users. We run the system for a selected plant, basil, in varying environmental conditions to cater for typical home conditions, and bootstrap our AI with the acquired data. For end users, we implement a web based user interface which provides both instructions and explanations.

Keywords: explainable artificial intelligence, intelligent agent, IoT, NDVI

Procedia PDF Downloads 136
2750 Theory and Practice of Wavelets in Signal Processing

Authors: Jalal Karam

Abstract:

The methods of Fourier, Laplace, and Wavelet Transforms provide transfer functions and relationships between the input and the output signals in linear time invariant systems. This paper shows the equivalence among these three methods and in each case presenting an application of the appropriate (Fourier, Laplace or Wavelet) to the convolution theorem. In addition, it is shown that the same holds for a direct integration method. The Biorthogonal wavelets Bior3.5 and Bior3.9 are examined and the zeros distribution of their polynomials associated filters are located. This paper also presents the significance of utilizing wavelets as effective tools in processing speech signals for common multimedia applications in general, and for recognition and compression in particular. Theoretically and practically, wavelets have proved to be effective and competitive. The practical use of the Continuous Wavelet Transform (CWT) in processing and analysis of speech is then presented along with explanations of how the human ear can be thought of as a natural wavelet transformer of speech. This generates a variety of approaches for applying the (CWT) to many paradigms analysing speech, sound and music. For perception, the flexibility of implementation of this transform allows the construction of numerous scales and we include two of them. Results for speech recognition and speech compression are then included.

Keywords: continuous wavelet transform, biorthogonal wavelets, speech perception, recognition and compression

Procedia PDF Downloads 381
2749 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Bankole Felix, Tomio Takara

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation, but neither is shown in orthography. In this paper, to proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test, and we achieved an average Mean Opinion Score (MOS) 3.4 (68%), which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: amharic, gemination, Speech synthesis, morphology, epenthesis

Procedia PDF Downloads 58
2748 Hate Speech Detection Using Machine Learning: A Survey

Authors: Edemealem Desalegn Kingawa, Kafte Tasew Timkete, Mekashaw Girmaw Abebe, Terefe Feyisa, Abiyot Bitew Mihretie, Senait Teklemarkos Haile

Abstract:

Currently, hate speech is a growing challenge for society, individuals, policymakers, and researchers, as social media platforms make it easy to anonymously create and grow online friends and followers and provide an online forum for debate about specific issues of community life, culture, politics, and others. Despite this, research on identifying and detecting hate speech is not satisfactory performance, and this is why future research on this issue is constantly called for. This paper provides a systematic review of the literature in this field, with a focus on approaches like word embedding techniques, machine learning, deep learning technologies, hate speech terminology, and other state-of-the-art technologies with challenges. In this paper, we have made a systematic review of the last six years of literature from Research Gate and Google Scholar. Furthermore, limitations, along with algorithm selection and use challenges, data collection, and cleaning challenges, and future research directions, are discussed in detail.

Keywords: Amharic hate speech, deep learning approach, hate speech detection review, Afaan Oromo hate speech detection

Procedia PDF Downloads 143
2747 An Intelligent Decision Support System Approach for New Product Development by Using QFD and Its Application in Metal Plating Industry

Authors: Ufuk Cebeci, Onur Doğan

Abstract:

New product becomes critical in competitive environment shortening a product's lifecycle due to the rapidly changing technology and increasing consumer requirements. Quality Function Deployment is one of the first steps of NPD process. The study presents an intelligent QFD application in metal plating industry. For application, an intelligent decision support system was developed. By intelligent system, house of quality was drawn and some calculations were shown. According to the results, some recommendations are given to end user. One of the purposes of this system is to give some advices to firms which do not know technical details of QFD and guide them about first steps of the new product development process.

Keywords: intelligent decision support systems, metal plating, quality function deployment, QFD software, new product development

Procedia PDF Downloads 371
2746 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Felix Bankole, Tomio Takara, Girma Mamo

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. In this paper, we proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions, and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test and we achieved an average Mean Opinion Score (MOS) 3.4 (68%) which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: Amharic, gemination, speech synthesis, morphology, epenthesis

Procedia PDF Downloads 54
2745 Systemic Functional Grammar Analysis of Barack Obama's Second Term Inaugural Speech

Authors: Sadiq Aminu, Ahmed Lamido

Abstract:

This research studies Barack Obama’s second inaugural speech using Halliday’s Systemic Functional Grammar (SFG). SFG is a text grammar which describes how language is used, so that the meaning of the text can be better understood. The primary source of data in this research work is Barack Obama’s second inaugural speech which was obtained from the internet. The analysis of the speech was based on the ideational and textual metafunctions of Systemic Functional Grammar. Specifically, the researcher analyses the Process Types and Participants (ideational) and the Theme/Rheme (textual). It was found that material process (process of doing) was the most frequently used ‘Process type’ and ‘We’ which refers to the people of America was the frequently used ‘Theme’. Application of the SFG theory, therefore, gives a better meaning to Barack Obama’s speech.

Keywords: ideational, metafunction, rheme, textual, theme

Procedia PDF Downloads 129
2744 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.

Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition

Procedia PDF Downloads 445
2743 IoT Based Approach to Healthcare System for a Quadriplegic Patient Using EEG

Authors: R. Gautam, P. Sastha Kanagasabai, G. N. Rathna

Abstract:

The proposed healthcare system enables quadriplegic patients, people with severe motor disabilities to send commands to electronic devices and monitor their vitals. The growth of Brain-Computer-Interface (BCI) has led to rapid development in 'assistive systems' for the disabled called 'assistive domotics'. Brain-Computer-Interface is capable of reading the brainwaves of an individual and analyse it to obtain some meaningful data. This processed data can be used to assist people having speech disorders and sometimes people with limited locomotion to communicate. In this Project, Emotiv EPOC Headset is used to obtain the electroencephalogram (EEG). The obtained data is processed to communicate pre-defined commands over the internet to the desired mobile phone user. Other Vital Information like the heartbeat, blood pressure, ECG and body temperature are monitored and uploaded to the server. Data analytics enables physicians to scan databases for a specific illness. The Data is processed in Intel Edison, system on chip (SoC). Patient metrics are displayed via Intel IoT Analytics cloud service.

Keywords: brain computer interface, Intel Edison, Emotiv EPOC, IoT analytics, electroencephalogram

Procedia PDF Downloads 166
2742 The Effect of Culture on User Interface Design of Social Media- A Case Study on Preferences of Saudi Arabian on the Arabic User Interface of Facebook

Authors: Hana Almakky, Reza Sahandi, Jacqui Taylor

Abstract:

Social media continue to grow, and user interfaces may become more appealing if cultural characteristics are incorporated into their design. Facebook was designed in the west, and the original language was English. Subsequently, the words in the user interface were translated to other languages, including Arabic. Arabic words are written from right to left, and English is written from left to right. The translated version may misrepresent the original design and users preferences may influence their culture, which should be considered in the user interface design. Previous research indicates that users are more comfortable when interacting with a user interface, which relates to their own culture. Therefore, this paper, using a survey investigates the preferences of Saudi Arabian on the Arabic version of user interface of Facebook.

Keywords: culture, social media, user interface design, Facebook, Saudi Arabia

Procedia PDF Downloads 369
2741 Element-Independent Implementation for Method of Lagrange Multipliers

Authors: Gil-Eon Jeong, Sung-Kie Youn, K. C. Park

Abstract:

Treatment for the non-matching interface is an important computational issue. To handle this problem, the method of Lagrange multipliers including classical and localized versions are the most popular technique. It essentially imposes the interface compatibility conditions by introducing Lagrange multipliers. However, the numerical system becomes unstable and inefficient due to the Lagrange multipliers. The interface element-independent formulation that does not include the Lagrange multipliers can be obtained by modifying the independent variables mathematically. Through this modification, more efficient and stable system can be achieved while involving equivalent accuracy comparing with the conventional method. A numerical example is conducted to verify the validity of the presented method.

Keywords: element-independent formulation, interface coupling, methods of Lagrange multipliers, non-matching interface

Procedia PDF Downloads 386
2740 A Survey on Intelligent Connected-Vehicle Applications Based on Intercommunication Techniques in Smart Cities

Authors: B. Karabuluter, O. Karaduman

Abstract:

Connected-Vehicles consists of intelligent vehicles, each of which can communicate with each other. Smart Cities are the most prominent application area of intelligent vehicles that can communicate with each other. The most important goal that is desired to be realized in Smart Cities planned for facilitating people's lives is to make transportation more comfortable and safe with intelligent/autonomous/driverless vehicles communicating with each other. In order to ensure these, the city must have communication infrastructure in the first place, and the vehicles must have the features to communicate with this infrastructure and with each other. In this context, intelligent transport studies to solve all transportation and traffic problems in classical cities continue to increase rapidly. In this study, current connected-vehicle applications developed for smart cities are considered in terms of communication techniques, vehicular networking, IoT, urban transportation implementations, intelligent traffic management, road safety, self driving. Taxonomies and assessments performed in the work show the trend of studies in inter-vehicle communication systems in smart cities and they are contributing to by ensuring that the requirements in this area are revealed.

Keywords: smart city, connected vehicles, infrastructures, VANET, wireless communication, intelligent traffic management

Procedia PDF Downloads 499
2739 Automatic Speech Recognition Systems Performance Evaluation Using Word Error Rate Method

Authors: João Rato, Nuno Costa

Abstract:

The human verbal communication is a two-way process which requires a mutual understanding that will result in some considerations. This kind of communication, also called dialogue, besides the supposed human agents it can also be performed between human agents and machines. The interaction between Men and Machines, by means of a natural language, has an important role concerning the improvement of the communication between each other. Aiming at knowing the performance of some speech recognition systems, this document shows the results of the accomplished tests according to the Word Error Rate evaluation method. Besides that, it is also given a set of information linked to the systems of Man-Machine communication. After this work has been made, conclusions were drawn regarding the Speech Recognition Systems, among which it can be mentioned their poor performance concerning the voice interpretation in noisy environments.

Keywords: automatic speech recognition, man-machine conversation, speech recognition, spoken dialogue systems, word error rate

Procedia PDF Downloads 294
2738 Multi-Granularity Feature Extraction and Optimization for Pathological Speech Intelligibility Evaluation

Authors: Chunying Fang, Haifeng Li, Lin Ma, Mancai Zhang

Abstract:

Speech intelligibility assessment is an important measure to evaluate the functional outcomes of surgical and non-surgical treatment, speech therapy and rehabilitation. The assessment of pathological speech plays an important role in assisting the experts. Pathological speech usually is non-stationary and mutational, in this paper, we describe a multi-granularity combined feature schemes, and which is optimized by hierarchical visual method. First of all, the difference granularity level pathological features are extracted which are BAFS (Basic acoustics feature set), local spectral characteristics MSCC (Mel s-transform cepstrum coefficients) and nonlinear dynamic characteristics based on chaotic analysis. Latterly, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96-dimensions.The experimental results denote that new features by support vector machine (SVM) has the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.

Keywords: pathological speech, multi-granularity feature, MSCC (Mel s-transform cepstrum coefficients), F-score, radar chart

Procedia PDF Downloads 261
2737 Status of Communication and Swallowing Therapy in Patient with a Tracheostomy

Authors: Ya-Hui Wang

Abstract:

Lower speech therapy rate of tracheostomized patient was noted in comparison with previous researches. This study is aim to shed light on the referral status of speech therapy in those patients in Taiwan. This study developed an analysis for the size and key characteristics of the population of tracheostomized in-patient in the Taiwan. Method: We analyzed National Healthcare Insurance data (The Collaboration Center of Health Information Application, CCHIA) from Jan 1 2010 to Dec 31 2010. Result: over ages 3, number of tracheostomized in-patient is directly proportional to age. A high service loading was observed in North region in comparison with other regions. Only 4.87% of the tracheostomized in-patients were referred for speech therapy, and 1.9% for swallow examination, 2.5% for communication evaluation.

Keywords: refer, speech therapy, training, rehabilitation

Procedia PDF Downloads 417
2736 Visual Speech Perception of Arabic Emphatics

Authors: Maha Saliba Foster

Abstract:

Speech perception has been recognized as a bi-sensory process involving the auditory and visual channels. Compared to the auditory modality, the contribution of the visual signal to speech perception is not very well understood. Studying how the visual modality affects speech recognition can have pedagogical implications in second language learning, as well as clinical application in speech therapy. The current investigation explores the potential effect of speech visual cues on the perception of Arabic emphatics (AEs). The corpus consists of 36 minimal pairs each containing two contrasting consonants, an AE versus a non-emphatic (NE). Movies of four Lebanese speakers were edited to allow perceivers to have partial view of facial regions: lips only, lips-cheeks, lips-chin, lips-cheeks-chin, lips-cheeks-chin-neck. In the absence of any auditory information and relying solely on visual speech, perceivers were above chance at correctly identifying AEs or NEs across vowel contexts; moreover, the models were able to predict the probability of perceivers’ accuracy in identifying some of the COIs produced by certain speakers; additionally, results showed an overlap between the measurements selected by the computer and those selected by human perceivers. The lack of significant face effect on the perception of AEs seems to point to the lips, present in all of the videos, as the most important and often sufficient facial feature for emphasis recognition. Future investigations will aim at refining the analyses of visual cues used by perceivers by using Principal Component Analysis and including time evolution of facial feature measurements.

Keywords: Arabic emphatics, machine learning, speech perception, visual speech perception

Procedia PDF Downloads 278
2735 Theoretical Analysis of Graded Interface CdS/CIGS Solar Cell

Authors: Hassane Ben Slimane, Dennai Benmoussa, Abderrachid Helmaoui

Abstract:

We have theoretically calculated the photovoltaic conversion efficiency of a graded interface CdS/CIGS solar cell, which can be experimentally fabricated. Because the conduction band discontinuity or spike in an abrupt heterojunction CdS/CIGS solar cell can hinder the separation of hole-electron by electric field, a graded interface layer is uses to eliminate the spike and reduces recombination in space charge region. This paper describes the role of the graded band gap interface layer in decreasing the performance of the heterojunction cell. By optimizing the thickness of the graded region, an improvement of conversion efficiency has been observed in comparison to the conventional CIGS system.

Keywords: heterojunction, solar cell, graded interface, CIGS

Procedia PDF Downloads 375