Search results for: Speech denoising
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 290

Search results for: Speech denoising

110 Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

Authors: Sandipan Chakroborty, Anindya Roy, Goutam Saha

Abstract:

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.

Keywords: Complementary Information, Filter Bank, GMM, IMFCC, MFCC, Speaker Identification, Speaker Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2294
109 Multi Switched Split Vector Quantizer

Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha

Abstract:

Vector quantization is a powerful tool for speech coding applications. This paper deals with LPC Coding of speech signals which uses a new technique called Multi Switched Split Vector Quantization, This is a hybrid of two product code vector quantization techniques namely the Multi stage vector quantization technique, and Switched split vector quantization technique,. Multi Switched Split Vector Quantization technique quantizes the linear predictive coefficients in terms of line spectral frequencies. From results it is proved that Multi Switched Split Vector Quantization provides better trade off between bitrate and spectral distortion performance, computational complexity and memory requirements when compared to Switched Split Vector Quantization, Multi stage vector quantization, and Split Vector Quantization techniques. By employing the switching technique at each stage of the vector quantizer the spectral distortion, computational complexity and memory requirements were greatly reduced. Spectral distortion was measured in dB, Computational complexity was measured in floating point operations (flops), and memory requirements was measured in (floats).

Keywords: Unconstrained vector quantization, Linear predictiveCoding, Split vector quantization, Multi stage vector quantization, Switched Split vector quantization, Line Spectral Frequencies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1740
108 Denoising by Spatial Domain Averaging for Wireless Local Area Network Terminal Localization

Authors: Diego Felix, Eugene Hyun, Michael McGuire, Mihai Sima

Abstract:

Terminal localization for indoor Wireless Local Area Networks (WLANs) is critical for the deployment of location-aware computing inside of buildings. A major challenge is obtaining high localization accuracy in presence of fluctuations of the received signal strength (RSS) measurements caused by multipath fading. This paper focuses on reducing the effect of the distance-varying noise by spatial filtering of the measured RSS. Two different survey point geometries are tested with the noise reduction technique: survey points arranged in sets of clusters and survey points uniformly distributed over the network area. The results show that the location accuracy improves by 16% when the filter is used and by 18% when the filter is applied to a clustered survey set as opposed to a straight-line survey set. The estimated locations are within 2 m of the true location, which indicates that clustering the survey points provides better localization accuracy due to superior noise removal.

Keywords: Position measurement, Wireless LAN, Radio navigation, Filtering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1520
107 Comparison among Various Question Generations for Decision Tree Based State Tying in Persian Language

Authors: Nasibeh Nasiri, Dawood Talebi Khanmiri

Abstract:

Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.

Keywords: Decision Tree, Markov Models, Speech Recognition, State Tying.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1722
106 Robust Statistics Based Algorithm to Remove Salt and Pepper Noise in Images

Authors: V.R.Vijaykumar, P.T.Vanathi, P.Kanagasabapathy, D.Ebenezer

Abstract:

In this paper, a robust statistics based filter to remove salt and pepper noise in digital images is presented. The function of the algorithm is to detect the corrupted pixels first since the impulse noise only affect certain pixels in the image and the remaining pixels are uncorrupted. The corrupted pixels are replaced by an estimated value using the proposed robust statistics based filter. The proposed method perform well in removing low to medium density impulse noise with detail preservation upto a noise density of 70% compared to standard median filter, weighted median filter, recursive weighted median filter, progressive switching median filter, signal dependent rank ordered mean filter, adaptive median filter and recently proposed decision based algorithm. The visual and quantitative results show the proposed algorithm outperforms in restoring the original image with superior preservation of edges and better suppression of impulse noise

Keywords: Image denoising, Nonlinear filter, Robust Statistics, and Salt and Pepper Noise.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201
105 Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms

Authors: Aqsa Ali, Aleem Mushtaq, Attaullah Memon, Monna

Abstract:

In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.

Keywords: American sign language, assistive hand gesture interpreter, human-machine interface, machine learning, sensing glove.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2731
104 Text-independent Speaker Identification Based on MAP Channel Compensation and Pitch-dependent Features

Authors: Jiqing Han, Rongchun Gao

Abstract:

One major source of performance decline in speaker recognition system is channel mismatch between training and testing. This paper focuses on improving channel robustness of speaker recognition system in two aspects of channel compensation technique and channel robust features. The system is text-independent speaker identification system based on two-stage recognition. In the aspect of channel compensation technique, this paper applies MAP (Maximum A Posterior Probability) channel compensation technique, which was used in speech recognition, to speaker recognition system. In the aspect of channel robust features, this paper introduces pitch-dependent features and pitch-dependent speaker model for the second stage recognition. Based on the first stage recognition to testing speech using GMM (Gaussian Mixture Model), the system uses GMM scores to decide if it needs to be recognized again. If it needs to, the system selects a few speakers from all of the speakers who participate in the first stage recognition for the second stage recognition. For each selected speaker, the system obtains 3 pitch-dependent results from his pitch-dependent speaker model, and then uses ANN (Artificial Neural Network) to unite the 3 pitch-dependent results and 1 GMM score for getting a fused result. The system makes the second stage recognition based on these fused results. The experiments show that the correct rate of two-stage recognition system based on MAP channel compensation technique and pitch-dependent features is 41.7% better than the baseline system for closed-set test.

Keywords: Channel Compensation, Channel Robustness, MAP, Speaker Identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544
103 Teaching Turn-Taking Rules and Pragmatic Principles to Empower EFL Students and Enhance Their Learning in Speaking Modules

Authors: O. F. Elkommos

Abstract:

Teaching and learning EFL speaking modules is one of the most challenging productive modules for both instructors and learners. In a student-centered interactive communicative language teaching approach, learners and instructors should be aware of the fact that the target language must be taught as/for communication. The student must be empowered by tools that will work on more than one level of their communicative competence. Communicative learning will need a teaching and learning methodology that will address the goal. Teaching turn-taking rules, pragmatic principles and speech acts will enhance students' sociolinguistic competence, strategic competence together with discourse competence. Sociolinguistic competence entails the mastering of speech act conventions and illocutionary acts of refusing, agreeing/disagreeing; emotive acts like, thanking, apologizing, inviting, offering; directives like, ordering, requesting, advising, and hinting, among others. Strategic competence includes enlightening students’ consciousness of the various particular turn-taking systemic rules of organizing techniques of opening and closing conversation, adjacency pairs, interrupting, back-channeling, asking for/giving opinion, agreeing/disagreeing, using natural fillers for pauses, gaps, speaker select, self-select, and silence among others. Students will have the tools to manage a conversation. Students are engaged in opportunities of experiencing the natural language not as a mere extra student talking time but rather an empowerment of knowing and using the strategies. They will have the component items they need to use as well as the opportunity to communicate in the target language using topics of their interest and choice. This enhances students' communicative abilities. Available websites and textbooks now use one or more of these tools of turn-taking or pragmatics. These will be students' support in self-study in their independent learning study hours. This will be their reinforcement practice on e-Learning interactive activities. The students' target is to be able to communicate the intended meaning to an addressee that is in turn able to infer that intended meaning. The combination of these tools will be assertive and encouraging to the student to beat the struggle with what to say, how to say it, and when to say it. Teaching the rules, principles and techniques is an act of awareness raising method engaging students in activities that will lead to their pragmatic discourse competence. The aim of the paper is to show how the suggested pragmatic model will empower students with tools and systems that would support their learning. Supporting students with turn taking rules, speech act theory, applying both to texts and practical analysis and using it in speaking classes empowers students’ pragmatic discourse competence and assists them to understand language and its context. They become more spontaneous and ready to learn the discourse pragmatic dimension of the speaking techniques and suitable content. Students showed a better performance and a good motivation to learn. The model is therefore suggested for speaking modules in EFL classes.

Keywords: Communicative competence, EFL, empowering learners, enhance learning, speech acts, teaching speaking, turn-taking, learner centered, pragmatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403
102 An Automatic Pipeline Monitoring System Based on PCA and SVM

Authors: C. Wan, A. Mita

Abstract:

This paper proposes a novel system for monitoring the health of underground pipelines. Some of these pipelines transport dangerous contents and any damage incurred might have catastrophic consequences. However, most of these damage are unintentional and usually a result of surrounding construction activities. In order to prevent these potential damages, monitoring systems are indispensable. This paper focuses on acoustically recognizing road cutters since they prelude most construction activities in modern cities. Acoustic recognition can be easily achieved by installing a distributed computing sensor network along the pipelines and using smart sensors to “listen" for potential threat; if there is a real threat, raise some form of alarm. For efficient pipeline monitoring, a novel monitoring approach is proposed. Principal Component Analysis (PCA) was studied and applied. Eigenvalues were regarded as the special signature that could characterize a sound sample, and were thus used for the feature vector for sound recognition. The denoising ability of PCA could make it robust to noise interference. One class SVM was used for classifier. On-site experiment results show that the proposed PCA and SVM based acoustic recognition system will be very effective with a low tendency for raising false alarms.

Keywords: One class SVM, pipeline monitoring system, principal component analysis, sound recognition, third party damage.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2016
101 Tidal Data Analysis using ANN

Authors: Ritu Vijay, Rekha Govil

Abstract:

The design of a complete expansion that allows for compact representation of certain relevant classes of signals is a central problem in signal processing applications. Achieving such a representation means knowing the signal features for the purpose of denoising, classification, interpolation and forecasting. Multilayer Neural Networks are relatively a new class of techniques that are mathematically proven to approximate any continuous function arbitrarily well. Radial Basis Function Networks, which make use of Gaussian activation function, are also shown to be a universal approximator. In this age of ever-increasing digitization in the storage, processing, analysis and communication of information, there are numerous examples of applications where one needs to construct a continuously defined function or numerical algorithm to approximate, represent and reconstruct the given discrete data of a signal. Many a times one wishes to manipulate the data in a way that requires information not included explicitly in the data, which is done through interpolation and/or extrapolation. Tidal data are a very perfect example of time series and many statistical techniques have been applied for tidal data analysis and representation. ANN is recent addition to such techniques. In the present paper we describe the time series representation capabilities of a special type of ANN- Radial Basis Function networks and present the results of tidal data representation using RBF. Tidal data analysis & representation is one of the important requirements in marine science for forecasting.

Keywords: ANN, RBF, Tidal Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1656
100 Discrete and Stationary Adaptive Sub-Band Threshold Method for Improving Image Resolution

Authors: P. Joyce Beryl Princess, Y. Harold Robinson

Abstract:

Image Processing is a structure of Signal Processing for which the input is the image and the output is also an image or parameter of the image. Image Resolution has been frequently referred as an important aspect of an image. In Image Resolution Enhancement, images are being processed in order to obtain more enhanced resolution. To generate highly resoluted image for a low resoluted input image with high PSNR value. Stationary Wavelet Transform is used for Edge Detection and minimize the loss occurs during Downsampling. Inverse Discrete Wavelet Transform is to get highly resoluted image. Highly resoluted output is generated from the Low resolution input with high quality. Noisy input will generate output with low PSNR value. So Noisy resolution enhancement technique has been used for adaptive sub-band thresholding is used. Downsampling in each of the DWT subbands causes information loss in the respective subbands. SWT is employed to minimize this loss. Inverse Discrete wavelet transform (IDWT) is to convert the object which is downsampled using DWT into a highly resoluted object. Used Image denoising and resolution enhancement techniques will generate image with high PSNR value. Our Proposed method will improve Image Resolution and reached the optimized threshold.

Keywords: Image Processing, Inverse Discrete wavelet transform, PSNR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790
99 A Trainable Neural Network Ensemble for ECG Beat Classification

Authors: Atena Sajedin, Shokoufeh Zakernejad, Soheil Faridi, Mehrdad Javadi, Reza Ebrahimpour

Abstract:

This paper illustrates the use of a combined neural network model for classification of electrocardiogram (ECG) beats. We present a trainable neural network ensemble approach to develop customized electrocardiogram beat classifier in an effort to further improve the performance of ECG processing and to offer individualized health care. We process a three stage technique for detection of premature ventricular contraction (PVC) from normal beats and other heart diseases. This method includes a denoising, a feature extraction and a classification. At first we investigate the application of stationary wavelet transform (SWT) for noise reduction of the electrocardiogram (ECG) signals. Then feature extraction module extracts 10 ECG morphological features and one timing interval feature. Then a number of multilayer perceptrons (MLPs) neural networks with different topologies are designed. The performance of the different combination methods as well as the efficiency of the whole system is presented. Among them, Stacked Generalization as a proposed trainable combined neural network model possesses the highest recognition rate of around 95%. Therefore, this network proves to be a suitable candidate in ECG signal diagnosis systems. ECG samples attributing to the different ECG beat types were extracted from the MIT-BIH arrhythmia database for the study.

Keywords: ECG beat Classification; Combining Classifiers;Premature Ventricular Contraction (PVC); Multi Layer Perceptrons;Wavelet Transform

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215
98 Performance Evaluation of an ANC-based Hybrid Algorithm for Multi-target Wideband Active Sonar Echolocation System

Authors: Jason Chien-Hsun Tseng

Abstract:

This paper evaluates performances of an adaptive noise cancelling (ANC) based target detection algorithm on a set of real test data supported by the Defense Evaluation Research Agency (DERA UK) for multi-target wideband active sonar echolocation system. The hybrid algorithm proposed is a combination of an adaptive ANC neuro-fuzzy scheme in the first instance and followed by an iterative optimum target motion estimation (TME) scheme. The neuro-fuzzy scheme is based on the adaptive noise cancelling concept with the core processor of ANFIS (adaptive neuro-fuzzy inference system) to provide an effective fine tuned signal. The resultant output is then sent as an input to the optimum TME scheme composed of twogauge trimmed-mean (TM) levelization, discrete wavelet denoising (WDeN), and optimal continuous wavelet transform (CWT) for further denosing and targets identification. Its aim is to recover the contact signals in an effective and efficient manner and then determine the Doppler motion (radial range, velocity and acceleration) at very low signal-to-noise ratio (SNR). Quantitative results have shown that the hybrid algorithm have excellent performance in predicting targets- Doppler motion within various target strength with the maximum false detection of 1.5%.

Keywords: Wideband Active Sonar Echolocation, ANC Neuro-Fuzzy, Wavelet Denoise, CWT, Hybrid Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2058
97 Evaluation of Pragmatic Information in an English Textbook: Focus on Requests

Authors: Israa A. Qari

Abstract:

Learning to request in a foreign language is a key ability within pragmatics language teaching. This paper examines how requests are taught in English Unlimited Book 3 (Cambridge University Press), an EFL textbook series employed by King Abdulaziz University in Jeddah, Saudi Arabia to teach advanced foundation year students English. The focus of analysis is the evaluation of the request linguistic strategies present in the textbook, frequency of the use of these strategies, and the contextual information provided on the use of these linguistic forms. The researcher collected all the linguistic forms which consisted of the request speech act and divided them into levels employing the CCSARP request coding manual. Findings demonstrated that simple and commonly employed request strategies are introduced. Looking closely at the exercises throughout the chapters, it was noticeable that the book exclusively employed the most direct form of requesting (the imperative) when giving learners instructions: e.g. listen, write, ask, answer, read, look, complete, choose, talk, think, etc. The book also made use of some other request strategies such as ‘hedged performatives’ and ‘query preparatory’. However, it was also found that many strategies were not dealt with in the book, specifically strategies with combined functions (e.g. possibility, ability). On a sociopragmatic level, a strong focus was found to exist on standard situations in which relations between the requester and requestee are clear. In general, contextual information was communicated implicitly only. The textbook did not seem to differentiate between formal and informal request contexts (register) which might consequently impel students to overgeneralize. The paper closes with some recommendations for textbook and curriculum designers. Findings are also contrasted with previous results from similar body of research on EFL requests.

Keywords: EFL, Requests, Saudi, speech acts, textbook evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 452
96 Enhancing Temporal Extrapolation of Wind Speed Using a Hybrid Technique: A Case Study in West Coast of Denmark

Authors: B. Elshafei, X. Mao

Abstract:

The demand for renewable energy is significantly increasing, major investments are being supplied to the wind power generation industry as a leading source of clean energy. The wind energy sector is entirely dependable and driven by the prediction of wind speed, which by the nature of wind is very stochastic and widely random. This s0tudy employs deep multi-fidelity Gaussian process regression, used to predict wind speeds for medium term time horizons. Data of the RUNE experiment in the west coast of Denmark were provided by the Technical University of Denmark, which represent the wind speed across the study area from the period between December 2015 and March 2016. The study aims to investigate the effect of pre-processing the data by denoising the signal using empirical wavelet transform (EWT) and engaging the vector components of wind speed to increase the number of input data layers for data fusion using deep multi-fidelity Gaussian process regression (GPR). The outcomes were compared using root mean square error (RMSE) and the results demonstrated a significant increase in the accuracy of predictions which demonstrated that using vector components of the wind speed as additional predictors exhibits more accurate predictions than strategies that ignore them, reflecting the importance of the inclusion of all sub data and pre-processing signals for wind speed forecasting models.

Keywords: Data fusion, Gaussian process regression, signal denoise, temporal extrapolation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 501
95 Ant Colony Optimization for Feature Subset Selection

Authors: Ahmed Al-Ani

Abstract:

The Ant Colony Optimization (ACO) is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It has recently attracted a lot of attention and has been successfully applied to a number of different optimization problems. Due to the importance of the feature selection problem and the potential of ACO, this paper presents a novel method that utilizes the ACO algorithm to implement a feature subset search procedure. Initial results obtained using the classification of speech segments are very promising.

Keywords: Ant Colony Optimization, ant systems, feature selection, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3141
94 Medical Image Segmentation Based On Vigorous Smoothing and Edge Detection Ideology

Authors: Jagadish H. Pujar, Pallavi S. Gurjal, Shambhavi D. S, Kiran S. Kunnur

Abstract:

Medical image segmentation based on image smoothing followed by edge detection assumes a great degree of importance in the field of Image Processing. In this regard, this paper proposes a novel algorithm for medical image segmentation based on vigorous smoothening by identifying the type of noise and edge diction ideology which seems to be a boom in medical image diagnosis. The main objective of this algorithm is to consider a particular medical image as input and make the preprocessing to remove the noise content by employing suitable filter after identifying the type of noise and finally carrying out edge detection for image segmentation. The algorithm consists of three parts. First, identifying the type of noise present in the medical image as additive, multiplicative or impulsive by analysis of local histograms and denoising it by employing Median, Gaussian or Frost filter. Second, edge detection of the filtered medical image is carried out using Canny edge detection technique. And third part is about the segmentation of edge detected medical image by the method of Normalized Cut Eigen Vectors. The method is validated through experiments on real images. The proposed algorithm has been simulated on MATLAB platform. The results obtained by the simulation shows that the proposed algorithm is very effective which can deal with low quality or marginal vague images which has high spatial redundancy, low contrast and biggish noise, and has a potential of certain practical use of medical image diagnosis.

Keywords: Image Segmentation, Image smoothing, Edge Detection, Impulsive noise, Gaussian noise, Median filter, Canny edge, Eigen values, Eigen vector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1910
93 Numerical Simulations of Acoustic Imaging in Hydrodynamic Tunnel with Model Adaptation and Boundary Layer Noise Reduction

Authors: Sylvain Amailland, Jean-Hugh Thomas, Charles PĂ©zerat, Romuald Boucheron, Jean-Claude Pascal

Abstract:

The noise requirements for naval and research vessels have seen an increasing demand for quieter ships in order to fulfil current regulations and to reduce the effects on marine life. Hence, new methods dedicated to the characterization of propeller noise, which is the main source of noise in the far-field, are needed. The study of cavitating propellers in closed-section is interesting for analyzing hydrodynamic performance but could involve significant difficulties for hydroacoustic study, especially due to reverberation and boundary layer noise in the tunnel. The aim of this paper is to present a numerical methodology for the identification of hydroacoustic sources on marine propellers using hydrophone arrays in a large hydrodynamic tunnel. The main difficulties are linked to the reverberation of the tunnel and the boundary layer noise that strongly reduce the signal-to-noise ratio. In this paper it is proposed to estimate the reflection coefficients using an inverse method and some reference transfer functions measured in the tunnel. This approach allows to reduce the uncertainties of the propagation model used in the inverse problem. In order to reduce the boundary layer noise, a cleaning algorithm taking advantage of the low rank and sparse structure of the cross-spectrum matrices of the acoustic and the boundary layer noise is presented. This approach allows to recover the acoustic signal even well under the boundary layer noise. The improvement brought by this method is visible on acoustic maps resulting from beamforming and DAMAS algorithms.

Keywords: Acoustic imaging, boundary layer noise denoising, inverse problems, model adaptation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974
92 Contextual SenSe Model: Word Sense Disambiguation Using Sense and Sense Value of Context Surrounding the Target

Authors: Vishal Raj, Noorhan Abbas

Abstract:

Ambiguity in NLP (Natural Language Processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a method to create an affinity matrix to calculate the affinity between any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an algorithm to create the sense clusters of tokens using affinity matrix under hierarchy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contextual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and challenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.

Keywords: Word Sense Disambiguation, WSD, Contextual SenSe Model, Most Frequent Sense, part of speech, POS, Natural Language Processing, NLP, OOV, out of vocabulary, ELMo, Embeddings from Language Model, BERT, Bidirectional Encoder Representations from Transformers, Word2Vec, lemma_POS, Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 382
91 A Study on the Differential Diagnostic Model for Newborn Hearing Loss Screening

Authors: Chun-Lang Chang

Abstract:

According to the statistics, the prevalence of congenital hearing loss in Taiwan is approximately six thousandths; furthermore, one thousandths of infants have severe hearing impairment. Hearing ability during infancy has significant impact in the development of children-s oral expressions, language maturity, cognitive performance, education ability and social behaviors in the future. Although most children born with hearing impairment have sensorineural hearing loss, almost every child more or less still retains some residual hearing. If provided with a hearing aid or cochlear implant (a bionic ear) timely in addition to hearing speech training, even severely hearing-impaired children can still learn to talk. On the other hand, those who failed to be diagnosed and thus unable to begin hearing and speech rehabilitations on a timely manner might lose an important opportunity to live a complete and healthy life. Eventually, the lack of hearing and speaking ability will affect the development of both mental and physical functions, intelligence, and social adaptability. Not only will this problem result in an irreparable regret to the hearing-impaired child for the life time, but also create a heavy burden for the family and society. Therefore, it is necessary to establish a set of computer-assisted predictive model that can accurately detect and help diagnose newborn hearing loss so that early interventions can be provided timely to eliminate waste of medical resources. This study uses information from the neonatal database of the case hospital as the subjects, adopting two different analysis methods of using support vector machine (SVM) for model predictions and using logistic regression to conduct factor screening prior to model predictions in SVM to examine the results. The results indicate that prediction accuracy is as high as 96.43% when the factors are screened and selected through logistic regression. Hence, the model constructed in this study will have real help in clinical diagnosis for the physicians and actually beneficial to the early interventions of newborn hearing impairment.

Keywords: Data mining, Hearing impairment, Logistic regression analysis, Support vector machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1801
90 Hand Gesture Detection via EmguCV Canny Pruning

Authors: N. N. Mosola, S. J. Molete, L. S. Masoebe, M. Letsae

Abstract:

Hand gesture recognition is a technique used to locate, detect, and recognize a hand gesture. Detection and recognition are concepts of Artificial Intelligence (AI). AI concepts are applicable in Human Computer Interaction (HCI), Expert systems (ES), etc. Hand gesture recognition can be used in sign language interpretation. Sign language is a visual communication tool. This tool is used mostly by deaf societies and those with speech disorder. Communication barriers exist when societies with speech disorder interact with others. This research aims to build a hand recognition system for Lesotho’s Sesotho and English language interpretation. The system will help to bridge the communication problems encountered by the mentioned societies. The system has various processing modules. The modules consist of a hand detection engine, image processing engine, feature extraction, and sign recognition. Detection is a process of identifying an object. The proposed system uses Canny pruning Haar and Haarcascade detection algorithms. Canny pruning implements the Canny edge detection. This is an optimal image processing algorithm. It is used to detect edges of an object. The system employs a skin detection algorithm. The skin detection performs background subtraction, computes the convex hull, and the centroid to assist in the detection process. Recognition is a process of gesture classification. Template matching classifies each hand gesture in real-time. The system was tested using various experiments. The results obtained show that time, distance, and light are factors that affect the rate of detection and ultimately recognition. Detection rate is directly proportional to the distance of the hand from the camera. Different lighting conditions were considered. The more the light intensity, the faster the detection rate. Based on the results obtained from this research, the applied methodologies are efficient and provide a plausible solution towards a light-weight, inexpensive system which can be used for sign language interpretation.

Keywords: Canny pruning, hand recognition, machine learning, skin tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1308
89 Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model

Authors: Selvam M, Natarajan. A M, Thangarajan R

Abstract:

Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syntax and semantics thereby increasing accuracy and efficiency of the parser. Tamil language has some inherent features which are more challenging. In order to obtain the solutions, lexicalized and statistical approach is to be applied in the parsing with the aid of a language model. Statistical models mainly focus on semantics of the language which are suitable for large vocabulary tasks where as structural methods focus on syntax which models small vocabulary tasks. A statistical language model based on Trigram for Tamil language with medium vocabulary of 5000 words has been built. Though statistical parsing gives better performance through tri-gram probabilities and large vocabulary size, it has some disadvantages like focus on semantics rather than syntax, lack of support in free ordering of words and long term relationship. To overcome the disadvantages a structural component is to be incorporated in statistical language models which leads to the implementation of hybrid language models. This paper has attempted to build phrase structured hybrid language model which resolves above mentioned disadvantages. In the development of hybrid language model, new part of speech tag set for Tamil language has been developed with more than 500 tags which have the wider coverage. A phrase structured Treebank has been developed with 326 Tamil sentences which covers more than 5000 words. A hybrid language model has been trained with the phrase structured Treebank using immediate head parsing technique. Lexicalized and statistical parser which employs this hybrid language model and immediate head parsing technique gives better results than pure grammar and trigram based model.

Keywords: Hybrid Language Model, Immediate Head Parsing, Lexicalized and Statistical Parsing, Natural Language Processing, Parts of Speech, Probabilistic Context Free Grammar, Tamil Language, Tree Bank.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3642
88 Reading and Teaching Poetry as Communicative Discourse: A Pragma-Linguistic Approach

Authors: Omnia Elkommos

Abstract:

Language is communication on several discourse levels. The target of teaching a language and the literature of a foreign language is to communicate a message. Reading, appreciating, analysing, and interpreting poetry as a sophisticated rhetorical expression of human thoughts, emotions, and philosophical messages is more feasible through the use of linguistic pragmatic tools from a communicative discourse perspective. The poet's intention, speech act, illocutionary act, and perlocutionary goal can be better understood when communicative situational context as well as linguistic discourse structure theories are employed. The use of linguistic theories in the teaching of poetry is, therefore, intrinsic to students' comprehension, interpretation, and appreciation of poetry of the different ages. It is the purpose of this study to show how both teachers as well as students can apply these linguistic theories and tools to dramatic poetic texts for an engaging, enlightening, and effective interpretation and appreciation of the language. Theories drawn from areas of pragmatics, discourse analysis, embedded discourse level, communicative situational context, and other linguistic approaches were applied to selected poetry texts from the different centuries. Further, in a simple statistical count of the number of poems with dialogic dramatic discourse with embedded two or three levels of discourse in different anthologies outweighs the number of descriptive poems with a one level of discourse, between the poet and the reader. Poetry is thus discourse on one, two, or three levels. It is, therefore, recommended that teachers and students in the area of ESL/EFL use the linguistics theories for a better understanding of poetry as communicative discourse. The practice of applying these linguistic theories in classrooms and in research will allow them to perceive the language and its linguistic, social, and cultural aspect. Texts will become live illocutionary acts with a perlocutionary acts goal rather than mere literary texts in anthologies.

Keywords: Coda, commissives, communicative situation, context of culture, context of reference, context of utterance, dialogue, directives, discourse analysis, dramatic discourse interaction, duologue, embedded discourse levels, language for communication, linguistic structures, literary texts, poetry, pragmatic theories, reader response, speech acts (macro/micro), stylistics, teaching literature, TEFL, terms of address, turn-taking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1731
87 Signal Reconstruction Using Cepstrum of Higher Order Statistics

Authors: Adnan Al-Smadi, Mahmoud Smadi

Abstract:

This paper presents an algorithm for reconstructing phase and magnitude responses of the impulse response when only the output data are available. The system is driven by a zero-mean independent identically distributed (i.i.d) non-Gaussian sequence that is not observed. The additive noise is assumed to be Gaussian. This is an important and essential problem in many practical applications of various science and engineering areas such as biomedical, seismic, and speech processing signals. The method is based on evaluating the bicepstrum of the third-order statistics of the observed output data. Simulations results are presented that demonstrate the performance of this method.

Keywords: Cepstrum, bicepstrum, third order statistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2036
86 Trispectral Analysis of Voiced Sounds Defective Audition and Tracheotomisian Cases

Authors: H. Maalem, F. Marir

Abstract:

This paper presents the cepstral and trispectral analysis of a speech signal produced by normal men, men with defective audition (deaf, deep deaf) and others affected by tracheotomy, the trispectral analysis based on parametric methods (Autoregressive AR) using the fourth order cumulant. These analyses are used to detect and compare the pitches and the formants of corresponding voiced sounds (vowel \a\, \i\ and \u\). The first results appear promising, since- it seems after several experimentsthere is no deformation of the spectrum as one could have supposed it at the beginning, however these pathologies influenced the two characteristics: The defective audition influences to the formants contrary to the tracheotomy, which influences the fundamental frequency (pitch).

Keywords: Cepstrum, cumulant, defective audition, tracheotomisy, trispectrum.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1406
85 A Method for Quality Inspection of Motors by Detecting Abnormal Sound

Authors: Tadatsugu Kitamoto

Abstract:

Recently, a quality of motors is inspected by human ears. In this paper, I propose two systems using a method of speech recognition for automation of the inspection. The first system is based on a method of linear processing which uses K-means and Nearest Neighbor method, and the second is based on a method of non-linear processing which uses neural networks. I used motor sounds in these systems, and I successfully recognize 86.67% of motor sounds in the linear processing system and 97.78% in the non-linear processing system.

Keywords: Acoustical diagnosis, Neural networks, K-means, Short-time Fourier transformation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1699
84 Virtual Reality in COVID-19 Stroke Rehabilitation: Preliminary Outcomes

Authors: Kasra Afsahi, Maryam Soheilifar, S. Hossein Hosseini

Abstract:

Background: There is growing evidence that Cerebral Vascular Accident (CVA) can be a consequence of COVID-19 infection. Understanding novel treatment approaches is important in optimizing patient outcomes. Case: This case explores the use of Virtual Reality (VR) in the treatment of a 23-year-old COVID-positive female presenting with left hemiparesis in August 2020. Imaging showed right globus pallidus, thalamus, and internal capsule ischemic stroke. Conventional rehabilitation was started two weeks later, with VR included. This game-based VR technology developed for stroke patients was based on upper extremity exercises and functions for stroke. Physical examination showed left hemiparesis with muscle strength 3/5 in the upper extremity and 4/5 in the lower extremity. The range of motion of the shoulder was 90-100 degrees. The speech exam showed a mild decrease in fluency. Mild lower lip dynamic asymmetry was seen. Babinski was positive on the left. Gait speed was decreased (75 steps per minute). Intervention: Our game-based VR system was developed based on upper extremity physiotherapy exercises for post-stroke patients to increase the active, voluntary movement of the upper extremity joints and improve the function. The conventional program was initiated with active exercises, shoulder sanding for joint ROMs, walking shoulder, shoulder wheel, and combination movements of the shoulder, elbow, and wrist joints, alternative flexion-extension, pronation-supination movements, Pegboard and Purdo pegboard exercises. Also, fine movements included smart gloves, biofeedback, finger ladder, and writing. The difficulty of the game increased at each stage of the practice with progress in patient performances. Outcome: After 6 weeks of treatment, gait and speech were normal and upper extremity strength was improved to near normal status. No adverse effects were noted. Conclusion: This case suggests that VR is a useful tool in the treatment of a patient with COVID-19 related CVA. The safety of developed instruments for such cases provides approaches to improve the therapeutic outcomes and prognosis as well as increased satisfaction rate among patients.

Keywords: COVID-19, stroke, virtual reality, rehabilitation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 446
83 Improvement of MLLR Speaker Adaptation Using a Novel Method

Authors: Ing-Jr Ding

Abstract:

This paper presents a technical speaker adaptation method called WMLLR, which is based on maximum likelihood linear regression (MLLR). In MLLR, a linear regression-based transform which adapted the HMM mean vectors was calculated to maximize the likelihood of adaptation data. In this paper, the prior knowledge of the initial model is adequately incorporated into the adaptation. A series of speaker adaptation experiments are carried out at a 30 famous city names database to investigate the efficiency of the proposed method. Experimental results show that the WMLLR method outperforms the conventional MLLR method, especially when only few utterances from a new speaker are available for adaptation.

Keywords: hidden Markov model, maximum likelihood linearregression, speech recognition, speaker adaptation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1841
82 Extracting Multiword Expressions in Machine Translation from English to Urdu using Relational Data Approach

Authors: Kashif Bilal, Uzair Muhammad, Atif Khan, M. Nasir Khan

Abstract:

Machine Translation, (hereafter in this document referred to as the "MT") faces a lot of complex problems from its origination. Extracting multiword expressions is also one of the complex problems in MT. Finding multiword expressions during translating a sentence from English into Urdu, through existing solutions, takes a lot of time and occupies system resources. We have designed a simple relational data approach, in which we simply set a bit in dictionary (database) for multiword, to find and handle multiword expression. This approach handles multiword efficiently.

Keywords: Machine Translation, Multiword Expressions, Urdulanguage processing, POS (stands for Parts of Speech) Tagging forUrdu, Expert Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2357
81 An Artificial Emotion Model For Visualizing Emotion of Characters

Authors: Junseok Ham, Chansun Jung, Junhyung Park, Jihye Ryeo, Ilju Ko

Abstract:

It is hard to express emotion through only speech when we watch a character in a movie or a play because we cannot estimate the size, kind, and quantity of emotion. So this paper proposes an artificial emotion model for visualizing current emotion with color and location in emotion model. The artificial emotion model is designed considering causality of generated emotion, difference of personality, difference of continual emotional stimulus, and co-relation of various emotions. This paper supposed the Emotion Field for visualizing current emotion with location, and current emotion is expressed by location and color in the Emotion Field. For visualizing changes within current emotion, the artificial emotion model is adjusted to characters in Hamlet.

Keywords: Emotion, Artificial Emotion, Visualizing, EmotionModel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1249