Search results for: Speech steganography
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 287

Search results for: Speech steganography

167 Extracting Tongue Shape Dynamics from Magnetic Resonance Image Sequences

Authors: María S. Avila-García, John N. Carter, Robert I. Damper

Abstract:

An important problem in speech research is the automatic extraction of information about the shape and dimensions of the vocal tract during real-time speech production. We have previously developed Southampton dynamic magnetic resonance imaging (SDMRI) as an approach to the solution of this problem.However, the SDMRI images are very noisy so that shape extraction is a major challenge. In this paper, we address the problem of tongue shape extraction, which poses difficulties because this is a highly deforming non-parametric shape. We show that combining active shape models with the dynamic Hough transform allows the tongue shape to be reliably tracked in the image sequence.

Keywords: Vocal tract imaging, speech production, active shapemodels, dynamic Hough transform, object tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1689
166 Influence of Loudness Compression on Hearing with Bone Anchored Hearing Implants

Authors: Anja Kurz, Marc Flynn, Tobias Good, Marco Caversaccio, Martin Kompis

Abstract:

Bone Anchored Hearing Implants (BAHI) are  routinely used in patients with conductive or mixed hearing loss, e.g.  if conventional air conduction hearing aids cannot be used. New  sound processors and new fitting software now allow the adjustment  of parameters such as loudness compression ratios or maximum  power output separately. Today it is unclear, how the choice of these  parameters influences aided speech understanding in BAHI users.  In this prospective experimental study, the effect of varying the  compression ratio and lowering the maximum power output in a  BAHI were investigated.  Twelve experienced adult subjects with a mixed hearing loss  participated in this study. Four different compression ratios (1.0; 1.3;  1.6; 2.0) were tested along with two different maximum power output  settings, resulting in a total of eight different programs. Each  participant tested each program during two weeks. A blinded Latin  square design was used to minimize bias.  For each of the eight programs, speech understanding in quiet and  in noise was assessed. For speech in quiet, the Freiburg number test  and the Freiburg monosyllabic word test at 50, 65, and 80 dB SPL  were used. For speech in noise, the Oldenburg sentence test was  administered.  Speech understanding in quiet and in noise was improved  significantly in the aided condition in any program, when compared  to the unaided condition. However, no significant differences were  found between any of the eight programs. In contrast, on a subjective  level there was a significant preference for medium compression  ratios of 1.3 to 1.6 and higher maximum power output.

 

Keywords: Bone Anchored Hearing Implant, Compression, Maximum Power Output, Speech understanding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010
165 Comparison of Fricative Vocal Tract Transfer Functions Derived using Two Different Segmentation Techniques

Authors: K. S. Subari, C. H. Shadle, A. Barney, R. I. Damper

Abstract:

The acoustic and articulatory properties of fricative speech sounds are being studied using magnetic resonance imaging (MRI) and acoustic recordings from a single subject. Area functions were derived from a complete set of axial and coronal MR slices using two different methods: the Mermelstein technique and the Blum transform. Area functions derived from the two techniques were shown to differ significantly in some cases. Such differences will lead to different acoustic predictions and it is important to know which is the more accurate. The vocal tract acoustic transfer function (VTTF) was derived from these area functions for each fricative and compared with measured speech signals for the same fricative and same subject. The VTTFs for /f/ in two vowel contexts and the corresponding acoustic spectra are derived here; the Blum transform appears to show a better match between prediction and measurement than the Mermelstein technique.

Keywords: Area functions, fricatives, vocal tract transferfunction, MRI, speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1613
164 Automotive 3-Microphone Noise Canceller in a Frequently Moving Noise Source Environment

Authors: Z. Qi, T. J. Moir

Abstract:

A combined three-microphone voice activity detector (VAD) and noise-canceling system is studied to enhance speech recognition in an automobile environment. A previous experiment clearly shows the ability of the composite system to cancel a single noise source outside of a defined zone. This paper investigates the performance of the composite system when there are frequently moving noise sources (noise sources are coming from different locations but are not always presented at the same time) e.g. there is other passenger speech or speech from a radio when a desired speech is presented. To work in a frequently moving noise sources environment, whilst a three-microphone voice activity detector (VAD) detects voice from a “VAD valid zone", the 3-microphone noise canceller uses a “noise canceller valid zone" defined in freespace around the users head. Therefore, a desired voice should be in the intersection of the noise canceller valid zone and VAD valid zone. Thus all noise is suppressed outside this intersection of area. Experiments are shown for a real environment e.g. all results were recorded in a car by omni-directional electret condenser microphones.

Keywords: Signal processing, voice activity detection, noise canceller, microphone array beam forming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1566
163 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis

Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze

Abstract:

The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.

Keywords: Auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2277
162 A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30d B SNR as a reference for voice activity.

Keywords: Atomic Decomposition, Gabor, Gammatone, Matching Pursuit, Voice Activity Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1747
161 Online Collaborative Learning System Using Speech Technology

Authors: Sid-Ahmed. Selouani, Tang-Ho Lê, Chadia Moghrabi, Benoit Lanteigne, Jean Roy

Abstract:

A Web-based learning tool, the Learn IN Context (LINC) system, designed and being used in some institution-s courses in mixed-mode learning, is presented in this paper. This mode combines face-to-face and distance approaches to education. LINC can achieve both collaborative and competitive learning. In order to provide both learners and tutors with a more natural way to interact with e-learning applications, a conversational interface has been included in LINC. Hence, the components and essential features of LINC+, the voice enhanced version of LINC, are described. We report evaluation experiments of LINC/LINC+ in a real use context of a computer programming course taught at the Université de Moncton (Canada). The findings show that when the learning material is delivered in the form of a collaborative and voice-enabled presentation, the majority of learners seem to be satisfied with this new media, and confirm that it does not negatively affect their cognitive load.

Keywords: E-leaning, Knowledge Network, Speech recognition, Speech synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
160 Hybrid Method Using Wavelets and Predictive Method for Compression of Speech Signal

Authors: Karima Siham Aoubid, Mohamed Boulemden

Abstract:

The development of the signal compression algorithms is having compressive progress. These algorithms are continuously improved by new tools and aim to reduce, an average, the number of bits necessary to the signal representation by means of minimizing the reconstruction error. The following article proposes the compression of Arabic speech signal by a hybrid method combining the wavelet transform and the linear prediction. The adopted approach rests, on one hand, on the original signal decomposition by ways of analysis filters, which is followed by the compression stage, and on the other hand, on the application of the order 5, as well as, the compression signal coefficients. The aim of this approach is the estimation of the predicted error, which will be coded and transmitted. The decoding operation is then used to reconstitute the original signal. Thus, the adequate choice of the bench of filters is useful to the transform in necessary to increase the compression rate and induce an impercevable distortion from an auditive point of view.

Keywords: Compression, linear prediction analysis, multiresolution analysis, speech signal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1280
159 Recognition of Noisy Words Using the Time Delay Neural Networks Approach

Authors: Khenfer-Koummich Fatima, Mesbahi Larbi, Hendel Fatiha

Abstract:

This paper presents a recognition system for isolated words like robot commands. It’s carried out by Time Delay Neural Networks; TDNN. To teleoperate a robot for specific tasks as turn, close, etc… In industrial environment and taking into account the noise coming from the machine. The choice of TDNN is based on its generalization in terms of accuracy, in more it acts as a filter that allows the passage of certain desirable frequency characteristics of speech; the goal is to determine the parameters of this filter for making an adaptable system to the variability of speech signal and to noise especially, for this the back propagation technique was used in learning phase. The approach was applied on commands pronounced in two languages separately: The French and Arabic. The results for two test bases of 300 spoken words for each one are 87%, 97.6% in neutral environment and 77.67%, 92.67% when the white Gaussian noisy was added with a SNR of 35 dB.

Keywords: Neural networks, Noise, Speech Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
158 High Securing Cover-File of Hidden Data Using Statistical Technique and AES Encryption Algorithm

Authors: A. A. Zaidan, Anas Majeed, B. B. Zaidan

Abstract:

Nowadays, the rapid development of multimedia and internet allows for wide distribution of digital media data. It becomes much easier to edit, modify and duplicate digital information Besides that, digital documents are also easy to copy and distribute, therefore it will be faced by many threatens. It-s a big security and privacy issue with the large flood of information and the development of the digital format, it become necessary to find appropriate protection because of the significance, accuracy and sensitivity of the information. Nowadays protection system classified with more specific as hiding information, encryption information, and combination between hiding and encryption to increase information security, the strength of the information hiding science is due to the non-existence of standard algorithms to be used in hiding secret messages. Also there is randomness in hiding methods such as combining several media (covers) with different methods to pass a secret message. In addition, there are no formal methods to be followed to discover the hidden data. For this reason, the task of this research becomes difficult. In this paper, a new system of information hiding is presented. The proposed system aim to hidden information (data file) in any execution file (EXE) and to detect the hidden file and we will see implementation of steganography system which embeds information in an execution file. (EXE) files have been investigated. The system tries to find a solution to the size of the cover file and making it undetectable by anti-virus software. The system includes two main functions; first is the hiding of the information in a Portable Executable File (EXE), through the execution of four process (specify the cover file, specify the information file, encryption of the information, and hiding the information) and the second function is the extraction of the hiding information through three process (specify the steno file, extract the information, and decryption of the information). The system has achieved the main goals, such as make the relation of the size of the cover file and the size of information independent and the result file does not make any conflict with anti-virus software.

Keywords: Cryptography, Steganography, Portable ExecutableFile.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1748
157 Speech Encryption and Decryption Using Linear Feedback Shift Register (LFSR)

Authors: Tin Lai Win, Nant Christina Kyaw

Abstract:

This paper is taken into consideration the problem of cryptanalysis of stream ciphers. There is some attempts need to improve the existing attacks on stream cipher and to make an attempt to distinguish the portions of cipher text obtained by the encryption of plain text in which some parts of the text are random and the rest are non-random. This paper presents a tutorial introduction to symmetric cryptography. The basic information theoretic and computational properties of classic and modern cryptographic systems are presented, followed by an examination of the application of cryptography to the security of VoIP system in computer networks using LFSR algorithm. The implementation program will be developed Java 2. LFSR algorithm is appropriate for the encryption and decryption of online streaming data, e.g. VoIP (voice chatting over IP). This paper is implemented the encryption module of speech signals to cipher text and decryption module of cipher text to speech signals.

Keywords: Linear Feedback Shift Register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3065
156 Matrix-Interleaved Serially Concatenated Block Codes for Speech Transmission in Fixed Wireless Communication Systems

Authors: F. Mehran

Abstract:

In this paper, we study a class of serially concatenated block codes (SCBC) based on matrix interleavers, to be employed in fixed wireless communication systems. The performances of SCBC¬coded systems are investigated under various interleaver dimensions. Numerical results reveal that the matrix interleaver could be a competitive candidate over conventional block interleaver for frame lengths of 200 bits; hence, the SCBC coding based on matrix interleaver is a promising technique to be employed for speech transmission applications in many international standards such as pan-European Global System for Mobile communications (GSM), Digital Cellular Systems (DCS) 1800, and Joint Detection Code Division Multiple Access (JD-CDMA) mobile radio systems, where the speech frame contains around 200 bits.

Keywords: Matrix Interleaver, serial concatenated block codes (SCBC), turbo codes, wireless communications.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885
155 Multimodal Database of Emotional Speech, Video and Gestures

Authors: Tomasz Sapiński, Dorota Kamińska, Adam Pelikant, Egils Avots, Cagri Ozcinar, Gholamreza Anbarjafari

Abstract:

People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition.

Keywords: Body movement, emotion recognition, emotional corpus, facial expressions, gestures, multimodal database, speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1066
154 SySRA: A System of a Continuous Speech Recognition in Arab Language

Authors: Samir Abdelhamid, Noureddine Bouguechal

Abstract:

We report in this paper the model adopted by our system of continuous speech recognition in Arab language SySRA and the results obtained until now. This system uses the database Arabdic-10 which is a corpus of word for the Arab language and which was manually segmented. Phonetic decoding is represented by an expert system where the knowledge base is translated in the form of production rules. This expert system transforms a vocal signal into a phonetic lattice. The higher level of the system takes care of the recognition of the lattice thus obtained by deferring it in the form of written sentences (orthographical Form). This level contains initially the lexical analyzer which is not other than the module of recognition. We subjected this analyzer to a set of spectrograms obtained by dictating a score of sentences in Arab language. The rate of recognition of these sentences is about 70% which is, to our knowledge, the best result for the recognition of the Arab language. The test set consists of twenty sentences from four speakers not having taken part in the training.

Keywords: Continuous speech recognition, lexical analyzer, phonetic decoding, phonetic lattice, vocal signal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1342
153 Efficient System for Speech Recognition using General Regression Neural Network

Authors: Abderrahmane Amrouche, Jean Michel Rouvaen

Abstract:

In this paper we present an efficient system for independent speaker speech recognition based on neural network approach. The proposed architecture comprises two phases: a preprocessing phase which consists in segmental normalization and features extraction and a classification phase which uses neural networks based on nonparametric density estimation namely the general regression neural network (GRNN). The relative performances of the proposed model are compared to the similar recognition systems based on the Multilayer Perceptron (MLP), the Recurrent Neural Network (RNN) and the well known Discrete Hidden Markov Model (HMM-VQ) that we have achieved also. Experimental results obtained with Arabic digits have shown that the use of nonparametric density estimation with an appropriate smoothing factor (spread) improves the generalization power of the neural network. The word error rate (WER) is reduced significantly over the baseline HMM method. GRNN computation is a successful alternative to the other neural network and DHMM.

Keywords: Speech Recognition, General Regression NeuralNetwork, Hidden Markov Model, Recurrent Neural Network, ArabicDigits.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2132
152 A Supervised Text-Independent Speaker Recognition Approach

Authors: Tudor Barbu

Abstract:

We provide a supervised speech-independent voice recognition technique in this paper. In the feature extraction stage we propose a mel-cepstral based approach. Our feature vector classification method uses a special nonlinear metric, derived from the Hausdorff distance for sets, and a minimum mean distance classifier.

Keywords: Text-independent speaker recognition, mel cepstral analysis, speech feature vector, Hausdorff-based metric, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783
151 Entropy Based Data Hiding for Document Images

Authors: Swetha Kurup, Sridhar G., Sridhar V.

Abstract:

In this paper we present a novel technique for data hiding in binary document images. We use the concept of entropy in order to identify document specific least distortive areas throughout the binary document image. The document image is treated as any other image and the proposed method utilizes the standard document characteristics for the embedding process. Proposed method minimizes perceptual distortion due to embedding and allows watermark extraction without the requirement of any side information at the decoder end.

Keywords: Entropy, Steganography, Watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479
150 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

Authors: Sanja Seljan, Ivan Dunđer

Abstract:

The paper presents combined automatic speech recognition (ASR) of English and machine translation (MT) for English and Croatian and Croatian-English language pairs in the domain of business correspondence. The first part presents results of training the ASR commercial system on English data sets, enriched by error analysis. The second part presents results of machine translation performed by free online tool for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Keywords: Automatic machine translation, integrated language technologies, quality evaluation, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2858
149 Using Speech Emotion Recognition as a Longitudinal Biomarker for Alzheimer’s Disease

Authors: Yishu Gong, Liangliang Yang, Jianyu Zhang, Zhengyu Chen, Sihong He, Xusheng Zhang, Wei Zhang

Abstract:

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects millions of people worldwide and is characterized by cognitive decline and behavioral changes. People living with Alzheimer’s disease often find it hard to complete routine tasks. However, there are limited objective assessments that aim to quantify the difficulty of certain tasks for AD patients compared to non-AD people. In this study, we propose to use speech emotion recognition (SER), especially the frustration level as a potential biomarker for quantifying the difficulty patients experience when describing a picture. We build an SER model using data from the IEMOCAP dataset and apply the model to the DementiaBank data to detect the AD/non-AD group difference and perform longitudinal analysis to track the AD disease progression. Our results show that the frustration level detected from the SER model can possibly be used as a cost-effective tool for objective tracking of AD progression in addition to the Mini-Mental State Examination (MMSE) score.

Keywords: Alzheimer’s disease, Speech Emotion Recognition, longitudinal biomarker, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 157
148 Spectral Analysis of Speech: A New Technique

Authors: Neeta Awasthy, J.P.Saini, D.S.Chauhan

Abstract:

ICA which is generally used for blind source separation problem has been tested for feature extraction in Speech recognition system to replace the phoneme based approach of MFCC. Applying the Cepstral coefficients generated to ICA as preprocessing has developed a new signal processing approach. This gives much better results against MFCC and ICA separately, both for word and speaker recognition. The mixing matrix A is different before and after MFCC as expected. As Mel is a nonlinear scale. However, cepstrals generated from Linear Predictive Coefficient being independent prove to be the right candidate for ICA. Matlab is the tool used for all comparisons. The database used is samples of ISOLET.

Keywords: Cepstral Coefficient, Distance measures, Independent Component Analysis, Linear Predictive Coefficients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1908
147 Multi Switched Split Vector Quantization of Narrowband Speech Signals

Authors: M. Satya Sai Ram, P. Siddaiah, M. Madhavi Latha

Abstract:

Vector quantization is a powerful tool for speech coding applications. This paper deals with LPC Coding of speech signals which uses a new technique called Multi Switched Split Vector Quantization (MSSVQ), which is a hybrid of Multi, switched, split vector quantization techniques. The spectral distortion performance, computational complexity, and memory requirements of MSSVQ are compared to split vector quantization (SVQ), multi stage vector quantization(MSVQ) and switched split vector quantization (SSVQ) techniques. It has been proved from results that MSSVQ has better spectral distortion performance, lower computational complexity and lower memory requirements when compared to all the above mentioned product code vector quantization techniques. Computational complexity is measured in floating point operations (flops), and memory requirements is measured in (floats).

Keywords: Linear predictive Coding, Multi stage vectorquantization, Switched Split vector quantization, Split vectorquantization, Line Spectral Frequencies (LSF).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617
146 Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

Authors: Zaineb Ben Messaoud, Dorra Gargouri, Saida Zribi, Ahmed Ben Hamida

Abstract:

This paper presents a formant-tracking linear prediction (FTLP) model for speech processing in noise. The main focus of this work is the detection of formant trajectory based on Hidden Markov Models (HMM), for improved formant estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of a time- sequence of peaks which satisfies continuity constraints on parameter; the within peaks are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning multi-band spectral subtraction stage to reduce the effect of residue noise on formants (2) estimation stage where an initial estimate of the LP model of speech for each frame is obtained (3) a formant classification using probability models of formants and Viterbi-decoders. The evaluation results for the estimation of the formant tracking LP model tested in Gaussian white noise background, demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and LPC variable order analysis, results in a significant reduction in errors and distortions. The performance was evaluated with noisy natual vowels extracted from international french and English vocabulary speech signals at SNR value of 10dB. In each case, the estimated formants are compared to reference formants.

Keywords: Formants Estimation, HMM, Multi Band Spectral Subtraction, Variable order LPC coding, White Gauusien Noise.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926
145 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1134
144 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: Hidden Markov model, Viterbi algorithm, POS tagging, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1648
143 Conspiracy Theory in Discussions of the Coronavirus Pandemic in the Gulf Region

Authors: Rasha Salameh

Abstract:

In light of the tense relationship between Saudi Arabia and Iran, this research paper sheds some light on Saudi-owned television network, Al-Arabiya’s reporting of the Coronavirus in the Gulf region. Particularly because most of the cases in the beginning were coming from Iran, some programs of this Saudi channel embraced a conspiracy theory. Hate speech has been used in the talking and discussions about the topic. The results of these discussions will be detailed in this paper in percentages with regard to the research sample, which includes five programs on the Al-Arabiya channel: ‘DNA’, ‘Marraya’ (Mirrors), ‘Panorama’, ‘Tafaolcom’ (Your Interaction) and ‘Diplomatic Street’, in the period between January 19, that is, the date of the first case in Iran, and April 10, 2020. The research shows the use of a conspiracy theory in the programs, in addition to some professional violations. The surveyed sample also shows that the matter receded due to the Arab Gulf states' preoccupation with the successively increasing cases that have appeared there since the start of the pandemic. The results indicate that hate speech was present in the sample at a rate of 98.1%, and that most of the programs that dealt with the Iranian issue under the Coronavirus pandemic on Al Arabiya used the conspiracy theory at a rate of 75.5%.

Keywords: Al-Arabiya, Iran, COVID-19, hate speech, conspiracy theory, politicization of the pandemic

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 406
142 A Blind Digital Watermark in Hadamard Domain

Authors: Saeid Saryazdi, Hossein Nezamabadi-pour

Abstract:

A new blind gray-level watermarking scheme is described. In the proposed method, the host image is first divided into 4*4 non-overlapping blocks. For each block, two first AC coefficients of its Hadamard transform are then estimated using DC coefficients of its neighbor blocks. A gray-level watermark is then added into estimated values. Since embedding watermark does not change the DC coefficients, watermark extracting could be done by estimating AC coefficients and comparing them with their actual values. Several experiments are made and results suggest the robustness of the proposed algorithm.

Keywords: Digital Watermarking, Image watermarking, Information Hiden, Steganography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2216
141 A Novel Steganographic Method for Gray-Level Images

Authors: Ahmad T. Al-Taani, Abdullah M. AL-Issa

Abstract:

In this work we propose a novel Steganographic method for hiding information within the spatial domain of the gray scale image. The proposed approach works by dividing the cover into blocks of equal sizes and then embeds the message in the edge of the block depending on the number of ones in left four bits of the pixel. The proposed approach is tested on a database consists of 100 different images. Experimental results, compared with other methods, showed that the proposed approach hide more large information and gave a good visual quality stego-image that can be seen by human eyes.

Keywords: Data Embedding, Cryptography, Watermarking, Steganography, Least Significant Bit, Information Hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2207
140 Improved Text-Independent Speaker Identification using Fused MFCC and IMFCC Feature Sets based on Gaussian Filter

Authors: Sandipan Chakroborty, Goutam Saha

Abstract:

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. On a recent contribution by authors, it has been shown that the Inverted Mel- Frequency Cepstral Coefficients (IMFCC) is useful feature set for SI, which contains complementary information present in high frequency region. This paper introduces the Gaussian shaped filter (GF) while calculating MFCC and IMFCC in place of typical triangular shaped bins. The objective is to introduce a higher amount of correlation between subband outputs. The performances of both MFCC & IMFCC improve with GF over conventional triangular filter (TF) based implementation, individually as well as in combination. With GMM as speaker modeling paradigm, the performances of proposed GF based MFCC and IMFCC in individual and fused mode have been verified in two standard databases YOHO, (Microphone Speech) and POLYCOST (Telephone Speech) each of which has more than 130 speakers.

Keywords: Gaussian Filter, Triangular Filter, Subbands, Correlation, MFCC, IMFCC, GMM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2376
139 Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

Authors: Ehab Mourtaga, Ahmad Sharieh, Mousa Abdallah

Abstract:

An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.

Keywords: Hidden Markov Model (HMM), MaximumLikelihood Linear Regression (MLLR), Quran, Regression ClassTree, Speech Recognition, Speaker-independent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1867
138 Face Localization Using Illumination-dependent Face Model for Visual Speech Recognition

Authors: Robert E. Hursig, Jane X. Zhang

Abstract:

A robust still image face localization algorithm capable of operating in an unconstrained visual environment is proposed. First, construction of a robust skin classifier within a shifted HSV color space is described. Then various filtering operations are performed to better isolate face candidates and mitigate the effect of substantial non-skin regions. Finally, a novel Bhattacharyya-based face detection algorithm is used to compare candidate regions of interest with a unique illumination-dependent face model probability distribution function approximation. Experimental results show a 90% face detection success rate despite the demands of the visually noisy environment.

Keywords: Audio-visual speech recognition, Bhattacharyyacoefficient, face detection,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1575