Search results for: Visual speech.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 802

Search results for: Visual speech.

712 Artificial Generation of Visual Evoked Potential to Enhance Visual Ability

Authors: A. Vani, M. N. Mamatha

Abstract:

Visual signal processing in human beings occurs in the occipital lobe of the brain. The signals that are generated in the brain are universal for all the human beings and they are called Visual Evoked Potential (VEP). Generally, the visually impaired people lose sight because of severe damage to only the eyes natural photo sensors, but the occipital lobe will still be functioning. In this paper, a technique of artificially generating VEP is proposed to enhance the visual ability of the subject. The system uses the electrical photoreceptors to capture image, process the image, to detect and recognize the subject or object. This voltage is further processed and can transmit wirelessly to a BIOMEMS implanted into occipital lobe of the patient’s brain. The proposed BIOMEMS consists of array of electrodes that generate the neuron potential which is similar to VEP of normal people. Thus, the neurons get the visual data from the BioMEMS which helps in generating partial vision or sight for the visually challenged patient. 

Keywords: Visual evoked potential, OpenViBe, BioMEMS, Neuro prosthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1405
711 Digital Watermarking Based on Visual Cryptography and Histogram

Authors: R. Rama Kishore, Sunesh

Abstract:

Nowadays, robust and secure watermarking algorithm and its optimization have been need of the hour. A watermarking algorithm is presented to achieve the copy right protection of the owner based on visual cryptography, histogram shape property and entropy. In this, both host image and watermark are preprocessed. Host image is preprocessed by using Butterworth filter, and watermark is with visual cryptography. Applying visual cryptography on water mark generates two shares. One share is used for embedding the watermark, and the other one is used for solving any dispute with the aid of trusted authority. Usage of histogram shape makes the process more robust against geometric and signal processing attacks. The combination of visual cryptography, Butterworth filter, histogram, and entropy can make the algorithm more robust, imperceptible, and copy right protection of the owner.

Keywords: Butterworth filter, digital watermarking, histogram, visual cryptography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1615
710 The Effects of Visual Elements and Cognitive Styles on Students Learning in Hypermedia Environment

Authors: Rishi Ruttun

Abstract:

One of the major features of hypermedia learning is its non-linear structure, allowing learners, the opportunity of flexible navigation to accommodate their own needs. Nevertheless, such flexibility can also cause problems such as insufficient navigation and disorientation for some learners, especially those with Field Dependent cognitive styles. As a result students learning performance can be deteriorated and in turn, they can have negative attitudes with hypermedia learning systems. It was suggested that visual elements can be used to compensate dilemmas. However, it is unclear whether these visual elements improve their learning or whether problems still exist. The aim of this study is to investigate the effect of students cognitive styles and visual elements on students learning performance and attitudes in hypermedia learning environment. Cognitive Style Analysis (CSA), Learning outcome in terms of pre and post-test, practical task, and Attitude Questionnaire (AQ) were administered to a sample of 60 university students. The findings revealed that FD students preformed equally to those of FI. Also, FD students experienced more disorientation in the hypermedia learning system where they depend a lot on the visual elements for navigation and orientation purposes. Furthermore, they had more positive attitudes towards the visual elements which escape them from experiencing navigation and disorientation dilemmas. In contrast, FI students were more comfortable, did not get disturbed or did not need some of the visual elements in the hypermedia learning system.

Keywords: Hypermedia learning, cognitive styles, visual elements, support, learning performance, attitudes and perceptions

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1628
709 A Case of Study for 3D Stereoscopic Conversion in Visual Effects Industry

Authors: Jin Zhi

Abstract:

This paper covered a series of key points in terms of 2D to 3D stereoscopic conversion. A successfully applied stereoscopic conversion approach in current visual effects industry was presented. The purpose of this paper is to cover a detailed workflow and concept, which has been successfully used in 3D stereoscopic conversion for feature films in visual effects industry, and therefore to clarify the process in stereoscopic conversion production and provide a clear idea for those entry-level artists to improve an overall understanding of 3D stereoscopic in digital compositing field as well as to the higher education factor of visual effects and hopefully inspire further collaboration and participants particularly between academia and industry.

Keywords: Clean plates, Mattes, Stereoscopic conversion, 3Dprojection, Z-depth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2189
708 Assamese Numeral Speech Recognition using Multiple Features and Cooperative LVQ -Architectures

Authors: Manash Pratim Sarma, Kandarpa Kumar Sarma

Abstract:

A set of Artificial Neural Network (ANN) based methods for the design of an effective system of speech recognition of numerals of Assamese language captured under varied recording conditions and moods is presented here. The work is related to the formulation of several ANN models configured to use Linear Predictive Code (LPC), Principal Component Analysis (PCA) and other features to tackle mood and gender variations uttering numbers as part of an Automatic Speech Recognition (ASR) system in Assamese. The ANN models are designed using a combination of Self Organizing Map (SOM) and Multi Layer Perceptron (MLP) constituting a Learning Vector Quantization (LVQ) block trained in a cooperative environment to handle male and female speech samples of numerals of Assamese- a language spoken by a sizable population in the North-Eastern part of India. The work provides a comparative evaluation of several such combinations while subjected to handle speech samples with gender based differences captured by a microphone in four different conditions viz. noiseless, noise mixed, stressed and stress-free.

Keywords: Assamese, Recognition, LPC, Spectral, ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1946
707 Laser Welded Ni-Cr Dental Alloys Inspection

Authors: Porojan S., Sandu L., Topală F.

Abstract:

Minor problems arising from optimizations by welding of fixed prostheses frameworks can be identified by macroscopic and microscopic visual inspection. The purpose of this study was to highlight the visible discontinuities present in the laser welds of dental Ni-Cr alloys. Ni-Cr base metal alloys designated for fixed prostheses manufacture were selected for the experiments. Using cast plates, preliminary tests were conducted by laser welding. Macroscopic visual inspection was done carefully to assess the defects of the welding rib. Electron microscopy images allowed visualization of small discontinuities, which escapes visual inspection. Making comparison to Ni-Cr alloys taken in the experiment and laser welded, after visual analysis, the best welds appear for Heraenium NA alloy.

Keywords: macroscopic visual inspection, electron microscopyimages, Ni-Cr dental alloys, laser welding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1507
706 An Improved Algorithm of SPIHT based on the Human Visual Characteristics

Authors: Meng Wang, Qi-rui Han

Abstract:

Because of excellent properties, people has paid more attention to SPIHI algorithm, which is based on the traditional wavelet transformation theory, but it also has its shortcomings. Combined the progress in the present wavelet domain and the human's visual characteristics, we propose an improved algorithm based on human visual characteristics of SPIHT in the base of analysis of SPIHI algorithm. The experiment indicated that the coding speed and quality has been enhanced well compared to the original SPIHT algorithm, moreover improved the quality of the transmission cut off.

Keywords: Lifted wavelet transform, SPIHT, Human Visual Characteristics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1497
705 SMaTTS: Standard Malay Text to Speech System

Authors: Othman O. Khalifa, Zakiah Hanim Ahmad, Teddy Surya Gunawan

Abstract:

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

Keywords: Natural Language Processing, Text-To-Speech (TTS), Diphone, source filter, low-/ high- level synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928
704 Comparative Study of Filter Characteristics as Statistical Vocal Correlates of Clinical Psychiatric State in Human

Authors: Thaweesak Yingthawornsuk, Chusak Thanawattano

Abstract:

Acoustical properties of speech have been shown to be related to mental states of speaker with symptoms: depression and remission. This paper describes way to address the issue of distinguishing depressed patients from remitted subjects based on measureable acoustics change of their spoken sound. The vocal-tract related frequency characteristics of speech samples from female remitted and depressed patients were analyzed via speech processing techniques and consequently, evaluated statistically by cross-validation with Support Vector Machine. Our results comparatively show the classifier's performance with effectively correct separation of 93% determined from testing with the subjectbased feature model and 88% from the frame-based model based on the same speech samples collected from hospital visiting interview sessions between patients and psychiatrists.

Keywords: Depression, SVM, Vocal Extract, Vocal Tract

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1503
703 Virtual Speaking Head for Hearing Impaired Students

Authors: Eva Pajorová, Ladislav Hluchý

Abstract:

Developed tool is one of system tools for easier access to various scientific areas and real time interactive learning between lecturer and for hearing impaired students. There is no demand for the lecturer to know Sign Language (SL). Instead, the new software tools will perform the translation of the regular speech into SL, after which it will be transferred to the student. On the other side, the questions of the student (in SL) will be translated and transferred to the lecturer in text or speech. One of those tools is presented tool. It-s too for developing the correct Speech Visemes as a root of total communication method for hearing impared students.

Keywords: Impared people, sing language, communication methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1803
702 Visual and Clinical Outcome in Patients with Corneal Lacerations

Authors: Avantika Verma

Abstract:

In industrialized nations, corneal lacerations are one of the most common reason for hospitalization. This study was designed to study visual and clinical outcome in patients presenting with full thickness corneal lacerations in Indian population and to ascertain the impact of various preoperative and operative factors influencing prognosis after repair of corneal lacerations. Males in third decade with injuries at work with metallic objects were common. Lens damage, hyphema, vitreous hemorrhage, retinal detachment and endophthalmitis were seen. All the patients underwent primary repair within first 24 hours of presentation. At 3 months, 74.3% had a good visual outcome. About 5.7% of patients had no perception of light.In conclusion, various demographic and preoperative factors like age, time of presentation, vision at presentation, length of corneal wound, involvement of visual axis, associated ocular features like hyphaema, lenticular changes, vitreous haemorrhage and retinal detachment are significant prognostic indicators for final visual outcome.

Keywords: Cornea, laceration, visual outcome, wound repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1048
701 Noise Estimation for Speech Enhancement in Non-Stationary Environments-A New Method

Authors: Ch.V.Rama Rao, Gowthami., Harsha., Rajkumar., M.B.Rama Murthy, K.Srinivasa Rao, K.AnithaSheela

Abstract:

This paper presents a new method for estimating the nonstationary noise power spectral density given a noisy signal. The method is based on averaging the noisy speech power spectrum using time and frequency dependent smoothing factors. These factors are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. This method adapts very quickly to highly non-stationary noise environments. The proposed method achieves significant improvements over a system that uses voice activity detector (VAD) in noise estimation.

Keywords: Noise estimation, Non-stationary noise, Speechenhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2294
700 Automatic Distance Compensation for Robust Voice-based Human-Computer Interaction

Authors: Randy Gomez, Keisuke Nakamura, Kazuhiro Nakadai

Abstract:

Distant-talking voice-based HCI system suffers from performance degradation due to mismatch between the acoustic speech (runtime) and the acoustic model (training). Mismatch is caused by the change in the power of the speech signal as observed at the microphones. This change is greatly influenced by the change in distance, affecting speech dynamics inside the room before reaching the microphones. Moreover, as the speech signal is reflected, its acoustical characteristic is also altered by the room properties. In general, power mismatch due to distance is a complex problem. This paper presents a novel approach in dealing with distance-induced mismatch by intelligently sensing instantaneous voice power variation and compensating model parameters. First, the distant-talking speech signal is processed through microphone array processing, and the corresponding distance information is extracted. Distance-sensitive Gaussian Mixture Models (GMMs), pre-trained to capture both speech power and room property are used to predict the optimal distance of the speech source. Consequently, pre-computed statistic priors corresponding to the optimal distance is selected to correct the statistics of the generic model which was frozen during training. Thus, model combinatorics are post-conditioned to match the power of instantaneous speech acoustics at runtime. This results to an improved likelihood in predicting the correct speech command at farther distances. We experiment using real data recorded inside two rooms. Experimental evaluation shows voice recognition performance using our method is more robust to the change in distance compared to the conventional approach. In our experiment, under the most acoustically challenging environment (i.e., Room 2: 2.5 meters), our method achieved 24.2% improvement in recognition performance against the best-performing conventional method.

Keywords: Human Machine Interaction, Human Computer Interaction, Voice Recognition, Acoustic Model Compensation, Acoustic Speech Enhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1836
699 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: Vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1076
698 Accent Identification by Clustering and Scoring Formants

Authors: Dejan Stantic, Jun Jo

Abstract:

There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.

Keywords: Accent Identification, Formants, Q Factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2043
697 Delineating Students’ Speaking Anxieties and Assessment Gaps in Online Speech Performances

Authors: Mary Jane B. Suarez

Abstract:

Speech anxiety is innumerable in any traditional communication classes especially for ESL students. The speech anxiety intensifies when communication skills assessments have taken its toll in an online mode of learning due to the perils of the COVID-19 virus. Teachers and students have experienced vast ambiguity on how to realize a still effective way to teach and learn various speaking skills amidst the pandemic. This mixed method study determined the factors that affected the public speaking skills of students in online performances, delineated the assessment gaps in assessing speaking skills in an online setup, and recommended ways to address students’ speech anxieties. Using convergent parallel design, quantitative data were gathered by examining the desired learning competencies of the English course including a review of the teacher’s class record to analyze how students’ performances reflected a significantly high level of anxiety in online speech delivery. Focus group discussion was also conducted for qualitative data describing students’ public speaking anxiety and assessment gaps. Results showed a significantly high level of students’ speech anxiety affected by time constraints, use of technology, lack of audience response, being conscious of making mistakes, and the use of English as a second language. The study presented recommendations to redesign curricular assessments of English teachers and to have a robust diagnosis of students’ speaking anxiety to better cater to the needs of learners in attempt to bridge any gaps in cultivating public speaking skills of students as educational institutions segue from the pandemic to the post-pandemic milieu.

Keywords: Blended learning, communication skills assessment, online speech delivery, public speaking anxiety, speech anxiety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 103
696 A Collaborative Framework for Visual Modeling on Web 2.0

Authors: Song Meng, Dianfu Ma, Yongwang Zhao, Jianxin Li

Abstract:

Cooperative visual modeling is more and more necessary in our complicated world. A collaborative environment which supports interactive operation and communication is required to increase work efficiency. We present a collaborative visual modeling framework which collaborative platform could be built on. On this platform, cooperation and communication is available for designers from different regions. This framework, which is different from other collaborative frameworks, contains a uniform message format, a message handling mechanism and other functions such as message pretreatment and Role-Communication-Token Access Control (RCTAC). We also show our implementation of this framework called Orchestra Designer, which support BPLE workflow modeling cooperatively online.

Keywords: colllaborative framework; visual modeling; message handling mechanism

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1509
695 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition

Authors: Jong Han Joo, Jeong Hun Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi

Abstract:

In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. However, the effects of echo path changes should be considered for eliminating the undesired echoes. We describe a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.

Keywords: Acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2262
694 Event Related Potentials in Terms of Visual and Auditory Stimuli

Authors: Seokbeen Lim, KyeongSeok Sim, DaKyeong Shin, Gilwon Yoon

Abstract:

Event-related potential (ERP) is one of the useful tools for investigating cognitive reactions. In this study, the potential of ERP components detected after auditory and visual stimuli was examined. Subjects were asked to respond upon stimuli that were of three categories; Target, Non-Target and Standard stimuli. The ERP after stimulus was measured. In the experiment of visual evoked potentials (VEPs), the subjects were asked to gaze at a center point on the monitor screen where the stimuli were provided by the reversal pattern of the checkerboard. In consequence of the VEP experiments, we observed consistent reactions. Each peak voltage could be measured when the ensemble average was applied. Visual stimuli had smaller amplitude and a longer latency compared to that of auditory stimuli. The amplitude was the highest with Target and the smallest with Standard in both stimuli.

Keywords: Auditory stimulus, EEG, event related potential, oddball task, visual stimulus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1203
693 HSV Image Watermarking Scheme Based on Visual Cryptography

Authors: Rawan I. Zaghloul, Enas F. Al-Rawashdeh

Abstract:

In this paper a simple watermarking method for color images is proposed. The proposed method is based on watermark embedding for the histograms of the HSV planes using visual cryptography watermarking. The method has been proved to be robust for various image processing operations such as filtering, compression, additive noise, and various geometrical attacks such as rotation, scaling, cropping, flipping, and shearing.

Keywords: Histogram, HSV image, Visual Cryptography, Watermark.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1926
692 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu

Abstract:

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.

Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1632
691 A High Quality Speech Coder at 600 bps

Authors: Yong Zhang, Ruimin Hu

Abstract:

This paper presents a vocoder to obtain high quality synthetic speech at 600 bps. To reduce the bit rate, the algorithm is based on a sinusoidally excited linear prediction model which extracts few coding parameters, and three consecutive frames are grouped into a superframe and jointly vector quantization is used to obtain high coding efficiency. The inter-frame redundancy is exploited with distinct quantization schemes for different unvoiced/voiced frame combinations in the superframe. Experimental results show that the quality of the proposed coder is better than that of 2.4kbps LPC10e and achieves approximately the same as that of 2.4kbps MELP and with high robustness.

Keywords: Speech coding, Vector quantization, linear predicition, Mixed sinusoidal excitation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2147
690 An ICA Algorithm for Separation of Convolutive Mixture of Speech Signals

Authors: Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano

Abstract:

This paper describes Independent Component Analysis (ICA) based fixed-point algorithm for the blind separation of the convolutive mixture of speech, picked-up by a linear microphone array. The proposed algorithm extracts independent sources by non- Gaussianizing the Time-Frequency Series of Speech (TFSS) in a deflationary way. The degree of non-Gaussianization is measured by negentropy. The relative performances of algorithm under random initialization and Null beamformer (NBF) based initialization are studied. It has been found that an NBF based initial value gives speedy convergence as well as better separation performance

Keywords: Blind signal separation, independent component analysis, negentropy, convolutive mixture.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726
689 An Advanced Method for Speech Recognition

Authors: Meysam Mohamad pour, Fardad Farokhi

Abstract:

In this paper in consideration of each available techniques deficiencies for speech recognition, an advanced method is presented that-s able to classify speech signals with the high accuracy (98%) at the minimum time. In the presented method, first, the recorded signal is preprocessed that this section includes denoising with Mels Frequency Cepstral Analysis and feature extraction using discrete wavelet transform (DWT) coefficients; Then these features are fed to Multilayer Perceptron (MLP) network for classification. Finally, after training of neural network effective features are selected with UTA algorithm.

Keywords: Multilayer perceptron (MLP) neural network, Discrete Wavelet Transform (DWT) , Mels Scale Frequency Filter , UTA algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2305
688 Ultrasonic Echo Image Adaptive Watermarking Using the Just-Noticeable Difference Estimation

Authors: Amnach Khawne, Kazuhiko Hamamoto, Orachat Chitsobhuk

Abstract:

Most of the image watermarking methods, using the properties of the human visual system (HVS), have been proposed in literature. The component of the visual threshold is usually related to either the spatial contrast sensitivity function (CSF) or the visual masking. Especially on the contrast masking, most methods have not mention to the effect near to the edge region. Since the HVS is sensitive what happens on the edge area. This paper proposes ultrasound image watermarking using the visual threshold corresponding to the HVS in which the coefficients in a DCT-block have been classified based on the texture, edge, and plain area. This classification method enables not only useful for imperceptibility when the watermark is insert into an image but also achievable a robustness of watermark detection. A comparison of the proposed method with other methods has been carried out which shown that the proposed method robusts to blockwise memoryless manipulations, and also robust against noise addition.

Keywords: Medical image watermarking, Human Visual System, Image Adaptive Watermark

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1554
687 The Effect of Different Compression Schemes on Speech Signals

Authors: Jalal Karam, Raed Saad

Abstract:

This paper studies the effect of different compression constraints and schemes presented in a new and flexible paradigm to achieve high compression ratios and acceptable signal to noise ratios of Arabic speech signals. Compression parameters are computed for variable frame sizes of a level 5 to 7 Discrete Wavelet Transform (DWT) representation of the signals for different analyzing mother wavelet functions. Results are obtained and compared for Global threshold and level dependent threshold techniques. The results obtained also include comparisons with Signal to Noise Ratios, Peak Signal to Noise Ratios and Normalized Root Mean Square Error.

Keywords: Speech Compression, Wavelets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1688
686 Analysis of Event-related Response in Human Visual Cortex with fMRI

Authors: Ayesha Zaman, Tanvir Atahary, Shahida Rafiq

Abstract:

Functional Magnetic Resonance Imaging(fMRI) is a noninvasive imaging technique that measures the hemodynamic response related to neural activity in the human brain. Event-related functional magnetic resonance imaging (efMRI) is a form of functional Magnetic Resonance Imaging (fMRI) in which a series of fMRI images are time-locked to a stimulus presentation and averaged together over many trials. Again an event related potential (ERP) is a measured brain response that is directly the result of a thought or perception. Here the neuronal response of human visual cortex in normal healthy patients have been studied. The patients were asked to perform a visual three choice reaction task; from the relative response of each patient corresponding neuronal activity in visual cortex was imaged. The average number of neurons in the adult human primary visual cortex, in each hemisphere has been estimated at around 140 million. Statistical analysis of this experiment was done with SPM5(Statistical Parametric Mapping version 5) software. The result shows a robust design of imaging the neuronal activity of human visual cortex.

Keywords: Echo Planner Imaging, Event related Response, General Linear Model, Visual Neuronal Response.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1401
685 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: Transient noise pulses, noise reduction, dynamic time warping, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1860
684 Controlling 6R Robot by Visionary System

Authors: Azamossadat Nourbakhsh, Moharram Habibnezhad Korayem

Abstract:

In the visual servoing systems, the data obtained by Visionary is used for controlling robots. In this project, at first the simulator which was proposed for simulating the performance of a 6R robot before, was examined in terms of software and test, and in the proposed simulator, existing defects were obviated. In the first version of simulation, the robot was directed toward the target object only in a Position-based method using two cameras in the environment. In the new version of the software, three cameras were used simultaneously. The camera which is installed as eye-inhand on the end-effector of the robot is used for visual servoing in a Feature-based method. The target object is recognized according to its characteristics and the robot is directed toward the object in compliance with an algorithm similar to the function of human-s eyes. Then, the function and accuracy of the operation of the robot are examined through Position-based visual servoing method using two cameras installed as eye-to-hand in the environment. Finally, the obtained results are tested under ANSI-RIA R15.05-2 standard.

Keywords: 6R Robot , camera, visual servoing, Feature-based visual servoing, Position-based visual servoing, Performance tests.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1349
683 Pictorial Multimodal Analysis of Selected Paintings of Salvador Dali

Authors: Shaza Melies, Abeer Refky, Nihad Mansoor

Abstract:

Multimodality involves the communication between verbal and visual components in various discourses. A painting represents a form of communication between the artist and the viewer in terms of colors, shades, objects, and the title. This paper aims to present how multimodality can be used to decode the verbal and visual dimensions a painting holds. For that purpose, this study uses Kress and van Leeuwen’s theoretical framework of visual grammar for the analysis of the multimodal semiotic resources of selected paintings of Salvador Dali. This study investigates the visual decoding of the selected paintings of Salvador Dali and analyzing their social and political meanings using Kress and van Leeuwen’s framework of visual grammar. The paper attempts to answer the following questions: 1. How far can multimodality decode the verbal and non-verbal meanings of surrealistic art? 2. How can Kress and van Leeuwen’s theoretical framework of visual grammar be applied to analyze Dali’s paintings? 3. To what extent is Kress and van Leeuwen’s theoretical framework of visual grammar apt to deliver political and social messages of Dali? The paper reached the following findings: the framework’s descriptive tools (representational, interactive, and compositional meanings) can be used to analyze the paintings’ title and their visual elements. Social and political messages were delivered by appropriate usage of color, gesture, vectors, modality, and the way social actors were represented.

Keywords: Multimodality, multimodal analysis, paintings analysis, Salvador Dali, visual grammar.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 679