Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 43

Search results for: sounds

13 Comparison of MFCC and Cepstral Coefficients as a Feature Set for PCG Biometric Systems

Authors: Justin Leo Cheang Loong, Khazaimatol S Subari, Muhammad Kamil Abdullah, Nurul Nadia Ahmad, RosliBesar

Abstract:

Heart sound is an acoustic signal and many techniques used nowadays for human recognition tasks borrow speech recognition techniques. One popular choice for feature extraction of accoustic signals is the Mel Frequency Cepstral Coefficients (MFCC) which maps the signal onto a non-linear Mel-Scale that mimics the human hearing. However the Mel-Scale is almost linear in the frequency region of heart sounds and thus should produce similar results with the standard cepstral coefficients (CC). In this paper, MFCC is investigated to see if it produces superior results for PCG based human identification system compared to CC. Results show that the MFCC system is still superior to CC despite linear filter-banks in the lower frequency range, giving up to 95% correct recognition rate for MFCC and 90% for CC. Further experiments show that the high recognition rate is due to the implementation of filter-banks and not from Mel-Scaling.

Keywords: Biometric, Phonocardiogram, Cepstral Coefficients, Mel Frequency

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3505

12 Vocal Communication in Sooty-headed Bulbul; Pycnonotus aurigaster

Authors: Surakan Payakkhabut

Abstract:

Studies of vocal communication in Sooty-headed Bulbul were carried out from January to December 2011. Vocal recordings and behavioral observations were made in their natural habitats at some localities of Lampang, Thailand. After editing, cuts of high quality recordings were analyzed with the help of Avisoft- SASLab Pro (version 4.40) software. More than one thousand element repertoires in five groups were found within two vocal structures. The two structures were short sounds with single element and phrases composed of elements, the frequency ranged from 1-10 kHz. Most phrases were composed of 2 to 5 elements that were often dissimilar in structure, however, these phrases were not as complex as song phrases. The elements and phrases were combined to form many patterns. The species used ten types of calls; i.e. alert, alarm, aggressive, begging, contact, courtship, distress, exciting, flying and invitation. Alert and contact calls were used more frequently than other calls. Aggressive, alarm and distress calls could be used for interspecific communication among some other bird species in the same habitats.

Keywords: Vocal communication, Call, Bird, Sooty-headed Bulbul

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2576

11 Voice Disorders Identification Using Hybrid Approach: Wavelet Analysis and Multilayer Neural Networks

Authors: L. Salhi, M. Talbi, A. Cherif

Abstract:

This paper presents a new strategy of identification and classification of pathological voices using the hybrid method based on wavelet transform and neural networks. After speech acquisition from a patient, the speech signal is analysed in order to extract the acoustic parameters such as the pitch, the formants, Jitter, and shimmer. Obtained results will be compared to those normal and standard values thanks to a programmable database. Sounds are collected from normal people and patients, and then classified into two different categories. Speech data base is consists of several pathological and normal voices collected from the national hospital “Rabta-Tunis". Speech processing algorithm is conducted in a supervised mode for discrimination of normal and pathology voices and then for classification between neural and vocal pathologies (Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation results will be presented in function of the disease and will be compared with the clinical diagnosis in order to have an objective evaluation of the developed tool.

Keywords: Formants, Neural Networks, Pathological Voices, Pitch, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2796

10 Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech

Authors: Panikos Heracleous

Abstract:

In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented. NAM microphones are special acoustic sensors, which are attached behind the talker-s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech conversion etc.) for sound-impaired people. Using a small amount of training data and adaptation approaches, 93.9% word accuracy was achieved for a 20k Japanese vocabulary dictation task. Non-audible murmur recognition in noisy environments is also investigated. In this study, further analysis of the NAM speech has been made using distance measures between hidden Markov model (HMM) pairs. It has been shown the reduced spectral space of NAM speech using a metric distance, however the location of the different phonemes of NAM are similar to the location of the phonemes of normal speech, and the NAM sounds are well discriminated. Promising results in using nonlinear features are also introduced, especially under noisy conditions.

Keywords: Speech recognition, unvoiced speech, nonlinear features, HMM distance measures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609

9 An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing

Authors: Aleksandra Zysk, Pawel Badura

Abstract:

Recognizing and controlling vocal registers during singing is a difficult task for beginner vocalist. It requires among others identifying which part of natural resonators is being used when a sound propagates through the body. Thus, an application has been designed allowing for sound recording, automatic vocal register recognition (VRR), and a graphical user interface providing real-time visualization of the signal and recognition results. Six spectral features are determined for each time frame and passed to the support vector machine classifier yielding a binary decision on the head or chest register assignment of the segment. The classification training and testing data have been recorded by ten professional female singers (soprano, aged 19-29) performing sounds for both chest and head register. The classification accuracy exceeded 93% in each of various validation schemes. Apart from a hard two-class clustering, the support vector classifier returns also information on the distance between particular feature vector and the discrimination hyperplane in a feature space. Such an information reflects the level of certainty of the vocal register classification in a fuzzy way. Thus, the designed recognition and training application is able to assess and visualize the continuous trend in singing in a user-friendly graphical mode providing an easy way to control the vocal emission.

Keywords: Classification, singing, spectral analysis, vocal emission, vocal register.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269

8 A Prevalence of Phonological Disorder in Children with Specific Language Impairment

Authors: Etim, Victoria Enefiok, Dada, Oluseyi Akintunde, Bassey Okon

Abstract:

Phonological disorder is a serious and disturbing issue to many parents and teachers. Efforts towards resolving the problem have been undermined by other specific disabilities which were hidden to many regular and special education teachers. It is against this background that this study was motivated to provide data on the prevalence of phonological disorders in children with specific language impairment (CWSLI) as the first step towards critical intervention. The study was a survey of 15 CWSLI from St. Louise Inclusive schools, Ikot Ekpene in Akwa Ibom State of Nigeria. Phonological Processes Diagnostic Scale (PPDS) with 17 short sentences, which cut across the five phonological processes that were examined, were validated by experts in test measurement, phonology and special education. The respondents were made to read the sentences with emphasis on the targeted sounds. Their utterances were recorded and analyzed in the language laboratory using Praat Software. Data were also collected through friendly interactions at different times from the clients. The theory of generative phonology was adopted for the descriptive analysis of the phonological processes. Data collected were analyzed using simple percentage and composite bar chart for better understanding of the result. The study found out that CWSLI exhibited the five phonological processes under investigation. It was revealed that 66.7%, 80%, 73.3%, 80%, and 86.7% of the respondents have severe deficit in fricative stopping, velar fronting, liquid gliding, final consonant deletion and cluster reduction, respectively. It was therefore recommended that a nationwide survey should be carried out to have national statistics of CWSLI with phonological deficits and develop intervention strategies for effective therapy to remediate the disorder.

Keywords: Language disorders, phonology, phonological processes, specific language impairment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1027

7 Sperm Whale Signal Analysis: Comparison using the Auto Regressive model and the Daubechies 15 Wavelets Transform

Authors: Olivier Adam, Maciej Lopatka, Christophe Laplanche, Jean-François Motsch

Abstract:

This article presents the results using a parametric approach and a Wavelet Transform in analysing signals emitting from the sperm whale. The extraction of intrinsic characteristics of these unique signals emitted by marine mammals is still at present a difficult exercise for various reasons: firstly, it concerns non-stationary signals, and secondly, these signals are obstructed by interfering background noise. In this article, we compare the advantages and disadvantages of both methods: Auto Regressive models and Wavelet Transform. These approaches serve as an alternative to the commonly used estimators which are based on the Fourier Transform for which the hypotheses necessary for its application are in certain cases, not sufficiently proven. These modern approaches provide effective results particularly for the periodic tracking of the signal's characteristics and notably when the signal-to-noise ratio negatively effects signal tracking. Our objectives are twofold. Our first goal is to identify the animal through its acoustic signature. This includes recognition of the marine mammal species and ultimately of the individual animal (within the species). The second is much more ambitious and directly involves the intervention of cetologists to study the sounds emitted by marine mammals in an effort to characterize their behaviour. We are working on an approach based on the recordings of marine mammal signals and the findings from this data result from the Wavelet Transform. This article will explore the reasons for using this approach. In addition, thanks to the use of new processors, these algorithms once heavy in calculation time can be integrated in a real-time system.

Keywords: Autoregressive model, Daubechies Wavelet, Fourier Transform, marine mammals, signal processing, spectrogram, sperm whale, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1961

6 Design, Manufacture and Test of a Solar Powered Audible Bird Scarer

Authors: Turhan Koyuncu, Fuat Lule

Abstract:

The most common domestic birds live in Turkey are: crows (Corvus corone), pigeons (Columba livia), sparrows (Passer domesticus), starlings (Sturnus vulgaris) and blackbirds (Turdus merula). These birds give damage to the agricultural areas and make dirty the human life areas. In order to send away these birds, some different materials and methods such as chemicals, treatments, colored lights, flash and audible scarers are used. It is possible to see many studies about chemical methods in the literatures. However there is not enough works regarding audible bird scarers are reported in the literature. Therefore, a solar powered bird scarer was designed, manufactured and tested in this experimental investigation. Firstly, to understand the sensitive level of these domestic birds against to the audible scarer, many series preliminary studies were conducted. These studies showed that crows are the most resistant against to the audible bird scarer when compared with pigeons, sparrows, starlings and blackbirds. Therefore the solar powered audible bird scarer was tested on crows. The scarer was tested about one month during April- May, 2007. 18 different common known predators- sounds (voices or calls) of domestic birds from Falcon (Falco eleonorae), Falcon (Buteo lagopus), Eagle (Aquila chrysaetos), Montagu-s harrier (Circus pygargus) and Owl (Glaucidium passerinum) were selected for test of the scarer. It was seen from the results that the reaction of the birds was changed depending on the predators- sound type, camouflage of the scarer, sound quality and volume, loudspeaker play and pause periods in one application. In addition, it was also seen that the sound from Falcon (Buteo lagopus) was most effective on crows and the scarer was enough efficient.

Keywords: Bird damage, Audible scarer, Solar powered scarer, Predator sound

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3616

5 Carnatic Music Ragas and Their Role in Music Therapy

Authors: Raghavi Janaswamy, Saraswathi K. Vasudev

Abstract:

Raga, as the soul and base, is a distinctive musical entity, in the music system, with unique structure on its construction of srutis (musical sounds) and application. One of the essential components of the music system is the ‘tala’ that defines the rhythm of a song. There are seven basic swaras (notes) Sa, Ri, Ga, Ma, Pa, Da and Ni in the carnatic music system that are analogous to the C, D, E, F, G, A and B of the western system. The carnatic music further builds on conscious use of microtones, gamakams (oscillation) and rendering styles. It has basic 72 ragas known as melakarta ragas, and a plethora of ragas have been developed from them with permutations and combinations of the basic swaras. Among them, some ragas derived from a same melakarta raga are distinctly different from each other and could evoke a profound difference in the raga bhava (emotion) during rendering. Although these could bear similar arohana and avarohana swaras, their quintessential differences in the gamakas usage and srutis present therein offer varied melodic feelings; variations in the intonation and stress given to certain swara phrases are the root causes. This article enlightens a group of such allied ragas (AR) from the perspectives of their schema and raga alapana (improvisation), ranjaka prayogas (signature phrases), differences in rendering tempo, gamakas and delicate srutis along with the range of sancharas (musical phrases). The intricate differences on the sruti frequencies and use of AR in composing kritis (musical compositions) toward emotive accomplishments such as mood of valor, kindness, love, humor, anger, mercy to name few, have also been explored. A brief review on the existing scientific research on the music therapy on some of the Carnatic ragas is presented. Studying and comprehending the AR, indeed, enable the music aspirants to gain a thorough knowledge on the subtle nuances among the ragas. Such knowledge helps leave a long-lasting melodic impression on the listeners and enable further research on the music therapy.

Keywords: Carnatic music, Allied rags, Raga analysis, Music therapy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1416

4 Development System for Emotion Detection Based on Brain Signals and Facial Images

Authors: Suprijanto, Linda Sari, Vebi Nadhira , IGN. Merthayasa. Farida I.M

Abstract:

Detection of human emotions has many potential applications. One of application is to quantify attentiveness audience in order evaluate acoustic quality in concern hall. The subjective audio preference that based on from audience is used. To obtain fairness evaluation of acoustic quality, the research proposed system for multimodal emotion detection; one modality based on brain signals that measured using electroencephalogram (EEG) and the second modality is sequences of facial images. In the experiment, an audio signal was customized which consist of normal and disorder sounds. Furthermore, an audio signal was played in order to stimulate positive/negative emotion feedback of volunteers. EEG signal from temporal lobes, i.e. T3 and T4 was used to measured brain response and sequence of facial image was used to monitoring facial expression during volunteer hearing audio signal. On EEG signal, feature was extracted from change information in brain wave, particularly in alpha and beta wave. Feature of facial expression was extracted based on analysis of motion images. We implement an advance optical flow method to detect the most active facial muscle form normal to other emotion expression that represented in vector flow maps. The reduce problem on detection of emotion state, vector flow maps are transformed into compass mapping that represents major directions and velocities of facial movement. The results showed that the power of beta wave is increasing when disorder sound stimulation was given, however for each volunteer was giving different emotion feedback. Based on features derived from facial face images, an optical flow compass mapping was promising to use as additional information to make decision about emotion feedback.

Keywords: Multimodal Emotion Detection, EEG, Facial Image, Optical Flow, compass mapping, Brain Wave

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2253

3 Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Keywords: Time-frequency plane, atomic decomposition, envelope sampling, Gabor atoms, matching pursuit, sparse dictionary learning, sparse autoencoder.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1522

2 The Evolution of Traditional Rhythms in Redefining the West African Country of Guinea

Authors: Janice Haworth, Karamoko Camara, Marie-Therèse Dramou, Kokoly Haba, Daniel Léno, Augustin Mara, Adama Noël Oulari, Silafa Tolno, Noël Zoumanigui

Abstract:

The traditional rhythms of the West African country of Guinea have played a centuries-long role in defining the different people groups that make up the country. Throughout their history, before and since colonization by the French, the different ethnicities have used their traditional music as a distinct part of their historical identities. That is starting to change. Guinea is an impoverished nation created in the early twentieth-century with little regard for the history and cultures of the people who were included. The traditional rhythms of the different people groups and their heritages have remained. Fifteen individual traditional Guinean rhythms were chosen to represent popular rhythms from the four geographical regions of Guinea. Each rhythm was traced back to its native village and video recorded on-site by as many different local performing groups as could be located. The cyclical patterns rhythms were transcribed via a circular, spatial design and then copied into a box notation system where sounds happening at the same time could be studied. These rhythms were analyzed for their consistency-overperformance in a Fundamental Rhythm Pattern analysis so rhythms could be compared for how they are changing through different performances. The analysis showed that the traditional rhythm performances of the Middle and Forest Guinea regions were the most cohesive and showed the least evidence of change between performances. The role of music in each of these regions is both limited and focused. The Coastal and High Guinea regions have much in common historically through their ethnic history and modern-day trade connections, but the rhythm performances seem to be less consistent and demonstrate more changes in how they are performed today. In each of these regions the role and usage of music is much freer and wide-spread. In spite of advances being made as a country, different ethnic groups still frequently only respond and participate (dance and sing) to the music of their native ethnicity. There is some evidence that this self-imposed musical barrier is beginning to change and evolve, partially through the development of better roads, more access to electricity and technology, the nationwide Ebola health crisis, and a growing self-identification as a unified nation.

Keywords: Cultural identity, Guinea, traditional rhythms, West Africa.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453

1 Sound Selection for Gesture Sonification and Manipulation of Virtual Objects

Authors: Benjamin Bressolette, S´ebastien Denjean, Vincent Roussarie, Mitsuko Aramaki, Sølvi Ystad, Richard Kronland-Martinet

Abstract:

New sensors and technologies – such as microphones, touchscreens or infrared sensors – are currently making their appearance in the automotive sector, introducing new kinds of Human-Machine Interfaces (HMIs). The interactions with such tools might be cognitively expensive, thus unsuitable for driving tasks. It could for instance be dangerous to use touchscreens with a visual feedback while driving, as it distracts the driver’s visual attention away from the road. Furthermore, new technologies in car cockpits modify the interactions of the users with the central system. In particular, touchscreens are preferred to arrays of buttons for space improvement and design purposes. However, the buttons’ tactile feedback is no more available to the driver, which makes such interfaces more difficult to manipulate while driving. Gestures combined with an auditory feedback might therefore constitute an interesting alternative to interact with the HMI. Indeed, gestures can be performed without vision, which means that the driver’s visual attention can be totally dedicated to the driving task. In fact, the auditory feedback can both inform the driver with respect to the task performed on the interface and on the performed gesture, which might constitute a possible solution to the lack of tactile information. As audition is a relatively unused sense in automotive contexts, gesture sonification can contribute to reducing the cognitive load thanks to the proposed multisensory exploitation. Our approach consists in using a virtual object (VO) to sonify the consequences of the gesture rather than the gesture itself. This approach is motivated by an ecological point of view: Gestures do not make sound, but their consequences do. In this experiment, the aim was to identify efficient sound strategies, to transmit dynamic information of VOs to users through sound. The swipe gesture was chosen for this purpose, as it is commonly used in current and new interfaces. We chose two VO parameters to sonify, the hand-VO distance and the VO velocity. Two kinds of sound parameters can be chosen to sonify the VO behavior: Spectral or temporal parameters. Pitch and brightness were tested as spectral parameters, and amplitude modulation as a temporal parameter. Performances showed a positive effect of sound compared to a no-sound situation, revealing the usefulness of sounds to accomplish the task.

Keywords: Auditory feedback, gesture, sonification, sound perception, virtual object.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 924