Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 287

Search results for: audio

287 Spatial Audio Player Using Musical Genre Classification

Authors: Jun-Yong Lee, Hyoung-Gook Kim

Abstract:

In this paper, we propose a smart music player that combines the musical genre classification and the spatial audio processing. The musical genre is classified based on content analysis of the musical segment detected from the audio stream. In parallel with the classification, the spatial audio quality is achieved by adding an artificial reverberation in a virtual acoustic space to the input mono sound. Thereafter, the spatial sound is boosted with the given frequency gains based on the musical genre when played back. Experiments measured the accuracy of detecting the musical segment from the audio stream and its musical genre classification. A listening test was performed based on the virtual acoustic space based spatial audio processing.

Keywords: automatic equalization, genre classification, music segment detection, spatial audio processing

Procedia PDF Downloads 341
286 Mathematical Model That Using Scrambling and Message Integrity Methods in Audio Steganography

Authors: Mohammed Salem Atoum

Abstract:

The success of audio steganography is to ensure imperceptibility of the embedded message in stego file and withstand any form of intentional or un-intentional degradation of message (robustness). Audio steganographic that utilized LSB of audio stream to embed message gain a lot of popularity over the years in meeting the perceptual transparency, robustness and capacity. This research proposes an XLSB technique in order to circumvent the weakness observed in LSB technique. Scrambling technique is introduce in two steps; partitioning the message into blocks followed by permutation each blocks in order to confuse the contents of the message. The message is embedded in the MP3 audio sample. After extracting the message, the permutation codebook is used to re-order it into its original form. Md5sum and SHA-256 are used to verify whether the message is altered or not during transmission. Experimental result shows that the XLSB performs better than LSB.

Keywords: XLSB, scrambling, audio steganography, security

Procedia PDF Downloads 301
285 Freedom of Expression and Its Restriction in Audiovisual Media

Authors: Sevil Yildiz

Abstract:

Audio visual communication is a type of collective expression. Collective expression activity informs the masses, gives direction to opinions and establishes public opinion. Due to these characteristics, audio visual communication must be subjected to special restrictions. This has been stipulated in both the Constitution and the European Human Rights Agreement. This paper aims to review freedom of expression and its restriction in audio visual media. For this purpose, the authorisation of the Radio and Television Supreme Council to impose sanctions as an independent administrative authority empowered to regulate the field of audio visual communication has been reviewed with regard to freedom of expression and its limits.

Keywords: audio visual media, freedom of expression, its limits, radio and television supreme council

Procedia PDF Downloads 241
284 The Influence of Audio on Perceived Quality of Segmentation

Authors: Silvio Ricardo Rodrigues Sanches, Bianca Cogo Barbosa, Beatriz Regina Brum, Cléber Gimenez Corrêa

Abstract:

To evaluate the quality of a segmentation algorithm the authors use subjective or objective metrics. Although subjective metrics are more accurate, objective metrics do not require user feedback to test an algorithm. Objective metrics require subjective experiments only during their development. Subjective experiments typically display to users some videos (generated from frames with segmentation errors) that simulate the environment of an application domain. This user feedback is the key information for metric definition. In the subjective experiments applied to develop some state-of-the-art metrics used to test segmentation algorithms, the videos displayed during the experiments did not contain audio. In applications such as videoconference and augmented reality, audio is an important component. If the audio influences the user’s perception, using only videos without audio in subjective experiments can compromise the efficiency of an objective metric generated using data from these experiments. The aim of this work is to identify if the audio influences the user’s perception of segmentation quality in background substitution applications with audio. For this, a subjective method based on formal methods of video quality assessment was applied. The results showed audio influences the quality of segmentation perceived by user.

Keywords: background substitution, influence of audio, segmentation evaluation, segmentation quality

Procedia PDF Downloads 21
283 Audio Information Retrieval in Mobile Environment with Fast Audio Classifier

Authors: Bruno T. Gomes, José A. Menezes, Giordano Cabral

Abstract:

With the popularity of smartphones, mobile apps emerge to meet the diverse needs, however the resources at the disposal are limited, either by the hardware, due to the low computing power, or the software, that does not have the same robustness of desktop environment. For example, in automatic audio classification (AC) tasks, musical information retrieval (MIR) subarea, is required a fast processing and a good success rate. However the mobile platform has limited computing power and the best AC tools are only available for desktop. To solve these problems the fast classifier suits, to mobile environments, the most widespread MIR technologies, seeking a balance in terms of speed and robustness. At the end we found that it is possible to enjoy the best of MIR for mobile environments. This paper presents the results obtained and the difficulties encountered.

Keywords: audio classification, audio extraction, environment mobile, musical information retrieval

Procedia PDF Downloads 458
282 Genetic Algorithms for Feature Generation in the Context of Audio Classification

Authors: José A. Menezes, Giordano Cabral, Bruno T. Gomes

Abstract:

Choosing good features is an essential part of machine learning. Recent techniques aim to automate this process. For instance, feature learning intends to learn the transformation of raw data into a useful representation to machine learning tasks. In automatic audio classification tasks, this is interesting since the audio, usually complex information, needs to be transformed into a computationally convenient input to process. Another technique tries to generate features by searching a feature space. Genetic algorithms, for instance, have being used to generate audio features by combining or modifying them. We find this approach particularly interesting and, despite the undeniable advances of feature learning approaches, we wanted to take a step forward in the use of genetic algorithms to find audio features, combining them with more conventional methods, like PCA, and inserting search control mechanisms, such as constraints over a confusion matrix. This work presents the results obtained on particular audio classification problems.

Keywords: feature generation, feature learning, genetic algorithm, music information retrieval

Procedia PDF Downloads 354
281 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 408
280 Audio-Visual Aids and the Secondary School Teaching

Authors: Shrikrishna Mishra, Badri Yadav

Abstract:

In this complex society of today where experiences are innumerable and varied, it is not at all possible to present every situation in its original colors hence the opportunities for learning by actual experiences always are not at all possible. It is only through the use of proper audio visual aids that the life situation can be trough in the class room by an enlightened teacher in their simplest form and representing the original to the highest point of similarity which is totally absent in the verbal or lecture method. In the presence of audio aids, the attention is attracted interest roused and suitable atmosphere for proper understanding is automatically created, but in the existing traditional method greater efforts are to be made in order to achieve the aforesaid essential requisite. Inspire of the best and sincere efforts on the side of the teacher the net effect as regards understanding or learning in general is quite negligible.

Keywords: Audio-Visual Aids, the secondary school teaching, complex society, audio

Procedia PDF Downloads 407
279 Effective Parameter Selection for Audio-Based Music Mood Classification for Christian Kokborok Song: A Regression-Based Approach

Authors: Sanchali Das, Swapan Debbarma

Abstract:

Music mood classification is developing in both the areas of music information retrieval (MIR) and natural language processing (NLP). Some of the Indian languages like Hindi English etc. have considerable exposure in MIR. But research in mood classification in regional language is very less. In this paper, powerful audio based feature for Kokborok Christian song is identified and mood classification task has been performed. Kokborok is an Indo-Burman language especially spoken in the northeastern part of India and also some other countries like Bangladesh, Myanmar etc. For performing audio-based classification task, useful audio features are taken out by jMIR software. There are some standard audio parameters are there for the audio-based task but as known to all that every language has its unique characteristics. So here, the most significant features which are the best fit for the database of Kokborok song is analysed. The regression-based model is used to find out the independent parameters that act as a predictor and predicts the dependencies of parameters and shows how it will impact on overall classification result. For classification WEKA 3.5 is used, and selected parameters create a classification model. And another model is developed by using all the standard audio features that are used by most of the researcher. In this experiment, the essential parameters that are responsible for effective audio based mood classification and parameters that do not significantly change for each of the Christian Kokborok songs are analysed, and a comparison is also shown between the two above model.

Keywords: Christian Kokborok song, mood classification, music information retrieval, regression

Procedia PDF Downloads 114
278 A Study on the Improvement of Mobile Device Call Buzz Noise Caused by Audio Frequency Ground Bounce

Authors: Jangje Park, So Young Kim

Abstract:

The market demand for audio quality in mobile devices continues to increase, and audible buzz noise generated in time division communication is a chronic problem that goes against the market demand. In the case of time-division type communication, the RF Power Amplifier (RF PA) is driven at the audio frequency cycle, and it has various influences on the audio signal. In this paper, we measured the ground bounce noise generated by the peak current flowing through the ground network in the RF PA with the audio frequency; it was confirmed that the noise is the cause of the audible buzz noise during a call. In addition, a grounding method of the microphone device that can improve the buzzing noise was proposed. Considering that the level of the audio signal generated by the microphone device is -38dBV based on 94dB Sound Pressure Level(SPL), even ground bounce noise of several hundred uV will fall within the range of audible noise if it is induced by the audio amplifier. Through the grounding method of the microphone device proposed in this paper, it was confirmed that the audible buzz noise power density at the RF PA driving frequency was improved by more than 5dB under the conditions of the Printed Circuit Board (PCB) used in the experiment. A fundamental improvement method was presented regarding the buzzing noise during a mobile phone call.

Keywords: audio frequency, buzz noise, ground bounce, microphone grounding

Procedia PDF Downloads 12
277 Audio-Visual Entrainment and Acupressure Therapy for Insomnia

Authors: Mariya Yeldhos, G. Hema, Sowmya Narayanan, L. Dhiviyalakshmi

Abstract:

Insomnia is one of the most prevalent psychological disorders worldwide. Some of the deficiencies of the current treatments of insomnia are: side effects in the case of sleeping pills and high costs in the case of psychotherapeutic treatment. In this paper, we propose a device which provides a combination of audio visual entrainment and acupressure based compression therapy for insomnia. This device provides drug-free treatment of insomnia through a user friendly and portable device that enables relaxation of brain and muscles, with certain advantages such as low cost, and wide accessibility to a large number of people. Tools adapted towards the treatment of insomnia: -Audio -Continuous exposure to binaural beats of a particular frequency of audible range -Visual -Flash of LED light -Acupressure points -GB-20 -GV-16 -B-10

Keywords: insomnia, acupressure, entrainment, audio-visual entrainment

Procedia PDF Downloads 345
276 Perception of Value Affecting Engagement Through Online Audio Communication

Authors: Apipol Penkitti

Abstract:

The new normal or a new way of life stemmed from the COVID-19 outbreak, gave rise to a new form of social media: audio-based social platforms (ABSPs), known as Clubhouse, Twitter space, and Facebook live audio room. These platforms, on which audio-based communication is featured, became popular in a short span of time. The objective of the research study is to understand ABSPs users’ behaviors in Thailand. The study, in which functional attitude theory, uses and gratifications theory, and social influence theory are referred to, is conducted through consumer perceived utilitarian, hedonic, and social value that affect engagement. This research study is mixed method paradigm, utilizing Model of Triangulation as its framework. The data acquisition is proceeded through questionnaires from a sample of 384 male, female and LGBTQA+ individuals aged 25 - 34 who, from various occupations, have used audio-based social platform applications. This research study employs the structural equation modeling to analyze the relationships between variables, and it uses the semi - structured interviewing to comprehend the rationality of the variables in the study. The study found that hedonic value directly affects engagement.

Keywords: audio based social platform, engagement, hedonic, perceived value, social, utilitarian

Procedia PDF Downloads 29
275 A Non-Parametric Based Mapping Algorithm for Use in Audio Fingerprinting

Authors: Analise Borg, Paul Micallef

Abstract:

Over the past few years, the online multimedia collection has grown at a fast pace. Several companies showed interest to study the different ways to organize the amount of audio information without the need of human intervention to generate metadata. In the past few years, many applications have emerged on the market which are capable of identifying a piece of music in a short time. Different audio effects and degradation make it much harder to identify the unknown piece. In this paper, an audio fingerprinting system which makes use of a non-parametric based algorithm is presented. Parametric analysis is also performed using Gaussian Mixture Models (GMMs). The feature extraction methods employed are the Mel Spectrum Coefficients and the MPEG-7 basic descriptors. Bin numbers replaced the extracted feature coefficients during the non-parametric modelling. The results show that non-parametric analysis offer potential results as the ones mentioned in the literature.

Keywords: audio fingerprinting, mapping algorithm, Gaussian Mixture Models, MFCC, MPEG-7

Procedia PDF Downloads 352
274 Digital Recording System Identification Based on Audio File

Authors: Michel Kulhandjian, Dimitris A. Pados

Abstract:

The objective of this work is to develop a theoretical framework for reliable digital recording system identification from digital audio files alone, for forensic purposes. A digital recording system consists of a microphone and a digital sound processing card. We view the cascade as a system of unknown transfer function. We expect same manufacturer and model microphone-sound card combinations to have very similar/near identical transfer functions, bar any unique manufacturing defect. Input voice (or other) signals are modeled as non-stationary processes. The technical problem under consideration becomes blind deconvolution with non-stationary inputs as it manifests itself in the specific application of digital audio recording equipment classification.

Keywords: blind system identification, audio fingerprinting, blind deconvolution, blind dereverberation

Procedia PDF Downloads 202
273 Audio-Visual Recognition Based on Effective Model and Distillation

Authors: Heng Yang, Tao Luo, Yakun Zhang, Kai Wang, Wei Qin, Liang Xie, Ye Yan, Erwei Yin

Abstract:

Recent years have seen that audio-visual recognition has shown great potential in a strong noise environment. The existing method of audio-visual recognition has explored methods with ResNet and feature fusion. However, on the one hand, ResNet always occupies a large amount of memory resources, restricting the application in engineering. On the other hand, the feature merging also brings some interferences in a high noise environment. In order to solve the problems, we proposed an effective framework with bidirectional distillation. At first, in consideration of the good performance in extracting of features, we chose the light model, Efficientnet as our extractor of spatial features. Secondly, self-distillation was applied to learn more information from raw data. Finally, we proposed a bidirectional distillation in decision-level fusion. In more detail, our experimental results are based on a multi-model dataset from 24 volunteers. Eventually, the lipreading accuracy of our framework was increased by 2.3% compared with existing systems, and our framework made progress in audio-visual fusion in a high noise environment compared with the system of audio recognition without visual.

Keywords: lipreading, audio-visual, Efficientnet, distillation

Procedia PDF Downloads 27
272 Satisfaction of Distance Education University Students with the Use of Audio Media as a Medium of Instruction: The Case of Mountains of the Moon University in Uganda

Authors: Mark Kaahwa, Chang Zhu, Moses Muhumuza

Abstract:

This study investigates the satisfaction of distance education university students (DEUS) with the use of audio media as a medium of instruction. Studying students’ satisfaction is vital because it shows whether learners are comfortable with a certain instructional strategy or not. Although previous studies have investigated the use of audio media, the satisfaction of students with an instructional strategy that combines radio teaching and podcasts as an independent teaching strategy has not been fully investigated. In this study, all lectures were delivered through the radio and students had no direct contact with their instructors. No modules or any other material in form of text were given to the students. They instead, revised the taught content by listening to podcasts saved on their mobile electronic gadgets. Prior to data collection, DEUS received orientation through workshops on how to use audio media in distance education. To achieve objectives of the study, a survey, naturalistic observations and face-to-face interviews were used to collect data from a sample of 211 undergraduate and graduate students. Findings indicate that there was no statistically significant difference in the levels of satisfaction between male and female students. The results from post hoc analysis show that there is a statistically significant difference in the levels of satisfaction regarding the use of audio media between diploma and graduate students. Diploma students are more satisfied compared to their graduate counterparts. T-test results reveal that there was no statistically significant difference in the general satisfaction with audio media between rural and urban-based students. And ANOVA results indicate that there is no statistically significant difference in the levels of satisfaction with the use of audio media across age groups. Furthermore, results from observations and interviews reveal that DEUS found learning using audio media a pleasurable medium of instruction. This is an indication that audio media can be considered as an instructional strategy on its own merit.

Keywords: audio media, distance education, distance education university students, medium of instruction, satisfaction

Procedia PDF Downloads 58
271 Robust and Transparent Spread Spectrum Audio Watermarking

Authors: Ali Akbar Attari, Ali Asghar Beheshti Shirazi

Abstract:

In this paper, we propose a blind and robust audio watermarking scheme based on spread spectrum in Discrete Wavelet Transform (DWT) domain. Watermarks are embedded in the low-frequency coefficients, which is less audible. The key idea is dividing the audio signal into small frames, and magnitude of the 6th level of DWT approximation coefficients is modifying based upon the Direct Sequence Spread Spectrum (DSSS) technique. Also, the psychoacoustic model for enhancing in imperceptibility, as well as Savitsky-Golay filter for increasing accuracy in extraction, is used. The experimental results illustrate high robustness against most common attacks, i.e. Gaussian noise addition, Low pass filter, Resampling, Requantizing, MP3 compression, without significant perceptual distortion (ODG is higher than -1). The proposed scheme has about 83 bps data payload.

Keywords: audio watermarking, spread spectrum, discrete wavelet transform, psychoacoustic, Savitsky-Golay filter

Procedia PDF Downloads 132
270 Multi-Level Pulse Width Modulation to Boost the Power Efficiency of Switching Amplifiers for Analog Signals with Very High Crest Factor

Authors: Jan Doutreloigne

Abstract:

The main goal of this paper is to develop a switching amplifier with optimized power efficiency for analog signals with a very high crest factor such as audio or DSL signals. Theoretical calculations show that a switching amplifier architecture based on multi-level pulse width modulation outperforms all other types of linear or switching amplifiers in that respect. Simulations on a 2 W multi-level switching audio amplifier, designed in a 50 V 0.35 mm IC technology, confirm its superior performance in terms of power efficiency. A real silicon implementation of this audio amplifier design is currently underway to provide experimental validation.

Keywords: audio amplifier, multi-level switching amplifier, power efficiency, pulse width modulation, PWM, self-oscillating amplifier

Procedia PDF Downloads 261
269 Implementation and Performance Analysis of Data Encryption Standard and RSA Algorithm with Image Steganography and Audio Steganography

Authors: S. C. Sharma, Ankit Gambhir, Rajeev Arya

Abstract:

In today’s era data security is an important concern and most demanding issues because it is essential for people using online banking, e-shopping, reservations etc. The two major techniques that are used for secure communication are Cryptography and Steganography. Cryptographic algorithms scramble the data so that intruder will not able to retrieve it; however steganography covers that data in some cover file so that presence of communication is hidden. This paper presents the implementation of Ron Rivest, Adi Shamir, and Leonard Adleman (RSA) Algorithm with Image and Audio Steganography and Data Encryption Standard (DES) Algorithm with Image and Audio Steganography. The coding for both the algorithms have been done using MATLAB and its observed that these techniques performed better than individual techniques. The risk of unauthorized access is alleviated up to a certain extent by using these techniques. These techniques could be used in Banks, RAW agencies etc, where highly confidential data is transferred. Finally, the comparisons of such two techniques are also given in tabular forms.

Keywords: audio steganography, data security, DES, image steganography, intruder, RSA, steganography

Procedia PDF Downloads 211
268 Agricultural Education by Media in Yogyakarta, Indonesia

Authors: Retno Dwi Wahyuningrum, Sunarru Samsi Hariadi

Abstract:

Education in agriculture is very significant; in a way that it can support farmers to improve their business. This can be done through certain media, such as printed, audio, and audio-visual media. To find out the effects of the media toward the knowledge, attitude, and motivation of farmers in order to adopt innovation, the study was conducted on 342 farmers, randomly selected from 12 farmer-groups, in the districts of Sleman and Bantul, Special Region of Yogyakarta Province. The study started from October 2014 to November 2015 by interviewing the respondents using a questionnaire which included 20 questions on knowledge, 20 questions on attitude, and 20 questions on adopting motivation. The data for the attitude and the adopting motivation were processed into Likert scale, then it was tested for validity and reliability. Differences in the levels of knowledge, attitude, and motivation were tested based on percentage of average score intervals of them and categorized into five interpretation levels. The results show that printed, audio, and audio-visual media give different impacts to the farmers. First, all media make farmers very aware to agricultural innovation, but the highest percentage is on theatrical play. Second, the most effective media to raise the attitude is interactive dialogue on Radio. Finally, printed media, especially comic, is the most effective way to improve the adopting motivation of farmers.

Keywords: agricultural education, printed media, audio media, audio-visual media, farmer knowledge, farmer attitude, farmer adopting motivation

Procedia PDF Downloads 139
267 On Musical Information Geometry with Applications to Sonified Image Analysis

Authors: Shannon Steinmetz, Ellen Gethner

Abstract:

In this paper, a theoretical foundation is developed for patterned segmentation of audio using the geometry of music and statistical manifold. We demonstrate image content clustering using conic space sonification. The algorithm takes a geodesic curve as a model estimator of the three-parameter Gamma distribution. The random variable is parameterized by musical centricity and centric velocity. Model parameters predict audio segmentation in the form of duration and frame count based on the likelihood of musical geometry transition. We provide an example using a database of randomly selected images, resulting in statistically significant clusters of similar image content.

Keywords: sonification, musical information geometry, image, content extraction, automated quantification, audio segmentation, pattern recognition

Procedia PDF Downloads 117
266 Carrier Communication through Power Lines

Authors: Pavuluri Gopikrishna, B. Neelima

Abstract:

Power line carrier communication means audio power transmission via power line and reception of the amplified audio power at the receiver as in the form of speaker output signal using power line as the channel medium. The main objective of this suggested work is to transmit our message signal after frequency modulation by the help of FM modulator IC LM565 which gives output proportional to the input voltage of the input message signal. And this audio power is received from the power line by the help of isolation circuit and demodulated from IC LM565 which uses the concept of the PLL and produces FM demodulated signal to the listener. Message signal will be transmitted over the carrier signal that will be generated from the FM modulator IC LM565. Using this message signal will not damage because of no direct contact of message signal from the power line, but noise can disturb our information.

Keywords: amplification, fm demodulator ic 565, fm modulator ic 565, phase locked loop, power isolation

Procedia PDF Downloads 454
265 The Implication of News Segments and Movies for Enhancing Listening Comprehension of Language Learners

Authors: Taher Bahrani

Abstract:

Armed with technological development, the present study aimed at gauging the effectiveness of exposure to news and movies as two types of audio-visual programs on improving language learners’ listening comprehension at the intermediate level. To this end, a listening comprehension test was administered to 108 language learners and finally 60 language learners were selected as intermediate language learners and randomly divided into group one and group two. During the experiment, group one participants had exposure to audio-visual news stories to work on in-and out-side the classroom. On the contrary, the participants in group two had only exposure to a sample selected utterances extracted from different kinds of movies. At the end of the experiment, both groups took another sample listening test to find out to what extent the participants in each group could enhance their listening comprehension. The results obtained from the post-test were indicative of the fact that the participants who had exposure to news outperformed the participants who had exposure to movies. The findings of the present research seem to indicate that the language input embedded in the type of audio-visual programs which language learners are exposed to is more important than the amount of exposure.

Keywords: audio-visual news, movies, listening comprehension, intermediate level

Procedia PDF Downloads 310
264 Atomic Decomposition Audio Data Compression and Denoising Using Sparse Dictionary Feature Learning

Authors: T. Bryan , V. Kepuska, I. Kostnaic

Abstract:

A method of data compression and denoising is introduced that is based on atomic decomposition of audio data using “basis vectors” that are learned from the audio data itself. The basis vectors are shown to have higher data compression and better signal-to-noise enhancement than the Gabor and gammatone “seed atoms” that were used to generate them. The basis vectors are the input weights of a Sparse AutoEncoder (SAE) that is trained using “envelope samples” of windowed segments of the audio data. The envelope samples are extracted from the audio data by performing atomic decomposition with Gabor or gammatone seed atoms. This process identifies segments of audio data that are locally coherent with the seed atoms. Envelope samples are extracted by identifying locally coherent audio data segments with Gabor or gammatone seed atoms, found by matching pursuit. The envelope samples are formed by taking the kronecker products of the atomic envelopes with the locally coherent data segments. Oracle signal-to-noise ratio (SNR) verses data compression curves are generated for the seed atoms as well as the basis vectors learned from Gabor and gammatone seed atoms. SNR data compression curves are generated for speech signals as well as early American music recordings. The basis vectors are shown to have higher denoising capability for data compression rates ranging from 90% to 99.84% for speech as well as music. Envelope samples are displayed as images by folding the time series into column vectors. This display method is used to compare of the output of the SAE with the envelope samples that produced them. The basis vectors are also displayed as images. Sparsity is shown to play an important role in producing the highest denoising basis vectors.

Keywords: sparse dictionary learning, autoencoder, sparse autoencoder, basis vectors, atomic decomposition, envelope sampling, envelope samples, Gabor, gammatone, matching pursuit

Procedia PDF Downloads 185
263 Mapping the Sonic Spectrum of Traditional Music and Instruments Used in Malaysian Kavadi Rituals

Authors: Ainolnaim Azizol, Valerie Ross

Abstract:

Music is as old as mankind and rituals using music such as Kavadi have been associated with social, cultural, and spiritual practices in many traditional and modern societies. Recent literature has provided scientific evidence that music affects psychological and physical changes through stimulation of brainwave. Despite such advances, the scientific study of the sonic qualities peculiar to traditional instruments and how it impacts on ritualistic activities is still lacking. This study addresses one such phenomenon. Devotees in Kavadi rituals are known to be in a state of trance state and do not experience pain nor suffer injury despite the hundreds of needles pierced through their skins. Although scientists have sought to understand how this is possible, lesser is known about the music that is used to prepare devotees to enter into the trance state. This study fills this gap of knowledge by providing scientific evidence through the identification and mapping of the sonic spectrum or sound fingerprint of the instruments and the repertoire used in these ritualistic forms in their ethnographic environment and in audio-controlled situations. The objectives are to identify and categorize the different types of traditional music used in Kavadi rituals; to record, transcribe and digitally score the musical repertoire used in the oral tradition of Kavadi rituals; to map the sonic spectrum of ritual music using spectromography and advanced music analytical software a mixed methodology will be used. This comprises ethnographic field studies using interviews, participant observation, audio-video recordings and audio-methodology using spectromography and advanced audio-technology for sonic mapping and the transcription of audio recordings into digital scores.

Keywords: sonic, traditional, ritual, Kavadi, music

Procedia PDF Downloads 172
262 Illumina MiSeq Sequencing for Bacteria Identification on Audio-Visual Materials

Authors: Tereza Branyšová, Martina Kračmarová, Kateřina Demnerová, Michal Ďurovič, Hana Stiborová

Abstract:

Microbial deterioration threatens all objects of cultural heritage, including audio-visual materials. Fungi are commonly known to be the main factor in audio-visual material deterioration. However, although being neglected, bacteria also play a significant role. In addition to microbial contamination of materials, it is also essential to analyse air as a possible contamination source. This work aims to identify bacterial species in the archives of the Czech Republic that occur on audio-visual materials as well as in the air in the archives. For sampling purposes, the smears from the materials were taken by sterile polyurethane sponges, and the air was collected using a MAS-100 aeroscope. Metagenomic DNA from all collected samples was immediately isolated and stored at -20 °C. DNA library for the 16S rRNA gene was prepared using two-step PCR and specific primers and the concentration step was included due to meagre yields of the DNA. After that, the samples were sent to the University of Fairbanks, Alaska, for Illumina MiSeq sequencing. Subsequently, the analysis of the sequences was conducted in R software. The obtained sequences were assigned to the corresponding bacterial species using the DADA2 package. The impact of air contamination and the impact of different photosensitive layers that audio-visual materials were made of, such as gelatine, albumen, and collodion, were evaluated. As a next step, we will take a deeper focus on air contamination. We will select an appropriate culture-dependent approach along with a culture-independent approach to observe a metabolically active species in the air. Acknowledgment: This project is supported by grant no. DG18P02OVV062 of the Ministry of Culture of the Czech Republic.

Keywords: cultural heritage, Illumina MiSeq, metagenomics, microbial identification

Procedia PDF Downloads 65
261 The Audio-Visual and Syntactic Priming Effect on Specific Language Impairment and Gender in Modern Standard Arabic

Authors: Mohammad Al-Dawoody

Abstract:

This study aims at exploring if priming is affected by gender in Modern Standard Arabic and if it is restricted solely to subjects with no specific language impairment (SLI). The sample in this study consists of 74 subjects, between the ages of 11;1 and 11;10, distributed into (a) 2 SLI experimental groups of 38 subjects divided into two gender groups of 18 females and 20 males and (b) 2 non-SLI control groups of 36 subjects divided into two gender groups of 17 females and 19 males. Employing a mixed research design, the researcher conducted this study within the framework of the relevance theory (RT) whose main assumption is that human beings are endowed with a biological ability to magnify the relevance of the incoming stimuli. Each of the four groups was given two different priming stimuli: audio-visual priming (T1) and syntactic priming (T2). The results showed that the priming effect was sheer distinct among SLI participants especially when retrieving typical responses (TR) in T1 and T2 with slight superiority of males over females. The results also revealed that non-SLI females showed stronger original response (OR) priming in T1 than males and that non-SLI males in T2 excelled in OR priming than females. Furthermore, the results suggested that the audio-visual priming has a stronger effect on SLI females than non-SLI females and that syntactic priming seems to have the same effect on the two groups (non-SLI and SLI females). The conclusion is that the priming effect varies according to gender and is not confined merely to non-SLI subjects.

Keywords: specific language impairment, relevance theory, audio-visual priming, syntactic priming, modern standard Arabic

Procedia PDF Downloads 94
260 Multimodal Convolutional Neural Network for Musical Instrument Recognition

Authors: Yagya Raj Pandeya, Joonwhoan Lee

Abstract:

The dynamic behavior of music and video makes it difficult to evaluate musical instrument playing in a video by computer system. Any television or film video clip with music information are rich sources for analyzing musical instruments using modern machine learning technologies. In this research, we integrate the audio and video information sources using convolutional neural network (CNN) and pass network learned features through recurrent neural network (RNN) to preserve the dynamic behaviors of audio and video. We use different pre-trained CNN for music and video feature extraction and then fine tune each model. The music network use 2D convolutional network and video network use 3D convolution (C3D). Finally, we concatenate each music and video feature by preserving the time varying features. The long short term memory (LSTM) network is used for long-term dynamic feature characterization and then use late fusion with generalized mean. The proposed network performs better performance to recognize the musical instrument using audio-video multimodal neural network.

Keywords: multimodal, 3D convolution, music-video feature extraction, generalized mean

Procedia PDF Downloads 129
259 A Comparison of Proxemics and Postural Head Movements during Pop Music versus Matched Music Videos

Authors: Harry J. Witchel, James Ackah, Carlos P. Santos, Nachiappan Chockalingam, Carina E. I. Westling

Abstract:

Introduction: Proxemics is the study of how people perceive and use space. It is commonly proposed that when people like or engage with a person/object, they will move slightly closer to it, often quite subtly and subconsciously. Music videos are known to add entertainment value to a pop song. Our hypothesis was that by adding appropriately matched video to a pop song, it would lead to a net approach of the head to the monitor screen compared to simply listening to an audio-only version of the song. Methods: We presented to 27 participants (ages 21.00 ± 2.89, 15 female) seated in front of 47.5 x 27 cm monitor two musical stimuli in a counterbalanced order; all stimuli were based on music videos by the band OK Go: Here It Goes Again (HIGA, boredom ratings (0-100) = 15.00 ± 4.76, mean ± SEM, standard-error-of-the-mean) and Do What You Want (DWYW, boredom ratings = 23.93 ± 5.98), which did not differ in boredom elicited (P = 0.21, rank-sum test). Each participant experienced each song only once, and one song (counterbalanced) as audio-only versus the other song as a music video. The movement was measured by video-tracking using Kinovea 0.8, based on recording from a lateral aspect; before beginning, each participant had a reflective motion tracking marker placed on the outer canthus of the left eye. Analysis of the Kinovea X-Y coordinate output in comma-separated-variables format was performed in Matlab, as were non-parametric statistical tests. Results: We found that the audio-only stimuli (combined for both HIGA and DWYW, mean ± SEM, 35.71 ± 5.36) were significantly more boring than the music video versions (19.46 ± 3.83, P = 0.0066 Wilcoxon Signed Rank Test (WSRT), Cohen's d = 0.658, N = 28). We also found that participants' heads moved around twice as much during the audio-only versions (speed = 0.590 ± 0.095 mm/sec) compared to the video versions (0.301 ± 0.063 mm/sec, P = 0.00077, WSRT). However, the participants' mean head-to-screen distances were not detectably smaller (i.e. head closer to the screen) during the music videos (74.4 ± 1.8 cm) compared to the audio-only stimuli (73.9 ± 1.8 cm, P = 0.37, WSRT). If anything, during the audio-only condition, they were slightly closer. Interestingly, the ranges of the head-to-screen distances were smaller during the music video (8.6 ± 1.4 cm) compared to the audio-only (12.9 ± 1.7 cm, P = 0.0057, WSRT), the standard deviations were also smaller (P = 0.0027, WSRT), and their heads were held 7 mm higher (video 116.1 ± 0.8 vs. audio-only 116.8 ± 0.8 cm above floor, P = 0.049, WSRT). Discussion: As predicted, sitting and listening to experimenter-selected pop music was more boring than when the music was accompanied by a matched, professionally-made video. However, we did not find that the proxemics of the situation led to approaching the screen. Instead, adding video led to efforts to control the head to a more central and upright viewing position and to suppress head fidgeting.

Keywords: boredom, engagement, music videos, posture, proxemics

Procedia PDF Downloads 62
258 Method Comprising One to One Web Based Real Time Communications

Authors: Lata Kiran Dey, Rajendra Kumar, Biren Karmakar

Abstract:

Web Real Time Communications is a collection of standards, protocols, which provides real-time communications capabilities between web browsers and devices. This paper outlines the design and further implementation of web real-time communications on secure web applications having audio and video call capabilities. This proposed application may put up a system that will be able to work over both desktops as well as the mobile browser. Though, WebRTC also gives a set of JavaScript standard RTC APIs, which primarily works over the real-time communication framework. This helps to build a suitable communication application, which enables the audio, video, and message transfer in between the today’s modern browsers having WebRTC support.

Keywords: WebRTC, SIP, RTC, JavaScript, SRTP, secure web sockets, browser

Procedia PDF Downloads 8