Search results for: audio watermarking
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 426

Search results for: audio watermarking

336 Security System for Safe Transmission of Medical Image

Authors: Mohammed Jamal Al-Mansor, Kok Beng Gan

Abstract:

This paper develops an optimized embedding of payload in medical image by using genetic optimization. The goal is to preserve region of interest from being distorted because of the watermark. By using this developed system there is no need of manual defining of region of interest through experts as the system will apply the genetic optimization to select the parts of image that can carry the watermark with guaranteeing less distortion. The experimental results assure that genetic based optimization is useful for performing steganography with less mean square error percentage.

Keywords: AES, DWT, genetic algorithm, watermarking

Procedia PDF Downloads 382
335 1D Convolutional Networks to Compute Mel-Spectrogram, Chromagram, and Cochleogram for Audio Networks

Authors: Elias Nemer, Greg Vines

Abstract:

Time-frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Cochleogram have been proven more effective and convenient than training on-time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different features. In this paper, we provide a PyTorch framework for creating various spectral features as well as time-frequency transformation and time-domain filter-banks using the built-in trainable conv1d() layer. This allows computing these features on the fly as part of a larger network and enabling easier experimentation with various combinations and parameters. Our work extends the work in the literature developed for that end: First, by adding more of these features and also by allowing the possibility of either starting from initialized kernels or training them from random values. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes or simply use as is and add more layers for various applications.

Keywords: neural networks Mel-Spectrogram, chromagram, cochleogram, discrete Fourrier transform, PyTorch conv1d()

Procedia PDF Downloads 196
334 New Methods to Acquire Grammatical Skills in A Foreign Language

Authors: Indu ray

Abstract:

In today’s digital world the internet is already flooded with information on how to master grammar in a foreign language. It is well known that one cannot master a language without grammar. Grammar is the backbone of any language. Without grammar there would be no structure to help you speak/write or listen/read. Successful communication is only possible if the form and function of linguistic utterances are firmly related to one another. Grammar has its own rules of use to formulate an easier-to-understand language. Like a tool, grammar formulates our thoughts and knowledge in a meaningful way. Every language has its own grammar. With grammar, we can quickly analyze whether there is any action in this text: (Present, past, future). Knowledge of grammar is an important prerequisite for mastering a foreign language. What’s most important is how teachers can make grammar lessons more interesting for students and thus promote grammar skills more successfully. Through this paper, we discuss a few important methods like (Interactive Grammar Exercises between students, Interactive Grammar Exercise between student to teacher, Grammar translation method, Audio -Visual Method, Deductive Method, Inductive Method). This paper is divided into two sections. In the first part, brief definitions and principles of these approaches will be provided. Then the possibility and the case of combination of this approach will be analyzed. In the last section of the paper, I would like to present a survey result conducted at my university on a few methods to quickly learn grammar in Foreign Language. We divided the Grammatical Skills in six Parts. 1.Grammatical Competence 2. Speaking Skills 3. Phonology 4. The syntax and the Semantics 5. Rule 6. Cognitive Function and conducted a survey among students. From our survey results, we can observe that phonology, speaking ability, syntax and semantics can be improved by inductive method, Audio-visual Method, and grammatical translation method, for grammar rules and cognitive functions we should choose IGE (teacher-student) method. and the IGE method (pupil-pupil). The study’s findings revealed, that the teacher delivery Methods should be blend or fusion based on the content of the Grammar.

Keywords: innovative method, grammatical skills, audio-visual, translation

Procedia PDF Downloads 40
333 A Peer-Produced Community of Learning: The Case of Second-Year Algerian Masters Students at a Distance

Authors: Nihad Alem

Abstract:

Nowadays, distance learning (DL) is widely perceived as a reformed type of education that takes advantage of technology to give more appealing opportunities especially for learners whose life conditions impede their attendance to regular classrooms however, creating interactional environment for students to expand their learning community and alleviate the feeling of loneliness and isolation should receive more attention when designing a distance learning course. This research aims to explore whether the audio/video peer learning can offer pedagogical add-ons to the Algerian distance learners and what are the pros and cons of its application as an educational experience in a synchronous environment mediated by Skype. Data were collected using video recordings of six sessions, reflective logs, and in-depth semi-structured interviews and will be analyzed by qualitatively identifying and measuring the three constitutional elements of the educational experience of peer learning namely the social presence, the cognitive presence, and the facilitation presence using a modified community of inquiry coding template. The findings from this study will provide recommendations for effective peer learning educational experience using the facilitation presence concept.

Keywords: audio/visual peer learning, community of inquiry, distance learning, facilitation presence

Procedia PDF Downloads 110
332 Heuristic Classification of Hydrophone Recordings

Authors: Daniel M. Wolff, Patricia Gray, Rafael de la Parra Venegas

Abstract:

An unsupervised machine listening system is constructed and applied to a dataset of 17,195 30-second marine hydrophone recordings. The system is then heuristically supplemented with anecdotal listening, contextual recording information, and supervised learning techniques to reduce the number of false positives. Features for classification are assembled by extracting the following data from each of the audio files: the spectral centroid, root-mean-squared values for each frequency band of a 10-octave filter bank, and mel-frequency cepstral coefficients in 5-second frames. In this way both time- and frequency-domain information are contained in the features to be passed to a clustering algorithm. Classification is performed using the k-means algorithm and then a k-nearest neighbors search. Different values of k are experimented with, in addition to different combinations of the available feature sets. Hypothesized class labels are 'primarily anthrophony' and 'primarily biophony', where the best class result conforming to the former label has 104 members after heuristic pruning. This demonstrates how a large audio dataset has been made more tractable with machine learning techniques, forming the foundation of a framework designed to acoustically monitor and gauge biological and anthropogenic activity in a marine environment.

Keywords: anthrophony, hydrophone, k-means, machine learning

Procedia PDF Downloads 134
331 Boundary Alert System for Powered Wheelchair in Confined Area Training

Authors: Tsoi Kim Ming, Yu King Pong

Abstract:

Background: With powered wheelchair, patients can travel more easily and conveniently. However, some patients suffer from other difficulties, such as visual impairment, cognitive disorder, or psychological issues, which make them unable to control powered wheelchair safely. Purpose: Therefore, those patients are required to complete a comprehensive driving training by therapists on confined area, which simulates narrow paths in daily live. During the training, therapists will give series of driving instruction to patients, which may be unaware of patients crossing out the boundary of area. To facilitate the training, it is needed to develop a device to provide warning to patients during training Method: We adopt LIDAR for distance sensing started from center of confined area. Then, we program the LIDAR with linear geometry to remember each side of the area. The LIDAR will sense the location of wheelchair continuously. Once the wheelchair is driven out of the boundary, audio alert will be given to patient. Result: Patients can pay their attention to the particular driving situation followed by audio alert during driving training, which can learn how to avoid out of boundary in similar situation next time. Conclusion: Instead of only instructed by therapist, the LIDAR can facilitate the powered wheelchair training by patients actively pay their attention to driving situation. After training, they are able to control the powered wheelchair safely when facing difficult and narrow path in real life.

Keywords: PWC, training, rehab, AT

Procedia PDF Downloads 71
330 The Role of Student Culture in Beginning Music Teachers’ Instruction in Urban School Settings

Authors: Kiana Williams

Abstract:

The purpose of this case study was to examine beginning music teachers’ perspectives of cultural relevance in relation to music instruction in urban school settings within a large Southwestern city. Research questions focused on the role of student culture in beginning music teachers’ instruction. Data were collected based on Seidman’s (2013) three interview series, consisting of audio recordings from two semi-structured individual interviews for each participant, a 15-20-minute video recording from each participant teaching in their classroom, and an audio recording of one focus group interview. Participants included three beginning music teachers currently employed in urban schools in a major metropolitan city in the Southern United States. In this study, a teacher was considered a beginning teacher if they had zero to three years of experience teaching music in urban school settings. The results revealed three broad themes related to connectivity and relatability, concerts, and differentiated instruction. Implications for current music educators as well as music teacher educators in higher education are included in this study. Further research should consider examining the effect of culturally relevant pedagogy on student retention in urban school music programs.

Keywords: culture, instruction, music, pedagogy, teacher, urban

Procedia PDF Downloads 109
329 A Qualitative Study on Metacognitive Patterns among High and Low Performance Problem Based on Learning Groups

Authors: Zuhairah Abdul Hadi, Mohd Nazir bin Md. Zabit, Zuriadah Ismail

Abstract:

Metacognitive has been empirically evidenced to be one important element influencing learning outcomes. Expert learners engage in metacognition by monitoring and controlling their thinking, and listing, considering and selecting the best strategies to achieve desired goals. Studies also found that good critical thinkers engage in more metacognition and people tend to activate more metacognition when solving complex problems. This study extends past studies by performing a qualitative analysis to understand metacognitive patterns among two high and two low performing groups by carefully examining video and audio records taken during Problem-based learning activities. High performing groups are groups with majority members scored well in Watson Glaser II Critical Thinking Appraisal (WGCTA II) and academic achievement tests. Low performing groups are groups with majority members fail to perform in the two tests. Audio records are transcribed and analyzed using schemas adopted from past studies. Metacognitive statements are analyzed using three stages model and patterns of metacognitive are described by contexts, components, and levels for each high and low performing groups.

Keywords: academic achievement, critical thinking, metacognitive, problem-based learning

Procedia PDF Downloads 254
328 English Pronunciation Materials on TikTok

Authors: Sebastian Leal-Arenas

Abstract:

TikTok’s influence on contemporary society is undeniable. The impact of the mobile app transcends entertainment, as shown by the growing presence of specialized accounts dedicated to providing educational content, particularly as it pertains to language learning. However, the prevailing trend on the platform is vocabulary and grammar acquisition, neglecting a critical component: pronunciation. This study examines English pronunciation materials available on TikTok by taking a comprehensive approach that incorporates established assessment tools, such as the Learning Object Review Instrument and the Framework for Language Learning App Evaluation. Furthermore, novel evaluation categories are introduced to provide a more holistic assessment of these educational resources. 60 English pronunciation videos were part of the analysis. The findings reveal that these audio-visual materials present clear audio bolstered by high-quality video content and automatically generated closed captions. These three components enhance the comprehensibility of the input, making these concise videos valuable assets for language learners. Nevertheless, certain deficiencies are observed, such as the lack of emphasis on specific segments and their relationship with articulators. Improvements and refinements are discussed, as well as their potential utility within the language classroom. This study contributes to the ongoing investigation of multimedia materials used for language teaching and emphasizes the need to adapt pronunciation instruction methods to today’s technology.

Keywords: pronunciation, segments, teaching materials, technology

Procedia PDF Downloads 46
327 Tensor Deep Stacking Neural Networks and Bilinear Mapping Based Speech Emotion Classification Using Facial Electromyography

Authors: P. S. Jagadeesh Kumar, Yang Yung, Wenli Hu

Abstract:

Speech emotion classification is a dominant research field in finding a sturdy and profligate classifier appropriate for different real-life applications. This effort accentuates on classifying different emotions from speech signal quarried from the features related to pitch, formants, energy contours, jitter, shimmer, spectral, perceptual and temporal features. Tensor deep stacking neural networks were supported to examine the factors that influence the classification success rate. Facial electromyography signals were composed of several forms of focuses in a controlled atmosphere by means of audio-visual stimuli. Proficient facial electromyography signals were pre-processed using moving average filter, and a set of arithmetical features were excavated. Extracted features were mapped into consistent emotions using bilinear mapping. With facial electromyography signals, a database comprising diverse emotions will be exposed with a suitable fine-tuning of features and training data. A success rate of 92% can be attained deprived of increasing the system connivance and the computation time for sorting diverse emotional states.

Keywords: speech emotion classification, tensor deep stacking neural networks, facial electromyography, bilinear mapping, audio-visual stimuli

Procedia PDF Downloads 220
326 Digital Curriculum Preservation Planning, Actions, and Challenges

Authors: Misook Ahn

Abstract:

This study examined the Digital Curriculum Repository (DCR) project initiated at Defense Language Institute Foreign Language Center (DLIFLC). The purpose of the DCR is to build a centralized curriculum infrastructure, preserve all curriculum materials, and provide academic service to users (faculty, students, or other agencies). The DCR collection includes core language curriculum materials developed by each language school—foreign language textbooks, language survival kits, and audio files currently in or not in use at the schools. All core curriculum materials with audio and video files have been coded, collected, and preserved at the DCR. The DCR website was designed with MS SharePoint for easy accessibility by the DLIFLC’s faculty and students. All metadata for the collected curriculum materials have been input by language, code, year, book type, level, user, version, and current status (in use/not in use). The study documents digital curriculum preservation planning, actions, and challenges, including collecting, coding, collaborating, designing DCR SharePoint, and policymaking. DCR Survey data is also collected and analyzed for this research. Based on the finding, the study concludes that the mandatory policy for the DCR system and collaboration with school leadership are critical elements of a successful repository system. The sample collected items, metadata, and DCR SharePoint site are presented in the evaluation section.

Keywords: MS share point, digital preservation, repository, policy

Procedia PDF Downloads 127
325 Working Memory and Audio-Motor Synchronization in Children with Different Degrees of Central Nervous System's Lesions

Authors: Anastasia V. Kovaleva, Alena A. Ryabova, Vladimir N. Kasatkin

Abstract:

Background: The most simple form of entrainment to a sensory (typically auditory) rhythmic stimulus involves perceiving and synchronizing movements with an isochronous beat with one level of periodicity, such as that produced by a metronome. Children with pediatric cancer usually treated with chemo- and radiotherapy. Because of such treatment, psychologists and health professionals declare cognitive and motor abilities decline in cancer patients. The purpose of our study was to measure working memory characteristics with association with audio-motor synchronization tasks, also involved some memory resources, in children with different degrees of central nervous system lesions: posterior fossa tumors, acute lymphoblastic leukemia, and healthy controls. Methods: Our sample consisted of three groups of children: children treated for posterior fossa tumors (PFT-group, n=42, mean age 12.23), children treated for acute lymphoblastic leukemia (ALL-group, n=11, mean age 11.57) and neurologically healthy children (control group, n=36, mean age 11.67). Participants were tested for working memory characteristics with Cambridge Neuropsychological Test Automated Battery (CANTAB). Pattern recognition memory (PRM) and spatial working memory (SWM) tests were applied. Outcome measures of PRM test include the number and percentage of correct trials and latency (speed of participant’s response), and measures of SWM include errors, strategy, and latency. In the synchronization tests, the instruction was to tap out a regular beat (40, 60, 90 and 120 beats per minute) in synchrony with the rhythmic sequences that were played. This meant that for the sequences with an isochronous beat, participants were required to tap into every auditory event. Variations of inter-tap-intervals and deviations of children’s taps from the metronome were assessed. Results: Analysis of variance revealed the significant effect of group (ALL, PFT and control) on such parameters as short-term PRM, SWM strategy and errors. Healthy controls demonstrated more correctly retained elements, better working memory strategy, compared to cancer patients. Interestingly that ALL patients chose the bad strategy, but committed significantly less errors in SWM test then PFT and controls did. As to rhythmic ability, significant associations of working memory were found out only with 40 bpm rhythm: the less variable were inter-tap-intervals of the child, the more elements in memory he/she could retain. The ability to audio-motor synchronization may be related to working memory processes mediated by the prefrontal cortex whereby each sensory event is actively retrieved and monitored during rhythmic sequencing. Conclusion: Our results suggest that working memory, tested with appropriate cognitive methods, is associated with the ability to synchronize movements with rhythmic sounds, especially in sub-second intervals (40 per minute).

Keywords: acute lymphoblastic leukemia (ALL), audio-motor synchronization, posterior fossa tumor, working memory

Procedia PDF Downloads 277
324 Investigation of Verbal Feedback and Learning Process for Oral Presentation

Authors: Nattawadee Sinpattanawong

Abstract:

Oral presentation has been used mostly in business communication. The business presentation is carrying out through an audio and visual presentation material such as statistical documents, projectors, etc. Common examples of business presentation are intra-organization and sales presentations. The study aims at investigating functions, strategies and contents of assessors’ verbal feedback on presenters’ oral presentations and exploring presenters’ learning process and specific views and expectations concerning assessors’ verbal feedback related to the delivery of the oral presentation. This study is designed as a descriptive qualitative research; four master students and one teacher in English for Business and Industry Presentation Techniques class of public university will be selected. The researcher hopes that any understanding how assessors’ verbal feedback on oral presentations and learning process may illuminate issues for other people. The data from this research may help to expand and facilitate the readers’ understanding of assessors’ verbal feedback on oral presentations and learning process in their own situations. The research instruments include an audio recorder, video recorder and an interview. The students will be interviewing in order to ask for their views and expectations concerning assessors’ verbal feedback related to the delivery of the oral presentation. After finishing data collection, the data will be analyzed and transcribed. The findings of this study are significant because it can provide presenters knowledge to enhance their learning process and provide teachers knowledge about providing verbal feedback on student’s oral presentations on a business context.

Keywords: business context, learning process, oral presentation, verbal feedback

Procedia PDF Downloads 161
323 Ear Protectors and Their Action in Protecting Hearing System of Workers against Occupational Noise

Authors: F. Forouharmajd, S. Pourabdian, N. Ziayi Ghahnavieh

Abstract:

For many years, the ear protectors have been used to preventing the audio and non-audio effects of received noise from occupation environments. Despite performing hearing protection programs, there are many people which still suffer from noise-induced hearing loss. This study was conducted with the aim of determination of human hearing system response to received noise and the effectiveness of ear protectors on preventing of noise-induced hearing loss. Sound pressure microphones were placed in a simulated ear canal. The severity of noise measured inside and outside of ear canal. The noise reduction values due to installing ear protectors were calculated in the octave band frequencies and LabVIEW programmer. The results of noise measurement inside and outside of ear canal showed a different in received sound levels by ear canal. The effectiveness of ear protectors has been considerably reduced for the low frequency limits. A change in resonance frequency also was observed after using ear protectors. The study indicated the ear canal structure may affect the received noise and it may lead a difference between the received sound from the measured sound by a sound level meter, and hearing system. It means the human hearing system may probably respond different from a sound level meter. Hearing protectors’ efficiency declines by increasing the noise levels, and thus, they are not suitable to protect workers against industrial noise particularly low frequency noise. Hearing protectors may be solely a reason to damaging of hearing system in a special frequency via changing of human hearing system acoustical structure. We need developing the subjective method of hearing protectors testing, because their evaluation is not designed based on industrial noise or in the field.

Keywords: ear protector, hearing system, occupational noise, workers

Procedia PDF Downloads 143
322 Improving Medication Understanding, Use and Self-Efficacy among Stroke Patients: A Randomised Controlled Trial; Study Protocol

Authors: Jamunarani Appalasamy, Tha Kyi Kyi, Quek Kia Fatt, Joyce Pauline Joseph, Anuar Zaini M. Zain

Abstract:

Background: The Health Belief Theory had always been associated with chronic disease management. Various health behaviour concepts and perception branching from this Health Belief Theory had involved with medication understanding, use, and self-efficacy which directly link to medication adherence. In a previous quantitative and qualitative study, stroke patients in Malaysia were found to be strongly believing information obtained by various sources such as the internet and social communication. This action leads to lower perception of their stroke preventative medication benefit which in long-term creates non-adherence. Hence, this study intends to pilot an intervention which uses audio-visual concept incorporated with mHealth service to enhance learning and self-reflection among stroke patients to manage their disease. Methods/Design: Twenty patients will be allocated to a proposed intervention whereas another twenty patients are allocated to the usual treatment. The intervention involves a series of developed audio-visual videos sent via mobile phone which later await for responses and feedback from the receiver (patient) via SMS or recorded calls. The primary outcome would be the medication understanding, use and self-efficacy measured over two months pre and post intervention. Secondary outcome is measured from changes of blood parameters and other self-reported questionnaires. Discussion: This study shall also assess uptake/attrition, feasibility, and acceptability of this intervention. Trial Registration: NMRR-15-851-24737 (IIR)

Keywords: health belief, medication understanding, medication use, self-efficacy

Procedia PDF Downloads 186
321 Exploring Teacher Verbal Feedback on Postgraduate Students' Performances in Presentations in English

Authors: Nattawadee Sinpattanawong, Yaowaret Tharawoot

Abstract:

This is an analytic and descriptive classroom-centered research, the purpose of which is to explore teacher verbal feedback on postgraduate students’ performances in presentations in English in an English for Specific Purposes (ESP) postgraduate classroom. The participants are a Thai female teacher, two Thai female postgraduate students, and two foreign male postgraduate students. The current study draws on both classroom observation and interview data. The class focused on the students’ presentations and the teacher’s providing verbal feedback on them was observed nine times with audio recording and taking notes. For the interviews, the teacher was interviewed about linkages between her verbal feedback and each student’s presentation skills in English. For the data analysis, the audio files from the observations were transcribed and analyzed both quantitatively and qualitatively. The quantitative approach addressed the frequencies and percentages of content of the teacher’s verbal feedback for each student’s performances based on eight presentation factors (content, structure, grammar, coherence, vocabulary, speaking skills, involving the audience, and self-presentation). Based on the quantitative data including the interview data, a qualitative analysis of the transcripts was made to describe the occurrences of several content of verbal feedback for each student’s presentation performances. The study’s findings may help teachers to reflect on their providing verbal feedback based on various students’ performances in presentation in English. They also help students who have similar characteristics to the students in the present study when giving a presentation in English improve their presentation performances by applying the teacher’s verbal feedback content.

Keywords: teacher verbal feedback, presentation factors, presentation in English, presentation performances

Procedia PDF Downloads 127
320 Governance of Social Media Using the Principles of Community Radio

Authors: Ken Zakreski

Abstract:

Regulating Canadian Facebook Groups, of a size and type, when they reach a threshold of audio video content. Consider the evolution of the Streaming Act, Parl GC Bill C-11 (44-1) and the regulations that will certainly follow. The Canadian Heritage Minister's office stipulates, "the Broadcasting Act only applies to audio and audiovisual content, not written journalism.” Governance— After 10 years, a community radio station for Gabriola Island, BC – Canadian Radio-television and Telecommunications Commission (“CRTC”) was approved but never started – became a Facebook Group “Community Bulletin Board - Life on Gabriola“ referred to as CBBlog. After CBBlog started and began to gather real traction, a member of the Group cloned the membership and ran their competing Facebook group under the banner of "free speech”. Here we see an inflection point [change of cultural stewardship] with two different mathematical results [engagement and membership growth]. Canada's telecommunication history of “portability” and “interoperability” made that Facebook Group CBBlog the better option, over broadcast FM radio for a community pandemic information sharing service for Gabriola Island, BC. A culture of ignorance flourishes in social media. Often people do not understand their own experience, or the experience of others because they do not have the concepts needed for understanding. It is thus important they are not denied concepts required for their full understanding. For example, Legislators need to know something about gay culture before they can make any decisions about it. Community Media policies and CRTC regulations are known and regulators can use that history to forge forward with regulations for internet platforms of a size and content type that reach a threshold of audio / video content. Mostly volunteer run media services, provide order of magnitude lower costs over commercial media. (Treating) Facebook Groups as new media.? Cathy Edwards, executive director of the Canadian Association of Community Television Users and Stations (“CACTUS”), calls it new media in that the distribution platform is not the issue. What does make community groups community media? Cathy responded, "... it's bylaws, articles of incorporation that state they are community media, they have accessibility, commitments to skills training, any member of the community can be a member, and there is accountability to a board of directors". Eligibility for funding through CACTUS requires these same commitments. It is risky for a community to invest into a platform as ownership has not been litigated. Is a FaceBook Group an asset of a not for profit society? The memo, from law student, Jared Hubbard summarizes, “Rights and interests in a Facebook group could, in theory, be transferred as property... This theory is currently unconfirmed by Canadian courts. “

Keywords: social media, governance, community media, Canadian radio

Procedia PDF Downloads 39
319 Emotion-Convolutional Neural Network for Perceiving Stress from Audio Signals: A Brain Chemistry Approach

Authors: Anup Anand Deshmukh, Catherine Soladie, Renaud Seguier

Abstract:

Emotion plays a key role in many applications like healthcare, to gather patients’ emotional behavior. Unlike typical ASR (Automated Speech Recognition) problems which focus on 'what was said', it is equally important to understand 'how it was said.' There are certain emotions which are given more importance due to their effectiveness in understanding human feelings. In this paper, we propose an approach that models human stress from audio signals. The research challenge in speech emotion detection is finding the appropriate set of acoustic features corresponding to an emotion. Another difficulty lies in defining the very meaning of emotion and being able to categorize it in a precise manner. Supervised Machine Learning models, including state of the art Deep Learning classification methods, rely on the availability of clean and labelled data. One of the problems in affective computation is the limited amount of annotated data. The existing labelled emotions datasets are highly subjective to the perception of the annotator. We address the first issue of feature selection by exploiting the use of traditional MFCC (Mel-Frequency Cepstral Coefficients) features in Convolutional Neural Network. Our proposed Emo-CNN (Emotion-CNN) architecture treats speech representations in a manner similar to how CNN’s treat images in a vision problem. Our experiments show that Emo-CNN consistently and significantly outperforms the popular existing methods over multiple datasets. It achieves 90.2% categorical accuracy on the Emo-DB dataset. We claim that Emo-CNN is robust to speaker variations and environmental distortions. The proposed approach achieves 85.5% speaker-dependant categorical accuracy for SAVEE (Surrey Audio-Visual Expressed Emotion) dataset, beating the existing CNN based approach by 10.2%. To tackle the second problem of subjectivity in stress labels, we use Lovheim’s cube, which is a 3-dimensional projection of emotions. Monoamine neurotransmitters are a type of chemical messengers in the brain that transmits signals on perceiving emotions. The cube aims at explaining the relationship between these neurotransmitters and the positions of emotions in 3D space. The learnt emotion representations from the Emo-CNN are mapped to the cube using three component PCA (Principal Component Analysis) which is then used to model human stress. This proposed approach not only circumvents the need for labelled stress data but also complies with the psychological theory of emotions given by Lovheim’s cube. We believe that this work is the first step towards creating a connection between Artificial Intelligence and the chemistry of human emotions.

Keywords: deep learning, brain chemistry, emotion perception, Lovheim's cube

Procedia PDF Downloads 124
318 Innovation Outcomes and Competing Agendas in Higher Education: Experimenting with Audio-Video Feedback

Authors: Adina Dudau, Georgios Kominis, Melinda Szocs

Abstract:

This paper links distinct bodies of literature around innovation and public services by examining a case of perceived innovation failure. Through a mixed methodology investigating student attitudes to, and behaviour around, technological innovation in higher education, the paper makes a contribution to the public service innovation literature by focusing on the duality of innovation outcomes, suggestive of an innovation typology in public services. The study was conducted in a UK Russell Group university and it focused on a technological process innovation. The innovation consisted of the provision of feedback to students in the form of a digital video (mp4), tailored to each individual submission, with extended voice-over commentary from the course coordinator and visual cues intended to help students see the relevance of comments to their submissions. The sample of the study consisted of a class of 79 undergraduate students. To investigate student attainment, we designed a field (also known as quasi or natural) experiment, essentially a manipulation of a social setting (in this case, the form of feedback given to students), but as part of a naturally occurring social arrangement (a real course which students attend and in which they are assessed). A two group control group design (see figure 3) was utilised to examine the effectiveness of the feedback innovation (video feedback). Two outcome variables of the service innovation were measured: student satisfaction and student attainment. In other words, the study examined not only students’ perceptions of whether VF was deemed to be beneficial towards their subsequent assignments; but also evidence of actual incremental benefits in students’ performance from one assignment to the next after VF was provided. The results were baffling and indicating competing agendas in higher education.

Keywords: higher education, audio-video, feedback, innovation

Procedia PDF Downloads 335
317 Another Beautiful Sounds: Building the Memory of Sound of Peddling in Beijing with Digital Technology

Authors: Dan Wang, Qing Ma, Xiaodan Wang, Tianjiao Qi

Abstract:

The sound of peddling in Beijing, also called “yo-heave-ho” or “cry of one's ware”, is a unique folk culture and usually found in Beijing hutong. For the civilians in Beijing, sound of peddling is part of their childhood. And for those who love the traditional culture of Beijing, it is an old song singing the local conditions and customs of the ancient city. For example, because of his great appreciation, the British poet Osbert Stewart once put sound of peddling which he had heard in Beijing as a street orchestra performance in the article named "Beijing's sound and color".This research aims to collect and integrate the voice/photo resources and historical materials of sound concerning peddling in Beijing by digital technology in order to protect the intangible cultural heritage and pass on the city memory. With the goal in mind, the next stage is to collect and record all the materials and resources based on the historical documents study and interviews with civilians or performers. Then set up a metadata scheme (which refers to the domestic and international standards such as "Audio Data Processing Standards in the National Library", DC, VRA, and CDWA, etc.) to describe, process and organize the sound of peddling into a database. In order to fully show the traditional culture of sound of peddling in Beijing, web design and GIS technology are utilized to establish a website and plan holding offline exhibitions and events for people to simulate and learn the sound of peddling by using VR/AR technology. All resources are opened to the public and civilians can share the digital memory through not only the offline experiential activities, but also the online interaction. With all the attempts, a multi-media narrative platform has been established to multi-dimensionally record the sound of peddling in old Beijing with text, images, audio, video and so on.

Keywords: sound of peddling, GIS, metadata scheme, VR/AR technology

Procedia PDF Downloads 276
316 Creative Radio Advertising in Turkey

Authors: Mehmet Sinan Erguven

Abstract:

A number of authorities argue that radio is an outdated medium for advertising and does not have the same impact on consumers as it did in the past. This grim outlook on the future of radio has its basis in the audio-visual world that consumers now live in and the popularity of Internet-based marketing tools among advertising professionals. Nonetheless, consumers still appear to overwhelmingly prefer radio as an entertainment tool. Today, in Canada, 90% of all adults (18+) tune into the radio on a weekly basis, and they listen for 17 hours. Teens are the most challenging group for radio to capture as an audience, but still, almost 75% tune in weekly. One online radio station reaches more than 250 million registered listeners worldwide, and revenues from radio advertising in Australia are expected to grow at an annual rate of 3% for the foreseeable future. Radio is also starting to become popular again in Turkey, with a 5% increase in the listening rates compared to 2014. A major matter of concern always affecting radio advertising is creativity. As radio generally serves as a background medium for listeners, the creativity of the radio commercials is important in terms of attracting the attention of the listener and directing their focus on the advertising message. This cannot simply be done by using audio tools like sound effects and jingles. This study aims to identify the creative elements (execution formats appeals and approaches) and creativity factors of radio commercials in Turkey. As part of the study, all of the award winning radio commercials produced throughout the history of the Kristal Elma Advertising Festival were analyzed using the content analysis technique. Two judges (an advertising agency copywriter and an academic) coded the commercials. The reliability was measured according to the proportional agreement. The results showed that sound effects, jingles, testimonials, slices of life and announcements were the most common execution formats in creative Turkish radio ads. Humor and excitement were the most commonly used creative appeals while award-winning ads featured various approaches, such as surprise musical performances, audio wallpaper, product voice, and theater of the mind. Some ads, however, were found to not contain any creativity factors. In order to be accepted as creative, an ad must have at least one divergence factor, such as originality, flexibility, unusual/empathic perspective, and provocative questions. These findings, as well as others from the study, hold great value for the history of creative radio advertising in Turkey. Today, the nature of radio and its listeners is changing. As more and more people are tuning into online radio channels, brands will need to focus more on this relatively cheap advertising medium in the very near future. This new development will require that advertising agencies focus their attention on creativity in order to produce radio commercials for their customers that will differentiate them from their competitors.

Keywords: advertising, creativity, radio, Turkey

Procedia PDF Downloads 354
315 A Review of Blog Assisted Language Learning Research: Based on Bibliometric Analysis

Authors: Bo Ning Lyu

Abstract:

Blog assisted language learning (BALL) has been trialed by educators in language teaching with the development of Web 2.0 technology. Understanding the development trend of related research helps grasp the whole picture of the use of blog in language education. This paper reviews current research related to blogs enhanced language learning based on bibliometric analysis, aiming at (1) identifying the most frequently used keywords and their co-occurrence, (2) clustering research topics based on co-citation analysis, (3) finding the most frequently cited studies and authors and (4) constructing the co-authorship network. 330 articles were searched out in Web of Science, 225 peer-viewed journal papers were finally collected according to selection criteria. Bibexcel and VOSviewer were used to visualize the results. Studies reviewed were published between 2005 to 2016, most in the year of 2014 and 2015 (35 papers respectively). The top 10 most frequently appeared keywords are learning, language, blog, teaching, writing, social, web 2.0, technology, English, communication. 8 research themes could be clustered by co-citation analysis: blogging for collaborative learning, blogging for writing skills, blogging in higher education, feedback via blogs, blogging for self-regulated learning, implementation of using blogs in classroom, comparative studies and audio/video blogs. Early studies focused on the introduction of the classroom implementation while recent studies moved to the audio/video blogs from their traditional usage. By reviewing the research related to BALL quantitatively and objectively, this paper reveals the evolution and development trends as well as identifies influential research, helping researchers and educators quickly grasp this field overall and conducting further studies.

Keywords: blog, bibliometric analysis, language learning, literature review

Procedia PDF Downloads 184
314 Authentication of Physical Objects with Dot-Based 2D Code

Authors: Michał Glet, Kamil Kaczyński

Abstract:

Counterfeit goods and documents are a global problem, which needs more and more sophisticated methods of resolving it. Existing techniques using watermarking or embedding symbols on objects are not suitable for all use cases. To address those special needs, we created complete system allowing authentication of paper documents and physical objects with flat surface. Objects are marked using orientation independent and resistant to camera noise 2D graphic codes, named DotAuth. Based on the identifier stored in 2D code, the system is able to perform basic authentication and allows to conduct more sophisticated analysis methods, e.g., relying on augmented reality and physical properties of the object. In this paper, we present the complete architecture, algorithms and applications of the proposed system. Results of the features comparison of the proposed solution and other products are presented as well, pointing to the existence of many advantages that increase usability and efficiency in the means of protecting physical objects.

Keywords: anti-forgery, authentication, paper documents, security

Procedia PDF Downloads 105
313 Building a Comprehensive Repository for Montreal Gamelan Archives

Authors: Laurent Bellemare

Abstract:

After the showcase of traditional Indonesian performing arts at the Vancouver Expo 1986, Canadian universities inherited sets of Indonesian gamelan orchestras and soon began offering courses for music students interested in learning these diverse traditions. Among them, Université de Montréal was offered two sets of Balinese orchestras, a novelty that allowed a community of Montreal gamelan enthusiasts to form and engage with this music. A few generations later, a large body of archives have amassed, framing the history of this niche community’s achievements. This data, scattered in public and private archive collections, comes in various formats: Digital Audio Tape, audio cassettes, Video Home System videotape, digital files, photos, reel-to-reel audiotape, posters, concert programs, letters, TV shows, reports and more. Attempting to study these documents in order to unearth a chronology of gamelan in Montreal has proven to be challenging since no suitable platform for preservation, storage, and research currently exists. These files are, therefore, hard to find due to their decentralized locations. Additionally, most of the documents in older formats have yet to be digitized. In the case of recent digital files, such as pictures or rehearsal recordings, their locations can be even messier and their quantity overwhelming. Aside from the basic issue of choosing a suitable repository platform, questions of legal rights and methodology arise. For posterity, these documents should nonetheless be digitized, organized, and stored in an easily accessible online repository. This paper aims to underline the various challenges encountered in the early stages of such a project as well as to suggest ways of overcoming the obstacles to a thorough archival investigation.

Keywords: archival work, archives, Balinese gamelan, Canada, Gamelan, Indonesia, Javanese gamelan, Montreal

Procedia PDF Downloads 91
312 Colloquialism in Audiovisual Translation: English Subtitling of the Lebanese Film Capernaum as a Case Study

Authors: Fatima Saab

Abstract:

This paper attempts to study colloquialism in audio-visual translation, with particular emphasis given to investigating the difficulties and challenges encountered by subtitlers in translating Lebanese colloquial into English. To achieve the main objectives of this study, ample and thorough cultural and translational analysis of examples drawn from the subtitled movie Capernaum are presented in order to identify the strategies used to overcome cultural barriers and differences and to show the process of decision-making by the translator. Also, special attention is given to explain the technicalities in translating subtitles and how they affect the translation process. The research is a descriptive analytical study whereby the writer sets out empirical observations, consisting of descriptive and analytical examination of the difficulties and problems associated with translating Arabic colloquialisms, specifically Lebanese, into English in the subtitled film, Capernaum. The research methodology utilizes a qualitative approach to group the selected data into the subtitling strategies presented by Gottlieb under the domesticating or foreignizing strategies according to Venuti's Model. It is shown that producing the same meanings to a foreign audience is not an easy task. The background of cultural elements and the stories that make up the history and mindset of the Lebanese and Arabic peoples leads to the use of the transfer and paraphrase methodologies most of the time (81% of the sample used for analysis). The research shows that translating and subtitling colloquialism needs special skills by the translators to overcome the challenges imposed by the limited presentation space as well as cultural differences. Translation of colloquial Arabic/Lebanese can be achieved to a certain extent and delivering the meaning and effect of the source language culture is accomplished in as much as the translator investigates and relates to the target culture.

Keywords: Lebanese colloquial, audio-visual translation, subtitling, Capernaum

Procedia PDF Downloads 120
311 Linguistic Accessibility and Audiovisual Translation: Corpus Linguistics as a Tool for Analysis

Authors: Juan-Pedro Rica-Peromingo

Abstract:

The important change taking place with respect to the media and the audiovisual world in Europe needs to benefit all populations, in particular those with special needs, such as the deaf and hard-of-hearing population (SDH) and blind and partially-sighted population (AD). This recent interest in the field of audiovisual translation (AVT) can be observed in the teaching and learning of the different modes of AVT in the degree and post-degree courses at Spanish universities, which expand the interest and practice of AVT linguistic accessibility. We present a research project led at the UCM which consists of the compilation of AVT activities for teaching purposes and tries to analyze the creation and reception of SDH and AD: the AVLA Project (Audiovisual Learning Archive), which includes audiovisual materials carried out by the university students on different AVT modes and evaluations from the blind and deaf informants. In this study, we present the materials created by the students. A group of the deaf and blind population has been in charge of testing the student's SDH and AD corpus of audiovisual materials through some questionnaires used to evaluate the students’ production. These questionnaires include information about the reception of the subtitles and the audio descriptions from linguistic and technical points of view. With all the materials compiled in the research project, a corpus with both the students’ production and the recipients’ evaluations is being compiled: the CALING (Corpus de Accesibilidad Lingüística) corpus. Preliminary results will be presented with respect to those aspects, difficulties, and deficiencies in the SDH and AD included in the corpus, specifically with respect to the length of subtitles, the position of the contextual information on the screen, and the text included in the audio descriptions and tone of voice used. These results may suggest some changes and improvements in the quality of the SDH and AD analyzed. In the end, demand for the teaching and learning of AVT and linguistic accessibility at a university level and some important changes in the norms which regulate SDH and AD nationally and internationally will be suggested.

Keywords: audiovisual translation, corpus linguistics, linguistic accessibility, teaching

Procedia PDF Downloads 51
310 The Relationship between Spindle Sound and Tool Performance in Turning

Authors: N. Seemuang, T. McLeay, T. Slatter

Abstract:

Worn tools have a direct effect on the surface finish and part accuracy. Tool condition monitoring systems have been developed over a long period and used to avoid a loss of productivity resulting from using a worn tool. However, the majority of tool monitoring research has applied expensive sensing systems not suitable for production. In this work, the cutting sound in turning machine was studied using microphone. Machining trials using seven cutting conditions were conducted until the observable flank wear width (FWW) on the main cutting edge exceeded 0.4 mm. The cutting inserts were removed from the tool holder and the flank wear width was measured optically. A microphone with built-in preamplifier was used to record the machining sound of EN24 steel being face turned by a CNC lathe in a wet cutting condition using constant surface speed control. The sound was sampled at 50 kS/s and all sound signals recorded from microphone were transformed into the frequency domain by FFT in order to establish the frequency content in the audio signature that could be then used for tool condition monitoring. The extracted feature from audio signal was compared to the flank wear progression on the cutting inserts. The spectrogram reveals a promising feature, named as ‘spindle noise’, which emits from the main spindle motor of turning machine. The spindle noise frequency was detected at 5.86 kHz of regardless of cutting conditions used on this particular CNC lathe. Varying cutting speed and feed rate have an influence on the magnitude of power spectrum of spindle noise. The magnitude of spindle noise frequency alters in conjunction with the tool wear progression. The magnitude increases significantly in the transition state between steady-state wear and severe wear. This could be used as a warning signal to prepare for tool replacement or adapt cutting parameters to extend tool life.

Keywords: tool wear, flank wear, condition monitoring, spindle noise

Procedia PDF Downloads 306
309 Digi-Buddy: A Smart Cane with Artificial Intelligence and Real-Time Assistance

Authors: Amaladhithyan Krishnamoorthy, Ruvaitha Banu

Abstract:

Vision is considered as the most important sense in humans, without which leading a normal can be often difficult. There are many existing smart canes for visually impaired with obstacle detection using ultrasonic transducer to help them navigate. Though the basic smart cane increases the safety of the users, it does not help in filling the void of visual loss. This paper introduces the concept of Digi-Buddy which is an evolved smart cane for visually impaired. The cane consists for several modules, apart from the basic obstacle detection features; the Digi-Buddy assists the user by capturing video/images and streams them to the server using a wide-angled camera, which then detects the objects using Deep Convolutional Neural Network. In addition to determining what the particular image/object is, the distance of the object is assessed by the ultrasonic transducer. The sound generation application, modelled with the help of Natural Language Processing is used to convert the processed images/object into audio. The object detected is signified by its name which is transmitted to the user with the help of Bluetooth hear phones. The object detection is extended to facial recognition which maps the faces of the person the user meets in the database of face images and alerts the user about the person. One of other crucial function consists of an automatic-intimation-alarm which is triggered when the user is in an emergency. If the user recovers within a set time, a button is provisioned in the cane to stop the alarm. Else an automatic intimation is sent to friends and family about the whereabouts of the user using GPS. In addition to safety and security by the existing smart canes, the proposed concept devices to be implemented as a prototype helping visually-impaired visualize their surroundings through audio more in an amicable way.

Keywords: artificial intelligence, facial recognition, natural language processing, internet of things

Procedia PDF Downloads 320
308 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 49
307 Causes and Consequences of Intuitive Animal Communication: A Case Study at Panthera Africa

Authors: Cathrine Scharning Cornwall-Nyquist, David Rafael Vaz Fernandes

Abstract:

Since its origins, mankind has been dreaming of communicating directly with other animals. Past civilizations interacted on different levels with other species and recognized them in their rituals and daily activities. However, recent scientific developments have limited the ability of humans to consider deeper levels of interaction beyond observation and/or physical behavior. In recent years, animal caretakers and facilities such as sanctuaries or rescue centers have been introducing new techniques based on intuition. Most of those initiatives are related to specific cases, such as the incapacity to understand an animal’s behavior. Respected organizations also include intuitive animal communication (IAC) sessions to follow up on past interventions with their animals. Despite the lack of credibility of this discipline, some animal caring structures have opted to integrate IAC into their daily routines and approaches to animal welfare. At this stage, animal communication will be generally defined as the ability of humans to communicate with animals on an intuitive level. The trend in the field remains to be explored. The lack of theory and previous research urges the scientific community to improve the description of the phenomenon and its consequences. Considering the current scenario, qualitative approaches may become a suitable pathway to explore this topic. The purpose of this case study is to explore the beliefs behind and the consequences of an approach based on intuitive animal communication techniques for Panthera Africa (PA), an ethical sanctuary located in South Africa. Due to their personal experience, the Sanctuary’s founders have developed a philosophy based on IAC while respecting the world's highest standards for big cat welfare. Their dual approach is reflected in their rescues, daily activities, and healing animals’ trauma. The case study's main research questions will be: (i) Why do they choose to apply IAC in their work? (ii) What consequences to their activities do IAC bring? (iii) What effects do IAC techniques bring in their interactions with the outside world? Data collection will be gathered on-site via: (i) Complete participation (field notes); (ii) Semi-structured interviews (audio transcriptions); (iii) Document analysis (internal procedures and policies); (iv) Audio-visual material (communication with third parties). The main researcher shall become an active member of the Sanctuary during a 30-day period and have full access to the site. Access to documents and audio-visual materials will be granted on a request basis. Interviews are expected to be held with PA founders and staff members and with IAC practitioners related to the facility. The information gathered shall enable the researcher to provide an extended description of the phenomenon and explore its internal and external consequences for Panthera Africa.

Keywords: animal welfare, intuitive animal communication, Panthera Africa, rescue

Procedia PDF Downloads 65