Search results for: audio
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 386

Search results for: audio

326 English Pronunciation Materials on TikTok

Authors: Sebastian Leal-Arenas

Abstract:

TikTok’s influence on contemporary society is undeniable. The impact of the mobile app transcends entertainment, as shown by the growing presence of specialized accounts dedicated to providing educational content, particularly as it pertains to language learning. However, the prevailing trend on the platform is vocabulary and grammar acquisition, neglecting a critical component: pronunciation. This study examines English pronunciation materials available on TikTok by taking a comprehensive approach that incorporates established assessment tools, such as the Learning Object Review Instrument and the Framework for Language Learning App Evaluation. Furthermore, novel evaluation categories are introduced to provide a more holistic assessment of these educational resources. 60 English pronunciation videos were part of the analysis. The findings reveal that these audio-visual materials present clear audio bolstered by high-quality video content and automatically generated closed captions. These three components enhance the comprehensibility of the input, making these concise videos valuable assets for language learners. Nevertheless, certain deficiencies are observed, such as the lack of emphasis on specific segments and their relationship with articulators. Improvements and refinements are discussed, as well as their potential utility within the language classroom. This study contributes to the ongoing investigation of multimedia materials used for language teaching and emphasizes the need to adapt pronunciation instruction methods to today’s technology.

Keywords: pronunciation, segments, teaching materials, technology

Procedia PDF Downloads 42
325 Tensor Deep Stacking Neural Networks and Bilinear Mapping Based Speech Emotion Classification Using Facial Electromyography

Authors: P. S. Jagadeesh Kumar, Yang Yung, Wenli Hu

Abstract:

Speech emotion classification is a dominant research field in finding a sturdy and profligate classifier appropriate for different real-life applications. This effort accentuates on classifying different emotions from speech signal quarried from the features related to pitch, formants, energy contours, jitter, shimmer, spectral, perceptual and temporal features. Tensor deep stacking neural networks were supported to examine the factors that influence the classification success rate. Facial electromyography signals were composed of several forms of focuses in a controlled atmosphere by means of audio-visual stimuli. Proficient facial electromyography signals were pre-processed using moving average filter, and a set of arithmetical features were excavated. Extracted features were mapped into consistent emotions using bilinear mapping. With facial electromyography signals, a database comprising diverse emotions will be exposed with a suitable fine-tuning of features and training data. A success rate of 92% can be attained deprived of increasing the system connivance and the computation time for sorting diverse emotional states.

Keywords: speech emotion classification, tensor deep stacking neural networks, facial electromyography, bilinear mapping, audio-visual stimuli

Procedia PDF Downloads 216
324 Digital Curriculum Preservation Planning, Actions, and Challenges

Authors: Misook Ahn

Abstract:

This study examined the Digital Curriculum Repository (DCR) project initiated at Defense Language Institute Foreign Language Center (DLIFLC). The purpose of the DCR is to build a centralized curriculum infrastructure, preserve all curriculum materials, and provide academic service to users (faculty, students, or other agencies). The DCR collection includes core language curriculum materials developed by each language school—foreign language textbooks, language survival kits, and audio files currently in or not in use at the schools. All core curriculum materials with audio and video files have been coded, collected, and preserved at the DCR. The DCR website was designed with MS SharePoint for easy accessibility by the DLIFLC’s faculty and students. All metadata for the collected curriculum materials have been input by language, code, year, book type, level, user, version, and current status (in use/not in use). The study documents digital curriculum preservation planning, actions, and challenges, including collecting, coding, collaborating, designing DCR SharePoint, and policymaking. DCR Survey data is also collected and analyzed for this research. Based on the finding, the study concludes that the mandatory policy for the DCR system and collaboration with school leadership are critical elements of a successful repository system. The sample collected items, metadata, and DCR SharePoint site are presented in the evaluation section.

Keywords: MS share point, digital preservation, repository, policy

Procedia PDF Downloads 125
323 Working Memory and Audio-Motor Synchronization in Children with Different Degrees of Central Nervous System's Lesions

Authors: Anastasia V. Kovaleva, Alena A. Ryabova, Vladimir N. Kasatkin

Abstract:

Background: The most simple form of entrainment to a sensory (typically auditory) rhythmic stimulus involves perceiving and synchronizing movements with an isochronous beat with one level of periodicity, such as that produced by a metronome. Children with pediatric cancer usually treated with chemo- and radiotherapy. Because of such treatment, psychologists and health professionals declare cognitive and motor abilities decline in cancer patients. The purpose of our study was to measure working memory characteristics with association with audio-motor synchronization tasks, also involved some memory resources, in children with different degrees of central nervous system lesions: posterior fossa tumors, acute lymphoblastic leukemia, and healthy controls. Methods: Our sample consisted of three groups of children: children treated for posterior fossa tumors (PFT-group, n=42, mean age 12.23), children treated for acute lymphoblastic leukemia (ALL-group, n=11, mean age 11.57) and neurologically healthy children (control group, n=36, mean age 11.67). Participants were tested for working memory characteristics with Cambridge Neuropsychological Test Automated Battery (CANTAB). Pattern recognition memory (PRM) and spatial working memory (SWM) tests were applied. Outcome measures of PRM test include the number and percentage of correct trials and latency (speed of participant’s response), and measures of SWM include errors, strategy, and latency. In the synchronization tests, the instruction was to tap out a regular beat (40, 60, 90 and 120 beats per minute) in synchrony with the rhythmic sequences that were played. This meant that for the sequences with an isochronous beat, participants were required to tap into every auditory event. Variations of inter-tap-intervals and deviations of children’s taps from the metronome were assessed. Results: Analysis of variance revealed the significant effect of group (ALL, PFT and control) on such parameters as short-term PRM, SWM strategy and errors. Healthy controls demonstrated more correctly retained elements, better working memory strategy, compared to cancer patients. Interestingly that ALL patients chose the bad strategy, but committed significantly less errors in SWM test then PFT and controls did. As to rhythmic ability, significant associations of working memory were found out only with 40 bpm rhythm: the less variable were inter-tap-intervals of the child, the more elements in memory he/she could retain. The ability to audio-motor synchronization may be related to working memory processes mediated by the prefrontal cortex whereby each sensory event is actively retrieved and monitored during rhythmic sequencing. Conclusion: Our results suggest that working memory, tested with appropriate cognitive methods, is associated with the ability to synchronize movements with rhythmic sounds, especially in sub-second intervals (40 per minute).

Keywords: acute lymphoblastic leukemia (ALL), audio-motor synchronization, posterior fossa tumor, working memory

Procedia PDF Downloads 274
322 Investigation of Verbal Feedback and Learning Process for Oral Presentation

Authors: Nattawadee Sinpattanawong

Abstract:

Oral presentation has been used mostly in business communication. The business presentation is carrying out through an audio and visual presentation material such as statistical documents, projectors, etc. Common examples of business presentation are intra-organization and sales presentations. The study aims at investigating functions, strategies and contents of assessors’ verbal feedback on presenters’ oral presentations and exploring presenters’ learning process and specific views and expectations concerning assessors’ verbal feedback related to the delivery of the oral presentation. This study is designed as a descriptive qualitative research; four master students and one teacher in English for Business and Industry Presentation Techniques class of public university will be selected. The researcher hopes that any understanding how assessors’ verbal feedback on oral presentations and learning process may illuminate issues for other people. The data from this research may help to expand and facilitate the readers’ understanding of assessors’ verbal feedback on oral presentations and learning process in their own situations. The research instruments include an audio recorder, video recorder and an interview. The students will be interviewing in order to ask for their views and expectations concerning assessors’ verbal feedback related to the delivery of the oral presentation. After finishing data collection, the data will be analyzed and transcribed. The findings of this study are significant because it can provide presenters knowledge to enhance their learning process and provide teachers knowledge about providing verbal feedback on student’s oral presentations on a business context.

Keywords: business context, learning process, oral presentation, verbal feedback

Procedia PDF Downloads 159
321 Ear Protectors and Their Action in Protecting Hearing System of Workers against Occupational Noise

Authors: F. Forouharmajd, S. Pourabdian, N. Ziayi Ghahnavieh

Abstract:

For many years, the ear protectors have been used to preventing the audio and non-audio effects of received noise from occupation environments. Despite performing hearing protection programs, there are many people which still suffer from noise-induced hearing loss. This study was conducted with the aim of determination of human hearing system response to received noise and the effectiveness of ear protectors on preventing of noise-induced hearing loss. Sound pressure microphones were placed in a simulated ear canal. The severity of noise measured inside and outside of ear canal. The noise reduction values due to installing ear protectors were calculated in the octave band frequencies and LabVIEW programmer. The results of noise measurement inside and outside of ear canal showed a different in received sound levels by ear canal. The effectiveness of ear protectors has been considerably reduced for the low frequency limits. A change in resonance frequency also was observed after using ear protectors. The study indicated the ear canal structure may affect the received noise and it may lead a difference between the received sound from the measured sound by a sound level meter, and hearing system. It means the human hearing system may probably respond different from a sound level meter. Hearing protectors’ efficiency declines by increasing the noise levels, and thus, they are not suitable to protect workers against industrial noise particularly low frequency noise. Hearing protectors may be solely a reason to damaging of hearing system in a special frequency via changing of human hearing system acoustical structure. We need developing the subjective method of hearing protectors testing, because their evaluation is not designed based on industrial noise or in the field.

Keywords: ear protector, hearing system, occupational noise, workers

Procedia PDF Downloads 140
320 Improving Medication Understanding, Use and Self-Efficacy among Stroke Patients: A Randomised Controlled Trial; Study Protocol

Authors: Jamunarani Appalasamy, Tha Kyi Kyi, Quek Kia Fatt, Joyce Pauline Joseph, Anuar Zaini M. Zain

Abstract:

Background: The Health Belief Theory had always been associated with chronic disease management. Various health behaviour concepts and perception branching from this Health Belief Theory had involved with medication understanding, use, and self-efficacy which directly link to medication adherence. In a previous quantitative and qualitative study, stroke patients in Malaysia were found to be strongly believing information obtained by various sources such as the internet and social communication. This action leads to lower perception of their stroke preventative medication benefit which in long-term creates non-adherence. Hence, this study intends to pilot an intervention which uses audio-visual concept incorporated with mHealth service to enhance learning and self-reflection among stroke patients to manage their disease. Methods/Design: Twenty patients will be allocated to a proposed intervention whereas another twenty patients are allocated to the usual treatment. The intervention involves a series of developed audio-visual videos sent via mobile phone which later await for responses and feedback from the receiver (patient) via SMS or recorded calls. The primary outcome would be the medication understanding, use and self-efficacy measured over two months pre and post intervention. Secondary outcome is measured from changes of blood parameters and other self-reported questionnaires. Discussion: This study shall also assess uptake/attrition, feasibility, and acceptability of this intervention. Trial Registration: NMRR-15-851-24737 (IIR)

Keywords: health belief, medication understanding, medication use, self-efficacy

Procedia PDF Downloads 184
319 Exploring Teacher Verbal Feedback on Postgraduate Students' Performances in Presentations in English

Authors: Nattawadee Sinpattanawong, Yaowaret Tharawoot

Abstract:

This is an analytic and descriptive classroom-centered research, the purpose of which is to explore teacher verbal feedback on postgraduate students’ performances in presentations in English in an English for Specific Purposes (ESP) postgraduate classroom. The participants are a Thai female teacher, two Thai female postgraduate students, and two foreign male postgraduate students. The current study draws on both classroom observation and interview data. The class focused on the students’ presentations and the teacher’s providing verbal feedback on them was observed nine times with audio recording and taking notes. For the interviews, the teacher was interviewed about linkages between her verbal feedback and each student’s presentation skills in English. For the data analysis, the audio files from the observations were transcribed and analyzed both quantitatively and qualitatively. The quantitative approach addressed the frequencies and percentages of content of the teacher’s verbal feedback for each student’s performances based on eight presentation factors (content, structure, grammar, coherence, vocabulary, speaking skills, involving the audience, and self-presentation). Based on the quantitative data including the interview data, a qualitative analysis of the transcripts was made to describe the occurrences of several content of verbal feedback for each student’s presentation performances. The study’s findings may help teachers to reflect on their providing verbal feedback based on various students’ performances in presentation in English. They also help students who have similar characteristics to the students in the present study when giving a presentation in English improve their presentation performances by applying the teacher’s verbal feedback content.

Keywords: teacher verbal feedback, presentation factors, presentation in English, presentation performances

Procedia PDF Downloads 124
318 Governance of Social Media Using the Principles of Community Radio

Authors: Ken Zakreski

Abstract:

Regulating Canadian Facebook Groups, of a size and type, when they reach a threshold of audio video content. Consider the evolution of the Streaming Act, Parl GC Bill C-11 (44-1) and the regulations that will certainly follow. The Canadian Heritage Minister's office stipulates, "the Broadcasting Act only applies to audio and audiovisual content, not written journalism.” Governance— After 10 years, a community radio station for Gabriola Island, BC – Canadian Radio-television and Telecommunications Commission (“CRTC”) was approved but never started – became a Facebook Group “Community Bulletin Board - Life on Gabriola“ referred to as CBBlog. After CBBlog started and began to gather real traction, a member of the Group cloned the membership and ran their competing Facebook group under the banner of "free speech”. Here we see an inflection point [change of cultural stewardship] with two different mathematical results [engagement and membership growth]. Canada's telecommunication history of “portability” and “interoperability” made that Facebook Group CBBlog the better option, over broadcast FM radio for a community pandemic information sharing service for Gabriola Island, BC. A culture of ignorance flourishes in social media. Often people do not understand their own experience, or the experience of others because they do not have the concepts needed for understanding. It is thus important they are not denied concepts required for their full understanding. For example, Legislators need to know something about gay culture before they can make any decisions about it. Community Media policies and CRTC regulations are known and regulators can use that history to forge forward with regulations for internet platforms of a size and content type that reach a threshold of audio / video content. Mostly volunteer run media services, provide order of magnitude lower costs over commercial media. (Treating) Facebook Groups as new media.? Cathy Edwards, executive director of the Canadian Association of Community Television Users and Stations (“CACTUS”), calls it new media in that the distribution platform is not the issue. What does make community groups community media? Cathy responded, "... it's bylaws, articles of incorporation that state they are community media, they have accessibility, commitments to skills training, any member of the community can be a member, and there is accountability to a board of directors". Eligibility for funding through CACTUS requires these same commitments. It is risky for a community to invest into a platform as ownership has not been litigated. Is a FaceBook Group an asset of a not for profit society? The memo, from law student, Jared Hubbard summarizes, “Rights and interests in a Facebook group could, in theory, be transferred as property... This theory is currently unconfirmed by Canadian courts. “

Keywords: social media, governance, community media, Canadian radio

Procedia PDF Downloads 36
317 Emotion-Convolutional Neural Network for Perceiving Stress from Audio Signals: A Brain Chemistry Approach

Authors: Anup Anand Deshmukh, Catherine Soladie, Renaud Seguier

Abstract:

Emotion plays a key role in many applications like healthcare, to gather patients’ emotional behavior. Unlike typical ASR (Automated Speech Recognition) problems which focus on 'what was said', it is equally important to understand 'how it was said.' There are certain emotions which are given more importance due to their effectiveness in understanding human feelings. In this paper, we propose an approach that models human stress from audio signals. The research challenge in speech emotion detection is finding the appropriate set of acoustic features corresponding to an emotion. Another difficulty lies in defining the very meaning of emotion and being able to categorize it in a precise manner. Supervised Machine Learning models, including state of the art Deep Learning classification methods, rely on the availability of clean and labelled data. One of the problems in affective computation is the limited amount of annotated data. The existing labelled emotions datasets are highly subjective to the perception of the annotator. We address the first issue of feature selection by exploiting the use of traditional MFCC (Mel-Frequency Cepstral Coefficients) features in Convolutional Neural Network. Our proposed Emo-CNN (Emotion-CNN) architecture treats speech representations in a manner similar to how CNN’s treat images in a vision problem. Our experiments show that Emo-CNN consistently and significantly outperforms the popular existing methods over multiple datasets. It achieves 90.2% categorical accuracy on the Emo-DB dataset. We claim that Emo-CNN is robust to speaker variations and environmental distortions. The proposed approach achieves 85.5% speaker-dependant categorical accuracy for SAVEE (Surrey Audio-Visual Expressed Emotion) dataset, beating the existing CNN based approach by 10.2%. To tackle the second problem of subjectivity in stress labels, we use Lovheim’s cube, which is a 3-dimensional projection of emotions. Monoamine neurotransmitters are a type of chemical messengers in the brain that transmits signals on perceiving emotions. The cube aims at explaining the relationship between these neurotransmitters and the positions of emotions in 3D space. The learnt emotion representations from the Emo-CNN are mapped to the cube using three component PCA (Principal Component Analysis) which is then used to model human stress. This proposed approach not only circumvents the need for labelled stress data but also complies with the psychological theory of emotions given by Lovheim’s cube. We believe that this work is the first step towards creating a connection between Artificial Intelligence and the chemistry of human emotions.

Keywords: deep learning, brain chemistry, emotion perception, Lovheim's cube

Procedia PDF Downloads 123
316 Innovation Outcomes and Competing Agendas in Higher Education: Experimenting with Audio-Video Feedback

Authors: Adina Dudau, Georgios Kominis, Melinda Szocs

Abstract:

This paper links distinct bodies of literature around innovation and public services by examining a case of perceived innovation failure. Through a mixed methodology investigating student attitudes to, and behaviour around, technological innovation in higher education, the paper makes a contribution to the public service innovation literature by focusing on the duality of innovation outcomes, suggestive of an innovation typology in public services. The study was conducted in a UK Russell Group university and it focused on a technological process innovation. The innovation consisted of the provision of feedback to students in the form of a digital video (mp4), tailored to each individual submission, with extended voice-over commentary from the course coordinator and visual cues intended to help students see the relevance of comments to their submissions. The sample of the study consisted of a class of 79 undergraduate students. To investigate student attainment, we designed a field (also known as quasi or natural) experiment, essentially a manipulation of a social setting (in this case, the form of feedback given to students), but as part of a naturally occurring social arrangement (a real course which students attend and in which they are assessed). A two group control group design (see figure 3) was utilised to examine the effectiveness of the feedback innovation (video feedback). Two outcome variables of the service innovation were measured: student satisfaction and student attainment. In other words, the study examined not only students’ perceptions of whether VF was deemed to be beneficial towards their subsequent assignments; but also evidence of actual incremental benefits in students’ performance from one assignment to the next after VF was provided. The results were baffling and indicating competing agendas in higher education.

Keywords: higher education, audio-video, feedback, innovation

Procedia PDF Downloads 333
315 Another Beautiful Sounds: Building the Memory of Sound of Peddling in Beijing with Digital Technology

Authors: Dan Wang, Qing Ma, Xiaodan Wang, Tianjiao Qi

Abstract:

The sound of peddling in Beijing, also called “yo-heave-ho” or “cry of one's ware”, is a unique folk culture and usually found in Beijing hutong. For the civilians in Beijing, sound of peddling is part of their childhood. And for those who love the traditional culture of Beijing, it is an old song singing the local conditions and customs of the ancient city. For example, because of his great appreciation, the British poet Osbert Stewart once put sound of peddling which he had heard in Beijing as a street orchestra performance in the article named "Beijing's sound and color".This research aims to collect and integrate the voice/photo resources and historical materials of sound concerning peddling in Beijing by digital technology in order to protect the intangible cultural heritage and pass on the city memory. With the goal in mind, the next stage is to collect and record all the materials and resources based on the historical documents study and interviews with civilians or performers. Then set up a metadata scheme (which refers to the domestic and international standards such as "Audio Data Processing Standards in the National Library", DC, VRA, and CDWA, etc.) to describe, process and organize the sound of peddling into a database. In order to fully show the traditional culture of sound of peddling in Beijing, web design and GIS technology are utilized to establish a website and plan holding offline exhibitions and events for people to simulate and learn the sound of peddling by using VR/AR technology. All resources are opened to the public and civilians can share the digital memory through not only the offline experiential activities, but also the online interaction. With all the attempts, a multi-media narrative platform has been established to multi-dimensionally record the sound of peddling in old Beijing with text, images, audio, video and so on.

Keywords: sound of peddling, GIS, metadata scheme, VR/AR technology

Procedia PDF Downloads 273
314 Creative Radio Advertising in Turkey

Authors: Mehmet Sinan Erguven

Abstract:

A number of authorities argue that radio is an outdated medium for advertising and does not have the same impact on consumers as it did in the past. This grim outlook on the future of radio has its basis in the audio-visual world that consumers now live in and the popularity of Internet-based marketing tools among advertising professionals. Nonetheless, consumers still appear to overwhelmingly prefer radio as an entertainment tool. Today, in Canada, 90% of all adults (18+) tune into the radio on a weekly basis, and they listen for 17 hours. Teens are the most challenging group for radio to capture as an audience, but still, almost 75% tune in weekly. One online radio station reaches more than 250 million registered listeners worldwide, and revenues from radio advertising in Australia are expected to grow at an annual rate of 3% for the foreseeable future. Radio is also starting to become popular again in Turkey, with a 5% increase in the listening rates compared to 2014. A major matter of concern always affecting radio advertising is creativity. As radio generally serves as a background medium for listeners, the creativity of the radio commercials is important in terms of attracting the attention of the listener and directing their focus on the advertising message. This cannot simply be done by using audio tools like sound effects and jingles. This study aims to identify the creative elements (execution formats appeals and approaches) and creativity factors of radio commercials in Turkey. As part of the study, all of the award winning radio commercials produced throughout the history of the Kristal Elma Advertising Festival were analyzed using the content analysis technique. Two judges (an advertising agency copywriter and an academic) coded the commercials. The reliability was measured according to the proportional agreement. The results showed that sound effects, jingles, testimonials, slices of life and announcements were the most common execution formats in creative Turkish radio ads. Humor and excitement were the most commonly used creative appeals while award-winning ads featured various approaches, such as surprise musical performances, audio wallpaper, product voice, and theater of the mind. Some ads, however, were found to not contain any creativity factors. In order to be accepted as creative, an ad must have at least one divergence factor, such as originality, flexibility, unusual/empathic perspective, and provocative questions. These findings, as well as others from the study, hold great value for the history of creative radio advertising in Turkey. Today, the nature of radio and its listeners is changing. As more and more people are tuning into online radio channels, brands will need to focus more on this relatively cheap advertising medium in the very near future. This new development will require that advertising agencies focus their attention on creativity in order to produce radio commercials for their customers that will differentiate them from their competitors.

Keywords: advertising, creativity, radio, Turkey

Procedia PDF Downloads 352
313 A Review of Blog Assisted Language Learning Research: Based on Bibliometric Analysis

Authors: Bo Ning Lyu

Abstract:

Blog assisted language learning (BALL) has been trialed by educators in language teaching with the development of Web 2.0 technology. Understanding the development trend of related research helps grasp the whole picture of the use of blog in language education. This paper reviews current research related to blogs enhanced language learning based on bibliometric analysis, aiming at (1) identifying the most frequently used keywords and their co-occurrence, (2) clustering research topics based on co-citation analysis, (3) finding the most frequently cited studies and authors and (4) constructing the co-authorship network. 330 articles were searched out in Web of Science, 225 peer-viewed journal papers were finally collected according to selection criteria. Bibexcel and VOSviewer were used to visualize the results. Studies reviewed were published between 2005 to 2016, most in the year of 2014 and 2015 (35 papers respectively). The top 10 most frequently appeared keywords are learning, language, blog, teaching, writing, social, web 2.0, technology, English, communication. 8 research themes could be clustered by co-citation analysis: blogging for collaborative learning, blogging for writing skills, blogging in higher education, feedback via blogs, blogging for self-regulated learning, implementation of using blogs in classroom, comparative studies and audio/video blogs. Early studies focused on the introduction of the classroom implementation while recent studies moved to the audio/video blogs from their traditional usage. By reviewing the research related to BALL quantitatively and objectively, this paper reveals the evolution and development trends as well as identifies influential research, helping researchers and educators quickly grasp this field overall and conducting further studies.

Keywords: blog, bibliometric analysis, language learning, literature review

Procedia PDF Downloads 181
312 Building a Comprehensive Repository for Montreal Gamelan Archives

Authors: Laurent Bellemare

Abstract:

After the showcase of traditional Indonesian performing arts at the Vancouver Expo 1986, Canadian universities inherited sets of Indonesian gamelan orchestras and soon began offering courses for music students interested in learning these diverse traditions. Among them, Université de Montréal was offered two sets of Balinese orchestras, a novelty that allowed a community of Montreal gamelan enthusiasts to form and engage with this music. A few generations later, a large body of archives have amassed, framing the history of this niche community’s achievements. This data, scattered in public and private archive collections, comes in various formats: Digital Audio Tape, audio cassettes, Video Home System videotape, digital files, photos, reel-to-reel audiotape, posters, concert programs, letters, TV shows, reports and more. Attempting to study these documents in order to unearth a chronology of gamelan in Montreal has proven to be challenging since no suitable platform for preservation, storage, and research currently exists. These files are, therefore, hard to find due to their decentralized locations. Additionally, most of the documents in older formats have yet to be digitized. In the case of recent digital files, such as pictures or rehearsal recordings, their locations can be even messier and their quantity overwhelming. Aside from the basic issue of choosing a suitable repository platform, questions of legal rights and methodology arise. For posterity, these documents should nonetheless be digitized, organized, and stored in an easily accessible online repository. This paper aims to underline the various challenges encountered in the early stages of such a project as well as to suggest ways of overcoming the obstacles to a thorough archival investigation.

Keywords: archival work, archives, Balinese gamelan, Canada, Gamelan, Indonesia, Javanese gamelan, Montreal

Procedia PDF Downloads 87
311 Colloquialism in Audiovisual Translation: English Subtitling of the Lebanese Film Capernaum as a Case Study

Authors: Fatima Saab

Abstract:

This paper attempts to study colloquialism in audio-visual translation, with particular emphasis given to investigating the difficulties and challenges encountered by subtitlers in translating Lebanese colloquial into English. To achieve the main objectives of this study, ample and thorough cultural and translational analysis of examples drawn from the subtitled movie Capernaum are presented in order to identify the strategies used to overcome cultural barriers and differences and to show the process of decision-making by the translator. Also, special attention is given to explain the technicalities in translating subtitles and how they affect the translation process. The research is a descriptive analytical study whereby the writer sets out empirical observations, consisting of descriptive and analytical examination of the difficulties and problems associated with translating Arabic colloquialisms, specifically Lebanese, into English in the subtitled film, Capernaum. The research methodology utilizes a qualitative approach to group the selected data into the subtitling strategies presented by Gottlieb under the domesticating or foreignizing strategies according to Venuti's Model. It is shown that producing the same meanings to a foreign audience is not an easy task. The background of cultural elements and the stories that make up the history and mindset of the Lebanese and Arabic peoples leads to the use of the transfer and paraphrase methodologies most of the time (81% of the sample used for analysis). The research shows that translating and subtitling colloquialism needs special skills by the translators to overcome the challenges imposed by the limited presentation space as well as cultural differences. Translation of colloquial Arabic/Lebanese can be achieved to a certain extent and delivering the meaning and effect of the source language culture is accomplished in as much as the translator investigates and relates to the target culture.

Keywords: Lebanese colloquial, audio-visual translation, subtitling, Capernaum

Procedia PDF Downloads 119
310 Linguistic Accessibility and Audiovisual Translation: Corpus Linguistics as a Tool for Analysis

Authors: Juan-Pedro Rica-Peromingo

Abstract:

The important change taking place with respect to the media and the audiovisual world in Europe needs to benefit all populations, in particular those with special needs, such as the deaf and hard-of-hearing population (SDH) and blind and partially-sighted population (AD). This recent interest in the field of audiovisual translation (AVT) can be observed in the teaching and learning of the different modes of AVT in the degree and post-degree courses at Spanish universities, which expand the interest and practice of AVT linguistic accessibility. We present a research project led at the UCM which consists of the compilation of AVT activities for teaching purposes and tries to analyze the creation and reception of SDH and AD: the AVLA Project (Audiovisual Learning Archive), which includes audiovisual materials carried out by the university students on different AVT modes and evaluations from the blind and deaf informants. In this study, we present the materials created by the students. A group of the deaf and blind population has been in charge of testing the student's SDH and AD corpus of audiovisual materials through some questionnaires used to evaluate the students’ production. These questionnaires include information about the reception of the subtitles and the audio descriptions from linguistic and technical points of view. With all the materials compiled in the research project, a corpus with both the students’ production and the recipients’ evaluations is being compiled: the CALING (Corpus de Accesibilidad Lingüística) corpus. Preliminary results will be presented with respect to those aspects, difficulties, and deficiencies in the SDH and AD included in the corpus, specifically with respect to the length of subtitles, the position of the contextual information on the screen, and the text included in the audio descriptions and tone of voice used. These results may suggest some changes and improvements in the quality of the SDH and AD analyzed. In the end, demand for the teaching and learning of AVT and linguistic accessibility at a university level and some important changes in the norms which regulate SDH and AD nationally and internationally will be suggested.

Keywords: audiovisual translation, corpus linguistics, linguistic accessibility, teaching

Procedia PDF Downloads 47
309 The Relationship between Spindle Sound and Tool Performance in Turning

Authors: N. Seemuang, T. McLeay, T. Slatter

Abstract:

Worn tools have a direct effect on the surface finish and part accuracy. Tool condition monitoring systems have been developed over a long period and used to avoid a loss of productivity resulting from using a worn tool. However, the majority of tool monitoring research has applied expensive sensing systems not suitable for production. In this work, the cutting sound in turning machine was studied using microphone. Machining trials using seven cutting conditions were conducted until the observable flank wear width (FWW) on the main cutting edge exceeded 0.4 mm. The cutting inserts were removed from the tool holder and the flank wear width was measured optically. A microphone with built-in preamplifier was used to record the machining sound of EN24 steel being face turned by a CNC lathe in a wet cutting condition using constant surface speed control. The sound was sampled at 50 kS/s and all sound signals recorded from microphone were transformed into the frequency domain by FFT in order to establish the frequency content in the audio signature that could be then used for tool condition monitoring. The extracted feature from audio signal was compared to the flank wear progression on the cutting inserts. The spectrogram reveals a promising feature, named as ‘spindle noise’, which emits from the main spindle motor of turning machine. The spindle noise frequency was detected at 5.86 kHz of regardless of cutting conditions used on this particular CNC lathe. Varying cutting speed and feed rate have an influence on the magnitude of power spectrum of spindle noise. The magnitude of spindle noise frequency alters in conjunction with the tool wear progression. The magnitude increases significantly in the transition state between steady-state wear and severe wear. This could be used as a warning signal to prepare for tool replacement or adapt cutting parameters to extend tool life.

Keywords: tool wear, flank wear, condition monitoring, spindle noise

Procedia PDF Downloads 303
308 Digi-Buddy: A Smart Cane with Artificial Intelligence and Real-Time Assistance

Authors: Amaladhithyan Krishnamoorthy, Ruvaitha Banu

Abstract:

Vision is considered as the most important sense in humans, without which leading a normal can be often difficult. There are many existing smart canes for visually impaired with obstacle detection using ultrasonic transducer to help them navigate. Though the basic smart cane increases the safety of the users, it does not help in filling the void of visual loss. This paper introduces the concept of Digi-Buddy which is an evolved smart cane for visually impaired. The cane consists for several modules, apart from the basic obstacle detection features; the Digi-Buddy assists the user by capturing video/images and streams them to the server using a wide-angled camera, which then detects the objects using Deep Convolutional Neural Network. In addition to determining what the particular image/object is, the distance of the object is assessed by the ultrasonic transducer. The sound generation application, modelled with the help of Natural Language Processing is used to convert the processed images/object into audio. The object detected is signified by its name which is transmitted to the user with the help of Bluetooth hear phones. The object detection is extended to facial recognition which maps the faces of the person the user meets in the database of face images and alerts the user about the person. One of other crucial function consists of an automatic-intimation-alarm which is triggered when the user is in an emergency. If the user recovers within a set time, a button is provisioned in the cane to stop the alarm. Else an automatic intimation is sent to friends and family about the whereabouts of the user using GPS. In addition to safety and security by the existing smart canes, the proposed concept devices to be implemented as a prototype helping visually-impaired visualize their surroundings through audio more in an amicable way.

Keywords: artificial intelligence, facial recognition, natural language processing, internet of things

Procedia PDF Downloads 317
307 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 44
306 Causes and Consequences of Intuitive Animal Communication: A Case Study at Panthera Africa

Authors: Cathrine Scharning Cornwall-Nyquist, David Rafael Vaz Fernandes

Abstract:

Since its origins, mankind has been dreaming of communicating directly with other animals. Past civilizations interacted on different levels with other species and recognized them in their rituals and daily activities. However, recent scientific developments have limited the ability of humans to consider deeper levels of interaction beyond observation and/or physical behavior. In recent years, animal caretakers and facilities such as sanctuaries or rescue centers have been introducing new techniques based on intuition. Most of those initiatives are related to specific cases, such as the incapacity to understand an animal’s behavior. Respected organizations also include intuitive animal communication (IAC) sessions to follow up on past interventions with their animals. Despite the lack of credibility of this discipline, some animal caring structures have opted to integrate IAC into their daily routines and approaches to animal welfare. At this stage, animal communication will be generally defined as the ability of humans to communicate with animals on an intuitive level. The trend in the field remains to be explored. The lack of theory and previous research urges the scientific community to improve the description of the phenomenon and its consequences. Considering the current scenario, qualitative approaches may become a suitable pathway to explore this topic. The purpose of this case study is to explore the beliefs behind and the consequences of an approach based on intuitive animal communication techniques for Panthera Africa (PA), an ethical sanctuary located in South Africa. Due to their personal experience, the Sanctuary’s founders have developed a philosophy based on IAC while respecting the world's highest standards for big cat welfare. Their dual approach is reflected in their rescues, daily activities, and healing animals’ trauma. The case study's main research questions will be: (i) Why do they choose to apply IAC in their work? (ii) What consequences to their activities do IAC bring? (iii) What effects do IAC techniques bring in their interactions with the outside world? Data collection will be gathered on-site via: (i) Complete participation (field notes); (ii) Semi-structured interviews (audio transcriptions); (iii) Document analysis (internal procedures and policies); (iv) Audio-visual material (communication with third parties). The main researcher shall become an active member of the Sanctuary during a 30-day period and have full access to the site. Access to documents and audio-visual materials will be granted on a request basis. Interviews are expected to be held with PA founders and staff members and with IAC practitioners related to the facility. The information gathered shall enable the researcher to provide an extended description of the phenomenon and explore its internal and external consequences for Panthera Africa.

Keywords: animal welfare, intuitive animal communication, Panthera Africa, rescue

Procedia PDF Downloads 62
305 Sound Analysis of Young Broilers Reared under Different Stocking Densities in Intensive Poultry Farming

Authors: Xiaoyang Zhao, Kaiying Wang

Abstract:

The choice of stocking density in poultry farming is a potential way for determining welfare level of poultry. However, it is difficult to measure stocking densities in poultry farming because of a lot of variables such as species, age and weight, feeding way, house structure and geographical location in different broiler houses. A method was proposed in this paper to measure the differences of young broilers reared under different stocking densities by sound analysis. Vocalisations of broilers were recorded and analysed under different stocking densities to identify the relationship between sounds and stocking densities. Recordings were made continuously for three-week-old chickens in order to evaluate the variation of sounds emitted by the animals at the beginning. The experimental trial was carried out in an indoor reared broiler farm; the audio recording procedures lasted for 5 days. Broilers were divided into 5 groups, stocking density treatments were 8/m², 10/m², 12/m² (96birds/pen), 14/m² and 16/m², all conditions including ventilation and feed conditions were kept same except from stocking densities in every group. The recordings and analysis of sounds of chickens were made noninvasively. Sound recordings were manually analysed and labelled using sound analysis software: GoldWave Digital Audio Editor. After sound acquisition process, the Mel Frequency Cepstrum Coefficients (MFCC) was extracted from sound data, and the Support Vector Machine (SVM) was used as an early detector and classifier. This preliminary study, conducted in an indoor reared broiler farm shows that this method can be used to classify sounds of chickens under different densities economically (only a cheap microphone and recorder can be used), the classification accuracy is 85.7%. This method can predict the optimum stocking density of broilers with the complement of animal welfare indicators, animal productive indicators and so on.

Keywords: broiler, stocking density, poultry farming, sound monitoring, Mel Frequency Cepstrum Coefficients (MFCC), Support Vector Machine (SVM)

Procedia PDF Downloads 115
304 Using E-learning in a Tertiary Institution during Community Outbreak of COVID-19 in Hong Kong

Authors: Susan Ka Yee Chow

Abstract:

The Coronavirus disease (COVID-19) reached Hong Kong in 2019 resulting in epidemic in late January 2020. Considering the epidemic development, tertiary institutions made announcements that all on-campus classes were suspended since 01/29/2020. In Tung Wah College, e-learning was adopted in all courses for all programmes. For the undergraduate nursing students, the contact hours and curriculum are bounded by the Nursing Council of Hong Kong to ensure core competence after graduation. Unlike the usual e-learning where students are allowed having flexibility of time and place in their learning, real time learning mode using Blackboard was used to mimic the actual classroom learning environment. Students were required to attend classes according to the timetable using online platform. For lectures, voice over PowerPoint file was the initial step for mass lecturing. Real time lecture was then adopted to improve interactions between teacher and students. Post-lecture quizzes were developed to monitor the effectiveness of lecture delivery. The seminars and tutorials were conducted using real time mode where students were separated into small groups with interactive discussions with teacher within the group. Live time demonstrations were conducted during laboratory sessions. All teaching sessions were audio/video recorded for students’ referral. The assessments including seminar presentation and debate were retained. The learning mode creates an atmosphere for students to display the visual, audio and written works in a non-threatening atmosphere. Other students could comment using text or direct voice as they desired. Real time online learning is the pedagogy to replace classroom contacts in the emergent and unforeseeable circumstances. The learning pace and interaction between students and students with teacher are maintained. The learning mode has the advantage of creating an effective and beneficial learning experience.

Keywords: e-learning, nursing curriculum, real time mode, teaching and learning

Procedia PDF Downloads 90
303 The Current Level of Shared Decision-Making in Head-And-Neck Oncology: An Exploratory Study – Preliminary Results

Authors: Anne N. Heirman, Song Duimel, Rob van Son, Lisette van der Molen, Richard Dirven, Gyorgi B. Halmos, Julia van Weert, Michiel W.M. van den Brekel

Abstract:

Objectives: Treatments for head-neck cancer are drastic and often significantly impact the quality of life and appearance of patients. Shared decision-making (SDM) beholds a collaboration between patient and doctor in which the most suitable treatment can be chosen by integrating patient preferences, values, and medical information. SDM has a lot of advantages that would be useful in making difficult treatment choices. The objective of this study was to determine the current level of SDM among patients and head-and-neck surgeons. Methods: Consultations of patients with a non-cutaneous head-and-neck malignancy facing a treatment decision were selected and included. If given informed consent, the consultation was recorded with an audio recorder, and the patient and surgeon filled in a questionnaire immediately after the consultation. The SDM level of the consultation was scored objectively by independent observers who judged audio recordings of the consultation using the OPTION5-scale, ranging from 0% (no SDM) to 100% (optimum SDM), as well as subjectively by patients (using the SDM-Q-9 and Control preference scale) and clinicians (SDM-Q-Doc, modified control preference scale) percentages. Preliminary results: Five head-neck surgeons have each at least seven recorded conversations with different patients. One of them was trained in SDM. The other four had no experience with SDM. Most patients were male (74%), and oropharyngeal carcinoma was the most common diagnosis (41%), followed by oral cancer (33%). Five patients received palliative treatment of which two patients were not treated recording guidelines. At this moment, all recordings are scored by the two independent observers. Analysis of the results will follow soon. Conclusion: The current study will determine to what extent there is a discrepancy between the objective and subjective level of shared decision-making (SDM) during a doctor-patient consultation in Head-and-Neck surgery. The results of the analysis will follow shortly.

Keywords: head-and-neck oncology, patient involvement, physician-patient relations, shared decision making

Procedia PDF Downloads 71
302 Analyzing the Sound of Space - The Glissando of the Planets and the Spiral Movement on the Sound of Earth, Saturn and Jupiter

Authors: L. Tonia, I. Daglis, W. Kurth

Abstract:

The sound of the universe creates an affinity with the sounds of music. The analysis of the sound of space focuses on the existence of a tone material, the microstructure and macrostructure, and the form of the sound through the signals recorded during the flight of the spacecraft Van Allen Probes and Cassini’s mission. The sound becomes from the frequencies that belong to electromagnetic waves. Plasma Wave Science Instrument and Electric and Magnetic Field Instrument Suite and Integrated Science (EMFISIS) recorded the signals from space. A transformation of that signals to audio gave the opportunity to study and analyze the sound. Due to the fact that the musical tone pitch has a frequency and every electromagnetic wave produces a frequency too, the creation of a musical score, which appears as the sound of space, can give information about the form, the symmetry, and the harmony of the sound. The conversion of space radio emissions to audio provides a number of tone pitches corresponding to the original frequencies. Through the process of these sounds, we have the opportunity to present a music score that “composed” from space. In this score, we can see some basic features associated with the music form, the structure, the tone center of music material, the construction and deconstruction of the sound. The structure, which was built through a harmonic world, includes tone centers, major and minor scales, sequences of chords, and types of cadences. The form of the sound represents the symmetry of a spiral movement not only in micro-structural but also to macro-structural shape. Multiple glissando sounds in linear and polyphonic process of the sound, founded in magnetic fields around Earth, Saturn, and Jupiter, but also a spiral movement appeared on the spectrogram of the sound. Whistles, Auroral Kilometric Radiations, and Chorus emissions reveal movements similar to musical excerpts of works by contemporary composers like Sofia Gubaidulina, Iannis Xenakis, EinojuhamiRautavara.

Keywords: space sound analysis, spiral, space music, analysis

Procedia PDF Downloads 143
301 Older Adults’ Coping during a Pandemic

Authors: Aditya Jayadas

Abstract:

During a pandemic like the one we are in with COVID-19, older adults, especially those who live in a senior retirement facility, experience even bigger challenges as they are often dependent on other individuals for care. Many older adults are dependent on caregivers to assist with their instrumented activities of daily living (IADL). With travel restrictions imposed during a pandemic, there is a critical need to ensure that older adults who are homebound continue to be able to participate in physical exercise, cognitive exercise, and social interaction programs. The objective of this study was to better understand the challenges that older adults faced during the pandemic and what they were doing specifically to cope with the pandemic physically, mentally, and through social interaction. A focus group was conducted with ten older adults (age: 82.70 ± 7.81 years; nine female and one male) who resided in a senior retirement facility. During the course of one hour, seven open-ended questions were posed to the participants: a) What has changed in your life since the start of the pandemic, b) What has been most challenging for you, c) What are you doing to take care of yourself, d) Are you doing anything specifically as it relates to your physical health, e) Are you doing anything specifically as it relates to your mental health, f) What did you do for social interaction during the pandemic, g) Is there anything else you would like to share as it relates to your experience during the pandemic. The focus group session was audio-taped, and verbatim transcripts were created to evaluate the responses of the participants. The transcript consisted of 4,698 words and 293 lines of text. The data was analyzed using content analysis. The unit of analysis was the text from the audio recordings that were transcribed. From the review of the transcribed text, themes and sub-themes were identified, along with salient quotes under each sub-theme. The major themes that emerged from the data were: having a routine, engaging in activities, attending exercise classes, use of technology, family, community, and prayer. The quotes under the sub-themes provided compelling evidence of how older adults coped during the pandemic while addressing the challenges they faced and developing strategies to address their physical and mental health while interacting with others. Lessons learned from this focus group can be used to develop specific physical exercise, cognitive exercise, and social interaction programs that benefit the health and well-being of older adults.

Keywords: cognitive exercise, pandemic, physical exercise, social interaction

Procedia PDF Downloads 41
300 Statistical Investigation Projects: A Way for Pre-Service Mathematics Teachers to Actively Solve a Campus Problem

Authors: Muhammet Şahal, Oğuz Köklü

Abstract:

As statistical thinking and problem-solving processes have become increasingly important, teachers need to be more rigorously prepared with statistical knowledge to teach their students effectively. This study examined preservice mathematics teachers' development of statistical investigation projects using data and exploratory data analysis tools, following a design-based research perspective and statistical investigation cycle. A total of 26 pre-service senior mathematics teachers from a public university in Turkiye participated in the study. They formed groups of 3-4 members voluntarily and worked on their statistical investigation projects for six weeks. The data sources were audio recordings of pre-service teachers' group discussions while working on their projects in class, whole-class video recordings, and each group’s weekly and final reports. As part of the study, we reviewed weekly reports, provided timely feedback specific to each group, and revised the following week's class work based on the groups’ needs and development in their project. We used content analysis to analyze groups’ audio and classroom video recordings. The participants encountered several difficulties, which included formulating a meaningful statistical question in the early phase of the investigation, securing the most suitable data collection strategy, and deciding on the data analysis method appropriate for their statistical questions. The data collection and organization processes were challenging for some groups and revealed the importance of comprehensive planning. Overall, preservice senior mathematics teachers were able to work on a statistical project that contained the formulation of a statistical question, planning, data collection, analysis, and reaching a conclusion holistically, even though they faced challenges because of their lack of experience. The study suggests that preservice senior mathematics teachers have the potential to apply statistical knowledge and techniques in a real-world context, and they could proceed with the project with the support of the researchers. We provided implications for the statistical education of teachers and future research.

Keywords: design-based study, pre-service mathematics teachers, statistical investigation projects, statistical model

Procedia PDF Downloads 42
299 A Simulation-Based Study of Dust Ingression into Microphone of Indoor Consumer Electronic Devices

Authors: Zhichao Song, Swanand Vaidya

Abstract:

Nowadays, most portable (e.g., smartphones) and wearable (e.g., smartwatches and earphones) consumer hardware are designed to be dustproof following IP5 or IP6 ratings to ensure the product is able to handle potentially dusty outdoor environments. On the other hand, the design guideline is relatively vague for indoor devices (e.g., smart displays and speakers). While it is generally believed that the indoor environment is much less dusty, in certain circumstances, dust ingression is still able to cause functional failures, such as microphone frequency response shift and camera black spot, or cosmetic dissatisfaction, mainly the dust build up in visible pockets and gaps which is hard to clean. In this paper, we developed a simulation methodology to analyze dust settlement and ingression into known ports of a device. A closed system is initialized with dust particles whose sizes follow Weibull distribution based on data collected in a user study, and dust particle movement was approximated as a settlement in stationary fluid, which is governed by Stokes’ law. Following this method, we simulated dust ingression into MEMS microphone through the acoustic port and protective mesh. Various design and environmental parameters are evaluated including mesh pore size, acoustic port depth-to-diameter ratio, mass density of dust material and inclined angle of microphone port. Although the dependencies of dust resistance on these parameters are all monotonic, smaller mesh pore size, larger acoustic depth-to-opening ratio and more inclined microphone placement (towards horizontal direction) are preferred for dust resistance; these preferences may represent certain trade-offs in audio performance and compromise in industrial design. The simulation results suggest the quantitative ranges of these parameters, with more pronounced effects in the improvement of dust resistance. Based on the simulation results, we proposed several design guidelines that intend to achieve an overall balanced design from audio performance, dust resistance, and flexibility in industrial design.

Keywords: dust settlement, numerical simulation, microphone design, Weibull distribution, Stoke's equation

Procedia PDF Downloads 81
298 A Two-Step Framework for Unsupervised Speaker Segmentation Using BIC and Artificial Neural Network

Authors: Ahmad Alwosheel, Ahmed Alqaraawi

Abstract:

This work proposes a new speaker segmentation approach for two speakers. It is an online approach that does not require a prior information about speaker models. It has two phases, a conventional approach such as unsupervised BIC-based is utilized in the first phase to detect speaker changes and train a Neural Network, while in the second phase, the output trained parameters from the Neural Network are used to predict next incoming audio stream. Using this approach, a comparable accuracy to similar BIC-based approaches is achieved with a significant improvement in terms of computation time.

Keywords: artificial neural network, diarization, speaker indexing, speaker segmentation

Procedia PDF Downloads 465
297 Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks

Authors: B. R. Campomanes-Alvarez, P. Quiros, B. Fernandez

Abstract:

Automatic Speech Recognition (ASR) is a machine-based process of decoding and transcribing oral speech. A typical ASR system receives acoustic input from a speaker or an audio file, analyzes it using algorithms, and produces an output in the form of a text. Some speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian Mixture Models (GMMs) to determine how well each state of each HMM fits a short window of frames of coefficients that represents the acoustic input. Another way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition systems. Acoustic models for state-of-the-art ASR systems are usually training on massive amounts of data. However, audio files with their corresponding transcriptions can be difficult to obtain, especially in the Spanish language. Hence, in the case of these low-resource scenarios, building an ASR model is considered as a complex task due to the lack of labeled data, resulting in an under-trained system. Semi-supervised learning approaches arise as necessary tasks given the high cost of transcribing audio data. The main goal of this proposal is to develop a procedure based on acoustic semi-supervised learning for Spanish ASR systems by using DNNs. This semi-supervised learning approach consists of: (a) Training a seed ASR model with a DNN using a set of audios and their respective transcriptions. A DNN with a one-hidden-layer network was initialized; increasing the number of hidden layers in training, to a five. A refinement, which consisted of the weight matrix plus bias term and a Stochastic Gradient Descent (SGD) training were also performed. The objective function was the cross-entropy criterion. (b) Decoding/testing a set of unlabeled data with the obtained seed model. (c) Selecting a suitable subset of the validated data to retrain the seed model, thereby improving its performance on the target test set. To choose the most precise transcriptions, three confidence scores or metrics, regarding the lattice concept (based on the graph cost, the acoustic cost and a combination of both), was performed as selection technique. The performance of the ASR system will be calculated by means of the Word Error Rate (WER). The test dataset was renewed in order to extract the new transcriptions added to the training dataset. Some experiments were carried out in order to select the best ASR results. A comparison between a GMM-based model without retraining and the DNN proposed system was also made under the same conditions. Results showed that the semi-supervised ASR-model based on DNNs outperformed the GMM-model, in terms of WER, in all tested cases. The best result obtained an improvement of 6% relative WER. Hence, these promising results suggest that the proposed technique could be suitable for building ASR models in low-resource environments.

Keywords: automatic speech recognition, deep neural networks, machine learning, semi-supervised learning

Procedia PDF Downloads 315