Search results for: audio fingerprinting
281 Robust Features for Impulsive Noisy Speech Recognition Using Relative Spectral Analysis
Authors: Hajer Rahali, Zied Hajaiej, Noureddine Ellouze
Abstract:
The goal of speech parameterization is to extract the relevant information about what is being spoken from the audio signal. In speech recognition systems Mel-Frequency Cepstral Coefficients (MFCC) and Relative Spectral Mel-Frequency Cepstral Coefficients (RASTA-MFCC) are the two main techniques used. It will be shown in this paper that it presents some modifications to the original MFCC method. In our work the effectiveness of proposed changes to MFCC called Modified Function Cepstral Coefficients (MODFCC) were tested and compared against the original MFCC and RASTA-MFCC features. The prosodic features such as jitter and shimmer are added to baseline spectral features. The above-mentioned techniques were tested with impulsive signals under various noisy conditions within AURORA databases.Keywords: auditory filter, impulsive noise, MFCC, prosodic features, RASTA filter
Procedia PDF Downloads 425280 Characterization of Internet Exchange Points by Using Quantitative Data
Authors: Yamba Dabone, Tounwendyam Frédéric Ouedraogo, Pengwendé Justin Kouraogo, Oumarou Sie
Abstract:
Reliable data transport over the Internet is one of the goals of researchers in the field of computer science. Data such as videos and audio files are becoming increasingly large. As a result, transporting them over the Internet is becoming difficult. Therefore, it has been important to establish a method to locally interconnect autonomous systems (AS) with each other to facilitate traffic exchange. It is in this context that Internet Exchange Points (IXPs) are set up to facilitate local and even regional traffic. They are now the lifeblood of the Internet. Therefore, it is important to think about the factors that can characterize IXPs. However, other more quantifiable characteristics can help determine the quality of an IXP. In addition, these characteristics may allow ISPs to have a clearer view of the exchange node and may also convince other networks to connect to an IXP. To that end, we define five new IXP characteristics: the attraction rate (τₐₜₜᵣ); and the peering rate (τₚₑₑᵣ); the target rate of an IXP (Objₐₜₜ); the number of IXP links (Nₗᵢₙₖ); the resistance rate τₑ𝒻𝒻 and the attraction failure rate (τ𝒻).Keywords: characteristic, autonomous system, internet service provider, internet exchange point, rate
Procedia PDF Downloads 94279 Comparative Analysis of Universal Filtered Multi Carrier and Filtered Orthogonal Frequency Division Multiplexing Systems for Wireless Communications
Authors: Raja Rajeswari K
Abstract:
Orthogonal Frequency Division Multiplexing (OFDM), a multi Carrier transmission technique that has been used in implementing the majority of wireless applications like Wireless Network Protocol Standards (like IEEE 802.11a, IEEE 802.11n), in telecommunications (like LTE, LTE-Advanced) and also in Digital Audio & Video Broadcast standards. The latest research and development in the area of orthogonal frequency division multiplexing, Universal Filtered Multi Carrier (UFMC) & Filtered OFDM (F-OFDM) has attracted lots of attention for wideband wireless communications. In this paper UFMC & F-OFDM system are implemented and comparative analysis are carried out in terms of M-ary QAM modulation scheme over Dolph-chebyshev filter & rectangular window filter and to estimate Bit Error Rate (BER) over Rayleigh fading channel.Keywords: UFMC, F-OFDM, BER, M-ary QAM
Procedia PDF Downloads 169278 Multimodal Characterization of Emotion within Multimedia Space
Authors: Dayo Samuel Banjo, Connice Trimmingham, Niloofar Yousefi, Nitin Agarwal
Abstract:
Technological advancement and its omnipresent connection have pushed humans past the boundaries and limitations of a computer screen, physical state, or geographical location. It has provided a depth of avenues that facilitate human-computer interaction that was once inconceivable such as audio and body language detection. Given the complex modularities of emotions, it becomes vital to study human-computer interaction, as it is the commencement of a thorough understanding of the emotional state of users and, in the context of social networks, the producers of multimodal information. This study first acknowledges the accuracy of classification found within multimodal emotion detection systems compared to unimodal solutions. Second, it explores the characterization of multimedia content produced based on their emotions and the coherence of emotion in different modalities by utilizing deep learning models to classify emotion across different modalities.Keywords: affective computing, deep learning, emotion recognition, multimodal
Procedia PDF Downloads 156277 Musical Composition by Computer with Inspiration from Files of Different Media Types
Authors: Cassandra Pratt Romero, Andres Gomez de Silva Garza
Abstract:
This paper describes a computational system designed to imitate human inspiration during musical composition. The system is called MIS (Musical Inspiration Simulator). The MIS system is inspired by media to which human beings are exposed daily (visual, textual, or auditory) to create new musical compositions based on the emotions detected in said media. After building the system we carried out a series of evaluations with volunteer users who used MIS to compose music based on images, texts, and audio files. The volunteers were asked to judge the harmoniousness and innovation in the system's compositions. An analysis of the results points to the difficulty of computational analysis of the characteristics of the media to which we are exposed daily, as human emotions have a subjective character. This observation will direct future improvements in the system.Keywords: human inspiration, musical composition, musical composition by computer, theory of sensation and human perception
Procedia PDF Downloads 183276 Screen Casting Instead of Illegible Scribbles: Making a Mini Movie for Feedback on Students’ Scholarly Papers
Authors: Kerri Alderson
Abstract:
There is pervasive awareness by post secondary faculty that written feedback on course assignments is inconsistently reviewed by students. In order to support student success and growth, a novel method of providing feedback was sought, and screen casting - short, narrated “movies” of audio visual instructor feedback on students’ scholarly papers - was provided as an alternative to traditional means. An overview of the teaching and learning experience as well as the user-friendly software utilized will be presented. This study covers an overview of this more direct, student-centered medium for providing feedback using technology familiar to post secondary students. Reminiscent of direct personal contact, the personalized video feedback is positively evaluated by students as a formative medium for student growth in scholarly writing.Keywords: education, pedagogy, screen casting, student feedback, teaching and learning
Procedia PDF Downloads 119275 Evaluation of Genetic Fidelity and Phytochemical Profiling of Micropropagated Plants of Cephalantheropsis obcordata: An Endangered Medicinal Orchid
Authors: Gargi Prasad, Ashiho A. Mao, Deepu Vijayan, S. Mandal
Abstract:
The main objective of the present study was to optimize and develop an efficient protocol for in vitro propagation of a medicinally important orchid Cephalantheropsis obcordata (Lindl.) Ormerod along with genetic stability analysis of regenerated plants. This plant has been traditionally used in Chinese folk medicine and the decoction of whole plant is known to possess anticancer activity. Nodal segments used as explants were inoculated on Murashige and Skoog (MS) medium supplemented with various concentrations of isopentenyl adenine (2iP). The rooted plants were successfully acclimatized in the greenhouse with 100% survival rate. Inter-simple sequence repeats (ISSR) markers were used to assess the genetic fidelity of in vitro raised plants and the mother plant. It was revealed that monomorphic bands showing the absence of polymorphism in all in vitro raised plantlets analyzed, confirming the genetic uniformity among the regenerants. Phytochemical analysis was done to compare the antioxidant activities and HPLC fingerprinting assay of 80% aqueous ethanol extract of the leaves and stem of in vitro and in vivo grown C. obcordata. The extracts of the plants were examined for their antioxidant activities by using free radical 1, 1-diphenyl-2-picryl hydrazyl (DPPH) scavenging method, 2,2’-azino-bis (3-ethylbenzothiazoline-6-sulfonic acid) (ABTS) radical scavenging ability, reducing power capacity, estimation of total phenolic content, flavonoid content and flavonol content. A simplified method for the detection of ascorbic acid, phenolic acids and flavonoids content was also developed by using reversed phase high-performance liquid chromatography (HPLC). This is the first report on the micropropagation, genetic integrity study and quantitative phytochemical analysis of in vitro regenerated plants of C. obcordata.Keywords: Cephalantheropsis obcordata, genetic fidelity, ISSR markers, HPLC
Procedia PDF Downloads 156274 Applications of Visual Ethnography in Public Anthropology
Authors: Subramaniam Panneerselvam, Gunanithi Perumal, KP Subin
Abstract:
The Visual Ethnography is used to document the culture of a community through a visual means. It could be either photography or audio-visual documentation. The visual ethnographic techniques are widely used in visual anthropology. The visual anthropologists use the camera to capture the cultural image of the studied community. There is a scope for subjectivity while the culture is documented by an external person. But the upcoming of the public anthropology provides an opportunity for the participants to document their own culture. There is a need to equip the participants with the skill of doing visual ethnography. The mobile phone technology provides visual documentation facility to everyone to capture the moments instantly. The visual ethnography facilitates the multiple-interpretation for the audiences. This study explores the effectiveness of visual ethnography among the tribal youth through public anthropology perspective. The case study was conducted to equip the tribal youth of Nilgiris in visual ethnography and the outcome of the experiment shared in this paper.Keywords: visual ethnography, visual anthropology, public anthropology, multiple-interpretation, case study
Procedia PDF Downloads 183273 Using Electronic Books to Enhance the Museum Visitors' Experience
Authors: Elvin Karaaslan Klose
Abstract:
Museums are important sites of informal, often semi-structured and self-paced learning. Challenged by digital alternatives and increased expectations from their visitors, museums have to adapt to the digital age by enriching their collection and educational content with additional options for interactivity. One such option lies in the concept of the electronic book, which can be used either on dedicated devices or downloaded by visitors before entering the exhibition area. These electronic books serve as an alternative or supplement to the classic audio guide and provide visitors with information about artifacts as well as background stories and factoids about the subjects of the exhibition. Bringing such interactive elements into the museum experience has been shown to increase information retention and enjoyment among young aged visitors and adults. This article aims to bring together both theoretical frameworks and practical examples of how interactive media in the form of electronic books can be used to enhance the experience of the museum visitor.Keywords: electronic books, interactive media, arts education, museum education
Procedia PDF Downloads 213272 Passive Attenuation with Multiple Resonator Rings for Musical Instruments Equalization
Authors: Lorenzo Bonoldi, Gianluca Memoli, Abdelhalim Azbaid El Ouahabi
Abstract:
In this paper, a series of ring-shaped attenuators utilizing Helmholtz and quarter wavelength resonators in variable, fixed, and combined configurations have been manufactured using a 3D printer. We illustrate possible uses by incorporating such devices into musical instruments (e.g. in acoustic guitar sound holes) and audio speakers with a view to controlling such devices tonal emissions without electronic equalization systems. Numerical investigations into the transmission loss values of these ring-shaped attenuators using finite element method simulations (COMSOL Multiphysics) have been presented in the frequency range of 100– 1000 Hz. We compare such results for each attenuator model with experimental measurements using different driving sources such as white noise, a maximum-length sequence (MLS), square and sine sweep pulses, and point scans in the frequency domain. Finally, we present a preliminary discussion on the comparison of numerical and experimental results.Keywords: equaliser, metamaterials, musical, instruments
Procedia PDF Downloads 174271 Bit Error Rate (BER) Performance of Coherent Homodyne BPSK-OCDMA Network for Multimedia Applications
Authors: Morsy Ahmed Morsy Ismail
Abstract:
In this paper, the structure of a coherent homodyne receiver for the Binary Phase Shift Keying (BPSK) Optical Code Division Multiple Access (OCDMA) network is introduced based on the Multi-Length Weighted Modified Prime Code (ML-WMPC) for multimedia applications. The Bit Error Rate (BER) of this homodyne detection is evaluated as a function of the number of active users and the signal to noise ratio for different code lengths according to the multimedia application such as audio, voice, and video. Besides, the Mach-Zehnder interferometer is used as an external phase modulator in homodyne detection. Furthermore, the Multiple Access Interference (MAI) and the receiver noise in a shot-noise limited regime are taken into consideration in the BER calculations.Keywords: OCDMA networks, bit error rate, multiple access interference, binary phase-shift keying, multimedia
Procedia PDF Downloads 175270 Comparison of the Effect of Heart Rate Variability Biofeedback and Slow Breathing Training on Promoting Autonomic Nervous Function Related Performance
Authors: Yi Jen Wang, Yu Ju Chen
Abstract:
Background: Heart rate variability (HRV) biofeedback can promote autonomic nervous function, sleep quality and reduce psychological stress. In HRV biofeedback training, it is hoped that through the guidance of machine video or audio, the patient can breathe slowly according to his own heart rate changes so that the heart and lungs can achieve resonance, thereby promoting the related effects of autonomic nerve function; while, it is also pointed out that if slow breathing of 6 times per minute can also guide the case to achieve the effect of cardiopulmonary resonance. However, there is no relevant research to explore the comparison of the effectiveness of cardiopulmonary resonance by using video or audio HRV biofeedback training and metronome-guided slow breathing. Purpose: To compare the promotion of autonomic nervous function performance between using HRV biofeedback and slow breathing guided by a metronome. Method: This research is a kind of experimental design with convenient sampling; the cases are randomly divided into the heart rate variability biofeedback training group and the slow breathing training group. The HRV biofeedback training group will conduct HRV biofeedback training in a four-week laboratory and use the home training device for autonomous training; while the slow breathing training group will conduct slow breathing training in the four-week laboratory using the mobile phone APP breathing metronome to guide the slow breathing training, and use the mobile phone APP for autonomous training at home. After two groups were enrolled and four weeks after the intervention, the autonomic nervous function-related performance was repeatedly measured. Using the chi-square test, student’s t-test and other statistical methods to analyze the results, and use p <0.05 as the basis for statistical significance. Results: A total of 27 subjects were included in the analysis. After four weeks of training, the HRV biofeedback training group showed significant improvement in the HRV indexes (SDNN, RMSSD, HF, TP) and sleep quality. Although the stress index also decreased, it did not reach statistical significance; the slow breathing training group was not statistically significant after four weeks of training, only sleep quality improved significantly, while the HRV indexes (SDNN, RMSSD, TP) all increased. Although HF and stress indexes decreased, they were not statistically significant. Comparing the difference between the two groups after training, it was found that the HF index improved significantly and reached statistical significance in the HRV biofeedback training group. Although the sleep quality of the two groups improved, it did not reach that level in a statistically significant difference. Conclusion: HRV biofeedback training is more effective in promoting autonomic nervous function than slow breathing training, but the effects of reducing stress and promoting sleep quality need to be explored after increasing the number of samples. The results of this study can provide a reference for clinical or community health promotion. In the future, it can also be further designed to integrate heart rate variability biological feedback training into the development of AI artificial intelligence wearable devices, which can make it more convenient for people to train independently and get effective feedback in time.Keywords: autonomic nervous function, HRV biofeedback, heart rate variability, slow breathing
Procedia PDF Downloads 175269 Health Literacy in Jordan: Obstacles for Doctors and Quality Patient Care
Authors: Etaf Alkhlaifat
Abstract:
This study drew conceptually on Communication Accommodation Theory to describe and analyze conversations between doctors and patients to examine the extent to which patients’ level of literacy represents one of the linguistic obstacles that may adversely influence the quality of healthcare services in Jordan. A thematic qualitative approach was employed to interpret the phenomena under study, which required direct observation and interviews with doctors (n=6) and patients (n=15) in natural Jordanian medical settings. This generated a comprehensive corpus of audio and videotaped data, which revealed that most doctors expressed dissatisfaction with patients’ ability to express themselves and comprehend them as a result of a lack of medical awareness and limited health education. The significance of this study rests on its detailed investigation of the impact of health literacy on patients’ health outcomes and while providing unique insights into how low health literacy could contribute to misunderstanding and potential ill-health.Keywords: doctor-patient communication, health literacy, medical knowledge, communication accommodation theory, qualitative research
Procedia PDF Downloads 6268 Hear Me: The Learning Experience on “Zoom” of Students With Deafness or Hard of Hearing Impairments
Authors: H. Weigelt-Marom
Abstract:
Over the years and up to the arousal of the COVID-19 pandemic, deaf or hard of hearing students studying in higher education institutions, participated lectures on campus using hearing aids and strategies adapted for frontal learning in a classroom. Usually, these aids were well known to them from their earlier study experience in school. However, the transition to online lessons, due to the latest pandemic, led deaf or hard of hearing students to study outside of their physical, well known learning environment. The change of learning environment and structure rose new challenges for these students. The present study examined the learning experience, limitations, challenges and benefits regarding learning online with lecture and classmates via the “Zoom” video conference program, among deaf or hard of hearing students in academia setting. In addition, emotional and social aspects related to learning in general versus the “Zoom” were examined. The study included 18 students diagnosed as deaf or hard of hearing, studying in various higher education institutions in Israel. All students had experienced lessons on the “Zoom”. Following allocation of the group study by the deaf and hard of hearing non-profit organization “Ma’agalei Shema”, and receiving the participants inform of consent, students were requested to answer a google form questioner and participate in an interview. The questioner included background information (e.g., age, year of studying, faculty etc.), level of computer literacy, and level of hearing and forms of communication (e.g., lip reading, sign language etc.). The interviews included a one on one, semi-structured, in-depth interview, conducted by the main researcher of the study (interview duration: up to 60 minutes). The interviews were held on “ZOOM” using specific adaptations for each interviewee: clear face screen of the interviewer for lip and face reading, and/ or professional sign language or live text transcript of the conversation. Additionally, interviewees used their audio devices if needed. Questions regarded: learning experience, difficulties and advantages studying using “Zoom”, learning in a classroom versus on “Zoom”, and questions concerning emotional and social aspects related to learning. Thematic analysis of the interviews revealed severe difficulties regarding the ability of deaf or hard of hearing students to comprehend during ”Zoom“ lessons without adoptive aids. For example, interviewees indicated difficulties understanding “Zoom” lessons due to their inability to use hearing devices commonly used by them in the classroom (e.g., FM systems). 80% indicated that they could not comprehend “Zoom” lessons since they could not see the lectures face, either because lectures did not agree to open their cameras or, either because they did not keep a straight forward clear face appearance while teaching. However, not all descriptions regarded learning via the “zoom” were negative. For example, 20% reported the recording of “Zoom” lessons as a main advantage. Enabling then to repeatedly watch the lessons at their own pace, mostly assisted by friends and family to translate the audio output into an accessible input. These finding and others regarding the learning experience of the group study on the “Zoom”, as well as their recommendation to enable deaf or hard of hearing students to study inclusively online, will be presented at the conference.Keywords: deaf or hard of hearing, learning experience, Zoom, qualitative research
Procedia PDF Downloads 116267 A Hybrid Watermarking Model Based on Frequency of Occurrence
Authors: Hamza A. A. Al-Sewadi, Adnan H. M. Al-Helali, Samaa A. K. Khamis
Abstract:
Ownership proofs of multimedia such as text, image, audio or video files can be achieved by the burial of watermark is them. It is achieved by introducing modifications into these files that are imperceptible to the human senses but easily recoverable by a computer program. These modifications would be in the time domain or frequency domain or both. This paper presents a procedure for watermarking by mixing amplitude modulation with frequency transformation histogram; namely a specific value is used to modulate the intensity component Y of the YIQ components of the carrier image. This scheme is referred to as histogram embedding technique (HET). Results comparison with those of other techniques such as discrete wavelet transform (DWT), discrete cosine transform (DCT) and singular value decomposition (SVD) have shown an enhance efficiency in terms of ease and performance. It has manifested a good degree of robustness against various environment effects such as resizing, rotation and different kinds of noise. This method would prove very useful technique for copyright protection and ownership judgment.Keywords: authentication, copyright protection, information hiding, ownership, watermarking
Procedia PDF Downloads 565266 The Museum of Museums: A Mobile Augmented Reality Application
Authors: Qian Jin
Abstract:
Museums have been using interactive technology to spark visitor interest and improve understanding. These technologies can play a crucial role in helping visitors understand more about an exhibition site by using multimedia to provide information. Google Arts and Culture and Smartify are two very successful digital heritage products. They used mobile augmented reality to visualise the museum's 3D models and heritage images but did not include 3D models of the collection and audio information. In this research, service-oriented mobile augmented reality application was developed for users to access collections from multiple museums(including V and A, the British Museum, and British Library). The third-party API (Application Programming Interface) is requested to collect metadata (including images, 3D models, videos, and text) of three museums' collections. The acquired content is then visualized in AR environments. This product will help users who cannot visit the museum offline due to various reasons (inconvenience of transportation, physical disability, time schedule).Keywords: digital heritage, argument reality, museum, flutter, ARcore
Procedia PDF Downloads 78265 Frequency of Occurrence Hybrid Watermarking Scheme
Authors: Hamza A. Ali, Adnan H. M. Al-Helali
Abstract:
Generally, a watermark is information that identifies the ownership of multimedia (text, image, audio or video files). It is achieved by introducing modifications into these files that are imperceptible to the human senses but easily recoverable by a computer program. These modifications are done according to a secret key in a descriptive model that would be either in the time domain or frequency domain or both. This paper presents a procedure for watermarking by mixing amplitude modulation with frequency transformation histogram; namely a specific value is used to modulate the intensity component Y of the YIQ components of the carrier image. This scheme is referred to as histogram embedding technique (HET). Results comparison with those of other techniques such as discrete wavelet transform (DWT), discrete cosine transform (DCT) and singular value decomposition (SVD) have shown an enhance efficiency in terms of ease and performance. It has manifested a good degree of robustness against various environment effects such as resizing, rotation and different kinds of noise. This method would prove very useful technique for copyright protection and ownership judgment.Keywords: watermarking, ownership, copyright protection, steganography, information hiding, authentication
Procedia PDF Downloads 368264 Voice Signal Processing and Coding in MATLAB Generating a Plasma Signal in a Tesla Coil for a Security System
Authors: Juan Jimenez, Erika Yambay, Dayana Pilco, Brayan Parra
Abstract:
This paper presents an investigation of voice signal processing and coding using MATLAB, with the objective of generating a plasma signal on a Tesla coil within a security system. The approach focuses on using advanced voice signal processing techniques to encode and modulate the audio signal, which is then amplified and applied to a Tesla coil. The result is the creation of a striking visual effect of voice-controlled plasma with specific applications in security systems. The article explores the technical aspects of voice signal processing, the generation of the plasma signal, and its relationship to security. The implications and creative potential of this technology are discussed, highlighting its relevance at the forefront of research in signal processing and visual effect generation in the field of security systems.Keywords: voice signal processing, voice signal coding, MATLAB, plasma signal, Tesla coil, security system, visual effects, audiovisual interaction
Procedia PDF Downloads 93263 Multimodal Database of Emotional Speech, Video and Gestures
Authors: Tomasz Sapiński, Dorota Kamińska, Adam Pelikant, Egils Avots, Cagri Ozcinar, Gholamreza Anbarjafari
Abstract:
People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition.Keywords: body movement, emotion recognition, emotional corpus, facial expressions, gestures, multimodal database, speech
Procedia PDF Downloads 349262 “Moves” for Guiding Presentations in French
Authors: Nuchanat Handumrongkul, Suwaree Yordchim, Anantachai Aeka
Abstract:
Despite four years of study in the tourism industry, the Bachelor’s graduates cannot perform their jobs as experienced tour guides. This research aimed to develop French teaching and studying for Tourism with two main purposes: to analyze ‘Moves’ used in oral presentations at tourist attractions; and to study content in guiding presentations or 'Guide Speak'. The study employed audio recording of these presentations as an interview method in authentic situations, having four tour guides as respondents and information providers. The data was analyzed via moves and content analysis. The results found that there were eight moves used; namely: welcoming, introducing oneself, drawing someone’s attention, giving information, explaining, highlighting, persuading, and saying goodbye. In terms of content, the information being presented covered the outstanding characteristics of the places and well-integrated with other related content. The findings were used as guidelines for curriculum development; in particular, the core content and the presentation forming the basis for students to meet the standard requirements of the labor-market and professional schemes.Keywords: moves, guiding presentation, french, tourism
Procedia PDF Downloads 232261 Text-to-Speech in Azerbaijani Language via Transfer Learning in a Low Resource Environment
Authors: Dzhavidan Zeinalov, Bugra Sen, Firangiz Aslanova
Abstract:
Most text-to-speech models cannot operate well in low-resource languages and require a great amount of high-quality training data to be considered good enough. Yet, with the improvements made in ASR systems, it is now much easier than ever to collect data for the design of custom text-to-speech models. In this work, our work on using the ASR model to collect data to build a viable text-to-speech system for one of the leading financial institutions of Azerbaijan will be outlined. NVIDIA’s implementation of the Tacotron 2 model was utilized along with the HiFiGAN vocoder. As for the training, the model was first trained with high-quality audio data collected from the Internet, then fine-tuned on the bank’s single speaker call center data. The results were then evaluated by 50 different listeners and got a mean opinion score of 4.17, displaying that our method is indeed viable. With this, we have successfully designed the first text-to-speech model in Azerbaijani and publicly shared 12 hours of audiobook data for everyone to use.Keywords: Azerbaijani language, HiFiGAN, Tacotron 2, text-to-speech, transfer learning, whisper
Procedia PDF Downloads 44260 Metaphorical Perceptions of Middle School Students regarding Computer Games
Authors: Ismail Celik, Ismail Sahin, Fetah Eren
Abstract:
The computer, among the most important inventions of the twentieth century, has become an increasingly important component in our everyday lives. Computer games also have become increasingly popular among people day-by-day, owing to their features based on realistic virtual environments, audio and visual features, and the roles they offer players. In the present study, the metaphors students have for computer games are investigated, as well as an effort to fill the gap in the literature. Students were asked to complete the sentence—‘Computer game is like/similar to….because….’— to determine the middle school students’ metaphorical images of the concept for ‘computer game’. The metaphors created by the students were grouped in six categories, based on the source of the metaphor. These categories were ordered as ‘computer game as a means of entertainment’, ‘computer game as a beneficial means’, ‘computer game as a basic need’, ‘computer game as a source of evil’, ‘computer game as a means of withdrawal’, and ‘computer game as a source of addiction’, according to the number of metaphors they included.Keywords: computer game, metaphor, middle school students, virtual environments
Procedia PDF Downloads 535259 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition
Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie
Abstract:
In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks
Procedia PDF Downloads 111258 The Use of Videoconferencing in a Task-Based Beginners' Chinese Class
Authors: Sijia Guo
Abstract:
The development of new technologies and the falling cost of high-speed Internet access have made it easier for institutes and language teachers to opt different ways to communicate with students at distance. The emergence of web-conferencing applications, which integrate text, chat, audio / video and graphic facilities, offers great opportunities for language learning to through the multimodal environment. This paper reports on data elicited from a Ph.D. study of using web-conferencing in the teaching of first-year Chinese class in order to promote learners’ collaborative learning. Firstly, a comparison of four desktop videoconferencing (DVC) tools was conducted to determine the pedagogical value of the videoconferencing tool-Blackboard Collaborate. Secondly, the evaluation of 14 campus-based Chinese learners who conducted five one-hour online sessions via the multimodal environment reveals the users’ choice of modes and their learning preference. The findings show that the tasks designed for the web-conferencing environment contributed to the learners’ collaborative learning and second language acquisition.Keywords: computer-mediated communication (CMC), CALL evaluation, TBLT, web-conferencing, online Chinese teaching
Procedia PDF Downloads 309257 The Development of Educational Video Games Aimed at Enhancing Academic Motivation and Learning Among African American Males
Authors: Kenneth Philip Jones
Abstract:
This dissertation investigates the potential of developing educational-based video games to motivate and engage African American males. The study employed a qualitative methodological approach by investigating African American males who are avid video game players and are currently enrolled at a college or university. The participants were individually and collectively video and audio recorded during the interviews and observations. Situated Learning theory analyzed how motivation and engagement can transfer from a video game to an educational context. The research aims to address the disparities in our educational systems when it comes to providing a culture, climate, and atmosphere that will enable the academic development of African American males. The primary objective of the findings is based on the participants’ responses and the data collected to provide recommendations to educators and scholars on how to address the issues that have demoralized African American males in education and provide a platform that will allow for equality in educational development and advancement.Keywords: video games, motivation, behavioral, learning transfer
Procedia PDF Downloads 121256 Robust Medical Image Watermarking Using Frequency Domain and Least Significant Bits Algorithms
Authors: Volkan Kaya, Ersin Elbasi
Abstract:
Watermarking and stenography are getting importance recently because of copyright protection and authentication. In watermarking we embed stamp, logo, noise or image to multimedia elements such as image, video, audio, animation and text. There are several works have been done in watermarking for different purposes. In this research work, we used watermarking techniques to embed patient information into the medical magnetic resonance (MR) images. There are two methods have been used; frequency domain (Digital Wavelet Transform-DWT, Digital Cosine Transform-DCT, and Digital Fourier Transform-DFT) and spatial domain (Least Significant Bits-LSB) domain. Experimental results show that embedding in frequency domains resist against one type of attacks, and embedding in spatial domain is resist against another group of attacks. Peak Signal Noise Ratio (PSNR) and Similarity Ratio (SR) values are two measurement values for testing. These two values give very promising result for information hiding in medical MR images.Keywords: watermarking, medical image, frequency domain, least significant bits, security
Procedia PDF Downloads 287255 Subband Coding and Glottal Closure Instant (GCI) Using SEDREAMS Algorithm
Authors: Harisudha Kuresan, Dhanalakshmi Samiappan, T. Rama Rao
Abstract:
In modern telecommunication applications, Glottal Closure Instants location finding is important and is directly evaluated from the speech waveform. Here, we study the GCI using Speech Event Detection using Residual Excitation and the Mean Based Signal (SEDREAMS) algorithm. Speech coding uses parameter estimation using audio signal processing techniques to model the speech signal combined with generic data compression algorithms to represent the resulting modeled in a compact bit stream. This paper proposes a sub-band coder SBC, which is a type of transform coding and its performance for GCI detection using SEDREAMS are evaluated. In SBCs code in the speech signal is divided into two or more frequency bands and each of these sub-band signal is coded individually. The sub-bands after being processed are recombined to form the output signal, whose bandwidth covers the whole frequency spectrum. Then the signal is decomposed into low and high-frequency components and decimation and interpolation in frequency domain are performed. The proposed structure significantly reduces error, and precise locations of Glottal Closure Instants (GCIs) are found using SEDREAMS algorithm.Keywords: SEDREAMS, GCI, SBC, GOI
Procedia PDF Downloads 356254 [Keynote Talk]: Computer-Assisted Language Learning (CALL) for Teaching English to Speakers of Other Languages (TESOL/ESOL) as a Foreign Language (TEFL/EFL), Second Language (TESL/ESL), or Additional Language (TEAL/EAL)
Authors: Andrew Laghos
Abstract:
Computer-assisted language learning (CALL) is defined as the use of computers to help learn languages. In this study we look at several different types of CALL tools and applications and how they can assist Adults and Young Learners in learning the English language as a foreign, second or additional language. It is important to identify the roles of the teacher and the learners, and what the learners’ motivations are for learning the language. Audio, video, interactive multimedia games, online translation services, conferencing, chat rooms, discussion forums, social networks, social media, email communication, songs and music video clips are just some of the many ways computers are currently being used to enhance language learning. CALL may be used for classroom teaching as well as for online and mobile learning. Advantages and disadvantages of CALL are discussed and the study ends with future predictions of CALL.Keywords: computer-assisted language learning (CALL), teaching English as a foreign language (TEFL/EFL), adult learners, young learners
Procedia PDF Downloads 434253 Building Teacher Capacity: Including All Students in Mathematics Experiences
Authors: Jay-R M. Mendoza
Abstract:
In almost all mathematics classrooms, students demonstrated discrepancies in their knowledge, skills, and understanding. OECD reports predicted that this continued to aggravate as not all teachers were sufficiently trained to handle this concentration. In response, the paper explored the potential of reSolve’s professional learning module 3 (PLM3) as an affordable and accessible professional development (PD) resource. Participants’ hands-on experience and exposure to PLM3 were audio recorded. After it was transcribed and examined and their work samples were analysed, there were four issues emerged: (1) criticality of conducting preliminary data collections and increasing the validity of inferences about what students can and cannot do by addressing the probabilistic nature of their performance; (2) criticality of the conclusion: a > b and/or (a-b) ∈ Z⁺ among students’ algebraic reasoning; (3) enabling and extending prompts provided by reSolve were found useful; and (4) dynamic adaptation of reSolve PLM3 through developing transferable skills and collaboration among teachers. PLM3 provided valuable insights on assessment, teaching, and planning to include all students in mathematics experiences.Keywords: algebraic reasoning, building teacher capacity, including all students in mathematics experiences, professional development
Procedia PDF Downloads 124252 A Genre Analysis of University Lectures
Authors: Lee Kok Yueh, Fatin Hamadah Rahman, David Hassell, Au Thien Wan
Abstract:
This work reports on a genre based study of lectures at a University in Brunei, Universiti Teknologi Brunei to explore the communicative functions and to gain insight into the discourse. It explores these in three different domains; Social Science, Engineering and Computing. Audio recordings from four lecturers comprising 20 lectures were transcribed and analysed, with the duration of each lecture varying between 20 to 90 minutes. This qualitative study found similar patterns and functions of lectures as those found in existing research amongst which include greetings, housekeeping, or recapping of previous lectures in the lecture introductions. In the lecture content, comprehension check and use of examples or analogies are very prevalent. However, the use of examples largely depend on the lecture content; and the more technical the content, the harder it was for lecturers to provide examples or analogies. Three functional moves are identified in the lecture conclusions; announcement, summary and future plan, all of which are optional. Despite the relatively small sample size, the present study shows that lectures are interactive and there are some consistencies with the delivery of lecture in relation to the communicative functions and genre of lecture.Keywords: communicative functions, genre analysis, higher education, lectures
Procedia PDF Downloads 191