Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 16646

Search results for: audio tactile model

16586 Prosody Generation in Neutral Speech Storytelling Application Using Tilt Model

Authors: Manjare Chandraprabha A., S. D. Shirbahadurkar, Manjare Anil S., Paithne Ajay N.

Abstract:

This paper proposes Intonation Modeling for Prosody generation in Neutral speech for Marathi (language spoken in Maharashtra, India) story telling applications. Nowadays audio story telling devices are very eminent for children. In this paper, we proposed tilt model for stressed words in Marathi for speech modification. Tilt model predicts modification in tone of neutral speech. GMM is used to identify stressed words for modification.

Keywords: tilt model, fundamental frequency, statistical parametric speech synthesis, GMM

Procedia PDF Downloads 359

16585 Cost-Effective Mechatronic Gaming Device for Post-Stroke Hand Rehabilitation

Authors: A. Raj Kumar, S. Bilaloglu

Abstract:

Stroke is a leading cause of adult disability worldwide. We depend on our hands for our activities of daily living(ADL). Although many patients regain the ability to walk, they continue to experience long-term hand motor impairments. As the number of individuals with young stroke is increasing, there is a critical need for effective approaches for rehabilitation of hand function post-stroke. Motor relearning for dexterity requires task-specific kinesthetic, tactile and visual feedback. However, when a stroke results in both sensory and motor impairment, it becomes difficult to ascertain when and what type of sensory substitutions can facilitate motor relearning. In an ideal situation, real-time task-specific data on the ability to learn and data-driven feedback to assist such learning will greatly assist rehabilitation for dexterity. We have found that kinesthetic and tactile information from the unaffected hand can assist patients re-learn the use of optimal fingertip forces during a grasp and lift task. Measurement of fingertip grip force (GF), load forces (LF), their corresponding rates (GFR and LFR), and other metrics can be used to gauge the impairment level and progress during learning. Currently ATI mini force-torque sensors are used in research settings to measure and compute the LF, GF, and their rates while grasping objects of different weights and textures. Use of the ATI sensor is cost prohibitive for deployment in clinical or at-home rehabilitation. A cost effective mechatronic device is developed to quantify GF, LF, and their rates for stroke rehabilitation purposes using off-the-shelf components such as load cells, flexi-force sensors, and an Arduino UNO microcontroller. A salient feature of the device is its integration with an interactive gaming environment to render a highly engaging user experience. This paper elaborates the integration of kinesthetic and tactile sensing through computation of LF, GF and their corresponding rates in real time, information processing, and interactive interfacing through augmented reality for visual feedback.

Keywords: feedback, gaming, kinesthetic, rehabilitation, tactile

Procedia PDF Downloads 219

16584 Mapping the Sonic Spectrum of Traditional Music and Instruments Used in Malaysian Kavadi Rituals

Authors: Ainolnaim Azizol, Valerie Ross

Abstract:

Music is as old as mankind and rituals using music such as Kavadi have been associated with social, cultural, and spiritual practices in many traditional and modern societies. Recent literature has provided scientific evidence that music affects psychological and physical changes through stimulation of brainwave. Despite such advances, the scientific study of the sonic qualities peculiar to traditional instruments and how it impacts on ritualistic activities is still lacking. This study addresses one such phenomenon. Devotees in Kavadi rituals are known to be in a state of trance state and do not experience pain nor suffer injury despite the hundreds of needles pierced through their skins. Although scientists have sought to understand how this is possible, lesser is known about the music that is used to prepare devotees to enter into the trance state. This study fills this gap of knowledge by providing scientific evidence through the identification and mapping of the sonic spectrum or sound fingerprint of the instruments and the repertoire used in these ritualistic forms in their ethnographic environment and in audio-controlled situations. The objectives are to identify and categorize the different types of traditional music used in Kavadi rituals; to record, transcribe and digitally score the musical repertoire used in the oral tradition of Kavadi rituals; to map the sonic spectrum of ritual music using spectromography and advanced music analytical software a mixed methodology will be used. This comprises ethnographic field studies using interviews, participant observation, audio-video recordings and audio-methodology using spectromography and advanced audio-technology for sonic mapping and the transcription of audio recordings into digital scores.

Keywords: sonic, traditional, ritual, Kavadi, music

Procedia PDF Downloads 217

16583 Illumina MiSeq Sequencing for Bacteria Identification on Audio-Visual Materials

Authors: Tereza Branyšová, Martina Kračmarová, Kateřina Demnerová, Michal Ďurovič, Hana Stiborová

Abstract:

Microbial deterioration threatens all objects of cultural heritage, including audio-visual materials. Fungi are commonly known to be the main factor in audio-visual material deterioration. However, although being neglected, bacteria also play a significant role. In addition to microbial contamination of materials, it is also essential to analyse air as a possible contamination source. This work aims to identify bacterial species in the archives of the Czech Republic that occur on audio-visual materials as well as in the air in the archives. For sampling purposes, the smears from the materials were taken by sterile polyurethane sponges, and the air was collected using a MAS-100 aeroscope. Metagenomic DNA from all collected samples was immediately isolated and stored at -20 °C. DNA library for the 16S rRNA gene was prepared using two-step PCR and specific primers and the concentration step was included due to meagre yields of the DNA. After that, the samples were sent to the University of Fairbanks, Alaska, for Illumina MiSeq sequencing. Subsequently, the analysis of the sequences was conducted in R software. The obtained sequences were assigned to the corresponding bacterial species using the DADA2 package. The impact of air contamination and the impact of different photosensitive layers that audio-visual materials were made of, such as gelatine, albumen, and collodion, were evaluated. As a next step, we will take a deeper focus on air contamination. We will select an appropriate culture-dependent approach along with a culture-independent approach to observe a metabolically active species in the air. Acknowledgment: This project is supported by grant no. DG18P02OVV062 of the Ministry of Culture of the Czech Republic.

Keywords: cultural heritage, Illumina MiSeq, metagenomics, microbial identification

Procedia PDF Downloads 125

16582 Stability Analysis and Experimental Evaluation on Maxwell Model of Impedance Control

Authors: Le Fu, Rui Wu, Gang Feng Liu, Jie Zhao

Abstract:

Normally, impedance control methods are based on a model that connects a spring and damper in parallel. The series connection, namely the Maxwell model, has emerged as a counterpart and draw the attention of robotics researchers. In the theoretical analysis, it turns out that the two pattern are both equivalents to some extent, but notable differences of response characteristics exist, especially in the effect of damping viscosity. However, this novel impedance control design is lack of validation on realistic robot platforms. In this study, stability analysis and experimental evaluation are achieved using a 3-fingered Barrett® robotic hand BH8-282 endowed with tactile sensing, mounted on a torque-controlled lightweight and collaborative robot KUKA® LBR iiwa 14 R820. Object handover and incoming objects catching tasks are executed for validation and analysis. Experimental results show that the series connection pattern has much better performance in natural impact or shock absorption, which indicate promising applications in robots’ safe and physical interaction with humans and objects in various environments.

Keywords: impedance control, Maxwell model, force control, dexterous manipulation

Procedia PDF Downloads 476

16581 A Study on the Effect of Design Factors of Slim Keyboard’s Tactile Feedback

Authors: Kai-Chieh Lin, Chih-Fu Wu, Hsiang Ling Hsu, Yung-Hsiang Tu, Chia-Chen Wu

Abstract:

With the rapid development of computer technology, the design of computers and keyboards moves towards a trend of slimness. The change of mobile input devices directly influences users’ behavior. Although multi-touch applications allow entering texts through a virtual keyboard, the performance, feedback, and comfortableness of the technology is inferior to traditional keyboard, and while manufacturers launch mobile touch keyboards and projection keyboards, the performance has not been satisfying. Therefore, this study discussed the design factors of slim pressure-sensitive keyboards. The factors were evaluated with an objective (accuracy and speed) and a subjective evaluation (operability, recognition, feedback, and difficulty) depending on the shape (circle, rectangle, and L-shaped), thickness (flat, 3mm, and 6mm), and force (35±10g, 60±10g, and 85±10g) of the keyboard. Moreover, MANOVA and Taguchi methods (regarding signal-to-noise ratios) were conducted to find the optimal level of each design factor. The research participants, by their typing speed (30 words/ minute), were divided in two groups. Considering the multitude of variables and levels, the experiments were implemented using the fractional factorial design. A representative model of the research samples were established for input task testing. The findings of this study showed that participants with low typing speed primarily relied on vision to recognize the keys, and those with high typing speed relied on tactile feedback that was affected by the thickness and force of the keys. In the objective and subjective evaluation, a combination of keyboard design factors that might result in higher performance and satisfaction was identified (L-shaped, 3mm, and 60±10g) as the optimal combination. The learning curve was analyzed to make a comparison with a traditional standard keyboard to investigate the influence of user experience on keyboard operation. The research results indicated the optimal combination provided input performance to inferior to a standard keyboard. The results could serve as a reference for the development of related products in industry and for applying comprehensively to touch devices and input interfaces which are interacted with people.

Keywords: input performance, mobile device, slim keyboard, tactile feedback

Procedia PDF Downloads 273

16580 Hydrotherapy with Dual Sensory Impairment (Dsi)-Deaf and Blind

Authors: M. Warburton

Abstract:

Background: Case study examining hydrotherapy for a person with DSI. A 46 year-old lady completely deaf and blind post congenital rubella syndrome. Touch becomes the primary information gathering sense to optimise function in life. Communication is achieved via tactile finger spelling and signals onto her hand and skin. Hydrotherapy may provide a suitable mobility environment and somato-sensory input to people, and especially DSI persons. Buoyancy, warmth, hydrostatic pressure, viscosity and turbulence are elements of hydrotherapy that may offer a DSI person somato-sensory input to stimulate the mechanoreceptors, thermoreceptors and proprioceptors and offer a unique hydro-therapeutic environment. Purpose: The purpose of this case study was to establish what measurable benefits could be achieved from hydrotherapy with a DSI person. Methods: Hydrotherapy was provided for 8-weeks, 2 x week, 35-minute session duration. Pool temperature 32.5 degrees centigrade. Pool length 25-metres. Each session consisted of mobility encouragement and supervision, and activities to stimulate the somato-sensory system utilising aquatic properties of buoyancy, turbulence, viscosity, warmth and hydrostatic pressure. Somato-sensory activities focused on stimulating touch and tactile exploration including objects of various shape, size, weight, contour, texture, elasticity, pliability, softness and hardness. Outcomes were measured by the Goal Attainment Scale (GAS) and included mobility distance, attendance, and timed tactile responsiveness to varying objects. Results: Mobility distance and attendance exceeded baseline expectations. Timed tactile responsiveness to varying objects also changed positively from baseline. Average scale scores were 1.00 with an overall GAS t-score of 63.69. Conclusions: Hydrotherapy can be a quantifiable physio-therapeutic option for persons with DSI. It provides a relatively safe environment for mobility and allows the somato-sensory system to be fully engaged - important for the DSI population. Implications: Hydrotherapy can be a measurable therapeutic option for a DSI person. Physiotherapists should consider hydrotherapy for DSI people. Hydrotherapy can offer unique physical properties for the DSI population not available on land.

Keywords: chronic, disability, disease, rehabilitation

Procedia PDF Downloads 318

16579 The Audio-Visual and Syntactic Priming Effect on Specific Language Impairment and Gender in Modern Standard Arabic

Authors: Mohammad Al-Dawoody

Abstract:

This study aims at exploring if priming is affected by gender in Modern Standard Arabic and if it is restricted solely to subjects with no specific language impairment (SLI). The sample in this study consists of 74 subjects, between the ages of 11;1 and 11;10, distributed into (a) 2 SLI experimental groups of 38 subjects divided into two gender groups of 18 females and 20 males and (b) 2 non-SLI control groups of 36 subjects divided into two gender groups of 17 females and 19 males. Employing a mixed research design, the researcher conducted this study within the framework of the relevance theory (RT) whose main assumption is that human beings are endowed with a biological ability to magnify the relevance of the incoming stimuli. Each of the four groups was given two different priming stimuli: audio-visual priming (T1) and syntactic priming (T2). The results showed that the priming effect was sheer distinct among SLI participants especially when retrieving typical responses (TR) in T1 and T2 with slight superiority of males over females. The results also revealed that non-SLI females showed stronger original response (OR) priming in T1 than males and that non-SLI males in T2 excelled in OR priming than females. Furthermore, the results suggested that the audio-visual priming has a stronger effect on SLI females than non-SLI females and that syntactic priming seems to have the same effect on the two groups (non-SLI and SLI females). The conclusion is that the priming effect varies according to gender and is not confined merely to non-SLI subjects.

Keywords: specific language impairment, relevance theory, audio-visual priming, syntactic priming, modern standard Arabic

Procedia PDF Downloads 143

16578 A Parallel Computation Based on GPU Programming for a 3D Compressible Fluid Flow Simulation

Authors: Sugeng Rianto, P.W. Arinto Yudi, Soemarno Muhammad Nurhuda

Abstract:

A computation of a 3D compressible fluid flow for virtual environment with haptic interaction can be a non-trivial issue. This is especially how to reach good performances and balancing between visualization, tactile feedback interaction, and computations. In this paper, we describe our approach of computation methods based on parallel programming on a GPU. The 3D fluid flow solvers have been developed for smoke dispersion simulation by using combinations of the cubic interpolated propagation (CIP) based fluid flow solvers and the advantages of the parallelism and programmability of the GPU. The fluid flow solver is generated in the GPU-CPU message passing scheme to get rapid development of haptic feedback modes for fluid dynamic data. A rapid solution in fluid flow solvers is developed by applying cubic interpolated propagation (CIP) fluid flow solvers. From this scheme, multiphase fluid flow equations can be solved simultaneously. To get more acceleration in the computation, the Navier-Stoke Equations (NSEs) is packed into channels of texel, where computation models are performed on pixels that can be considered to be a grid of cells. Therefore, despite of the complexity of the obstacle geometry, processing on multiple vertices and pixels can be done simultaneously in parallel. The data are also shared in global memory for CPU to control the haptic in providing kinaesthetic interaction and felling. The results show that GPU based parallel computation approaches provide effective simulation of compressible fluid flow model for real-time interaction in 3D computer graphic for PC platform. This report has shown the feasibility of a new approach of solving the compressible fluid flow equations on the GPU. The experimental tests proved that the compressible fluid flowing on various obstacles with haptic interactions on the few model obstacles can be effectively and efficiently simulated on the reasonable frame rate with a realistic visualization. These results confirm that good performances and balancing between visualization, tactile feedback interaction, and computations can be applied successfully.

Keywords: CIP, compressible fluid, GPU programming, parallel computation, real-time visualisation

Procedia PDF Downloads 403

16577 A Comparison of Proxemics and Postural Head Movements during Pop Music versus Matched Music Videos

Authors: Harry J. Witchel, James Ackah, Carlos P. Santos, Nachiappan Chockalingam, Carina E. I. Westling

Abstract:

Introduction: Proxemics is the study of how people perceive and use space. It is commonly proposed that when people like or engage with a person/object, they will move slightly closer to it, often quite subtly and subconsciously. Music videos are known to add entertainment value to a pop song. Our hypothesis was that by adding appropriately matched video to a pop song, it would lead to a net approach of the head to the monitor screen compared to simply listening to an audio-only version of the song. Methods: We presented to 27 participants (ages 21.00 ± 2.89, 15 female) seated in front of 47.5 x 27 cm monitor two musical stimuli in a counterbalanced order; all stimuli were based on music videos by the band OK Go: Here It Goes Again (HIGA, boredom ratings (0-100) = 15.00 ± 4.76, mean ± SEM, standard-error-of-the-mean) and Do What You Want (DWYW, boredom ratings = 23.93 ± 5.98), which did not differ in boredom elicited (P = 0.21, rank-sum test). Each participant experienced each song only once, and one song (counterbalanced) as audio-only versus the other song as a music video. The movement was measured by video-tracking using Kinovea 0.8, based on recording from a lateral aspect; before beginning, each participant had a reflective motion tracking marker placed on the outer canthus of the left eye. Analysis of the Kinovea X-Y coordinate output in comma-separated-variables format was performed in Matlab, as were non-parametric statistical tests. Results: We found that the audio-only stimuli (combined for both HIGA and DWYW, mean ± SEM, 35.71 ± 5.36) were significantly more boring than the music video versions (19.46 ± 3.83, P = 0.0066 Wilcoxon Signed Rank Test (WSRT), Cohen's d = 0.658, N = 28). We also found that participants' heads moved around twice as much during the audio-only versions (speed = 0.590 ± 0.095 mm/sec) compared to the video versions (0.301 ± 0.063 mm/sec, P = 0.00077, WSRT). However, the participants' mean head-to-screen distances were not detectably smaller (i.e. head closer to the screen) during the music videos (74.4 ± 1.8 cm) compared to the audio-only stimuli (73.9 ± 1.8 cm, P = 0.37, WSRT). If anything, during the audio-only condition, they were slightly closer. Interestingly, the ranges of the head-to-screen distances were smaller during the music video (8.6 ± 1.4 cm) compared to the audio-only (12.9 ± 1.7 cm, P = 0.0057, WSRT), the standard deviations were also smaller (P = 0.0027, WSRT), and their heads were held 7 mm higher (video 116.1 ± 0.8 vs. audio-only 116.8 ± 0.8 cm above floor, P = 0.049, WSRT). Discussion: As predicted, sitting and listening to experimenter-selected pop music was more boring than when the music was accompanied by a matched, professionally-made video. However, we did not find that the proxemics of the situation led to approaching the screen. Instead, adding video led to efforts to control the head to a more central and upright viewing position and to suppress head fidgeting.

Keywords: boredom, engagement, music videos, posture, proxemics

Procedia PDF Downloads 139

16576 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 49

16575 A Guide to the Implementation of Ambisonics Super Stereo

Authors: Alessio Mastrorillo, Giuseppe Silvi, Francesco Scagliola

Abstract:

In this work, we introduce an Ambisonics decoder with an implementation of the C-format, also called Super Stereo. This format is an alternative to conventional stereo and binaural decoding. Unlike those, this format conveys audio information from the horizontal plane and works with stereo speakers and headphones. The two C-format channels can also return a reconstructed planar B-format. This work provides an open-source implementation for this format. We implement an all-pass filter for signal quadrature, as required by the decoding equations. This filter works with six Biquads in a cascade configuration, with values for control frequency and quality factor discovered experimentally. The phase response of the filter delivers a small error in the 20-14.000Hz range. The decoder has been tested with audio sources up to 192kHz sample rate, returning pristine sound quality and detailed stereo image. It has been included in the Envelop for Live suite and is available as an open-source repository. This decoder has applications in Virtual Reality and 360° audio productions, music composition, and online streaming.

Keywords: ambisonics, UHJ, quadrature filter, virtual reality, Gerzon, decoder, stereo, binaural, biquad

Procedia PDF Downloads 64

16574 Subtitled Based-Approach for Learning Foreign Arabic Language

Authors: Elleuch Imen

Abstract:

In this paper, it propose a new approach for learning Arabic as a foreign language via audio-visual translation, particularly subtitling. The approach consists of developing video sequences appropriate to different levels of learning (from A1 to C2) containing conversations, quizzes, games and others. Each video aims to achieve a specific objective, such as the correct pronunciation of Arabic words, the correct syntactic structuring of Arabic sentences, the recognition of the morphological characteristics of terms and the semantic understanding of statements. The subtitled videos obtained can be incorporated into different Arabic second language learning tools such as Moocs, websites, platforms, etc.

Keywords: arabic foreign language, learning, audio-visuel translation, subtitled videos

Procedia PDF Downloads 30

16573 Method Comprising One to One Web Based Real Time Communications

Authors: Lata Kiran Dey, Rajendra Kumar, Biren Karmakar

Abstract:

Web Real Time Communications is a collection of standards, protocols, which provides real-time communications capabilities between web browsers and devices. This paper outlines the design and further implementation of web real-time communications on secure web applications having audio and video call capabilities. This proposed application may put up a system that will be able to work over both desktops as well as the mobile browser. Though, WebRTC also gives a set of JavaScript standard RTC APIs, which primarily works over the real-time communication framework. This helps to build a suitable communication application, which enables the audio, video, and message transfer in between the today’s modern browsers having WebRTC support.

Keywords: WebRTC, SIP, RTC, JavaScript, SRTP, secure web sockets, browser

Procedia PDF Downloads 109

16572 Assessment of Post-surgical Donor-Site Morbidity in Vastus lateralis Free Flap for Head and Neck Reconstructive Surgery: An Observational Study

Authors: Ishith Seth, Lyndel Hewitt, Takako Yabe, James Wykes, Jonathan Clark, Bruce Ashford

Abstract:

Background: Vastus lateralis (VL) can be used to reconstruct defects of the head and neck. Whilst the advantages are documented, donor-site morbidity is not well described. This study aimed to assess donor-site morbidity after VL flap harvest. The results will determine future directions for preventative and post-operative care to improve patient health outcomes. Methods: Ten participants (mean age 55 years) were assessed for the presence of donor-site morbidity after VL harvest. Musculoskeletal (pain, muscle strength, muscle length, tactile sensation), quality of life (SF-12), and lower limb function (lower extremity function, gait (function and speed), sit to stand were assessed using validated and standardized procedures. Outcomes were compared to age-matched healthy reference values or the non-operative side. Analyses were conducted using descriptive statistics and non-parametric tests. Results: There was no difference in muscle strength (knee extension), muscle length, ability to sit-to-stand, or gait function (all P > 0.05). Knee flexor muscle strength was significantly less on the operated leg compared to the non-operated leg (P=0.02) and walking speed was slower than age-matched healthy values (P<0.001). Thigh tactile sensation was impaired in 89% of participants. Quality of life was significantly less for the physical health component of the SF-12 (P<0.001). The mental health component of the SF-12 was similar to healthy controls (P=0.26). Conclusion: There was no effect on donor site morbidity with regards to knee extensor strength, pain, walking function, ability to sit-to-stand, and muscle length. VL harvest affected donor-site knee flexion strength, walking speed, tactile sensation, and physical health-related quality of life.

Keywords: vastus lateralis, morbidity, head and neck, surgery, donor-site morbidity

Procedia PDF Downloads 212

16571 Teaching Speaking Skills to Adult English Language Learners through ALM

Authors: Wichuda Kunnu, Aungkana Sukwises

Abstract:

Audio-lingual method (ALM) is a teaching approach that is claimed that ineffective for teaching second/foreign languages. Because some linguists and second/foreign language teachers believe that ALM is a rote learning style. However, this study is done on a belief that ALM will be able to solve Thais’ English speaking problem. This paper aims to report the findings on teaching English speaking to adult learners with an “adapted ALM”, one distinction of which is to use Thai as the medium language of instruction. The participants are consisted of 9 adult learners. They were allowed to speak English more freely using both the materials presented in the class and their background knowledge of English. At the end of the course, they spoke English more fluently, more confidently, to the extent that they applied what they learnt both in and outside the class.

Keywords: teaching English, audio lingual method, cognitive science, psychology

Procedia PDF Downloads 387

16570 Parkinson’s Disease Hand-Eye Coordination and Dexterity Evaluation System

Authors: Wann-Yun Shieh, Chin-Man Wang, Ya-Cheng Shieh

Abstract:

This study aims to develop an objective scoring system to evaluate hand-eye coordination and hand dexterity for Parkinson’s disease. This system contains three boards, and each of them is implemented with the sensors to sense a user’s finger operations. The operations include the peg test, the block test, and the blind block test. A user has to use the vision, hearing, and tactile abilities to finish these operations, and the board will record the results automatically. These results can help the physicians to evaluate a user’s reaction, coordination, dexterity function. The results will be collected to a cloud database for further analysis and statistics. A researcher can use this system to obtain systematic, graphic reports for an individual or a group of users. Particularly, a deep learning model is developed to learn the features of the data from different users. This model will help the physicians to assess the Parkinson’s disease symptoms by a more intellective algorithm.

Keywords: deep learning, hand-eye coordination, reaction, hand dexterity

Procedia PDF Downloads 37

16569 Analysis of Tactile Perception of Textiles by Fingertip Skin Model

Authors: Izabela L. Ciesielska-Wrόbel

Abstract:

This paper presents finite element models of the fingertip skin which have been created to simulate the contact of textile objects with the skin to gain a better understanding of the perception of textiles through the skin, so-called Hand of Textiles (HoT). Many objective and subjective techniques have been developed to analyze HoT, however none of them provide exact overall information concerning the sensation of textiles through the skin. As the human skin is a complex heterogeneous hyperelastic body composed of many particles, some simplifications had to be made at the stage of building the models. The same concerns models of woven structures, however their utilitarian value was maintained. The models reflect only friction between skin and woven textiles, deformation of the skin and fabrics when “touching” textiles and heat transfer from the surface of the skin into direction of textiles.

Keywords: fingertip skin models, finite element models, modelling of textiles, sensation of textiles through the skin

Procedia PDF Downloads 438

16568 A Qualitative Study on Metacognitive Patterns among High and Low Performance Problem Based on Learning Groups

Authors: Zuhairah Abdul Hadi, Mohd Nazir bin Md. Zabit, Zuriadah Ismail

Abstract:

Metacognitive has been empirically evidenced to be one important element influencing learning outcomes. Expert learners engage in metacognition by monitoring and controlling their thinking, and listing, considering and selecting the best strategies to achieve desired goals. Studies also found that good critical thinkers engage in more metacognition and people tend to activate more metacognition when solving complex problems. This study extends past studies by performing a qualitative analysis to understand metacognitive patterns among two high and two low performing groups by carefully examining video and audio records taken during Problem-based learning activities. High performing groups are groups with majority members scored well in Watson Glaser II Critical Thinking Appraisal (WGCTA II) and academic achievement tests. Low performing groups are groups with majority members fail to perform in the two tests. Audio records are transcribed and analyzed using schemas adopted from past studies. Metacognitive statements are analyzed using three stages model and patterns of metacognitive are described by contexts, components, and levels for each high and low performing groups.

Keywords: academic achievement, critical thinking, metacognitive, problem-based learning

Procedia PDF Downloads 254

16567 Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks

Authors: B. R. Campomanes-Alvarez, P. Quiros, B. Fernandez

Abstract:

Automatic Speech Recognition (ASR) is a machine-based process of decoding and transcribing oral speech. A typical ASR system receives acoustic input from a speaker or an audio file, analyzes it using algorithms, and produces an output in the form of a text. Some speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian Mixture Models (GMMs) to determine how well each state of each HMM fits a short window of frames of coefficients that represents the acoustic input. Another way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition systems. Acoustic models for state-of-the-art ASR systems are usually training on massive amounts of data. However, audio files with their corresponding transcriptions can be difficult to obtain, especially in the Spanish language. Hence, in the case of these low-resource scenarios, building an ASR model is considered as a complex task due to the lack of labeled data, resulting in an under-trained system. Semi-supervised learning approaches arise as necessary tasks given the high cost of transcribing audio data. The main goal of this proposal is to develop a procedure based on acoustic semi-supervised learning for Spanish ASR systems by using DNNs. This semi-supervised learning approach consists of: (a) Training a seed ASR model with a DNN using a set of audios and their respective transcriptions. A DNN with a one-hidden-layer network was initialized; increasing the number of hidden layers in training, to a five. A refinement, which consisted of the weight matrix plus bias term and a Stochastic Gradient Descent (SGD) training were also performed. The objective function was the cross-entropy criterion. (b) Decoding/testing a set of unlabeled data with the obtained seed model. (c) Selecting a suitable subset of the validated data to retrain the seed model, thereby improving its performance on the target test set. To choose the most precise transcriptions, three confidence scores or metrics, regarding the lattice concept (based on the graph cost, the acoustic cost and a combination of both), was performed as selection technique. The performance of the ASR system will be calculated by means of the Word Error Rate (WER). The test dataset was renewed in order to extract the new transcriptions added to the training dataset. Some experiments were carried out in order to select the best ASR results. A comparison between a GMM-based model without retraining and the DNN proposed system was also made under the same conditions. Results showed that the semi-supervised ASR-model based on DNNs outperformed the GMM-model, in terms of WER, in all tested cases. The best result obtained an improvement of 6% relative WER. Hence, these promising results suggest that the proposed technique could be suitable for building ASR models in low-resource environments.

Keywords: automatic speech recognition, deep neural networks, machine learning, semi-supervised learning

Procedia PDF Downloads 317

16566 Finite Element Method Analysis of Occluded-Ear Simulator and Natural Human Ear Canal

Authors: M. Sasajima, T. Yamaguchi, Y. Hu, Y. Koike

Abstract:

In this paper, we discuss the propagation of sound in the narrow pathways of an occluded-ear simulator typically used for the measurement of insert-type earphones. The simulator has a standardized frequency response conforming to the international standard (IEC60318-4). In narrow pathways, the speed and phase of sound waves are modified by viscous air damping. In our previous paper, we proposed a new finite element method (FEM) to consider the effects of air viscosity in this type of audio equipment. In this study, we will compare the results from the ear simulator FEM model, and those from a three dimensional human ear canal FEM model made from computed tomography images, with the measured frequency response data from the ear canals of 18 people.

Keywords: ear simulator, FEM, viscosity, human ear canal

Procedia PDF Downloads 376

16565 An Automated Bender Element System Used for S-Wave Velocity Tomography during Model Pile Installation

Authors: Yuxin Wu, Yu-Shing Wang, Zitao Zhang

Abstract:

A high-speed and time-lapse S-wave velocity measurement system has been built up for S-wave tomography in sand. This system is based on bender elements and applied to model pile tests in a tailor-made pressurized chamber to monitor the shear wave velocity distribution during pile installation in sand. Tactile pressure sensors are used parallel together with bender elements to monitor the stress changes during the tests. Strain gages are used to monitor the shaft resistance and toe resistance of pile. Since the shear wave velocity (Vs) is determined by the shear modulus of sand and the shaft resistance of pile is also influenced by the shear modulus of sand around the pile, the purposes of this study are to time-lapse monitor the S-wave velocity distribution change at a certain horizontal section during pile installation and to correlate the S-wave velocity distribution and shaft resistance of pile in sand.

Keywords: bender element, pile, shaft resistance, shear wave velocity, tomography

Procedia PDF Downloads 387

16564 Online Delivery Approaches of Post Secondary Virtual Inclusive Media Education

Authors: Margot Whitfield, Andrea Ducent, Marie Catherine Rombaut, Katia Iassinovskaia, Deborah Fels

Abstract:

Learning how to create inclusive media, such as closed captioning (CC) and audio description (AD), in North America is restricted to the private sector, proprietary company-based training. We are delivering (through synchronous and asynchronous online learning) the first Canadian post-secondary, practice-based continuing education course package in inclusive media for broadcast production and processes. Despite the prevalence of CC and AD taught within the field of translation studies in Europe, North America has no comparable field of study. This novel approach to audio visual translation (AVT) education develops evidence-based methodology innovations, stemming from user study research with blind/low vision and Deaf/hard of hearing audiences for television and theatre, undertaken at Ryerson University. Knowledge outcomes from the courses include a) Understanding how CC/AD fit within disability/regulatory frameworks in Canada. b) Knowledge of how CC/AD could be employed in the initial stages of production development within broadcasting. c) Writing and/or speaking techniques designed for media. d) Hands-on practice in captioning re-speaking techniques and open source technologies, or in AD techniques. e) Understanding of audio production technologies and editing techniques. The case study of the curriculum development and deployment, involving first-time online course delivery from academic and practitioner-based instructors in introductory Captioning and Audio Description courses (CDIM 101 and 102), will compare two different instructors' approaches to learning design, including the ratio of synchronous and asynchronous classroom time and technological engagement tools on meeting software platform such as breakout rooms and polling. Student reception of these two different approaches will be analysed using qualitative thematic and quantitative survey analysis. Thus far, anecdotal conversations with students suggests that they prefer synchronous compared with asynchronous learning within our hands-on online course delivery method.

Keywords: inclusive media theory, broadcasting practices, AVT post secondary education, respeaking, audio description, learning design, virtual education

Procedia PDF Downloads 161

16563 Getting Out of the Box: Tangible Music Production in the Age of Virtual Technological Abundance

Authors: Tim Nikolsky

Abstract:

This paper seeks to explore the different ways in which music producers choose to embrace various levels of technology based on musical values, objectives, affordability, access and workflow benefits. Current digital audio production workflow is questioned. Engineers and music producers of today are increasingly divorced from the tangibility of music production. Making music no longer requires you to reach over and turn a knob. Ideas of authenticity in music production are being redefined. Calculations from the mathematical algorithm with the pretty pictures are increasingly being chosen over hardware containing transformers and tubes. Are mouse clicks and movements equivalent or inferior to the master brush strokes we are seeking to conjure? We are making audio production decisions visually by constantly looking at a screen rather than listening. Have we compromised our music objectives and values by removing the ‘hands-on’ nature of music making? DAW interfaces are making our musical decisions for us not necessarily in our best interests. Technological innovation has presented opportunities as well as challenges for education. What do music production students actually need to learn in a formalised education environment, and to what extent do they need to know it? In this brave new world of omnipresent music creation tools, do we still need tangibility in music production? Interviews with prominent Australian music producers that work in a variety of fields will be featured in this paper, and will provide insight in answering these questions and move towards developing an understanding how tangibility can be rediscovered in the next generation of music production.

Keywords: analogue, digital, digital audio workstation, music production, plugins, tangibility, technology, workflow

Procedia PDF Downloads 245

16562 Interactive Fun Activities for Blind and Sighted Teenagers

Authors: Haif Alharthy, Samar Altarteer

Abstract:

Blind and sighted teenagers might find it challenging to communicate and have fun interaction with each other. The previous studies emphasize the importance of the interactive communication of the blind with the sighted people in developing the interpersonal and social skills of the blind people . Playing games is one of the effective ways used to engage the blind with the sighted people and help in enhancing their social skills. However, it is difficult to find a fun game that is designed to encourage interaction between blind and sighted teenagers in which the blind can play it independently without help and that the sighted find its design attractive and satisfying. The aim of this paper is to examine how challenging is to have fun interaction between blind and sighted people and offer interactive tabletop game solution in which both of them can independently participate and enjoy. The paper discusses the importance and the impact of the interactive fun communication between blind and sighted people and how to get them involved with each other through games. The paper investigates several approaches to design a universal game. A survey was conducted for blind teenager’s family members to discover what difficulties they face while playing and communicating with their blind family member and to identify the blind’s needs and interests in games. The study reveals that although families like to play tabletop games with their blind member, they find difficulties in finding universal games that is interesting and adequate for both. Also, qualitative interviews were conducted with blind teenager shows the sufficiency in tabletop games that do not require help from another family member to play the game. The results suggested that an effective approach is to develop an interactive tabletop game embedded with audio and tactile techniques. The findings of the pilot study highlighted the necessary information such as tools, visuals, and game concepts that should be considered in developing interactive card game for blind and sighted teenagers.

Keywords: Blind, card game, communication, interaction, play, tabletop game, teenager

Procedia PDF Downloads 184

16561 Development of Non-Intrusive Speech Evaluation Measure Using S-Transform and Light-Gbm

Authors: Tusar Kanti Dash, Ganapati Panda

Abstract:

The evaluation of speech quality and intelligence is critical to the overall effectiveness of the Speech Enhancement Algorithms. Several intrusive and non-intrusive measures are employed to calculate these parameters. Non-Intrusive Evaluation is most challenging as, very often, the reference clean speech data is not available. In this paper, a novel non-intrusive speech evaluation measure is proposed using audio features derived from the Stockwell transform. These features are used with the Light Gradient Boosting Machine for the effective prediction of speech quality and intelligibility. The proposed model is analyzed using noisy and reverberant speech from four databases, and the results are compared with the standard Intrusive Evaluation Measures. It is observed from the comparative analysis that the proposed model is performing better than the standard Non-Intrusive models.

Keywords: non-Intrusive speech evaluation, S-transform, light GBM, speech quality, and intelligibility

Procedia PDF Downloads 228

16560 Musical Instrument Recognition in Polyphonic Audio Through Convolutional Neural Networks and Spectrograms

Authors: Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan

Abstract:

This study investigates the task of identifying musical instruments in polyphonic compositions using Convolutional Neural Networks (CNNs) from spectrogram inputs, focusing on binary classification. The model showed promising results, with an accuracy of 97% on solo instrument recognition. When applied to polyphonic combinations of 1 to 10 instruments, the overall accuracy was 64%, reflecting the increasing challenge with larger ensembles. These findings contribute to the field of Music Information Retrieval (MIR) by highlighting the potential and limitations of current approaches in handling complex musical arrangements. Future work aims to include a broader range of musical sounds, including electronic and synthetic sounds, to improve the model's robustness and applicability in real-time MIR systems.

Keywords: binary classifier, CNN, spectrogram, instrument

Procedia PDF Downloads 7

16559 Crosssampler: A Digital Convolution Cross Synthesis Instrument

Authors: Jimmy Eadie

Abstract:

Convolutional Cross Synthesis (CCS) has emerged as a powerful technique for blending input signals to create hybrid sounds. It has significantly expanded the horizons of digital signal processing, enabling artists to explore audio effects. However, the conventional applications of CCS primarily revolve around reverberation and room simulation rather than being utilized as a creative synthesis method. In this paper, we present the design of a digital instrument called CrossSampler that harnesses a parametric approach to convolution cross-synthesis, which involves using adjustable parameters to control the blending of audio signals through convolution. These parameters allow for customization of the resulting sound, offering greater creative control and flexibility. It enables users to shape the output by manipulating factors such as duration, intensity, and spectral characteristics. This approach facilitates experimentation and exploration in sound design and opens new sonic possibilities.

Keywords: convolution, synthesis, sampling, virtual instrument

Procedia PDF Downloads 22

16558 Enhancing VR Exposure Therapy for the Treatment of Phobias with the Use of Photorealistic VR Environments and Stimuli, and the Use of Tactile Feedback Suits and Responsive Systems

Authors: Vardan Melkonyan, Arman Azizyan, Astghik Boyajyan

Abstract:

Virtual reality (VR) exposure therapy is a form of cognitive-behavioral therapy that uses immersive virtual environments to expose individuals to the feared stimuli or situations that trigger their phobia. VR exposure therapy has become an increasingly popular treatment for phobias, including fear of heights, public speaking, and flying, due to its ability to provide a controlled and safe environment for individuals to confront their fears while also allowing therapists to tailor the virtual exposure to the specific needs and goals of each individual. It is also a cost-effective and accessible treatment option, as it can be delivered remotely and does not require the use of drugs. Overall, VR exposure therapy has the potential to be a valuable tool for therapists in the treatment of phobias. But current methods may be improved by incorporating advanced technology such as photorealistic VR environments, tactile feedback suits, and responsive systems. The aim of this study was to identify the most effective approach for enhancing VR exposure therapy for the treatment of phobias. Photorealistic VR environments and stimuli can greatly enhance the effectiveness of VR exposure therapy for the treatment of phobias. By creating immersive, realistic virtual environments that closely mimic the real-life situations that trigger phobia responses, patients are able to more fully engage in the therapeutic process and confront their fears in a controlled and safe manner. This can help to reduce the severity of phobia symptoms and increase treatment outcomes. The use of tactile feedback suits and responsive systems can further enhance the VR exposure therapy experience by adding a physical element to the virtual environment. These suits, which can mimic the sensations of touch, pressure, and movement, allow patients to fully immerse themselves in the virtual world and feel as if they are physically present in the situation. This can help to increase the realism of the virtual environment and make it more effective in reducing phobia symptoms. Additionally, responsive systems can be used to trigger specific events or responses within the virtual environment based on the patient's actions, providing a more interactive and personalized treatment experience. A comprehensive literature review was conducted, including studies on VR exposure therapy for phobias and the use of advanced technology to enhance the therapy. Results indicate that incorporating these enhancements may significantly increase the effectiveness of VR exposure therapy for phobias. Further research is needed to fully understand the potential of these enhancements and to determine the optimal combination and implementation.

Keywords: virtual reality, mental health, phobias, fears, treatment, photorealistic, immersive, phobia

Procedia PDF Downloads 53

16557 A Measurement and Motor Control System for Free Throw Shots in Basketball Using Gyroscope Sensor

Authors: Niloofar Zebarjad

Abstract:

This research aims at finding a tool to provide basketball players with real-time audio feedback on their shooting form in free throw shots. Free throws played a pivotal role in taking the lead in fierce competitions. The major problem in performing an accurate free throw seems to be improper training. Since the arm movement during the free throw shot is complex, the coach or the athlete might miss the movement details during practice. Hence, there is a necessity to create a system that measures arm movements' critical characteristics and control for improper kinematics. The proposed setup in this study quantifies arm kinematics and provides real-time feedback as an audio signal consisting of a gyroscope sensor. Spatial shoulder angle data are transmitted in a mobile application in real-time and can be saved and processed for statistical and analysis purposes. The proposed system is easy to use, inexpensive, portable, and real-time applicable. Objectives: This research aims to modify and control the free throw using audio feedback and determine if and to what extent the new setup reduces errors in arm formations during throws and finally assesses the successful throw rate. Methods: One group of elite basketball athletes and two novice athletes (control and study group) participated in this study. Each group contains 5 participants being studied in three separate sessions over a week. Results: Empirical results showed enhancements in the free throw shooting style, shot pocket (SP), and locked position (LP). The mean values of shoulder angle were controlled on 25° and 45° for SP and LP, respectively, recommended by valid FIBA references. Conclusion: Throughout the experiments, the system helped correct and control the shoulder angles toward the targeted pattern of shot pocket (SP) and locked position (LP). According to the desired results for arm motion, adding another sensor to measure and control the elbow angle is recommended.

Keywords: audio-feedback, basketball, free-throw, locked-position, motor-control, shot-pocket

Procedia PDF Downloads 255