Search results for: audio tactile model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16879

Search results for: audio tactile model

16789 Development of Non-Intrusive Speech Evaluation Measure Using S-Transform and Light-Gbm

Authors: Tusar Kanti Dash, Ganapati Panda

Abstract:

The evaluation of speech quality and intelligence is critical to the overall effectiveness of the Speech Enhancement Algorithms. Several intrusive and non-intrusive measures are employed to calculate these parameters. Non-Intrusive Evaluation is most challenging as, very often, the reference clean speech data is not available. In this paper, a novel non-intrusive speech evaluation measure is proposed using audio features derived from the Stockwell transform. These features are used with the Light Gradient Boosting Machine for the effective prediction of speech quality and intelligibility. The proposed model is analyzed using noisy and reverberant speech from four databases, and the results are compared with the standard Intrusive Evaluation Measures. It is observed from the comparative analysis that the proposed model is performing better than the standard Non-Intrusive models.

Keywords: non-Intrusive speech evaluation, S-transform, light GBM, speech quality, and intelligibility

Procedia PDF Downloads 248
16788 Crosssampler: A Digital Convolution Cross Synthesis Instrument

Authors: Jimmy Eadie

Abstract:

Convolutional Cross Synthesis (CCS) has emerged as a powerful technique for blending input signals to create hybrid sounds. It has significantly expanded the horizons of digital signal processing, enabling artists to explore audio effects. However, the conventional applications of CCS primarily revolve around reverberation and room simulation rather than being utilized as a creative synthesis method. In this paper, we present the design of a digital instrument called CrossSampler that harnesses a parametric approach to convolution cross-synthesis, which involves using adjustable parameters to control the blending of audio signals through convolution. These parameters allow for customization of the resulting sound, offering greater creative control and flexibility. It enables users to shape the output by manipulating factors such as duration, intensity, and spectral characteristics. This approach facilitates experimentation and exploration in sound design and opens new sonic possibilities.

Keywords: convolution, synthesis, sampling, virtual instrument

Procedia PDF Downloads 48
16787 Musical Instrument Recognition in Polyphonic Audio Through Convolutional Neural Networks and Spectrograms

Authors: Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan

Abstract:

This study investigates the task of identifying musical instruments in polyphonic compositions using Convolutional Neural Networks (CNNs) from spectrogram inputs, focusing on binary classification. The model showed promising results, with an accuracy of 97% on solo instrument recognition. When applied to polyphonic combinations of 1 to 10 instruments, the overall accuracy was 64%, reflecting the increasing challenge with larger ensembles. These findings contribute to the field of Music Information Retrieval (MIR) by highlighting the potential and limitations of current approaches in handling complex musical arrangements. Future work aims to include a broader range of musical sounds, including electronic and synthetic sounds, to improve the model's robustness and applicability in real-time MIR systems.

Keywords: binary classifier, CNN, spectrogram, instrument

Procedia PDF Downloads 49
16786 Enhancing VR Exposure Therapy for the Treatment of Phobias with the Use of Photorealistic VR Environments and Stimuli, and the Use of Tactile Feedback Suits and Responsive Systems

Authors: Vardan Melkonyan, Arman Azizyan, Astghik Boyajyan

Abstract:

Virtual reality (VR) exposure therapy is a form of cognitive-behavioral therapy that uses immersive virtual environments to expose individuals to the feared stimuli or situations that trigger their phobia. VR exposure therapy has become an increasingly popular treatment for phobias, including fear of heights, public speaking, and flying, due to its ability to provide a controlled and safe environment for individuals to confront their fears while also allowing therapists to tailor the virtual exposure to the specific needs and goals of each individual. It is also a cost-effective and accessible treatment option, as it can be delivered remotely and does not require the use of drugs. Overall, VR exposure therapy has the potential to be a valuable tool for therapists in the treatment of phobias. But current methods may be improved by incorporating advanced technology such as photorealistic VR environments, tactile feedback suits, and responsive systems. The aim of this study was to identify the most effective approach for enhancing VR exposure therapy for the treatment of phobias. Photorealistic VR environments and stimuli can greatly enhance the effectiveness of VR exposure therapy for the treatment of phobias. By creating immersive, realistic virtual environments that closely mimic the real-life situations that trigger phobia responses, patients are able to more fully engage in the therapeutic process and confront their fears in a controlled and safe manner. This can help to reduce the severity of phobia symptoms and increase treatment outcomes. The use of tactile feedback suits and responsive systems can further enhance the VR exposure therapy experience by adding a physical element to the virtual environment. These suits, which can mimic the sensations of touch, pressure, and movement, allow patients to fully immerse themselves in the virtual world and feel as if they are physically present in the situation. This can help to increase the realism of the virtual environment and make it more effective in reducing phobia symptoms. Additionally, responsive systems can be used to trigger specific events or responses within the virtual environment based on the patient's actions, providing a more interactive and personalized treatment experience. A comprehensive literature review was conducted, including studies on VR exposure therapy for phobias and the use of advanced technology to enhance the therapy. Results indicate that incorporating these enhancements may significantly increase the effectiveness of VR exposure therapy for phobias. Further research is needed to fully understand the potential of these enhancements and to determine the optimal combination and implementation.

Keywords: virtual reality, mental health, phobias, fears, treatment, photorealistic, immersive, phobia

Procedia PDF Downloads 77
16785 A Measurement and Motor Control System for Free Throw Shots in Basketball Using Gyroscope Sensor

Authors: Niloofar Zebarjad

Abstract:

This research aims at finding a tool to provide basketball players with real-time audio feedback on their shooting form in free throw shots. Free throws played a pivotal role in taking the lead in fierce competitions. The major problem in performing an accurate free throw seems to be improper training. Since the arm movement during the free throw shot is complex, the coach or the athlete might miss the movement details during practice. Hence, there is a necessity to create a system that measures arm movements' critical characteristics and control for improper kinematics. The proposed setup in this study quantifies arm kinematics and provides real-time feedback as an audio signal consisting of a gyroscope sensor. Spatial shoulder angle data are transmitted in a mobile application in real-time and can be saved and processed for statistical and analysis purposes. The proposed system is easy to use, inexpensive, portable, and real-time applicable. Objectives: This research aims to modify and control the free throw using audio feedback and determine if and to what extent the new setup reduces errors in arm formations during throws and finally assesses the successful throw rate. Methods: One group of elite basketball athletes and two novice athletes (control and study group) participated in this study. Each group contains 5 participants being studied in three separate sessions over a week. Results: Empirical results showed enhancements in the free throw shooting style, shot pocket (SP), and locked position (LP). The mean values of shoulder angle were controlled on 25° and 45° for SP and LP, respectively, recommended by valid FIBA references. Conclusion: Throughout the experiments, the system helped correct and control the shoulder angles toward the targeted pattern of shot pocket (SP) and locked position (LP). According to the desired results for arm motion, adding another sensor to measure and control the elbow angle is recommended.

Keywords: audio-feedback, basketball, free-throw, locked-position, motor-control, shot-pocket

Procedia PDF Downloads 271
16784 OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction

Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue

Abstract:

OPEN-EmoRecII is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (mimic reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and mimic annotations.

Keywords: open multimodal emotion corpus, annotated labels, intelligent interaction

Procedia PDF Downloads 401
16783 Broadcast Routing in Vehicular Ad hoc Networks (VANETs)

Authors: Muazzam A. Khan, Muhammad Wasim

Abstract:

Vehicular adhoc network (VANET) Cars for network (VANET) allowing vehicles to talk to each other, which is committed to building a strong network of mobile vehicles is technical. In VANETs vehicles are equipped with special devices that can get and share info with the atmosphere and other vehicles in the network. Depending on this data security and safety of the vehicles can be enhanced. Broadcast routing is dispersion of any audio or visual medium of mass communication scattered audience distribute audio and video content, but usually using electromagnetic radiation (waves). The lack of server or fixed infrastructure media messages in VANETs plays an important role for every individual application. Broadcast Message VANETs still open research challenge and requires some effort to come to good solutions. This paper starts with a brief introduction of VANET, its applications, and the law of the message-trends in this network starts. This work provides an important and comprehensive study of reliable broadcast routing in VANET scenario.

Keywords: vehicular ad-hoc network , broadcasting, networking protocols, traffic pattern, low intensity conflict

Procedia PDF Downloads 515
16782 Unveiling Game Designers’ Designing Practices: Five-Essential-Steps Model

Authors: Mifrah Ahmad

Abstract:

Game designing processes vary with the intentions of the game. Digital games have versatile starting and finishing processes and these have been reported throughout the literature over decades. However, the need to understand how game designers’ practice in designing games is approached in the industry and how do they approach designing games is yet to be informed and whether they consider existing models or frameworks in their practice to assist their designing process of games. Therefore, this paper discusses 17 game designers’ participants' perspectives on how they approach designing games and how their experience of designing various games influences their practice. This research is conducted in an Australian context, through a phenomenology approach, where semi-structured interviews were designed and grounded by theory of experience by John Dewey. The audio data collected was analyzed using NVivo and interpreted using the interpretivism paradigm to contextualize the essence of game designers’ experiences in their practice and unfold their designing, developing, and iterative methodologies. As a result, a generic game-designing model is proposed that illuminates a sequence of steps that enables game designers’ initiatives toward a successful game design process. A ‘Five-Essential-Steps’ model (5ESM) for designing digital games may potentially assist early career game designers, gaming researchers as well as academics pursuing the designing process of games, educational games, or serious games.

Keywords: game designers practice, experiential design, designing models, game design approaches, designing process, software design, top-down model

Procedia PDF Downloads 36
16781 The Effectiveness of Using MS SharePoint for the Curriculum Repository System

Authors: Misook Ahn

Abstract:

This study examines the Institutional Curriculum Repository (ICR) developed with MS SharePoint. The purpose of using MS SharePoint is to organize, share, and manage the curriculum data. The ICR aims to build a centralized curriculum infrastructure, preserve all curriculum materials, and provide academic service to users (faculty, students, or other agencies). The ICR collection includes core language curriculum materials developed by each language school—foreign language textbooks, language survival kits, and audio files currently in or not in use at the schools. All core curriculum materials with audio and video files have been coded, collected, and preserved at the ICR. All metadata for the collected curriculum materials have been input by language, code, year, book type, level, user, version, and current status (in use/not in use). The qualitative content analysis, including the survey data, is used to evaluate the effectiveness of using MS SharePoint for the repository system. This study explains how to manage and preserve curriculum materials with MS SharePoint, along with challenges and suggestions for further research. This study will be beneficial to other universities or organizations considering archiving or preserving educational materials.

Keywords: digital preservation, ms sharepoint, repository, curriculum materials

Procedia PDF Downloads 92
16780 A Combined Feature Extraction and Thresholding Technique for Silence Removal in Percussive Sounds

Authors: B. Kishore Kumar, Pogula Rakesh, T. Kishore Kumar

Abstract:

The music analysis is a part of the audio content analysis used to analyze the music by using the different features of audio signal. In music analysis, the first step is to divide the music signal to different sections based on the feature profiles of the music signal. In this paper, we present a music segmentation technique that will effectively segmentize the signal and thresholding technique to remove silence from the percussive sounds produced by percussive instruments, which uses two features of music, namely signal energy and spectral centroid. The proposed method impose thresholds on both the features which will vary depends on the music signal. Depends on the threshold, silence part is removed and the segmentation is done. The effectiveness of the proposed method is analyzed using MATLAB.

Keywords: percussive sounds, spectral centroid, spectral energy, silence removal, feature extraction

Procedia PDF Downloads 572
16779 Error Analysis of the Pronunciation of English Consonants and Arabic Consonants by Egyptian Learners

Authors: Marwa A. Nasser

Abstract:

This is an empirical study that provides an investigation of the most significant errors of Egyptian learners in producing English consonants and Arabic consonants, and advice on how these can be remedied. The study adopts a descriptive approach and the analysis is based on audio recordings of two groups of people. The first group includes six volunteers of Egyptian learners belonging to the English Department at Faculty of Women who learn English as a foreign language. The other group includes six Egyptian learners who are studying Tajweed (how to recite Quran correctly). The audio recordings were examined, and sounds were analyzed in an attempt to highlight the most common error done by the learners while reading English or reading (or reciting) Quran. Results show that the two groups of learners have problems with certain phonemic contrasts. Both groups share common errors although both languages are different and not related (e.g. pre-aspiration of fortis stops, incorrect articulation of consonants and velarization of certain sounds).

Keywords: consonant articulations, Egyptian learners of English, Egyptian learners of Quran, empirical study, error analysis, pronunciation problems

Procedia PDF Downloads 262
16778 Cloud Shield: Model to Secure User Data While Using Content Delivery Network Services

Authors: Rachna Jain, Sushila Madan, Bindu Garg

Abstract:

Cloud computing is the key powerhouse in numerous organizations due to shifting of their data to the cloud environment. In recent years it has been observed that cloud-based-services are being used on large scale for content storage, distribution and processing. Various issues have been observed in cloud computing environment that need to be addressed. Security and privacy are found topmost concern area. In this paper, a novel security model is proposed to secure data by utilizing CDN services like image to icon conversion. CDN Service is a content delivery service which converts an image to icon, word to pdf & Latex to pdf etc. Presented model is used to convert an image into icon by keeping image secret. Here security of image is imparted so that image should be encrypted and decrypted by data owners only. It is also discussed in the paper that how server performs multiplication and selection on encrypted data without decryption. The data can be image file, word file, audio or video file. Moreover, the proposed model is capable enough to multiply images, encrypt them and send to a server application for conversion. Eventually, the prime objective is to encrypt an image and convert the encrypted image to image Icon by utilizing homomorphic encryption.

Keywords: cloud computing, user data security, homomorphic encryption, image multiplication, CDN service

Procedia PDF Downloads 325
16777 FlameCens: Visualization of Expressive Deviations in Music Performance

Authors: Y. Trantafyllou, C. Alexandraki

Abstract:

Music interpretation accounts to the way musicians shape their performance by deliberately deviating from composers’ intentions, which are commonly communicated via some form of music transcription, such as a music score. For transcribed and non-improvised music, music expression is manifested by introducing subtle deviations in tempo, dynamics and articulation during the evolution of performance. This paper presents an application, named FlameCens, which, given two recordings of the same piece of music, presumably performed by different musicians, allow visualising deviations in tempo and dynamics during playback. The application may also compare a certain performance to the music score of that piece (i.e. MIDI file), which may be thought of as an expression-neutral representation of that piece, hence depicting the expressive queues employed by certain performers. FlameCens uses the Dynamic Time Warping algorithm to compare two audio sequences, based on CENS (Chroma Energy distribution Normalized Statistics) audio features. Expressive deviations are illustrated in a moving flame, which is generated by an animation of particles. The length of the flame is mapped to deviations in dynamics, while the slope of the flame is mapped to tempo deviations so that faster tempo changes the slope to the right and slower tempo changes the slope to the left. Constant slope signifies no tempo deviation. The detected deviations in tempo and dynamics can be additionally recorded in a text file, which allows for offline investigation. Moreover, in the case of monophonic music, the color of particles is used to convey the pitch of the notes during performance. FlameCens has been implemented in Python and it is openly available via GitHub. The application has been experimentally validated for different music genres including classical, contemporary, jazz and popular music. These experiments revealed that FlameCens can be a valuable tool for music specialists (i.e. musicians or musicologists) to investigate the expressive performance strategies employed by different musicians, as well as for music audience to enhance their listening experience.

Keywords: audio synchronization, computational music analysis, expressive music performance, information visualization

Procedia PDF Downloads 118
16776 Two Kinds of Self-Oscillating Circuits Mechanically Demonstrated

Authors: Shiang-Hwua Yu, Po-Hsun Wu

Abstract:

This study introduces two types of self-oscillating circuits that are frequently found in power electronics applications. Special effort is made to relate the circuits to the analogous mechanical systems of some important scientific inventions: Galileo’s pendulum clock and Coulomb’s friction model. A little touch of related history and philosophy of science will hopefully encourage curiosity, advance the understanding of self-oscillating systems and satisfy the aspiration of some students for scientific literacy. Finally, the two self-oscillating circuits are applied to design a simple class-D audio amplifier.

Keywords: self-oscillation, sigma-delta modulator, pendulum clock, Coulomb friction, class-D amplifier

Procedia PDF Downloads 342
16775 1D Convolutional Networks to Compute Mel-Spectrogram, Chromagram, and Cochleogram for Audio Networks

Authors: Elias Nemer, Greg Vines

Abstract:

Time-frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Cochleogram have been proven more effective and convenient than training on-time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different features. In this paper, we provide a PyTorch framework for creating various spectral features as well as time-frequency transformation and time-domain filter-banks using the built-in trainable conv1d() layer. This allows computing these features on the fly as part of a larger network and enabling easier experimentation with various combinations and parameters. Our work extends the work in the literature developed for that end: First, by adding more of these features and also by allowing the possibility of either starting from initialized kernels or training them from random values. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes or simply use as is and add more layers for various applications.

Keywords: neural networks Mel-Spectrogram, chromagram, cochleogram, discrete Fourrier transform, PyTorch conv1d()

Procedia PDF Downloads 220
16774 When Your Change The Business Model ~ You Change The World

Authors: H. E. Amb. Terry Earthwind Nichols

Abstract:

Over the years Ambassador Nichols observed that successful companies all have one thing in common - belief in people. His observations of people in many companies, industries, and countries have also concluded one thing - groups of achievers far exceed the expectations and timelines of their superiors. His experience with achieving this has brought forth a model for the 21st century that will not only exceed expectations of companies, but it will also set visions for the future of business globally. It is time for real discussion around the future of work and the business model that will set the example for the world. Methodologies: In-person observations over 40 years – Ambassador Nichols present during the observations. Audio-visual observations – TV, Cinema, social media (YouTube, etc.), various news outlet Reading the autobiography of some of successful leaders over the last 75 years that lead their companies from a distinct perspective your people are your commodity. Major findings: People who believe in the leader’s vision for the company so much so that they remain excited about the future of the company and want to do anything in their power to ethically achieve that vision. People who are achieving regularly in groups, division, companies, etcetera: Live more healthfully lowering both sick time off and on-the-job accidents. Cannot wait to physically get to work as much as they can to feed off the high energy present in these companies. They are fully respected and supported resulting in near zero attrition. Simply put – they do not “Burn Out”. Conclusion: To the author’s best knowledge, 20th century practices in business are no longer valid and people are not going to work in those environments any longer. The average worker in the post-covid world is better educated than 50 years ago and most importantly, they have real-time information about any subject and can stream injustices as they happen. The Consortium Model is just the model for the evolution of both humankind and business in the 21st century.

Keywords: business model, future of work, people, paradigm shift, business management

Procedia PDF Downloads 63
16773 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition

Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie

Abstract:

In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.

Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks

Procedia PDF Downloads 95
16772 Colloquialism in Audiovisual Translation: English Subtitling of the Lebanese Film Capernaum as a Case Study

Authors: Fatima Saab

Abstract:

This paper attempts to study colloquialism in audio-visual translation, with particular emphasis given to investigating the difficulties and challenges encountered by subtitlers in translating Lebanese colloquial into English. To achieve the main objectives of this study, ample and thorough cultural and translational analysis of examples drawn from the subtitled movie Capernaum are presented in order to identify the strategies used to overcome cultural barriers and differences and to show the process of decision-making by the translator. Also, special attention is given to explain the technicalities in translating subtitles and how they affect the translation process. The research is a descriptive analytical study whereby the writer sets out empirical observations, consisting of descriptive and analytical examination of the difficulties and problems associated with translating Arabic colloquialisms, specifically Lebanese, into English in the subtitled film, Capernaum. The research methodology utilizes a qualitative approach to group the selected data into the subtitling strategies presented by Gottlieb under the domesticating or foreignizing strategies according to Venuti's Model. It is shown that producing the same meanings to a foreign audience is not an easy task. The background of cultural elements and the stories that make up the history and mindset of the Lebanese and Arabic peoples leads to the use of the transfer and paraphrase methodologies most of the time (81% of the sample used for analysis). The research shows that translating and subtitling colloquialism needs special skills by the translators to overcome the challenges imposed by the limited presentation space as well as cultural differences. Translation of colloquial Arabic/Lebanese can be achieved to a certain extent and delivering the meaning and effect of the source language culture is accomplished in as much as the translator investigates and relates to the target culture.

Keywords: Lebanese colloquial, audio-visual translation, subtitling, Capernaum

Procedia PDF Downloads 136
16771 New Methods to Acquire Grammatical Skills in A Foreign Language

Authors: Indu ray

Abstract:

In today’s digital world the internet is already flooded with information on how to master grammar in a foreign language. It is well known that one cannot master a language without grammar. Grammar is the backbone of any language. Without grammar there would be no structure to help you speak/write or listen/read. Successful communication is only possible if the form and function of linguistic utterances are firmly related to one another. Grammar has its own rules of use to formulate an easier-to-understand language. Like a tool, grammar formulates our thoughts and knowledge in a meaningful way. Every language has its own grammar. With grammar, we can quickly analyze whether there is any action in this text: (Present, past, future). Knowledge of grammar is an important prerequisite for mastering a foreign language. What’s most important is how teachers can make grammar lessons more interesting for students and thus promote grammar skills more successfully. Through this paper, we discuss a few important methods like (Interactive Grammar Exercises between students, Interactive Grammar Exercise between student to teacher, Grammar translation method, Audio -Visual Method, Deductive Method, Inductive Method). This paper is divided into two sections. In the first part, brief definitions and principles of these approaches will be provided. Then the possibility and the case of combination of this approach will be analyzed. In the last section of the paper, I would like to present a survey result conducted at my university on a few methods to quickly learn grammar in Foreign Language. We divided the Grammatical Skills in six Parts. 1.Grammatical Competence 2. Speaking Skills 3. Phonology 4. The syntax and the Semantics 5. Rule 6. Cognitive Function and conducted a survey among students. From our survey results, we can observe that phonology, speaking ability, syntax and semantics can be improved by inductive method, Audio-visual Method, and grammatical translation method, for grammar rules and cognitive functions we should choose IGE (teacher-student) method. and the IGE method (pupil-pupil). The study’s findings revealed, that the teacher delivery Methods should be blend or fusion based on the content of the Grammar.

Keywords: innovative method, grammatical skills, audio-visual, translation

Procedia PDF Downloads 57
16770 A Peer-Produced Community of Learning: The Case of Second-Year Algerian Masters Students at a Distance

Authors: Nihad Alem

Abstract:

Nowadays, distance learning (DL) is widely perceived as a reformed type of education that takes advantage of technology to give more appealing opportunities especially for learners whose life conditions impede their attendance to regular classrooms however, creating interactional environment for students to expand their learning community and alleviate the feeling of loneliness and isolation should receive more attention when designing a distance learning course. This research aims to explore whether the audio/video peer learning can offer pedagogical add-ons to the Algerian distance learners and what are the pros and cons of its application as an educational experience in a synchronous environment mediated by Skype. Data were collected using video recordings of six sessions, reflective logs, and in-depth semi-structured interviews and will be analyzed by qualitatively identifying and measuring the three constitutional elements of the educational experience of peer learning namely the social presence, the cognitive presence, and the facilitation presence using a modified community of inquiry coding template. The findings from this study will provide recommendations for effective peer learning educational experience using the facilitation presence concept.

Keywords: audio/visual peer learning, community of inquiry, distance learning, facilitation presence

Procedia PDF Downloads 131
16769 Emotion-Convolutional Neural Network for Perceiving Stress from Audio Signals: A Brain Chemistry Approach

Authors: Anup Anand Deshmukh, Catherine Soladie, Renaud Seguier

Abstract:

Emotion plays a key role in many applications like healthcare, to gather patients’ emotional behavior. Unlike typical ASR (Automated Speech Recognition) problems which focus on 'what was said', it is equally important to understand 'how it was said.' There are certain emotions which are given more importance due to their effectiveness in understanding human feelings. In this paper, we propose an approach that models human stress from audio signals. The research challenge in speech emotion detection is finding the appropriate set of acoustic features corresponding to an emotion. Another difficulty lies in defining the very meaning of emotion and being able to categorize it in a precise manner. Supervised Machine Learning models, including state of the art Deep Learning classification methods, rely on the availability of clean and labelled data. One of the problems in affective computation is the limited amount of annotated data. The existing labelled emotions datasets are highly subjective to the perception of the annotator. We address the first issue of feature selection by exploiting the use of traditional MFCC (Mel-Frequency Cepstral Coefficients) features in Convolutional Neural Network. Our proposed Emo-CNN (Emotion-CNN) architecture treats speech representations in a manner similar to how CNN’s treat images in a vision problem. Our experiments show that Emo-CNN consistently and significantly outperforms the popular existing methods over multiple datasets. It achieves 90.2% categorical accuracy on the Emo-DB dataset. We claim that Emo-CNN is robust to speaker variations and environmental distortions. The proposed approach achieves 85.5% speaker-dependant categorical accuracy for SAVEE (Surrey Audio-Visual Expressed Emotion) dataset, beating the existing CNN based approach by 10.2%. To tackle the second problem of subjectivity in stress labels, we use Lovheim’s cube, which is a 3-dimensional projection of emotions. Monoamine neurotransmitters are a type of chemical messengers in the brain that transmits signals on perceiving emotions. The cube aims at explaining the relationship between these neurotransmitters and the positions of emotions in 3D space. The learnt emotion representations from the Emo-CNN are mapped to the cube using three component PCA (Principal Component Analysis) which is then used to model human stress. This proposed approach not only circumvents the need for labelled stress data but also complies with the psychological theory of emotions given by Lovheim’s cube. We believe that this work is the first step towards creating a connection between Artificial Intelligence and the chemistry of human emotions.

Keywords: deep learning, brain chemistry, emotion perception, Lovheim's cube

Procedia PDF Downloads 136
16768 A New Nonlinear State-Space Model and Its Application

Authors: Abdullah Eqal Al Mazrooei

Abstract:

In this work, a new nonlinear model will be introduced. The model is in the state-space form. The nonlinearity of this model is in the state equation where the state vector is multiplied by its self. This technique makes our model generalizes many famous models as Lotka-Volterra model and Lorenz model which have many applications in the real life. We will apply our new model to estimate the wind speed by using a new nonlinear estimator which suitable to work with our model.

Keywords: nonlinear systems, state-space model, Kronecker product, nonlinear estimator

Procedia PDF Downloads 675
16767 The Role of Smart Educational Aids in Learning Listening Among Pupils with Attention and Listening Problems

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Adham Al Yaari, Aayah Al Yaari, Montaha Al Yaari, Ayman Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

The recent rise of smart educational aids and the move away from traditional listening aids are leading to a fundamental shift in the way in which individuals with attention and listening problems (ALP) manipulate listening inputs and/or act appropriately to the spoken information presented to them. A total sample of twenty-six ALP pupils (m=20 and f=6) between 7-12 years old was selected from different strata based on gender, region and school. In the sample size, thirteen (10 males and 3 females) received the treatment in terms of smart classes provided with smart educational aids in a listening course that lasted for four months, while others did not (they studied the same course by the same instructor but in ordinary class). A pretest was administered to assess participants’ levels, and a posttest was given to evaluate their attention and listening comprehension performance, namely in phonetic and phonological tests with sociolinguistic themes that have been designed for this purpose. Test results were analyzed both psychoneurolinguistically and statistically. Results reveal a remarkable change in pupils’ behavioral listening where scores witnessed a significant difference in the performance of the experimental ALP group in the pretest compared to the posttest (Pupils performed better at the pretest-posttest on phonetics than at the two tests on phonology). It is concluded that smart educational aids designed for listening skills help not only increase the listening command of pupils with ALP to understand what they listen to but also develop their interactive listening capability and, at the same rate, are responsible for increasing concentrated and in-depth listening capacity. Plus, ALP pupils become able to grasp the audio content of text recordings, including educational audio recordings, news, oral stories and tales, views, spiritual/religious text and general knowledge. However, the pupils have not experienced individual smart audio-visual aids that connect listening to other language receptive and productive skills, which could be the future area of research.

Keywords: smart aids, attention, listening, problems

Procedia PDF Downloads 30
16766 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language

Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim

Abstract:

The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.

Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition

Procedia PDF Downloads 311
16765 The Impact of Smart Educational Aids in Learning Listening Among Pupils with Attention and Listening Problems

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Adham Al Yaari, Ayah Al Yaari, Ayman Al Yaari, Montaha Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

The recent rise of smart educational aids and the move away from traditional listening aids are leading to a fundamental shift in the way in which individuals with attention and listening problems (ALP) manipulate listening inputs and/or act appropriately to the spoken information presented to them. A total sample of twenty-six ALP pupils (m=20 and f=6) between 7-12 years old was selected from different strata based on gender, region and school. In the sample size, thirteen (10 males and 3 females) received the treatment in terms of smart classes provided with smart educational aids in a listening course that lasted for four months, while others did not (they studied the same course by the same instructor but in ordinary class). A pretest was administered to assess participants’ levels, and a posttest was given to evaluate their attention and listening comprehension performance, namely in phonetic and phonological tests with sociolinguistic themes that have been designed for this purpose. Test results were analyzed both psychoneurolinguistically and statistically. Results reveal a remarkable change in pupils’ behavioral listening where scores witnessed a significant difference in the performance of the experimental ALP group in the pretest compared to the posttest (Pupils performed better at the pretest-posttest on phonetics than at the two tests on phonology). It is concluded that smart educational aids designed for listening skills help not only increase the listening command of pupils with ALP to understand what they listen to but also develop their interactive listening capability and, at the same rate, are responsible for increasing concentrated and in-depth listening capacity. Plus, ALP pupils become able to grasp the audio content of text recordings, including educational audio recordings, news, oral stories and tales, views, spiritual/religious text and general knowledge. However, the pupils have not experienced individual smart audio-visual aids that connect listening to other language receptive and productive skills, which could be the future area of research.

Keywords: smart educational aids, listening attention, pupils, problems

Procedia PDF Downloads 33
16764 The Impact of Smart Educational Aids in Learning Listening Among Pupils with Attention and Listening Problems

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Aayah Al Yaari, Ayman Al Yaari, Montaha Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

The recent rise of smart educational aids and the move away from traditional listening aids are leading to a fundamental shift in the way in which individuals with attention and listening problems (ALP) manipulate listening inputs and/or act appropriately to the spoken information presented to them. A total sample of twenty-six ALP pupils (m=20 and f=6) between 7-12 years old was selected from different strata based on gender, region and school. In the sample size, thirteen (10 males and 3 females) received the treatment in terms of smart classes provided with smart educational aids in a listening course that lasted for four month-semester while others did not (they studied the same course by the same instructor but in ordinary class). A pretest was administered to assess participants’ levels, and a posttest was given to evaluate their attention and listening comprehension performance, namely in phonetic and phonological tests with sociolinguistic themes that have been designed for this purpose. Test results were analyzed both psychoneurolinguistically and statistically. Results reveal a remarkable change in pupils’ behavioral listening where scores witnessed a significant difference in the performance of the experimental ALP group in the pretest compared to the posttest (Pupils performed better at the pretest-posttest on phonetics than at the two tests on phonology). It is concluded that smart educational aids designed for listening skills help not only increase the listening command of pupils with ALP to understand what they listen to but also develop their interactive listening capability and, at the same rate, are responsible for increasing concentrated and in-depth listening capacity. Plus, ALP pupils become able to grasp the audio content of text recordings, including educational audio recordings, news, oral stories and tales, views, spiritual/religious text and general knowledge. However, the pupils have not experienced individual smart audio-visual aids that connect listening to other language receptive and productive skills, which could be the future area of research.

Keywords: language skills, implementing, listening skill, attention, smart aids

Procedia PDF Downloads 30
16763 Heuristic Classification of Hydrophone Recordings

Authors: Daniel M. Wolff, Patricia Gray, Rafael de la Parra Venegas

Abstract:

An unsupervised machine listening system is constructed and applied to a dataset of 17,195 30-second marine hydrophone recordings. The system is then heuristically supplemented with anecdotal listening, contextual recording information, and supervised learning techniques to reduce the number of false positives. Features for classification are assembled by extracting the following data from each of the audio files: the spectral centroid, root-mean-squared values for each frequency band of a 10-octave filter bank, and mel-frequency cepstral coefficients in 5-second frames. In this way both time- and frequency-domain information are contained in the features to be passed to a clustering algorithm. Classification is performed using the k-means algorithm and then a k-nearest neighbors search. Different values of k are experimented with, in addition to different combinations of the available feature sets. Hypothesized class labels are 'primarily anthrophony' and 'primarily biophony', where the best class result conforming to the former label has 104 members after heuristic pruning. This demonstrates how a large audio dataset has been made more tractable with machine learning techniques, forming the foundation of a framework designed to acoustically monitor and gauge biological and anthropogenic activity in a marine environment.

Keywords: anthrophony, hydrophone, k-means, machine learning

Procedia PDF Downloads 157
16762 Effects of Gym-Based and Audio-Visual Guided Home-Based Exercise Programmes on Some Anthropometric and Cardiovascular Parameters Among Overweight and Obese College Students

Authors: Abiodun Afolabi, Rufus Adesoji Adedoyin

Abstract:

This study investigated and compared the effects of gym-based exercise programme (GEBP) and audio-visual guided home-based exercise programme (AVGHBEP) on selected Anthropometric variables (Weight (W), Body Mass Index (BMI), Waist Circumference (WC), Hip Circumference (HC), Thigh Circumference (TC), Waist-Hip-Ratio (WHR), Waist-Height-Ratio (WHtR), Waist-Thigh-Ratio (WTR), Biceps Skinfold Thickness (BSFT), Triceps Skinfold Thickness (TSFT), Suprailliac Skinfold Thickness (SISFT), Subscapular Skinfold Thickness (SSSFT) and Percent Body Fat (PBF)); and Cardiovasular variables (Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP) and Heart Rate (HR)) of overweight and obese students of Federal College of Education (Special), Oyo, Oyo State, Nigeria, with a view to providing information and evidence for GBEP and AVGHBEP in reducing overweight and obesity for promoting cardiovascular fitness. Eighty overweight and obese students (BMI ≥ 25 Kg/m²) were involved in this pretest-posttest quasi experimental study. Participants were randomly assigned into GBEP (n = 40) and AVGBBEP (n = 40) groups. Anthropometric and cardiovascular variables were measured using a weighing scale, height meter, tape measure, skinfold caliper and electronic sphygmomanometer following standard protocols. GBEP and AVGHBEP were implemented following a circuit training (aerobic and resistance training) pattern with a duration of 40-60 minutes, thrice weekly for twelve weeks. GBEP consisted of gymnasium supervised exercise programme while AVGHBEP is a Visual Display guided exercise programme conducted at the home setting. Data were analyzed by Descriptive and Inferential Statistics. The mean ages of the participants were 22.55 ± 2.55 and 23.65 ± 2.89 years for the GBEP group and AVGHBEP group, respectively. Findings showed that in the GBEP group, there were significant reductions in anthropometric variables and adiposity measures of Weight, BMI, BSFT, TSFT, SISFT, SSSFT, WC, HC, TC, WHtR, and PBF at week 12 of the study. Similarly, in the AVGHBEP group, there were significant reductions in Weight, BMI, BSFT, TSFT, SISFT, SSSFT, WC, HC, TC, WHtR and PBF at the 12th week of intervention. Comparison of the effects of GEBP and AVGHBEP on anthropometric variables and measures of adiposity showed that there was no significant difference between the two groups in weight, BMI, BSFT, TSFT, SISFT, SSSFT, WC, HC, TC, WHR, WHtR, WTR and PBF between the two groups at week 12 of the study. Furthermore, findings on the effects of exercise on programmes on cardiovascular variables revealed that significant reductions occurred in SBP in GBEP group and AVGHBEP group respectively. Comparison of the effects of GBEP and AVGHBEP on cardiovascular variables showed that there was no significant difference in SBP, DBP and HR between the two groups at week 12 of the study. It was concluded that the Audio-Visual Guided Home-based Exercise Programme was as effective as the Gym-Based Exercise Programme in causing a significant reduction in anthropometric variables and body fat among college students who are overweight and obese over a period of twelve weeks. Both Gymnasium-Based Exercise Programme and Audio-Visual Guided Home-Based Exercise Programme led to significant reduction in Systolic Blood Pressure over a period of weeks. Audio-Visual Guided Home-Based Exercise Programme can, therefore, be used as an alternative therapy in the non-pharmacological management of people who are overweight and obese.

Keywords: gym-based exercises, audio-visual guided home-based exercises, anthropometric parameters, cardiovascular parameters, overweight students, obese students

Procedia PDF Downloads 24
16761 The Modification of Convolutional Neural Network in Fin Whale Identification

Authors: Jiahao Cui

Abstract:

In the past centuries, due to climate change and intense whaling, the global whale population has dramatically declined. Among the various whale species, the fin whale experienced the most drastic drop in number due to its popularity in whaling. Under this background, identifying fin whale calls could be immensely beneficial to the preservation of the species. This paper uses feature extraction to process the input audio signal, then a network based on AlexNet and three networks based on the ResNet model was constructed to classify fin whale calls. A mixture of the DOSITS database and the Watkins database was used during training. The results demonstrate that a modified ResNet network has the best performance considering precision and network complexity.

Keywords: convolutional neural network, ResNet, AlexNet, fin whale preservation, feature extraction

Procedia PDF Downloads 104
16760 Force Sensor for Robotic Graspers in Minimally Invasive Surgery

Authors: Naghmeh M. Bandari, Javad Dargahi, Muthukumaran Packirisamy

Abstract:

Robot-assisted minimally invasive surgery (RMIS) has been widely performed around the world during the last two decades. RMIS demonstrates significant advantages over conventional surgery, e.g., improving the accuracy and dexterity of a surgeon, providing 3D vision, motion scaling, hand-eye coordination, decreasing tremor, and reducing x-ray exposure for surgeons. Despite benefits, surgeons cannot touch the surgical site and perceive tactile information. This happens due to the remote control of robots. The literature survey identified the lack of force feedback as the riskiest limitation in the existing technology. Without the perception of tool-tissue contact force, the surgeon might apply an excessive force causing tissue laceration or insufficient force causing tissue slippage. The primary use of force sensors has been to measure the tool-tissue interaction force in real-time in-situ. Design of a tactile sensor is subjected to a set of design requirements, e.g., biocompatibility, electrical-passivity, MRI-compatibility, miniaturization, ability to measure static and dynamic force. In this study, a planar optical fiber-based sensor was proposed to mount at the surgical grasper. It was developed based on the light intensity modulation principle. The deflectable part of the sensor was a beam modeled as a cantilever Euler-Bernoulli beam on rigid substrates. A semi-cylindrical indenter was attached to the bottom surface the beam at the mid-span. An optical fiber was secured at both ends on the same rigid substrates. The indenter was in contact with the fiber. External force on the sensor caused deflection in the beam and optical fiber simultaneously. The micro-bending of the optical fiber would consequently result in light power loss. The sensor was simulated and studied using finite element methods. A laser light beam with 800nm wavelength and 5mW power was used as the input to the optical fiber. The output power was measured using a photodetector. The voltage from photodetector was calibrated to the external force for a chirp input (0.1-5Hz). The range, resolution, and hysteresis of the sensor were studied under monotonic and harmonic external forces of 0-2.0N with 0 and 5Hz, respectively. The results confirmed the validity of proposed sensing principle. Also, the sensor demonstrated an acceptable linearity (R2 > 0.9). A minimum external force was observed below which no power loss was detectable. It is postulated that this phenomenon is attributed to the critical angle of the optical fiber to observe total internal reflection. The experimental results were of negligible hysteresis (R2 > 0.9) and in fair agreement with the simulations. In conclusion, the suggested planar sensor is assessed to be a cost-effective solution, feasible, and easy to use the sensor for being miniaturized and integrated at the tip of robotic graspers. Geometrical and optical factors affecting the minimum sensible force and the working range of the sensor should be studied and optimized. This design is intrinsically scalable and meets all the design requirements. Therefore, it has a significant potential of industrialization and mass production.

Keywords: force sensor, minimally invasive surgery, optical sensor, robotic surgery, tactile sensor

Procedia PDF Downloads 213