Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1629

Search results for: speech compression

1479 Speech Emotion Recognition with Bi-GRU and Self-Attention based Feature Representation

Abstract:

Speech is considered an essential and most natural medium for the interaction between machines and humans. However, extracting effective features for speech emotion recognition (SER) is remains challenging. The present studies show that the temporal information captured but high-level temporal-feature learning is yet to be investigated. In this paper, we present an efficient novel method using the Self-attention (SA) mechanism in a combination of Convolutional Neural Network (CNN) and Bi-directional Gated Recurrent Unit (Bi-GRU) network to learn high-level temporal-feature. In order to further enhance the representation of the high-level temporal-feature, we integrate a Bi-GRU output with learnable weights features by SA, and improve the performance. We evaluate our proposed method on our created SITB-OSED and IEMOCAP databases. We report that the experimental results of our proposed method achieve state-of-the-art performance on both databases.

Keywords: Bi-GRU, 1D-CNNs, self-attention, speech emotion recognition

Procedia PDF Downloads 102

1478 Experimental Study of Different Types of Concrete in Uniaxial Compression Test

Authors: Khashayar Jafari, Mostafa Jafarian Abyaneh, Vahab Toufigh

Abstract:

Polymer concrete (PC) is a distinct concrete with superior characteristics in comparison to ordinary cement concrete. It has become well-known for its applications in thin overlays, floors and precast components. In this investigation, the mechanical properties of PC with different epoxy resin contents, ordinary cement concrete (OCC) and lightweight concrete (LC) have been studied under uniaxial compression test. The study involves five types of concrete, with each type being tested four times. Their complete elastic-plastic behavior was compared with each other through the measurement of volumetric strain during the tests. According to the results, PC showed higher strength, ductility and energy absorption with respect to OCC and LC.

Keywords: polymer concrete, ordinary cement concrete, lightweight concrete, uniaxial compression test, volumetric strain

Procedia PDF Downloads 380

1477 Comparing ITV Definitions From 4D CT-PET and Breath-Hold Technique with Abdominal Compression

Authors: R. D. Esposito, P. Dorado Rodriguez, D. Planes Meseguer

Abstract:

In this work, we compare the contour of Internal Target Volume (ITV), for Stereotactic Body Radiation Therapy (SBRT) of a patient affected by a single liver metastasis, obtained from two different patient data acquisition techniques. The first technique consists in a free breathing Computer Tomography (CT) scan acquisition, followed by exhalation breath-hold and inhalation breath-hold CT scans, all of them applying abdominal compression while the second technique consists in a free breathing 4D CT-PET (Positron Emission Tomography) scan. Results obtained with these two methods are consistent, which demonstrate that at least for this specific case, both techniques are adequate for ITV contouring in SBRT treatments.

Keywords: 4D CT-PET, abdominal compression, ITV, SBRT

Procedia PDF Downloads 429

1476 The Impact of Direct and Indirect Pressure Measuring Systems on the Pressure Mapping for the Medical Compression Garments

Authors: Arash M. Shahidi, Tilak Dias, Gayani K. Nandasiri

Abstract:

While graduated compression is the foundation of treatment and management of many medical complications such as leg ulcer, varicose veins, and lymphedema, monitoring the interface pressure has been conducted using different sensors that operate based on diverse approaches. The variations existed from the pressure readings collected using different interface pressure measurement systems would cause difficulties in taking a decision regarding the compression therapy. It is crucial to acknowledge the differences existing between direct and indirect pressure measurement systems while considering the commercially available systems such as AMI, Picopress and OPM which are under direct measurements systems, and HATRA (BSI), HOSY (RAL-GZ) and FlexiForce which comes under the indirect measurement system. Furthermore, Piezo-resistive sensors (Flexiforce) can measure the changes in resistance corresponding to the applied force on the sensing area. Direct pressure measuring systems are capable of measuring interface pressure on the three-dimensional states, while the indirect pressure measuring systems stretch the fabric in the two-dimensional direction and extrapolate pressure from surface tension measured on the device and neglect the vital factor which is the radius of curvature. In this study, a leg mannequin of known dimensions is selected with a knitted class 3 compression stocking. It has been decided to evaluate the data collected from different available systems (AMI, PicoPress, FlexiForce, and HATRA) and compare the results. The results showed a discrepancy between Hatra, AMI, Picopress, and Flexiforce against the pressure standard used to generate class 3 compression stocking. As predicted a higher pressure value with direct interface measuring systems were monitored against HATRA due to the effect of the radius of curvature.

Keywords: AMI, FlexiForce, graduated compression, HATRA, interface pressure, PicoPress

Procedia PDF Downloads 327

1475 An Investigation into the Influence of Compression on 3D Woven Preform Thickness and Architecture

Authors: Calvin Ralph, Edward Archer, Alistair McIlhagger

Abstract:

3D woven textile composites continue to emerge as an advanced material for structural applications and composite manufacture due to their bespoke nature, through thickness reinforcement and near net shape capabilities. When 3D woven preforms are produced, they are in their optimal physical state. As 3D weaving is a dry preforming technology it relies on compression of the preform to achieve the desired composite thickness, fibre volume fraction (Vf) and consolidation. This compression of the preform during manufacture results in changes to its thickness and architecture which can often lead to under-performance or changes of the 3D woven composite. Unlike traditional 2D fabrics, the bespoke nature and variability of 3D woven architectures makes it difficult to know exactly how each 3D preform will behave during processing. Therefore, the focus of this study is to investigate the effect of compression on differing 3D woven architectures in terms of structure, crimp or fibre waviness and thickness as well as analysing the accuracy of available software to predict how 3D woven preforms behave under compression. To achieve this, 3D preforms are modelled and compression simulated in Wisetex with varying architectures of binder style, pick density, thickness and tow size. These architectures have then been woven with samples dry compression tested to determine the compressibility of the preforms under various pressures. Additional preform samples were manufactured using Resin Transfer Moulding (RTM) with varying compressive force. Composite samples were cross sectioned, polished and analysed using microscopy to investigate changes in architecture and crimp. Data from dry fabric compression and composite samples were then compared alongside the Wisetex models to determine accuracy of the prediction and identify architecture parameters that can affect the preform compressibility and stability. Results indicate that binder style/pick density, tow size and thickness have a significant effect on compressibility of 3D woven preforms with lower pick density allowing for greater compression and distortion of the architecture. It was further highlighted that binder style combined with pressure had a significant effect on changes to preform architecture where orthogonal binders experienced highest level of deformation, but highest overall stability, with compression while layer to layer indicated a reduction in fibre crimp of the binder. In general, simulations showed a relative comparison to experimental results; however, deviation is evident due to assumptions present within the modelled results.

Keywords: 3D woven composites, compression, preforms, textile composites

Procedia PDF Downloads 119

1474 Comparative Exergy Analysis of Vapor Compression Refrigeration System Using Alternative Refrigerants

Authors: Gulshan Sachdeva, Vaibhav Jain

Abstract:

In present paper, the performance of various alternative refrigerants is compared to find the substitute of R22, the widely used hydrochlorofluorocarbon refrigerant in developing countries. These include the environmentally friendly hydrofluorocarbon (HFC) refrigerants such as R134A, R410A, R407C and M20. In the present study, a steady state thermodynamic model (includes both first and second law analysis) which simulates the working of an actual vapor-compression system is developed. The model predicts the performance of system with alternative refrigerants. Considering the recent trends of replacement of ozone depleting refrigerants and improvement in system efficiency, R407C is found to be potential candidate to replace R22 refrigerant in the present study.

Keywords: refrigeration, compression system, performance study, modeling, R407C

Procedia PDF Downloads 296

1473 Exploring Deep Neural Network Compression: An Overview

Authors: Ghorab Sara, Meziani Lila, Rubin Harvey Stuart

Abstract:

The rapid growth of deep learning has led to intricate and resource-intensive deep neural networks widely used in computer vision tasks. However, their complexity results in high computational demands and memory usage, hindering real-time application. To address this, research focuses on model compression techniques. The paper provides an overview of recent advancements in compressing neural networks and categorizes the various methods into four main approaches: network pruning, quantization, network decomposition, and knowledge distillation. This paper aims to provide a comprehensive outline of both the advantages and limitations of each method.

Keywords: model compression, deep neural network, pruning, knowledge distillation, quantization, low-rank decomposition

Procedia PDF Downloads 24

1472 Objective Evaluation on Medical Image Compression Using Wavelet Transformation

Authors: Amhimmid Mohammed Saffour, Mustafa Mohamed Abdullah

Abstract:

The use of computers for handling image data in the healthcare is growing. However, the amount of data produced by modern image generating techniques is vast. This data might be a problem from a storage point of view or when the data is sent over a network. This paper using wavelet transform technique for medical images compression. MATLAB program, are designed to evaluate medical images storage and transmission time problem at Sebha Medical Center Libya. In this paper, three different Computed Tomography images which are abdomen, brain and chest have been selected and compressed using wavelet transform. Objective evaluation has been performed to measure the quality of the compressed images. For this evaluation, the results show that the Peak Signal to Noise Ratio (PSNR) which indicates the quality of the compressed image is ranging from (25.89db to 34.35db for abdomen images, 23.26db to 33.3db for brain images and 25.5db to 36.11db for chest images. These values shows that the compression ratio is nearly to 30:1 is acceptable.

Keywords: medical image, Matlab, image compression, wavelet's, objective evaluation

Procedia PDF Downloads 270

1471 Investigating the Online Effect of Language on Gesture in Advanced Bilinguals of Two Structurally Different Languages in Comparison to L1 Native Speakers of L2 and Explores Whether Bilinguals Will Follow Target L2 Patterns in Speech and Co-speech

Authors: Armita Ghobadi, Samantha Emerson, Seyda Ozcaliskan

Abstract:

Being a bilingual involves mastery of both speech and gesture patterns in a second language (L2). We know from earlier work in first language (L1) production contexts that speech and co-speech gesture form a tightly integrated system: co-speech gesture mirrors the patterns observed in speech, suggesting an online effect of language on nonverbal representation of events in gesture during the act of speaking (i.e., “thinking for speaking”). Relatively less is known about the online effect of language on gesture in bilinguals speaking structurally different languages. The few existing studies—mostly with small sample sizes—suggests inconclusive findings: some show greater achievement of L2 patterns in gesture with more advanced L2 speech production, while others show preferences for L1 gesture patterns even in advanced bilinguals. In this study, we focus on advanced bilingual speakers of two structurally different languages (Spanish L1 with English L2) in comparison to L1 English speakers. We ask whether bilingual speakers will follow target L2 patterns not only in speech but also in gesture, or alternatively, follow L2 patterns in speech but resort to L1 patterns in gesture. We examined this question by studying speech and gestures produced by 23 advanced adult Spanish (L1)-English (L2) bilinguals (Mage=22; SD=7) and 23 monolingual English speakers (Mage=20; SD=2). Participants were shown 16 animated motion event scenes that included distinct manner and path components (e.g., "run over the bridge"). We recorded and transcribed all participant responses for speech and segmented it into sentence units that included at least one motion verb and its associated arguments. We also coded all gestures that accompanied each sentence unit. We focused on motion event descriptions as it shows strong crosslinguistic differences in the packaging of motion elements in speech and co-speech gesture in first language production contexts. English speakers synthesize manner and path into a single clause or gesture (he runs over the bridge; running fingers forward), while Spanish speakers express each component separately (manner-only: el corre=he is running; circle arms next to body conveying running; path-only: el cruza el puente=he crosses the bridge; trace finger forward conveying trajectory). We tallied all responses by group and packaging type, separately for speech and co-speech gesture. Our preliminary results (n=4/group) showed that productions in English L1 and Spanish L1 differed, with greater preference for conflated packaging in L1 English and separated packaging in L1 Spanish—a pattern that was also largely evident in co-speech gesture. Bilinguals’ production in L2 English, however, followed the patterns of the target language in speech—with greater preference for conflated packaging—but not in gesture. Bilinguals used separated and conflated strategies in gesture in roughly similar rates in their L2 English, showing an effect of both L1 and L2 on co-speech gesture. Our results suggest that online production of L2 language has more limited effects on L2 gestures and that mastery of native-like patterns in L2 gesture might take longer than native-like L2 speech patterns.

Keywords: bilingualism, cross-linguistic variation, gesture, second language acquisition, thinking for speaking hypothesis

Procedia PDF Downloads 60

1470 Cognitive Semantics Study of Conceptual and Metonymical Expressions in Johnson's Speeches about COVID-19

Authors: Hussain Hameed Mayuuf

Abstract:

The study is an attempt to investigate the conceptual metonymies is used in political discourse about COVID-19. Thus, this study tries to analyze and investigate how the conceptual metonymies in Johnson's speech about coronavirus are constructed. This study aims at: Identifying how are metonymies relevant to understand the messages in Boris Johnson speeches and to find out how can conceptual blending theory help people to understand the messages in the political speech about COVID-19. Lastly, it tries to Point out the kinds of integration networks are common in political speech. The study is based on the hypotheses that conceptual blending theory is a powerful tool for investigating the intended messages in Johnson's speech and there are different processes of blending networks and conceptual mapping that enable the listeners to identify the messages in political speech. This study presents a qualitative and quantitative analysis of four speeches about COVID-19; they are said by Boris Johnson. The selected data have been tackled from the cognitive-semantic perspective by adopting Conceptual Blending Theory as a model for the analysis. It concludes that CBT is applicable to the analysis of metonymies in political discourse. Its mechanisms enable listeners to analyze and understand these speeches. Also the listener can identify and understand the hidden messages in Biden and Johnson's discourse about COVID-19 by using different conceptual networks. Finally, it is concluded that the double scope networks are the most common types of blending of metonymies in the political speech.

Keywords: cognitive, semantics, conceptual, metonymical, Covid-19

Procedia PDF Downloads 104

1469 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses

Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas

Abstract:

We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.

Keywords: transient noise pulses, noise reduction, dynamic time warping, speech recognition

Procedia PDF Downloads 539

1468 Economical Analysis of Optimum Insulation Thickness for HVAC Duct

Authors: D. Kumar, S. Kumar, A. G. Memon, R. A. Memon, K. Harijan

Abstract:

A considerable amount of energy is usually lost due to compression of insulation in Heating, ventilation, and air conditioning (HVAC) duct. In this paper, the economic impact of compression of insulation is estimated. Relevant mathematical models were used to estimate the optimal thickness at the points of compression. Furthermore, the payback period is calculated for the optimal thickness at the critical parts of supply air duct (SAD) and return air duct (RAD) considering natural gas (NG) and liquefied petroleum gas (LPG) as fuels for chillier operation. The mathematical model is developed using preliminary data obtained for an HVAC system of a pharmaceutical company. The higher heat gain and cooling loss, due to compression of thermal insulation, is estimated using relevant heat transfer equations. The results reveal that maximum energy savings (ES) in SAD is 34.5 and 40%, while in RAD is 22.9% and 29% for NG and LPG, respectively. Moreover, the minimum payback period (PP) for SAD is 2 and 1.6years, while in RAD is 4.3 and 2.7years for NG and LPG, respectively. The optimum insulation thickness (OIT) corresponding to maximum ES and minimum PP is estimated to be 35 and 42mm for SAD, while 30 and 38mm for RAD in case of NG and LPG, respectively.

Keywords: optimum insulation thickness, life cycle cost analysis, payback period, HVAC system

Procedia PDF Downloads 196

1467 The Combination of the Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), JITTER and SHIMMER Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech

Authors: Brahim-Fares Zaidi, Malika Boudraa, Sid-Ahmed Selouani

Abstract:

Our work aims to improve our Automatic Recognition System for Dysarthria Speech (ARSDS) based on the Hidden Models of Markov (HMM) and the Hidden Markov Model Toolkit (HTK) to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients (MFCC's) and Perceptual Linear Prediction (PLP's) and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.

Keywords: hidden Markov model toolkit (HTK), hidden models of Markov (HMM), Mel-frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP’s)

Procedia PDF Downloads 139

1466 A Computational Investigation of Knocking Tendency in a Hydrogen-Fueled SI Engine

Authors: Hammam Aljabri, Hong G. Im

Abstract:

Hydrogen is a promising future fuel to support the transition of the energy sector toward carbon neutrality. The direct utilization of H2 in Internal Combustion Engines (ICEs) is possible, and this technology faces mainly two challenges; high NOx emissions and severe knocking at mid to high loads. In this study, we numerically investigated the potential of H2 combustion in a truck-size engine operated in SI mode. To mitigate the knocking nature of H2 combustion, we have focused on studying the effects of three primary parameters; the compression ratio (CR), the air-fuel ratio, and the spark time. The baseline case was set using a CR of 16.5 and an equivalence ratio of 0.35. In simulations, the auto-ignition tendency was evaluated based on the maximum pressure rise rate and the local pressure fluctuations at the monitoring points set along the wall of the combustion chamber. To mitigate the auto-ignition tendency while enabling a wider range of engine operation, the effect of lowering the compression ratio was assessed. The results indicate that by lowering the compression ratio from 16.5:1 to 12.5:1, an indicated thermal efficiency of 47.5% can be achieved. Aiming to restrain the auto-ignition while maintaining good efficiency, a reduction in the equivalence ratio was examined under different compression ratios. The result indicates that higher compression ratios will require lower equivalence ratios, and due to practical limitations, a lower equivalence ratio of 0.25 was set as the limit. Using a compression ratio of 13.5 combined with an equivalence ratio of 0.3 resulted in an indicated thermal efficiency of 48.6%, that is, at a fixed spark time. It is found that under such lean conditions, the incomplete combustion losses and exhaust losses were high. Thus, advancing the spark time was assessed as a possible solution. The results demonstrated the advantages of advancing the spark time, where an indicated thermal efficiency exceeding 50% was achieved using a compression ratio of 14.5:1 and an equivalence ratio of 0.25.

Keywords: hydrogen, combustion, engine knock, SI engine

Procedia PDF Downloads 114

1465 Mechanical Characterization of Brain Tissue in Compression

Authors: Abbas Shafiee, Mohammad Taghi Ahmadian, Maryam Hoviattalab

Abstract:

The biomechanical behavior of brain tissue is needed for predicting the traumatic brain injury (TBI). Each year over 1.5 million people sustain a TBI in the USA. The appropriate coefficients for injury prediction can be evaluated using experimental data. In this study, an experimental setup on brain soft tissue was developed to perform unconfined compression tests at quasistatic strain rates ∈0.0004 s-1 and 0.008 s-1 and 0.4 stress relaxation test under unconfined uniaxial compression with ∈ 0.67 s-1 ramp rate. The fitted visco-hyperelastic parameters were utilized by using obtained stress-strain curves. The experimental data was validated using finite element analysis (FEA) and previous findings. Also, influence of friction coefficient on unconfined compression and relaxation test and effect of ramp rate in relaxation test is investigated. Results of the findings are implemented on the analysis of a human brain under high acceleration due to impact.

Keywords: brain soft tissue, visco-hyperelastic, finite element analysis (FEA), friction, quasistatic strain rate

Procedia PDF Downloads 642

1464 Cultural-Creative Design with Language Figures of Speech

Authors: Wei Chen Chang, Ming Yu Hsiao

Abstract:

The commodity takes one kind of mark, the designer how to construction and interpretation the user how to use the process and effectively convey message in design education has always been an important issue. Cultural-creative design refers to signifying cultural heritage for product design. In terms of Peirce’s Semiotic Triangle: signifying elements-object-interpretant, signifying elements are the outcomes of design, the object is cultural heritage, and the interpretant is the positioning and description of product design. How to elaborate the positioning, design, and development of a product is a narrative issue of the interpretant, and how to shape the signifying elements of a product by modifying and adapting styles is a rhetoric matter. This study investigated the rhetoric of elements signifying products to develop a rhetoric model with cultural style. Figures of speech are a rhetoric method in narrative. By adapting figures of speech to the interpretant, this study developed the rhetoric context of cultural context by narrative means. In this two-phase study, phase I defines figures of speech and phase II analyzes existing cultural-creative products in terms of figures of speech to develop a rhetoric of style model. We expect it can reference for the future development of Cultural-creative design.

Keywords: cultural-creative design, cultural-creative products, figures of speech, Peirce’s semiotic triangle, rhetoric of style model

Procedia PDF Downloads 358

1463 Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech

Authors: Deawan Rakin Ahamed Remal, Sinthia Chowdhury, Sharun Akter Khushbu, Sheak Rashed Haider Noori

Abstract:

Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent.

Keywords: TTR, NSTTR, text to text recognition, deep learning, natural language processing

Procedia PDF Downloads 113

1462 Quantum Cum Synaptic-Neuronal Paradigm and Schema for Human Speech Output and Autism

Authors: Gobinathan Devathasan, Kezia Devathasan

Abstract:

Objective: To improve the current modified Broca-Wernicke-Lichtheim-Kussmaul speech schema and provide insight into autism. Methods: We reviewed the pertinent literature. Current findings, involving Brodmann areas 22, 46, 9,44,45,6,4 are based on neuropathology and functional MRI studies. However, in primary autism, there is no lucid explanation and changes described, whether neuropathology or functional MRI, appear consequential. Findings: We forward an enhanced model which may explain the enigma related to autism. Vowel output is subcortical and does need cortical representation whereas consonant speech is cortical in origin. Left lateralization is needed to commence the circuitry spin as our life have evolved with L-amino acids and left spin of electrons. A fundamental species difference is we are capable of three syllable-consonants and bi-syllable expression whereas cetaceans and songbirds are confined to single or dual consonants. The 4 key sites for speech are superior auditory cortex, Broca’s two areas, and the supplementary motor cortex. Using the Argand’s diagram and Reimann’s projection, we theorize that the Euclidean three dimensional synaptic neuronal circuits of speech are quantized to coherent waves, and then decoherence takes place at area 6 (spherical representation). In this quantum state complex, 3-consonant languages are instantaneously integrated and multiple languages can be learned, verbalized and differentiated. Conclusion: We postulate that evolutionary human speech is elevated to quantum interaction unlike cetaceans and birds to achieve the three consonants/bi-syllable speech. In classical primary autism, the sudden speech switches off and on noted in several cases could now be explained not by any anatomical lesion but failure of coherence. Area 6 projects directly into prefrontal saccadic area (8); and this further explains the second primary feature in autism: lack of eye contact. The third feature which is repetitive finger gestures, located adjacent to the speech/motor areas, are actual attempts to communicate with the autistic child akin to sign language for the deaf.

Keywords: quantum neuronal paradigm, cetaceans and human speech, autism and rapid magnetic stimulation, coherence and decoherence of speech

Procedia PDF Downloads 174

1461 Determination of the Friction Coefficient of AL5754 Alloy by Ring Compression Test: Experimental and Numerical Survey

Authors: P. M. Keshtiban, M. Zadshakoyan

Abstract:

One of the important factors that alter different process and geometrical parameters on metal forming processes is friction between contacting surfaces. Some important factors that effected directly by friction are: stress, strain, required load, wear of surfaces and then geometrical parameters. In order to control friction effects permanent lubrication is necessary. In this article, the friction coefficient is elicited by the most effective method, ring compression tests. The tests were done by both finite element method and practical tests. Different friction curves that extracted by finite element simulations and has good conformity with published results, used for obtaining final friction coefficient. In this study Mos2 is used as the lubricant and Al5754 alloy used as the specimens material.

Keywords: experiment, FEM, friction coefficient, ring compression

Procedia PDF Downloads 444

1460 Performance Analysis of VoIP Coders for Different Modulations Under Pervasive Environment

Authors: Jasbinder Singh, Harjit Pal Singh, S. A. Khan

Abstract:

The work, in this paper, presents the comparison of encoded speech signals by different VoIP narrow-band and wide-band codecs for different modulation schemes. The simulation results indicate that codec has an impact on the speech quality and also effected by modulation schemes.

Keywords: VoIP, coders, modulations, BER, MOS

Procedia PDF Downloads 490

1459 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 63

1458 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition

Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie

Abstract:

In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.

Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks

Procedia PDF Downloads 91

1457 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies

Procedia PDF Downloads 200

1456 Compressed Suffix Arrays to Self-Indexes Based on Partitioned Elias-Fano

Authors: Guo Wenyu, Qu Youli

Abstract:

A practical and simple self-indexing data structure, Partitioned Elias-Fano (PEF) - Compressed Suffix Arrays (CSA), is built in linear time for the CSA based on PEF indexes. Moreover, the PEF-CSA is compared with two classical compressed indexing methods, Ferragina and Manzini implementation (FMI) and Sad-CSA on different type and size files in Pizza & Chili. The PEF-CSA performs better on the existing data in terms of the compression ratio, count, and locates time except for the evenly distributed data such as proteins data. The observations of the experiments are that the distribution of the φ is more important than the alphabet size on the compression ratio. Unevenly distributed data φ makes better compression effect, and the larger the size of the hit counts, the longer the count and locate time.

Keywords: compressed suffix array, self-indexing, partitioned Elias-Fano, PEF-CSA

Procedia PDF Downloads 234

1455 Emotional and Physiological Reaction While Listening the Speech of Adults Who Stutter

Authors: Xharavina V., Gallopeni F., Ahmeti K.

Abstract:

Stuttered speech is filled with intermittent sound prolongations and/or rapid part word repetitions. Oftentimes, these aberrant acoustic behaviors are associated with intermittent physical tension and struggle behaviors such as head jerks, arm jerks, finger tapping, excessive eye-blinks, etc. Additionally, the jarring nature of acoustic and physical manifestations that often accompanies moderate-severe stuttering may induce negative emotional responses in listeners, which alters communication between the person who stutters and their listeners. However, researches for the influence of negative emotions in the communication and for physical reaction are limited. Therefore, to compare psycho-physiological responses of fluent adults, while listening the speech of adults who speak fluency and adults who stutter, are necessary. This study comprises the experimental method, with total of 104 participants (average age-20 years old, SD=2.1), divided into 3 groups. All participants self-reported no impairments in speech, language, or hearing. Exploring the responses of the participants, there were used two records speeches; a voice who speaks fluently and the voice who stutters. Heartbeats and the pulse were measured by the digital blood pressure monitor called 'Tensoval', as a physiological response to the fluent and stuttering sample. Meanwhile, the emotional responses of participants were measured by the self-reporting questionnaire (Steenbarger, 2001). Results showed an increase in heartbeats during the stuttering speech compared with the fluent sample (p < 0.5). The listeners also self-reported themselves as more alive, unhappy, nervous, repulsive, sad, tense, distracted and upset when listening the stuttering words versus the words of the fluent adult (where it was reported to experience positive emotions). These data support the notions that speech with stuttering can bring a psycho-physical reaction to the listeners. Speech pathologists should be aware that listeners show intolerable physiological reactions to stuttering that remain visible over time.

Keywords: emotional, physiological, stuttering, fluent speech

Procedia PDF Downloads 125

1454 Effect of Signal Acquisition Procedure on Imagined Speech Classification Accuracy

Authors: M.R Asghari Bejestani, Gh. R. Mohammad Khani, V.R. Nafisi

Abstract:

Imagined speech recognition is one of the most interesting approaches to BCI development and a lot of works have been done in this area. Many different experiments have been designed and hundreds of combinations of feature extraction methods and classifiers have been examined. Reported classification accuracies range from the chance level to more than 90%. Based on non-stationary nature of brain signals, we have introduced 3 classification modes according to time difference in inter and intra-class samples. The modes can explain the diversity of reported results and predict the range of expected classification accuracies from the brain signal accusation procedure. In this paper, a few samples are illustrated by inspecting results of some previous works.

Keywords: brain computer interface, silent talk, imagined speech, classification, signal processing

Procedia PDF Downloads 134

1453 The Importance of the Historical Approach in the Linguistic Research

Authors: Zoran Spasovski

Abstract:

The paper shortly discusses the significance and the benefits of the historical approach in the research of languages by presenting examples of it in the fields of phonetics and phonology, lexicology, morphology, syntax, and even in the onomastics (toponomy and anthroponomy). The examples from the field of phonetics/phonology include insights into animal speech and its evolution into human speech, the evolution of the sounds of human speech from vocals to glides and consonants and from velar consonants to palatal, etc., on well-known examples of former researchers. Those from the field of lexicology show shortly the formation of the lexemes and their evolution; the morphology and syntax are explained by examples of the development of grammar and syntax forms, and the importance of the historical approach in the research of place-names and personal names is briefly outlined through examples of place-names and personal names and surnames, and the conclusions that come from it, in different languages.

Keywords: animal speech, glotogenesis, grammar forms, lexicology, place-names, personal names, surnames, syntax categories

Procedia PDF Downloads 62

1452 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language

Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim

Abstract:

The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.

Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition

Procedia PDF Downloads 307

1451 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 119

1450 Design of an Active Compression System for Treating Vascular Disease Using a Series of Silicone Based Inflatable Mini Bladders

Authors: Gayani K. Nandasiri, Tilak Dias, William Hurley

Abstract:

Venous disease of human lower limb could range from minor asymptomatic incompetence of venous valves to chronic venous ulceration. The sheer prevalence of varicose veins and its associated significant costs of treating late complications such as chronic ulcers contribute to a higher burden on health care resources. In most of western countries with developed health care systems, treatment costs associated with Venous disease accounts for a considerable portion of their total health care budget, and it has become a high-cost burden to National Health Service (NHS), UK. The established gold standard of treatment for the venous disease is the graduated compression, where the pressure at the ankle being highest and decreasing towards the knee and thigh. Currently, medical practitioners use two main methods to treat venous disease; i.e. compression bandaging and compression stockings. Both these systems have their own disadvantages which lead to the current programme of research. The aim of the present study is to revolutionize the compression therapy by using a novel active compression system to deliver a controllable and more accurate pressure profiles using a series of inflatable mini bladders. Two types of commercially available silicones were tested for the application. The mini bladders were designed with a special fabrication procedure to provide required pressure profiles, and a series of experiments were conducted to characterise the mini bladders. The inflation/deflation heights of these mini bladders were investigated experimentally and using a finite element model (FEM), and the experimental data were compared to the results obtained from FEM simulations, which showed 70-80% agreement. Finally, the mini bladders were tested for its pressure transmittance characteristics, and the results showed a 70-80% of inlet air pressure transmitted onto the treated surface.

Keywords: finite element analysis, graduated compression, inflatable bladders, venous disease

Procedia PDF Downloads 169