Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25

Search results for: spectrogram

25 1D Convolutional Networks to Compute Mel-Spectrogram, Chromagram, and Cochleogram for Audio Networks

Abstract:

Time-frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Cochleogram have been proven more effective and convenient than training on-time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different features. In this paper, we provide a PyTorch framework for creating various spectral features as well as time-frequency transformation and time-domain filter-banks using the built-in trainable conv1d() layer. This allows computing these features on the fly as part of a larger network and enabling easier experimentation with various combinations and parameters. Our work extends the work in the literature developed for that end: First, by adding more of these features and also by allowing the possibility of either starting from initialized kernels or training them from random values. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes or simply use as is and add more layers for various applications.

Keywords: neural networks Mel-Spectrogram, chromagram, cochleogram, discrete Fourrier transform, PyTorch conv1d()

Procedia PDF Downloads 233

24 Slice Bispectrogram Analysis-Based Classification of Environmental Sounds Using Convolutional Neural Network

Authors: Katsumi Hirata

Abstract:

Certain systems can function well only if they recognize the sound environment as humans do. In this research, we focus on sound classification by adopting a convolutional neural network and aim to develop a method that automatically classifies various environmental sounds. Although the neural network is a powerful technique, the performance depends on the type of input data. Therefore, we propose an approach via a slice bispectrogram, which is a third-order spectrogram and is a slice version of the amplitude for the short-time bispectrum. This paper explains the slice bispectrogram and discusses the effectiveness of the derived method by evaluating the experimental results using the ESC‑50 sound dataset. As a result, the proposed scheme gives high accuracy and stability. Furthermore, some relationship between the accuracy and non-Gaussianity of sound signals was confirmed.

Keywords: environmental sound, bispectrum, spectrogram, slice bispectrogram, convolutional neural network

Procedia PDF Downloads 125

23 Musical Instrument Recognition in Polyphonic Audio Through Convolutional Neural Networks and Spectrograms

Authors: Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan

Abstract:

This study investigates the task of identifying musical instruments in polyphonic compositions using Convolutional Neural Networks (CNNs) from spectrogram inputs, focusing on binary classification. The model showed promising results, with an accuracy of 97% on solo instrument recognition. When applied to polyphonic combinations of 1 to 10 instruments, the overall accuracy was 64%, reflecting the increasing challenge with larger ensembles. These findings contribute to the field of Music Information Retrieval (MIR) by highlighting the potential and limitations of current approaches in handling complex musical arrangements. Future work aims to include a broader range of musical sounds, including electronic and synthetic sounds, to improve the model's robustness and applicability in real-time MIR systems.

Keywords: binary classifier, CNN, spectrogram, instrument

Procedia PDF Downloads 77

22 Classification of Coughing and Breathing Activities Using Wearable and a Light-Weight DL Model

Authors: Subham Ghosh, Arnab Nandi

Abstract:

Background: The proliferation of Wireless Body Area Networks (WBAN) and Internet of Things (IoT) applications demonstrates the potential for continuous monitoring of physical changes in the body. These technologies are vital for health monitoring tasks, such as identifying coughing and breathing activities, which are necessary for disease diagnosis and management. Monitoring activities such as coughing and deep breathing can provide valuable insights into a variety of medical issues. Wearable radio-based antenna sensors, which are lightweight and easy to incorporate into clothing or portable goods, provide continuous monitoring. This mobility gives it a substantial advantage over stationary environmental sensors like as cameras and radar, which are constrained to certain places. Furthermore, using compressive techniques provides benefits such as reduced data transmission speeds and memory needs. These wearable sensors offer more advanced and diverse health monitoring capabilities. Methodology: This study analyzes the feasibility of using a semi-flexible antenna operating at 2.4 GHz (ISM band) and positioned around the neck and near the mouth to identify three activities: coughing, deep breathing, and idleness. Vector network analyzer (VNA) is used to collect time-varying complex reflection coefficient data from perturbed antenna nearfield. The reflection coefficient (S11) conveys nuanced information caused by simultaneous variations in the nearfield radiation of three activities across time. The signatures are sparsely represented with gaussian windowed Gabor spectrograms. The Gabor spectrogram is used as a sparse representation approach, which reassigns the ridges of the spectrogram images to improve their resolution and focus on essential components. The antenna is biocompatible in terms of specific absorption rate (SAR). The sparsely represented Gabor spectrogram pictures are fed into a lightweight deep learning (DL) model for feature extraction and classification. Two antenna locations are investigated in order to determine the most effective localization for three different activities. Findings: Cross-validation techniques were used on data from both locations. Due to the complex form of the recorded S11, separate analyzes and assessments were performed on the magnitude, phase, and their combination. The combination of magnitude and phase fared better than the separate analyses. Various sliding window sizes, ranging from 1 to 5 seconds, were tested to find the best window for activity classification. It was discovered that a neck-mounted design was effective at detecting the three unique behaviors.

Keywords: activity recognition, antenna, deep-learning, time-frequency

Procedia PDF Downloads 6

21 Experimental Research and Analyses of Yoruba Native Speakers’ Chinese Phonetic Errors

Authors: Obasa Joshua Ifeoluwa

Abstract:

Phonetics is the foundation and most important part of language learning. This article, through an acoustic experiment as well as using Praat software, uses Yoruba students’ Chinese consonants, vowels, and tones pronunciation to carry out a visual comparison with that of native Chinese speakers. This article is aimed at Yoruba native speakers learning Chinese phonetics; therefore, Yoruba students are selected. The students surveyed are required to be at an elementary level and have learned Chinese for less than six months. The students selected are all undergraduates majoring in Chinese Studies at the University of Lagos. These students have already learned Chinese Pinyin and are all familiar with the pinyin used in the provided questionnaire. The Chinese students selected are those that have passed the level two Mandarin proficiency examination, which serves as an assurance that their pronunciation is standard. It is discovered in this work that in terms of Mandarin’s consonants pronunciation, Yoruba students cannot distinguish between the voiced and voiceless as well as the aspirated and non-aspirated phonetics features. For instance, while pronouncing [ph] it is clearly shown in the spectrogram that the Voice Onset Time (VOT) of a Chinese speaker is higher than that of a Yoruba native speaker, which means that the Yoruba speaker is pronouncing the unaspirated counterpart [p]. Another difficulty is to pronounce some affricates like [tʂ]、[tʂʰ]、[ʂ]、[ʐ]、 [tɕ]、[tɕʰ]、[ɕ]. This is because these sounds are not in the phonetic system of the Yoruba language. In terms of vowels, some students find it difficult to pronounce some allophonic high vowels such as [ɿ] and [ʅ], therefore pronouncing them as their phoneme [i]; another pronunciation error is pronouncing [y] as [u], also as shown in the spectrogram, a student pronounced [y] as [iu]. In terms of tone, it is most difficult for students to differentiate between the second (rising) and third (falling and rising) tones because these tones’ emphasis is on the rising pitch. This work concludes that the major error made by Yoruba students while pronouncing Chinese sounds is caused by the interference of their first language (LI) and sometimes by their lingua franca.

Keywords: Chinese, Yoruba, error analysis, experimental phonetics, consonant, vowel, tone

Procedia PDF Downloads 111

20 Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for the balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Keywords: power spectral density, 3D EEG model, brain balancing, kNN

Procedia PDF Downloads 485

19 Italian Speech Vowels Landmark Detection through the Legacy Tool 'xkl' with Integration of Combined CNNs and RNNs

Authors: Kaleem Kashif, Tayyaba Anam, Yizhi Wu

Abstract:

This paper introduces a methodology for advancing Italian speech vowels landmark detection within the distinctive feature-based speech recognition domain. Leveraging the legacy tool 'xkl' by integrating combined convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the study presents a comprehensive enhancement to the 'xkl' legacy software. This integration incorporates re-assigned spectrogram methodologies, enabling meticulous acoustic analysis. Simultaneously, our proposed model, integrating combined CNNs and RNNs, demonstrates unprecedented precision and robustness in landmark detection. The augmentation of re-assigned spectrogram fusion within the 'xkl' software signifies a meticulous advancement, particularly enhancing precision related to vowel formant estimation. This augmentation catalyzes unparalleled accuracy in landmark detection, resulting in a substantial performance leap compared to conventional methods. The proposed model emerges as a state-of-the-art solution in the distinctive feature-based speech recognition systems domain. In the realm of deep learning, a synergistic integration of combined CNNs and RNNs is introduced, endowed with specialized temporal embeddings, harnessing self-attention mechanisms, and positional embeddings. The proposed model allows it to excel in capturing intricate dependencies within Italian speech vowels, rendering it highly adaptable and sophisticated in the distinctive feature domain. Furthermore, our advanced temporal modeling approach employs Bayesian temporal encoding, refining the measurement of inter-landmark intervals. Comparative analysis against state-of-the-art models reveals a substantial improvement in accuracy, highlighting the robustness and efficacy of the proposed methodology. Upon rigorous testing on a database (LaMIT) speech recorded in a silent room by four Italian native speakers, the landmark detector demonstrates exceptional performance, achieving a 95% true detection rate and a 10% false detection rate. A majority of missed landmarks were observed in proximity to reduced vowels. These promising results underscore the robust identifiability of landmarks within the speech waveform, establishing the feasibility of employing a landmark detector as a front end in a speech recognition system. The synergistic integration of re-assigned spectrogram fusion, CNNs, RNNs, and Bayesian temporal encoding not only signifies a significant advancement in Italian speech vowels landmark detection but also positions the proposed model as a leader in the field. The model offers distinct advantages, including unparalleled accuracy, adaptability, and sophistication, marking a milestone in the intersection of deep learning and distinctive feature-based speech recognition. This work contributes to the broader scientific community by presenting a methodologically rigorous framework for enhancing landmark detection accuracy in Italian speech vowels. The integration of cutting-edge techniques establishes a foundation for future advancements in speech signal processing, emphasizing the potential of the proposed model in practical applications across various domains requiring robust speech recognition systems.

Keywords: landmark detection, acoustic analysis, convolutional neural network, recurrent neural network

Procedia PDF Downloads 62

18 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources

Authors: Mustafa Alhamdi

Abstract:

Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.

Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification

Procedia PDF Downloads 150

17 Characterization of 3D-MRP for Analyzing of Brain Balancing Index (BBI) Pattern

Authors: N. Fuad, M. N. Taib, R. Jailani, M. E. Marwan

Abstract:

This paper discusses on power spectral density (PSD) characteristics which are extracted from three-dimensional (3D) electroencephalogram (EEG) models. The EEG signal recording was conducted on 150 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, the values of maximum PSD were extracted as features from the model. These features are analysed using mean relative power (MRP) and different mean relative power (DMRP) technique to observe the pattern among different brain balancing indexes. The results showed that by implementing these techniques, the pattern of brain balancing indexes can be clearly observed. Some patterns are indicates between index 1 to index 5 for left frontal (LF) and right frontal (RF).

Keywords: power spectral density, 3D EEG model, brain balancing, mean relative power, different mean relative power

Procedia PDF Downloads 473

16 Times2D: A Time-Frequency Method for Time Series Forecasting

Authors: Reza Nematirad, Anil Pahwa, Balasubramaniam Natarajan

Abstract:

Time series data consist of successive data points collected over a period of time. Accurate prediction of future values is essential for informed decision-making in several real-world applications, including electricity load demand forecasting, lifetime estimation of industrial machinery, traffic planning, weather prediction, and the stock market. Due to their critical relevance and wide application, there has been considerable interest in time series forecasting in recent years. However, the proliferation of sensors and IoT devices, real-time monitoring systems, and high-frequency trading data introduce significant intricate temporal variations, rapid changes, noise, and non-linearities, making time series forecasting more challenging. Classical methods such as Autoregressive integrated moving average (ARIMA) and Exponential Smoothing aim to extract pre-defined temporal variations, such as trends and seasonality. While these methods are effective for capturing well-defined seasonal patterns and trends, they often struggle with more complex, non-linear patterns present in real-world time series data. In recent years, deep learning has made significant contributions to time series forecasting. Recurrent Neural Networks (RNNs) and their variants, such as Long short-term memory (LSTMs) and Gated Recurrent Units (GRUs), have been widely adopted for modeling sequential data. However, they often suffer from the locality, making it difficult to capture local trends and rapid fluctuations. Convolutional Neural Networks (CNNs), particularly Temporal Convolutional Networks (TCNs), leverage convolutional layers to capture temporal dependencies by applying convolutional filters along the temporal dimension. Despite their advantages, TCNs struggle with capturing relationships between distant time points due to the locality of one-dimensional convolution kernels. Transformers have revolutionized time series forecasting with their powerful attention mechanisms, effectively capturing long-term dependencies and relationships between distant time points. However, the attention mechanism may struggle to discern dependencies directly from scattered time points due to intricate temporal patterns. Lastly, Multi-Layer Perceptrons (MLPs) have also been employed, with models like N-BEATS and LightTS demonstrating success. Despite this, MLPs often face high volatility and computational complexity challenges in long-horizon forecasting. To address intricate temporal variations in time series data, this study introduces Times2D, a novel framework that parallelly integrates 2D spectrogram and derivative heatmap techniques. The spectrogram focuses on the frequency domain, capturing periodicity, while the derivative patterns emphasize the time domain, highlighting sharp fluctuations and turning points. This 2D transformation enables the utilization of powerful computer vision techniques to capture various intricate temporal variations. To evaluate the performance of Times2D, extensive experiments were conducted on standard time series datasets and compared with various state-of-the-art algorithms, including DLinear (2023), TimesNet (2023), Non-stationary Transformer (2022), PatchTST (2023), N-HiTS (2023), Crossformer (2023), MICN (2023), LightTS (2022), FEDformer (2022), FiLM (2022), SCINet (2022a), Autoformer (2021), and Informer (2021) under the same modeling conditions. The initial results demonstrated that Times2D achieves consistent state-of-the-art performance in both short-term and long-term forecasting tasks. Furthermore, the generality of the Times2D framework allows it to be applied to various tasks such as time series imputation, clustering, classification, and anomaly detection, offering potential benefits in any domain that involves sequential data analysis.

Keywords: derivative patterns, spectrogram, time series forecasting, times2D, 2D representation

Procedia PDF Downloads 41

15 Adsorption of Methylene Blue by Pectin from Durian (Durio zibethinus) Seeds

Authors: Siti Nurkhalimah, Devita Wijiyanti, Kuntari

Abstract:

Methylene blue is a popular water-soluble dye that is used for dyeing a variety of substrates such as bacteria, wool, and silk. Methylene blue discharged into the aquatic environment will cause health problems for living things. Treatment method for industrial wastewater may be divided into three main categories: physical, chemical, and biological. Among them, adsorption technology is generally considered to be an effective method for quickly lowering the concentration of dissolved dyes in a wastewater. This has attracted considerable research into low-cost alternative adsorbents for adsorbing or removing coloring matter. In this research, pectin from durian seeds was utilized here to assess their ability for the removal of methylene blue. Adsorption parameters are contact time and dye concentration were examined in the batch adsorption processes. Pectin characterization was performed by FTIR spectrometry. Methylene blue concentration was determined by using UV-Vis spectrophotometer. FTIR results show that the samples showed the typical fingerprint in IR spectrogram. The adsorption result on 10 mL of 5 mg/L methylene blue solution achieved 95.12% when contact time 10 minutes and pectin 0.2 g.

Keywords: pectin, methylene blue, adsorption, durian seed

Procedia PDF Downloads 184

14 Deep Learning Approaches for Accurate Detection of Epileptic Seizures from Electroencephalogram Data

Authors: Ramzi Rihane, Yassine Benayed

Abstract:

Epilepsy is a chronic neurological disorder characterized by recurrent, unprovoked seizures resulting from abnormal electrical activity in the brain. Timely and accurate detection of these seizures is essential for improving patient care. In this study, we leverage the UK Bonn University open-source EEG dataset and employ advanced deep-learning techniques to automate the detection of epileptic seizures. By extracting key features from both time and frequency domains, as well as Spectrogram features, we enhance the performance of various deep learning models. Our investigation includes architectures such as Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), 1D Convolutional Neural Networks (1D-CNN), and hybrid CNN-LSTM and CNN-BiLSTM models. The models achieved impressive accuracies: LSTM (98.52%), Bi-LSTM (98.61%), CNN-LSTM (98.91%), CNN-BiLSTM (98.83%), and CNN (98.73%). Additionally, we utilized a data augmentation technique called SMOTE, which yielded the following results: CNN (97.36%), LSTM (97.01%), Bi-LSTM (97.23%), CNN-LSTM (97.45%), and CNN-BiLSTM (97.34%). These findings demonstrate the effectiveness of deep learning in capturing complex patterns in EEG signals, providing a reliable and scalable solution for real-time seizure detection in clinical environments.

Keywords: electroencephalogram, epileptic seizure, deep learning, LSTM, CNN, BI-LSTM, seizure detection

Procedia PDF Downloads 12

13 Detection of Atrial Fibrillation Using Wearables via Attentional Two-Stream Heterogeneous Networks

Authors: Huawei Bai, Jianguo Yao, Fellow, IEEE

Abstract:

Atrial fibrillation (AF) is the most common form of heart arrhythmia and is closely associated with mortality and morbidity in heart failure, stroke, and coronary artery disease. The development of single spot optical sensors enables widespread photoplethysmography (PPG) screening, especially for AF, since it represents a more convenient and noninvasive approach. To our knowledge, most existing studies based on public and unbalanced datasets can barely handle the multiple noises sources in the real world and, also, lack interpretability. In this paper, we construct a large- scale PPG dataset using measurements collected from PPG wrist- watch devices worn by volunteers and propose an attention-based two-stream heterogeneous neural network (TSHNN). The first stream is a hybrid neural network consisting of a three-layer one-dimensional convolutional neural network (1D-CNN) and two-layer attention- based bidirectional long short-term memory (Bi-LSTM) network to learn representations from temporally sampled signals. The second stream extracts latent representations from the PPG time-frequency spectrogram using a five-layer CNN. The outputs from both streams are fed into a fusion layer for the outcome. Visualization of the attention weights learned demonstrates the effectiveness of the attention mechanism against noise. The experimental results show that the TSHNN outperforms all the competitive baseline approaches and with 98.09% accuracy, achieves state-of-the-art performance.

Keywords: PPG wearables, atrial fibrillation, feature fusion, attention mechanism, hyber network

Procedia PDF Downloads 120

12 Accentuation Moods of Blaming Utterances in Egyptian Arabic: A Pragmatic Study of Prosodic Focus

Authors: Reda A. H. Mahmoud

Abstract:

This paper investigates the pragmatic meaning of prosodic focus through four accentuation moods of blaming utterances in Egyptian Arabic. Prosodic focus results in various pragmatic meanings when the speaker utters the same blaming expression in different emotional moods: the angry, the mocking, the frustrated, and the informative moods. The main objective of this study is to interpret the meanings of these four accentuation moods in relation to their illocutionary forces and pre-locutionary effects, the integrated features of prosodic focus (e.g., tone movement distributions, pitch accents, lengthening of vowels, deaccentuation of certain syllables/words, and tempo), and the consonance between the former prosodic features and certain lexico-grammatical components to communicate the intentions of the speaker. The data on blaming utterances has been collected via elicitation and pre-recorded material, and the selection of blaming utterances is based on the criteria of lexical and prosodic regularity to be processed and verified by three computer programs, Praat, Speech Analyzer, and Spectrogram Freeware. A dual pragmatic approach is established to interpret expressive blaming utterance and their lexico-grammatical distributions into intonational focus structure units. The pragmatic component of this approach explains the variable psychological attitudes through the expressions of blaming and their effects whereas the analysis of prosodic focus structure is used to describe the intonational contours of blaming utterances and other prosodic features. The study concludes that every accentuation mood has its different prosodic configuration which influences the listener’s interpretation of the pragmatic meanings of blaming utterances.

Keywords: pragmatics, pragmatic interpretation, prosody, prosodic focus

Procedia PDF Downloads 151

11 UEMG-FHR Coupling Analysis in Pregnancies Complicated by Pre-Eclampsia and Small for Gestational Age

Authors: Kun Chen, Yan Wang, Yangyu Zhao, Shufang Li, Lian Chen, Xiaoyue Guo, Jue Zhang, Jing Fang

Abstract:

The coupling strength between uterine electromyography (UEMG) and Fetal heart rate (FHR) signals during peripartum reflects the fetal biophysical activities. Therefore, UEMG-FHR coupling characterization is instructive in assessing placenta function. This study introduced a physiological marker named elevated frequency of UEMG-FHR coupling (E-UFC) and explored its predictive value for pregnancies complicated by pre-eclampsia and small for gestational age (SGA). Placental insufficiency patients (n=12) and healthy volunteers (n=24) were recruited and participated. UEMG and FHR were recorded non-invasively by a trans-abdominal device in women at term with singleton pregnancy (32-37 weeks) from 10:00 pm to 8:00 am. The product of the wavelet coherence and the wavelet cross-spectral power between UEMG and FHR was used to weight these two effects in order to quantify the degree of the UEMG-FHR coupling. E-UFC was exacted from the resultant spectrogram by calculating the mean value of the high-coherence (r > 0.5) frequency band. Results showed the high-coherence between UEMG and FHR was observed in the frequency band (1/512-1/16Hz). In addition, E-UFC in placental insufficiency patients was weaker compared to healthy controls (p < 0.001) at group level. These findings suggested the proposed approach could be used to quantitatively characterize the fetal biophysical activities, which is beneficial for early detection of placental insufficiency and reduces the occurrence of adverse pregnancy.

Keywords: uterine electromyography, fetal heart rate, coupling analysis, wavelet analysis

Procedia PDF Downloads 201

10 Adhesive Based upon Polyvinyl Alcohol And Chemical Modified Oca (Oxalis tuberosa) Starch

Authors: Samantha Borja, Vladimir Valle, Pamela Molina

Abstract:

The development of adhesives from renewable raw materials attracts the attention of the scientific community, due to it promises the reduction of the dependence with materials derived from oil. This work proposes the use of modified 'oca (Oxalis tuberosa)' starch and polyvinyl alcohol (PVA) in the elaboration of adhesives for lignocellulosic substrates. The investigation focused on the formulation of adhesives with 3 different PVA:starch (modified and native) ratios (of 1,0:0,33; 1,0:1,0; 1,0:1,67). The first step to perform it was the chemical modification of starch through acid hydrolysis and a subsequent urea treatment to get carbamate starch. Then, the adhesive obtained was characterized in terms of instantaneous viscosity, Fourier-transform infrared spectroscopy (FTIR) and shear strength. The results showed that viscosity and mechanical tests exhibit data with the same tendency in relation to the native and modified starch concentration. It was observed that the data started to reduce its values to a certain concentration, where the values began to grow. On the other hand, two relevant bands were found in the FTIR spectrogram. The first in 3300 cm⁻¹ of OH group with the same intensity for all the essays and the other one in 2900 cm⁻¹, belonging to the group of alkanes with a different intensity for each adhesive. On the whole, the ratio PVA:starch (1:1) will not favor crosslinking in the adhesive structure and causes the viscosity reduction, whereas, in the others ones, the viscosity is higher. It was also observed that adhesives made with modified starch had better characteristics, but the adhesives with high concentrations of native starch could equal the properties of the adhesives made with low concentrations of modified starch.

Keywords: polyvinyl alcohol, PVA, chemical modification, starch, FTIR, viscosity, shear strength

Procedia PDF Downloads 153

9 The Relationship between Spindle Sound and Tool Performance in Turning

Authors: N. Seemuang, T. McLeay, T. Slatter

Abstract:

Worn tools have a direct effect on the surface finish and part accuracy. Tool condition monitoring systems have been developed over a long period and used to avoid a loss of productivity resulting from using a worn tool. However, the majority of tool monitoring research has applied expensive sensing systems not suitable for production. In this work, the cutting sound in turning machine was studied using microphone. Machining trials using seven cutting conditions were conducted until the observable flank wear width (FWW) on the main cutting edge exceeded 0.4 mm. The cutting inserts were removed from the tool holder and the flank wear width was measured optically. A microphone with built-in preamplifier was used to record the machining sound of EN24 steel being face turned by a CNC lathe in a wet cutting condition using constant surface speed control. The sound was sampled at 50 kS/s and all sound signals recorded from microphone were transformed into the frequency domain by FFT in order to establish the frequency content in the audio signature that could be then used for tool condition monitoring. The extracted feature from audio signal was compared to the flank wear progression on the cutting inserts. The spectrogram reveals a promising feature, named as ‘spindle noise’, which emits from the main spindle motor of turning machine. The spindle noise frequency was detected at 5.86 kHz of regardless of cutting conditions used on this particular CNC lathe. Varying cutting speed and feed rate have an influence on the magnitude of power spectrum of spindle noise. The magnitude of spindle noise frequency alters in conjunction with the tool wear progression. The magnitude increases significantly in the transition state between steady-state wear and severe wear. This could be used as a warning signal to prepare for tool replacement or adapt cutting parameters to extend tool life.

Keywords: tool wear, flank wear, condition monitoring, spindle noise

Procedia PDF Downloads 338

8 Speech Emotion Recognition: A DNN and LSTM Comparison in Single and Multiple Feature Application

Authors: Thiago Spilborghs Bueno Meyer, Plinio Thomaz Aquino Junior

Abstract:

Through speech, which privileges the functional and interactive nature of the text, it is possible to ascertain the spatiotemporal circumstances, the conditions of production and reception of the discourse, the explicit purposes such as informing, explaining, convincing, etc. These conditions allow bringing the interaction between humans closer to the human-robot interaction, making it natural and sensitive to information. However, it is not enough to understand what is said; it is necessary to recognize emotions for the desired interaction. The validity of the use of neural networks for feature selection and emotion recognition was verified. For this purpose, it is proposed the use of neural networks and comparison of models, such as recurrent neural networks and deep neural networks, in order to carry out the classification of emotions through speech signals to verify the quality of recognition. It is expected to enable the implementation of robots in a domestic environment, such as the HERA robot from the RoboFEI@Home team, which focuses on autonomous service robots for the domestic environment. Tests were performed using only the Mel-Frequency Cepstral Coefficients, as well as tests with several characteristics of Delta-MFCC, spectral contrast, and the Mel spectrogram. To carry out the training, validation and testing of the neural networks, the eNTERFACE’05 database was used, which has 42 speakers from 14 different nationalities speaking the English language. The data from the chosen database are videos that, for use in neural networks, were converted into audios. It was found as a result, a classification of 51,969% of correct answers when using the deep neural network, when the use of the recurrent neural network was verified, with the classification with accuracy equal to 44.09%. The results are more accurate when only the Mel-Frequency Cepstral Coefficients are used for the classification, using the classifier with the deep neural network, and in only one case, it is possible to observe a greater accuracy by the recurrent neural network, which occurs in the use of various features and setting 73 for batch size and 100 training epochs.

Keywords: emotion recognition, speech, deep learning, human-robot interaction, neural networks

Procedia PDF Downloads 169

7 Analyzing the Sound of Space - The Glissando of the Planets and the Spiral Movement on the Sound of Earth, Saturn and Jupiter

Authors: L. Tonia, I. Daglis, W. Kurth

Abstract:

The sound of the universe creates an affinity with the sounds of music. The analysis of the sound of space focuses on the existence of a tone material, the microstructure and macrostructure, and the form of the sound through the signals recorded during the flight of the spacecraft Van Allen Probes and Cassini’s mission. The sound becomes from the frequencies that belong to electromagnetic waves. Plasma Wave Science Instrument and Electric and Magnetic Field Instrument Suite and Integrated Science (EMFISIS) recorded the signals from space. A transformation of that signals to audio gave the opportunity to study and analyze the sound. Due to the fact that the musical tone pitch has a frequency and every electromagnetic wave produces a frequency too, the creation of a musical score, which appears as the sound of space, can give information about the form, the symmetry, and the harmony of the sound. The conversion of space radio emissions to audio provides a number of tone pitches corresponding to the original frequencies. Through the process of these sounds, we have the opportunity to present a music score that “composed” from space. In this score, we can see some basic features associated with the music form, the structure, the tone center of music material, the construction and deconstruction of the sound. The structure, which was built through a harmonic world, includes tone centers, major and minor scales, sequences of chords, and types of cadences. The form of the sound represents the symmetry of a spiral movement not only in micro-structural but also to macro-structural shape. Multiple glissando sounds in linear and polyphonic process of the sound, founded in magnetic fields around Earth, Saturn, and Jupiter, but also a spiral movement appeared on the spectrogram of the sound. Whistles, Auroral Kilometric Radiations, and Chorus emissions reveal movements similar to musical excerpts of works by contemporary composers like Sofia Gubaidulina, Iannis Xenakis, EinojuhamiRautavara.

Keywords: space sound analysis, spiral, space music, analysis

Procedia PDF Downloads 175

6 Robustness of the Deep Chroma Extractor and Locally-Normalized Quarter Tone Filters in Automatic Chord Estimation under Reverberant Conditions

Authors: Luis Alvarado, Victor Poblete, Isaac Gonzalez, Yetzabeth Gonzalez

Abstract:

In MIREX 2016 (http://www.music-ir.org/mirex), the deep neural network (DNN)-Deep Chroma Extractor, proposed by Korzeniowski and Wiedmer, reached the highest score in an audio chord recognition task. In the present paper, this tool is assessed under acoustic reverberant environments and distinct source-microphone distances. The evaluation dataset comprises The Beatles and Queen datasets. These datasets are sequentially re-recorded with a single microphone in a real reverberant chamber at four reverberation times (0 -anechoic-, 1, 2, and 3 s, approximately), as well as four source-microphone distances (32, 64, 128, and 256 cm). It is expected that the performance of the trained DNN will dramatically decrease under these acoustic conditions with signals degraded by room reverberation and distance to the source. Recently, the effect of the bio-inspired Locally-Normalized Cepstral Coefficients (LNCC), has been assessed in a text independent speaker verification task using speech signals degraded by additive noise at different signal-to-noise ratios with variations of recording distance, and it has also been assessed under reverberant conditions with variations of recording distance. LNCC showed a performance so high as the state-of-the-art Mel Frequency Cepstral Coefficient filters. Based on these results, this paper proposes a variation of locally-normalized triangular filters called Locally-Normalized Quarter Tone (LNQT) filters. By using the LNQT spectrogram, robustness improvements of the trained Deep Chroma Extractor are expected, compared with classical triangular filters, and thus compensating the music signal degradation improving the accuracy of the chord recognition system.

Keywords: chord recognition, deep neural networks, feature extraction, music information retrieval

Procedia PDF Downloads 231

5 Duration of Isolated Vowels in Infants with Cochlear Implants

Authors: Paris Binos

Abstract:

The present work investigates developmental aspects of the duration of isolated vowels in infants with normal hearing compared to those who received cochlear implants (CIs) before two years of age. Infants with normal hearing produced shorter vowel duration since this find related with more mature production abilities. First isolated vowels are transparent during the protophonic stage as evidence of an increased motor and linguistic control. Vowel duration is a crucial factor for the transition of prelexical speech to normal adult speech. Despite current knowledge of data for infants with normal hearing more research is needed to unravel productions skills in early implanted children. Thus, isolated vowel productions by two congenitally hearing-impaired Greek infants (implantation ages 1:4-1:11; post-implant ages 0:6-1:3) were recorded and sampled for six months after implantation with a Nucleus-24. The results compared with the productions of three normal hearing infants (chronological ages 0:8-1:1). Vegetative data and vocalizations masked by external noise or sounds were excluded. Participants had no other disabilities and had unknown deafness etiology. Prior to implantation the infants had an average unaided hearing loss of 95-110 dB HL while the post-implantation PTA decreased to 10-38 dB HL. The current research offers a methodology for the processing of the prelinguistic productions based on a combination of acoustical and auditory analyses. Based on the current methodological framework, duration measured through spectrograms based on wideband analysis, from the voicing onset to the end of the vowel. The end marked by two co-occurring events: 1) The onset of aperiodicity with a rapid change in amplitude in the waveform and 2) a loss in formant’s energy. Cut-off levels of significance were set at 0.05 for all tests. Bonferroni post hoc tests indicated that difference was significant between the mean duration of vowels of infants wearing CIs and their normal hearing peers. Thus, the mean vowel duration of CIs measured longer compared to the normal hearing peers (0.000). The current longitudinal findings contribute to the existing data for the performance of children wearing CIs at a very young age and enrich also the data of the Greek language. The above described weakness for CI’s performance is a challenge for future work in speech processing and CI’s processing strategies.

Keywords: cochlear implant, duration, spectrogram, vowel

Procedia PDF Downloads 261

4 Theta-Phase Gamma-Amplitude Coupling as a Neurophysiological Marker in Neuroleptic-Naive Schizophrenia

Authors: Jun Won Kim

Abstract:

Objective: Theta-phase gamma-amplitude coupling (TGC) was used as a novel evidence-based tool to reflect the dysfunctional cortico-thalamic interaction in patients with schizophrenia. However, to our best knowledge, no studies have reported the diagnostic utility of the TGC in the resting-state electroencephalographic (EEG) of neuroleptic-naive patients with schizophrenia compared to healthy controls. Thus, the purpose of this EEG study was to understand the underlying mechanisms in patients with schizophrenia by comparing the TGC at rest between two groups and to evaluate the diagnostic utility of TGC. Method: The subjects included 90 patients with schizophrenia and 90 healthy controls. All patients were diagnosed with schizophrenia according to the criteria of Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) by two independent psychiatrists using semi-structured clinical interviews. Because patients were either drug-naïve (first episode) or had not been taking psychoactive drugs for one month before the study, we could exclude the influence of medications. Five frequency bands were defined for spectral analyses: delta (1–4 Hz), theta (4–8 Hz), slow alpha (8–10 Hz), fast alpha (10–13.5 Hz), beta (13.5–30 Hz), and gamma (30-80 Hz). The spectral power of the EEG data was calculated with fast Fourier Transformation using the 'spectrogram.m' function of the signal processing toolbox in Matlab. An analysis of covariance (ANCOVA) was performed to compare the TGC results between the groups, which were adjusted using a Bonferroni correction (P < 0.05/19 = 0.0026). Receiver operator characteristic (ROC) analysis was conducted to examine the discriminating ability of the TGC data for schizophrenia diagnosis. Results: The patients with schizophrenia showed a significant increase in the resting-state TGC at all electrodes. The delta, theta, slow alpha, fast alpha, and beta powers showed low accuracies of 62.2%, 58.4%, 56.9%, 60.9%, and 59.0%, respectively, in discriminating the patients with schizophrenia from the healthy controls. The ROC analysis performed on the TGC data generated the most accurate result among the EEG measures, displaying an overall classification accuracy of 92.5%. Conclusion: As TGC includes phase, which contains information about neuronal interactions from the EEG recording, TGC is expected to be useful for understanding the mechanisms the dysfunctional cortico-thalamic interaction in patients with schizophrenia. The resting-state TGC value was increased in the patients with schizophrenia compared to that in the healthy controls and had a higher discriminating ability than the other parameters. These findings may be related to the compensatory hyper-arousal patterns of the dysfunctional default-mode network (DMN) in schizophrenia. Further research exploring the association between TGC and medical or psychiatric conditions that may confound EEG signals will help clarify the potential utility of TGC.

Keywords: quantitative electroencephalography (QEEG), theta-phase gamma-amplitude coupling (TGC), schizophrenia, diagnostic utility

Procedia PDF Downloads 141

3 Development of an EEG-Based Real-Time Emotion Recognition System on Edge AI

Authors: James Rigor Camacho, Wansu Lim

Abstract:

Over the last few years, the development of new wearable and processing technologies has accelerated in order to harness physiological data such as electroencephalograms (EEGs) for EEG-based applications. EEG has been demonstrated to be a source of emotion recognition signals with the highest classification accuracy among physiological signals. However, when emotion recognition systems are used for real-time classification, the training unit is frequently left to run offline or in the cloud rather than working locally on the edge. That strategy has hampered research, and the full potential of using an edge AI device has yet to be realized. Edge AI devices are computers with high performance that can process complex algorithms. It is capable of collecting, processing, and storing data on its own. It can also analyze and apply complicated algorithms like localization, detection, and recognition on a real-time application, making it a powerful embedded device. The NVIDIA Jetson series, specifically the Jetson Nano device, was used in the implementation. The cEEGrid, which is integrated to the open-source brain computer-interface platform (OpenBCI), is used to collect EEG signals. An EEG-based real-time emotion recognition system on Edge AI is proposed in this paper. To perform graphical spectrogram categorization of EEG signals and to predict emotional states based on input data properties, machine learning-based classifiers were used. Until the emotional state was identified, the EEG signals were analyzed using the K-Nearest Neighbor (KNN) technique, which is a supervised learning system. In EEG signal processing, after each EEG signal has been received in real-time and translated from time to frequency domain, the Fast Fourier Transform (FFT) technique is utilized to observe the frequency bands in each EEG signal. To appropriately show the variance of each EEG frequency band, power density, standard deviation, and mean are calculated and employed. The next stage is to identify the features that have been chosen to predict emotion in EEG data using the K-Nearest Neighbors (KNN) technique. Arousal and valence datasets are used to train the parameters defined by the KNN technique.Because classification and recognition of specific classes, as well as emotion prediction, are conducted both online and locally on the edge, the KNN technique increased the performance of the emotion recognition system on the NVIDIA Jetson Nano. Finally, this implementation aims to bridge the research gap on cost-effective and efficient real-time emotion recognition using a resource constrained hardware device, like the NVIDIA Jetson Nano. On the cutting edge of AI, EEG-based emotion identification can be employed in applications that can rapidly expand the research and implementation industry's use.

Keywords: edge AI device, EEG, emotion recognition system, supervised learning algorithm, sensors

Procedia PDF Downloads 104

2 Categorical Metadata Encoding Schemes for Arteriovenous Fistula Blood Flow Sound Classification: Scaling Numerical Representations Leads to Improved Performance

Authors: George Zhou, Yunchan Chen, Candace Chien

Abstract:

Kidney replacement therapy is the current standard of care for end-stage renal diseases. In-center or home hemodialysis remains an integral component of the therapeutic regimen. Arteriovenous fistulas (AVF) make up the vascular circuit through which blood is filtered and returned. Naturally, AVF patency determines whether adequate clearance and filtration can be achieved and directly influences clinical outcomes. Our aim was to build a deep learning model for automated AVF stenosis screening based on the sound of blood flow through the AVF. A total of 311 patients with AVF were enrolled in this study. Blood flow sounds were collected using a digital stethoscope. For each patient, blood flow sounds were collected at 6 different locations along the patient’s AVF. The 6 locations are artery, anastomosis, distal vein, middle vein, proximal vein, and venous arch. A total of 1866 sounds were collected. The blood flow sounds are labeled as “patent” (normal) or “stenotic” (abnormal). The labels are validated from concurrent ultrasound. Our dataset included 1527 “patent” and 339 “stenotic” sounds. We show that blood flow sounds vary significantly along the AVF. For example, the blood flow sound is loudest at the anastomosis site and softest at the cephalic arch. Contextualizing the sound with location metadata significantly improves classification performance. How to encode and incorporate categorical metadata is an active area of research1. Herein, we study ordinal (i.e., integer) encoding schemes. The numerical representation is concatenated to the flattened feature vector. We train a vision transformer (ViT) on spectrogram image representations of the sound and demonstrate that using scalar multiples of our integer encodings improves classification performance. Models are evaluated using a 10-fold cross-validation procedure. The baseline performance of our ViT without any location metadata achieves an AuROC and AuPRC of 0.68 ± 0.05 and 0.28 ± 0.09, respectively. Using the following encodings of Artery:0; Arch: 1; Proximal: 2; Middle: 3; Distal 4: Anastomosis: 5, the ViT achieves an AuROC and AuPRC of 0.69 ± 0.06 and 0.30 ± 0.10, respectively. Using the following encodings of Artery:0; Arch: 10; Proximal: 20; Middle: 30; Distal 40: Anastomosis: 50, the ViT achieves an AuROC and AuPRC of 0.74 ± 0.06 and 0.38 ± 0.10, respectively. Using the following encodings of Artery:0; Arch: 100; Proximal: 200; Middle: 300; Distal 400: Anastomosis: 500, the ViT achieves an AuROC and AuPRC of 0.78 ± 0.06 and 0.43 ± 0.11. respectively. Interestingly, we see that using increasing scalar multiples of our integer encoding scheme (i.e., encoding “venous arch” as 1,10,100) results in progressively improved performance. In theory, the integer values do not matter since we are optimizing the same loss function; the model can learn to increase or decrease the weights associated with location encodings and converge on the same solution. However, in the setting of limited data and computation resources, increasing the importance at initialization either leads to faster convergence or helps the model escape a local minimum.

Keywords: arteriovenous fistula, blood flow sounds, metadata encoding, deep learning

Procedia PDF Downloads 86

1 Analysis of Vibration and Shock Levels during Transport and Handling of Bananas within the Post-Harvest Supply Chain in Australia

Authors: Indika Fernando, Jiangang Fei, Roger Stanley, Hossein Enshaei

Abstract:

Delicate produce such as fresh fruits are increasingly susceptible to physiological damage during the essential post-harvest operations such as transport and handling. Vibration and shock during the distribution are identified factors for produce damage within post-harvest supply chains. Mechanical damages caused during transit may significantly diminish the quality of fresh produce which may also result in a substantial wastage. Bananas are one of the staple fruit crops and the most sold supermarket produce in Australia. It is also the largest horticultural industry in the state of Queensland where 95% of the total production of bananas are cultivated. This results in significantly lengthy interstate supply chains where fruits are exposed to prolonged vibration and shocks. This paper is focused on determining the shock and vibration levels experienced by packaged bananas during transit from the farm gate to the retail market. Tri-axis acceleration data were captured by custom made accelerometer based data loggers which were set to a predetermined sampling rate of 400 Hz. The devices recorded data continuously for 96 Hours in the interstate journey of nearly 3000 Km from the growing fields in far north Queensland to the central distribution centre in Melbourne in Victoria. After the bananas were ripened at the ripening facility in Melbourne, the data loggers were used to capture the transport and handling conditions from the central distribution centre to three retail outlets within the outskirts of Melbourne. The quality of bananas were assessed before and after transport at each location along the supply chain. Time series vibration and shock data were used to determine the frequency and the severity of the transient shocks experienced by the packages. Frequency spectrogram was generated to determine the dominant frequencies within each segment of the post-harvest supply chain. Root Mean Square (RMS) acceleration levels were calculated to characterise the vibration intensity during transport. Data were further analysed by Fast Fourier Transform (FFT) and the Power Spectral Density (PSD) profiles were generated to determine the critical frequency ranges. It revealed the frequency range in which the escalated energy levels were transferred to the packages. It was found that the vertical vibration was the highest and the acceleration levels mostly oscillated between ± 1g during transport. Several shock responses were recorded exceeding this range which were mostly attributed to package handling. These detrimental high impact shocks may eventually lead to mechanical damages in bananas such as impact bruising, compression bruising and neck injuries which affect their freshness and visual quality. It was revealed that the frequency range between 0-5 Hz and 15-20 Hz exert an escalated level of vibration energy to the packaged bananas which may result in abrasion damages such as scuffing, fruit rub and blackened rub. Further research is indicated specially in the identified critical frequency ranges to minimise exposure of fruits to the harmful effects of vibration. Improving the handling conditions and also further study on package failure mechanisms when exposed to transient shock excitation will be crucial to improve the visual quality of bananas within the post-harvest supply chain in Australia.

Keywords: bananas, handling, post-harvest, supply chain, shocks, transport, vibration

Procedia PDF Downloads 190