Search results for: formant
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16

Search results for: formant

16 Automatic Vowel and Consonant's Target Formant Frequency Detection

Authors: Othmane Bouferroum, Malika Boudraa

Abstract:

In this study, a dual exponential model for CV formant transition is derived from locus theory of speech perception. Then, an algorithm for automatic vowel and consonant’s target formant frequency detection is developed and tested on real speech. The results show that vowels and consonants are detected through transitions rather than their small stable portions. Also, vowel reduction is clearly observed in our data. These results are confirmed by the observations made in perceptual experiments in the literature.

Keywords: acoustic invariance, coarticulation, formant transition, locus equation

Procedia PDF Downloads 236
15 Acoustic Characteristics of Ḫijaiyaḫ Letters Pronunciation by Indonesian Native Speaker

Authors: Romi Hardiyansyah, Raden Sugeng Joko Sarwono, Agus Samsi

Abstract:

Indonesian people have a mother language but not Arabic. Meanwhile, they must be able to pronounce the Arabic because Islam is the biggest religion in Indonesia. Arabic is composed by ḫijaiyaḫ letters which has its own pronunciation. Sound production process in humans can be divided into three physiological processes, namely: the formation of airflow from the lungs, the change in airflow from the lungs into the sound, and articulation (the modulation/sound setting into a specific sound). Ḫijaiyaḫ letters has its own articulation, some of which seem strange for most people in Indonesia. Those letters come out from the middle and upper throat so that the letters has its own acoustic characteristics. Acoustic characteristics of voice can be observed by source-filter approach that has parameters: pitch, formant, and formant bandwidth. Pitch is the basic tone in every human being. Formant is the resonance frequency of the human voice. Formant bandwidth is the time-width of a formant. After recording the sound from 21 subjects, data is processed by software Praat version 5.3.39. The analysis showed that each pronunciation, syakal (vowel changer), and the place of discharge letters has the same timbre which are determined by third and fourth formant.

Keywords: ḫijaiyaḫ, articulation, pitch, formant, formant bandwidth, timbre

Procedia PDF Downloads 367
14 Numerical Calculation and Analysis of Fine Echo Characteristics of Underwater Hemispherical Cylindrical Shell

Authors: Hongjian Jia

Abstract:

A finite-length cylindrical shell with a spherical cap is a typical engineering approximation model of actual underwater targets. The research on the omni-directional acoustic scattering characteristics of this target model can provide a favorable basis for the detection and identification of actual underwater targets. The elastic resonance characteristics of the target are the results of the comprehensive effect of the target length, shell-thickness ratio and materials. Under the conditions of different materials and geometric dimensions, the coincidence resonance characteristics of the target have obvious differences. Aiming at this problem, this paper obtains the omni-directional acoustic scattering field of the underwater hemispherical cylindrical shell by numerical calculation and studies the influence of target geometric parameters (length, shell-thickness ratio) and material parameters on the coincidence resonance characteristics of the target in turn. The study found that the formant interval is not a stable value and changes with the incident angle. Among them, the formant interval is less affected by the target length and shell-thickness ratio and is significantly affected by the material properties, which is an effective feature for classifying and identifying targets of different materials. The quadratic polynomial is utilized to fully fit the change relationship between the formant interval and the angle. The results show that the three fitting coefficients of the stainless steel and aluminum targets are significantly different, which can be used as an effective feature parameter to characterize the target materials.

Keywords: hemispherical cylindrical shell;, fine echo characteristics;, geometric and material parameters;, formant interval

Procedia PDF Downloads 69
13 The Acoustic Features of Ulu Terengganu Malay Monophthongs

Authors: Siti Nadiah Nuwawi, Roshidah Hassan

Abstract:

Dialect is one of the language variants emerge due to certain factors. One of the distinctive dialects spoken by people in Malaysia is the one spoken by those who reside in the inland area of the East Peninsular Malaysia; Hulu Terengganu, which is known as Ulu Terengganu Malay dialect. This dialect is unique since it possesses ancient elements in its phonology elements, which makes it is hard to be understood by people who come from other states. There is dearth of acoustic studies of the dialect in which this paper aims to attain by describing the quality of the monophthongs found in the dialect instrumentally based on their first and second formant values. The hertz values are observed and recorded from the waveforms and spectrograms depicted in PRAAT version 6.0.43 software. The findings show that Ulu Terengganu Malay speakers produced ten monophthongs namely /ɛ/, /e/, /a/, /ɐ/, /ɞ/, /ɔ/, /i/, /o/, /ɵ/ and /ɘ/ which applauds a few monophthongs suggested by past researchers which were based on auditory impression namely /ɛ/, /e/, /a/, ɔ/, and /i/. It also discovers the other five monophthongs of the dialect which are unknown before namely /ɐ/, /ɞ/, /o/, /ɵ/ and /ɘ/.

Keywords: acoustic analysis, dialect, formant values, monophthongs, Ulu Terengganu Malay

Procedia PDF Downloads 144
12 Comparing Sounds of the Singing Voice

Authors: Christel Elisabeth Bonin

Abstract:

This experiment aims at showing that classical singing and belting have both different singing qualities, but singing with a speaking voice has no singing quality. For this purpose, a singing female voice was recorded on four different tone pitches, singing the vowel ‘a’ by using 3 different kinds of singing - classical trained voice, belting voice and speaking voice. The recordings have been entered in the Software Praat. Then the formants of each recorded tone were compared to each other and put in relationship to the singer’s formant. The visible results are taken as an indicator of comparable sound qualities of a classical trained female voice and a belting female voice concerning the concentration of overtones in F1 to F5 and a lack of sound quality in the speaking voice for singing purpose. The results also show that classical singing and belting are both valuable vocal techniques for singing due to their richness of overtones and that belting is not comparable to shouting or screaming. Singing with a speaking voice in contrast should not be called singing due to the lack of overtones which means by definition that there is no musical tone.

Keywords: formants, overtone, singer’s formant, singing voice, belting, classical singing, singing with the speaking voice

Procedia PDF Downloads 301
11 Motor Speech Profile of Marathi Speaking Adults and Children

Authors: Anindita Banik, Anjali Kant, Aninda Duti Banik, Arun Banik

Abstract:

Speech is a complex, dynamic unique motor activity through which we express thoughts and emotions and respond to and control our environment. The aim was based to compare select Motor Speech parameters and their sub parameters across typical Marathi speaking adults and children. The subjects included a total of 300 divided into Group I, II, III including males and females. Subjects included were reported of no significant medical history and had a rating of 0-1 on GRBAS scale. The recordings were obtained utilizing three stimuli for the acoustic analysis of Diadochokinetic rate (DDK), Second Formant Transition, Voice and Tremor and its sub parameters. And these aforementioned parameters were acoustically analyzed in Motor Speech Profile software in VisiPitch IV. The statistical analyses were done by applying descriptive statistics and Two- Way ANOVA.The results obtained showed statistically significant difference across age groups and gender for the aforementioned parameters and its sub parameters.In DDK, for avp (ms) there was a significant difference only across age groups. However, for avr (/s) there was a significant difference across age groups and gender. It was observed that there was an increase in rate with an increase in age groups. The second formant transition sub parameter F2 magn (Hz) also showed a statistically significant difference across both age groups and gender. There was an increase in mean value with an increase in age. Females had a higher mean when compared to males. For F2 rate (/s) a statistically significant difference was observed across age groups. There was an increase in mean value with increase in age. It was observed for Voice and Tremor MFTR (%) that a statistically significant difference was present across age groups and gender. Also for RATR (Hz) there was statistically significant difference across both age groups and gender. In other words, the values of MFTR and RATR increased with an increase in age. Thus, this study highlights the variation of the motor speech parameters amongst the typical population which would be beneficial for comparison with the individuals with motor speech disorders for assessment and management.

Keywords: adult, children, diadochokinetic rate, second formant transition, tremor, voice

Procedia PDF Downloads 275
10 Imprecise Vowel Articulation in Down Syndrome: An Acoustic Study

Authors: Anitha Naittee Abraham, N. Sreedevi

Abstract:

Individuals with Down syndrome (DS) have relatively better expressive language compared to other individuals with intellectual disabilities. Reduced speech intelligibility is one of the major concerns of this group of individuals due to their anatomical and physiological differences. The study investigated the vowel articulation of Malayalam speaking children with DS in the age range of 5-10 years. The vowel production of 10 children with DS was compared with typically developing children in the same age range. Vowels were extracted from 3 words with the corner vowels /a/, /i/ and /u/ in the word-initial position, using Praat (version 5.3.23) software. Acoustic analysis was based on vowel space area (VSA), Formant centralization ration (FCR) and F2i/F2u. The findings revealed increased formant values for the control group except for F2a and F2u. Also, the experimental group had higher FCR, lower VSA, and F2i/F2u values suggestive of imprecise vowel articulation due to restricted tongue movements. The results of the independent t-test revealed a significant difference in F1a, F2i, F2u, VSA, FCR and F2i/F2u values between the experimental and control group. These findings support the fact that children with DS have imprecise vowel articulation that interferes with the overall speech intelligibility. Hence it is essential to target the oromotor skills to enhance the speech intelligibility which in turn benefit in the social and vocational domains of these individuals.

Keywords: Down syndrome, FCR, vowel articulation, vowel space

Procedia PDF Downloads 149
9 Interaction between Breathiness and Nasality: An Acoustic Analysis

Authors: Pamir Gogoi, Ratree Wayland

Abstract:

This study investigates the acoustic measures of breathiness when coarticulated with nasality. The acoustic correlates of breathiness and nasality that has already been well established after years of empirical research. Some of these acoustic parameters - like low frequency peaks and wider bandwidths- are common for both nasal and breathy voice. Therefore, it is likely that these parameters interact when a sound is coarticulated with breathiness and nasality. This leads to the hypothesis that the acoustic parameters, which usually act as robust cues in differentiating between breathy and modal voice, might not be reliable cues for differentiating between breathy and modal voice when breathiness is coarticulated with nasality. The effect of nasality on the perception of breathiness has been explored in earlier studies using synthesized speech. The results showed that perceptually, nasality and breathiness do interact. The current study investigates if a similar pattern is observed in natural speech. The study is conducted on Marathi, an Indo-Aryan language which has a three-way contrast between nasality and breathiness. That is, there is a phonemic distinction between nasals, breathy voice and breathy-nasals. Voice quality parameters like – H1-H2 (Difference between the amplitude of first and second harmonic), H1-A3 (Difference between the amplitude of first harmonic and third formant, CPP (Cepstral Peak Prominence), HNR (Harmonics to Noise ratio) and B1 (Bandwidth of first formant) were extracted. Statistical models like linear mixed effects regression and Random Forest classifiers show that measures that capture the noise component in the signal- like CPP and HNR- can classify breathy voice from modal voice better than spectral measures when breathy voice is coarticulated with nasality.

Keywords: breathiness, marathi, nasality, voice quality

Procedia PDF Downloads 58
8 A Comparative Study on Vowel Articulation in Malayalam Speaking Children Using Cochlear Implant

Authors: Deepthy Ann Joy, N. Sreedevi

Abstract:

Hearing impairment (HI) at an early age, identified before the onset of language development can reduce the negative effect on speech and language development of children. Early rehabilitation is very important in the improvement of speech production in children with HI. Other than conventional hearing aids, Cochlear Implants are being used in the rehabilitation of children with HI. However, delay in acquisition of speech and language milestones persist in children with Cochlear Implant (CI). Delay in speech milestones are reflected through speech sound errors. These errors reflect the temporal and spectral characteristics of speech. Hence, acoustical analysis of the speech sounds will provide a better representation of speech production skills in children with CI. The present study aimed at investigating the acoustic characteristics of vowels in Malayalam speaking children with a cochlear implant. The participants of the study consisted of 20 Malayalam speaking children in the age range of four and seven years. The experimental group consisted of 10 children with CI, and the control group consisted of 10 typically developing children. Acoustic analysis was carried out for 5 short (/a/, /i/, /u/, /e/, /o/) and 5 long vowels (/a:/, /i:/, /u:/, /e:/, /o:/) in word-initial position. The responses were recorded and analyzed for acoustic parameters such as Vowel duration, Ratio of the duration of a short and long vowel, Formant frequencies (F₁ and F₂) and Formant Centralization Ratio (FCR) computed using the formula (F₂u+F₂a+F₁i+F₁u)/(F₂i+F₁a). Findings of the present study indicated that the values for vowel duration were higher in experimental group compared to the control group for all the vowels except for /u/. Ratio of duration of short and long vowel was also found to be higher in experimental group compared to control group except for /i/. Further F₁ for all vowels was found to be higher in experimental group with variability noticed in F₂ values. FCR was found be higher in experimental group, indicating vowel centralization. Further, the results of independent t-test revealed no significant difference across the parameters in both the groups. It was found that the spectral and temporal measures in children with CI moved towards normal range. The result emphasizes the significance of early rehabilitation in children with hearing impairment. The role of rehabilitation related aspects are also discussed in detail which can be clinically incorporated for the betterment of speech therapeutic services in children with CI.

Keywords: acoustics, cochlear implant, Malayalam, vowels

Procedia PDF Downloads 117
7 Effect of Helium and Sulfur Hexafluoride Gas Inhalation on Voice Resonances

Authors: Pallavi Marathe

Abstract:

Voice is considered to be a unique biometric property of human beings. Unlike other biometric evidence, for example, fingerprints and retina scans, etc., voice can be easily changed or mimicked. The present paper talks about how the inhalation of helium and sulfur hexafluoride (SF6) gas affects the voice formant frequencies that are the resonant frequencies of the vocal tract. Helium gas is low-density gas; hence, the voice travels with a higher speed than that of air. On the other side in SF6 gas voice travels with lower speed than that of air due to its higher density. These results in decreasing the resonant frequencies of voice in helium and increasing in SF6. Results are presented with the help of Praat software, which is used for voice analysis.

Keywords: voice formants, helium, sulfur hexafluoride, gas inhalation

Procedia PDF Downloads 93
6 The Impact of Speech Style on the Production of Spanish Vowels by Spanish-English Bilinguals and Spanish Monolinguals

Authors: Vivian Franco

Abstract:

There has been a great deal of research about vowel production of second language learners of Spanish, vowel variation across Spanish dialects, and more recently, research related to Spanish heritage speakers’ vowel production based on speech style. However, there is little investigation reported on Spanish heritage speakers’ vowel production in regard to task modality by incorporating own comparison groups of monolinguals and late bilinguals. Thus, the present study investigates the influence of speech style on Spanish heritage speakers’ vowel production by comparing Spanish-English early and late bilinguals and Spanish monolinguals. The study was guided by the following research question: How do early bilinguals (heritage speakers) differ/relate to advanced L2 speakers of Spanish (late bilinguals) and Spanish monolinguals in their vowel quality (acoustic distribution) and quantity (duration) based on speech style? The participants were a total of 11 speakers of Spanish: 7 early Spanish-English bilinguals with a similar linguistic background (simultaneous bilinguals of the second generation); 2 advanced L2 speakers of Spanish; and 2 Spanish monolinguals from Mexico. The study consisted of two tasks. The first one adopted a semi-spontaneous style by a solicited narration of life experiences and a description of a favorite movie with the purpose to collect spontaneous speech. The second task was a reading activity in which the participants read two paragraphs of a Mexican literary essay 'La nuez.' This task aimed to obtain a more controlled speech style. From this study, it can be concluded that early bilinguals and monolinguals show a smaller formant vowel space overall compared to the late bilinguals in both speech styles. In terms of formant values by stress, the early bilinguals and the late bilinguals resembled in the semi-spontaneous speech style as their unstressed vowel space overlapped with that of the unstressed vowels different from the monolinguals who displayed a slightly reduced unstressed vowel space. For the controlled data, the early bilinguals were similar to the monolinguals as their stressed and unstressed vowel spaces overlapped in comparison to the late bilinguals who showed a more clear reduction of unstressed vowel space. In regard to stress, the monolinguals revealed longer vowel duration overall. However, findings of duration by stress showed that the early bilinguals and the monolinguals remained stable with shorter values of unstressed vowels in the semi-spontaneous data and longer duration in the controlled data when compared to the late bilinguals who displayed opposite results. These findings suggest an implication for Spanish heritage speakers and L2 Spanish vowels research as it has been frequently argued that Spanish bilinguals differ from the Spanish monolinguals by their vowel reduction and centralized vowel space influenced by English. However, some Spanish varieties are characterized by vowel reduction especially in certain phonetic contexts so that some vowels present more weakening than others. Consequently, it would not be conclusive to affirm an English influence on the Spanish of these bilinguals.

Keywords: Spanish-English bilinguals, Spanish monolinguals, spontaneous and controlled speech, vowel production.

Procedia PDF Downloads 104
5 Still Pictures for Learning Foreign Language Sounds

Authors: Kaoru Tomita

Abstract:

This study explores how visual information helps us to learn foreign language pronunciation. Visual assistance and its effect for learning foreign language have been discussed widely. For example, simplified illustrations in textbooks are used for telling learners which part of the articulation organs are used for pronouncing sounds. Vowels are put into a chart that depicts a vowel space. Consonants are put into a table that contains two axes of place and manner of articulation. When comparing a still picture and a moving picture for visualizing learners’ pronunciation, it becomes clear that the former works better than the latter. The visualization of vowels was applied to class activities in which native and non-native speakers’ English was compared and the learners’ feedback was collected: the positions of six vowels did not scatter as much as they were expected to do. Specifically, two vowels were not discriminated and were arranged very close in the vowel space. It was surprising for the author to find that learners liked analyzing their own pronunciation by linking formant ones and twos on a sheet of paper with a pencil. Even a simple method works well if it leads learners to think about their pronunciation analytically.

Keywords: feedback, pronunciation, visualization, vowel

Procedia PDF Downloads 217
4 Pharyngealization Spread in Ibbi Dialect of Yemeni Arabic: An Acoustic Study

Authors: Fadhl Qutaish

Abstract:

This paper examines the pharyngealization spread in one of the Yemeni Arabic dialects, namely, Ibbi Arabic (IA). It investigates how pharyngealized sounds spread their acoustic features onto the neighboring vowels and change their default features. This feature has been investigated quietly well in MSA but still has to be deeply studied in the different dialect of Arabic which will bring about a clearer picture of the similarities and the differences among these dialects and help in mapping them based on the way this feature is utilized. Though the studies are numerous, no one of them has illustrated how far in the multi-syllabic word the spread can be and whether it takes a steady or gradient manner. This study tries to fill this gap and give a satisfactory explanation of the pharyngealization spread in Ibbi Dialect. This study is the first step towards a larger investigation of the different dialects of Yemeni Arabic in the future. The data recorded are represented in minimal pairs in which the trigger (pharyngealized or the non-pharyngealized sound) is in the initial or final position of monosyllabic and multisyllabic words. A group of 24 words were divided into four groups and repeated three times by three subjects which will yield 216 tokens that are tested and analyzed. The subjects are three male speakers aged between 28 and 31 with no history of neurological, speaking or hearing problems. All of them are bilingual speakers of Arabic and English and native speakers of Ibbi-Dialect. Recordings were done in a sound-proof room and praat software was used for the analysis and coding of the trajectories of F1 and F2 for the low vowel /a/ to see the effect of pharyngealization on the formant trajectory within the same syllable and in other syllables of the same word by comparing the F1 and F2 formants to the non-pharyngealized environment. The results show that pharyngealization spread is gradient (progressively and regressively). The spread is reflected in the gradual raising of F1 as we move closer towards the trigger and the gradual lowering of F2 as well. The results of the F1 mean values in tri-syllabic words when the trigger is word initially show that there is a raise of 37.9 HZ in the first syllable, 26.8HZ in the second syllable and 14.2HZ in the third syllable. F2 mean values undergo a lowering of 239 HZ in the first syllable, 211.7 HZ in the second syllable and 176.5 in the third syllable. This gradual decrease in the difference of F2 values in the non-pharyngealized and pharyngealized context illustrates that the spread is gradient. A similar result was found when the trigger is word-final which proves that the spread is gradient (progressively and regressively.

Keywords: pharyngealization, Yemeni Arabic, Ibbi dialect, pharyngealization spread

Procedia PDF Downloads 194
3 Duration of Isolated Vowels in Infants with Cochlear Implants

Authors: Paris Binos

Abstract:

The present work investigates developmental aspects of the duration of isolated vowels in infants with normal hearing compared to those who received cochlear implants (CIs) before two years of age. Infants with normal hearing produced shorter vowel duration since this find related with more mature production abilities. First isolated vowels are transparent during the protophonic stage as evidence of an increased motor and linguistic control. Vowel duration is a crucial factor for the transition of prelexical speech to normal adult speech. Despite current knowledge of data for infants with normal hearing more research is needed to unravel productions skills in early implanted children. Thus, isolated vowel productions by two congenitally hearing-impaired Greek infants (implantation ages 1:4-1:11; post-implant ages 0:6-1:3) were recorded and sampled for six months after implantation with a Nucleus-24. The results compared with the productions of three normal hearing infants (chronological ages 0:8-1:1). Vegetative data and vocalizations masked by external noise or sounds were excluded. Participants had no other disabilities and had unknown deafness etiology. Prior to implantation the infants had an average unaided hearing loss of 95-110 dB HL while the post-implantation PTA decreased to 10-38 dB HL. The current research offers a methodology for the processing of the prelinguistic productions based on a combination of acoustical and auditory analyses. Based on the current methodological framework, duration measured through spectrograms based on wideband analysis, from the voicing onset to the end of the vowel. The end marked by two co-occurring events: 1) The onset of aperiodicity with a rapid change in amplitude in the waveform and 2) a loss in formant’s energy. Cut-off levels of significance were set at 0.05 for all tests. Bonferroni post hoc tests indicated that difference was significant between the mean duration of vowels of infants wearing CIs and their normal hearing peers. Thus, the mean vowel duration of CIs measured longer compared to the normal hearing peers (0.000). The current longitudinal findings contribute to the existing data for the performance of children wearing CIs at a very young age and enrich also the data of the Greek language. The above described weakness for CI’s performance is a challenge for future work in speech processing and CI’s processing strategies.

Keywords: cochlear implant, duration, spectrogram, vowel

Procedia PDF Downloads 232
2 Recursion, Merge and Event Sequence: A Bio-Mathematical Perspective

Authors: Noury Bakrim

Abstract:

Formalization is indeed a foundational Mathematical Linguistics as demonstrated by the pioneering works. While dialoguing with this frame, we nonetheless propone, in our approach of language as a real object, a mathematical linguistics/biosemiotics defined as a dialectical synthesis between induction and computational deduction. Therefore, relying on the parametric interaction of cycles, rules, and features giving way to a sub-hypothetic biological point of view, we first hypothesize a factorial equation as an explanatory principle within Category Mathematics of the Ergobrain: our computation proposal of Universal Grammar rules per cycle or a scalar determination (multiplying right/left columns of the determinant matrix and right/left columns of the logarithmic matrix) of the transformable matrix for rule addition/deletion and cycles within representational mapping/cycle heredity basing on the factorial example, being the logarithmic exponent or power of rule deletion/addition. It enables us to propone an extension of minimalist merge/label notions to a Language Merge (as a computing principle) within cycle recursion relying on combinatorial mapping of rules hierarchies on external Entax of the Event Sequence. Therefore, to define combinatorial maps as language merge of features and combinatorial hierarchical restrictions (governing, commanding, and other rules), we secondly hypothesize from our results feature/hierarchy exponentiation on graph representation deriving from Gromov's Symbolic Dynamics where combinatorial vertices from Fe are set to combinatorial vertices of Hie and edges from Fe to Hie such as for all combinatorial group, there are restriction maps representing different derivational levels that are subgraphs: the intersection on I defines pullbacks and deletion rules (under restriction maps) then under disjunction edges H such that for the combinatorial map P belonging to Hie exponentiation by intersection there are pullbacks and projections that are equal to restriction maps RM₁ and RM₂. The model will draw on experimental biomathematics as well as structural frames with focus on Amazigh and English (cases from phonology/micro-semantics, Syntax) shift from Structure to event (especially Amazigh formant principle resolving its morphological heterogeneity).

Keywords: rule/cycle addition/deletion, bio-mathematical methodology, general merge calculation, feature exponentiation, combinatorial maps, event sequence

Procedia PDF Downloads 98
1 Italian Speech Vowels Landmark Detection through the Legacy Tool 'xkl' with Integration of Combined CNNs and RNNs

Authors: Kaleem Kashif, Tayyaba Anam, Yizhi Wu

Abstract:

This paper introduces a methodology for advancing Italian speech vowels landmark detection within the distinctive feature-based speech recognition domain. Leveraging the legacy tool 'xkl' by integrating combined convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the study presents a comprehensive enhancement to the 'xkl' legacy software. This integration incorporates re-assigned spectrogram methodologies, enabling meticulous acoustic analysis. Simultaneously, our proposed model, integrating combined CNNs and RNNs, demonstrates unprecedented precision and robustness in landmark detection. The augmentation of re-assigned spectrogram fusion within the 'xkl' software signifies a meticulous advancement, particularly enhancing precision related to vowel formant estimation. This augmentation catalyzes unparalleled accuracy in landmark detection, resulting in a substantial performance leap compared to conventional methods. The proposed model emerges as a state-of-the-art solution in the distinctive feature-based speech recognition systems domain. In the realm of deep learning, a synergistic integration of combined CNNs and RNNs is introduced, endowed with specialized temporal embeddings, harnessing self-attention mechanisms, and positional embeddings. The proposed model allows it to excel in capturing intricate dependencies within Italian speech vowels, rendering it highly adaptable and sophisticated in the distinctive feature domain. Furthermore, our advanced temporal modeling approach employs Bayesian temporal encoding, refining the measurement of inter-landmark intervals. Comparative analysis against state-of-the-art models reveals a substantial improvement in accuracy, highlighting the robustness and efficacy of the proposed methodology. Upon rigorous testing on a database (LaMIT) speech recorded in a silent room by four Italian native speakers, the landmark detector demonstrates exceptional performance, achieving a 95% true detection rate and a 10% false detection rate. A majority of missed landmarks were observed in proximity to reduced vowels. These promising results underscore the robust identifiability of landmarks within the speech waveform, establishing the feasibility of employing a landmark detector as a front end in a speech recognition system. The synergistic integration of re-assigned spectrogram fusion, CNNs, RNNs, and Bayesian temporal encoding not only signifies a significant advancement in Italian speech vowels landmark detection but also positions the proposed model as a leader in the field. The model offers distinct advantages, including unparalleled accuracy, adaptability, and sophistication, marking a milestone in the intersection of deep learning and distinctive feature-based speech recognition. This work contributes to the broader scientific community by presenting a methodologically rigorous framework for enhancing landmark detection accuracy in Italian speech vowels. The integration of cutting-edge techniques establishes a foundation for future advancements in speech signal processing, emphasizing the potential of the proposed model in practical applications across various domains requiring robust speech recognition systems.

Keywords: landmark detection, acoustic analysis, convolutional neural network, recurrent neural network

Procedia PDF Downloads 18