Search results for: speech enhancement
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2053

Search results for: speech enhancement

1873 Image Enhancement of Histological Slides by Using Nonlinear Transfer Function

Authors: D. Suman, B. Nikitha, J. Sarvani, V. Archana

Abstract:

Histological slides provide clinical diagnostic information about the subjects from the ancient times. Even with the advent of high resolution imaging cameras the image tend to have some background noise which makes the analysis complex. A study of the histological slides is done by using a nonlinear transfer function based image enhancement method. The method processes the raw, color images acquired from the biological microscope, which, in general, is associated with background noise. The images usually appearing blurred does not convey the intended information. In this regard, an enhancement method is proposed and implemented on 50 histological slides of human tissue by using nonlinear transfer function method. The histological image is converted into HSV color image. The luminance value of the image is enhanced (V component) because change in the H and S components could change the color balance between HSV components. The HSV image is divided into smaller blocks for carrying out the dynamic range compression by using a linear transformation function. Each pixel in the block is enhanced based on the contrast of the center pixel and its neighborhood. After the processing the V component, the HSV image is transformed into a colour image. The study has shown improvement of the characteristics of the image so that the significant details of the histological images were improved.

Keywords: HSV space, histology, enhancement, image

Procedia PDF Downloads 304
1872 Color Image Enhancement Using Multiscale Retinex and Image Fusion Techniques

Authors: Chang-Hsing Lee, Cheng-Chang Lien, Chin-Chuan Han

Abstract:

In this paper, an edge-strength guided multiscale retinex (EGMSR) approach will be proposed for color image contrast enhancement. In EGMSR, the pixel-dependent weight associated with each pixel in the single scale retinex output image is computed according to the edge strength around this pixel in order to prevent from over-enhancing the noises contained in the smooth dark/bright regions. Further, by fusing together the enhanced results of EGMSR and adaptive multiscale retinex (AMSR), we can get a natural fused image having high contrast and proper tonal rendition. Experimental results on several low-contrast images have shown that our proposed approach can produce natural and appealing enhanced images.

Keywords: image enhancement, multiscale retinex, image fusion, EGMSR

Procedia PDF Downloads 427
1871 Influence of Auditory Visual Information in Speech Perception in Children with Normal Hearing and Cochlear Implant

Authors: Sachin, Shantanu Arya, Gunjan Mehta, Md. Shamim Ansari

Abstract:

The cross-modal influence of visual information on speech perception can be illustrated by the McGurk effect which is an illusion of hearing of syllable /ta/ when a listener listens one syllable, e.g.: /pa/ while watching a synchronized video recording of syllable, /ka/. The McGurk effect is an excellent tool to investigate multisensory integration in speech perception in both normal hearing and hearing impaired populations. As the visual cue is unaffected by noise, individuals with hearing impairment rely more than normal listeners on the visual cues.However, when non congruent visual and auditory cues are processed together, audiovisual interaction seems to occur differently in normal and persons with hearing impairment. Therefore, this study aims to observe the audiovisual interaction in speech perception in Cochlear Implant users compares the same with normal hearing children. Auditory stimuli was routed through calibrated Clinical audiometer in sound field condition, and visual stimuli were presented on laptop screen placed at a distance of 1m at 0 degree azimuth. Out of 4 presentations, if 3 responses were a fusion, then McGurk effect was considered to be present. The congruent audiovisual stimuli /pa/ /pa/ and /ka/ /ka/ were perceived correctly as ‘‘pa’’ and ‘‘ka,’’ respectively by both the groups. For the non- congruent stimuli /da/ /pa/, 23 children out of 35 with normal hearing and 9 children out of 35 with cochlear implant had a fusion of sounds i.e. McGurk effect was present. For the non-congruent stimulus /pa/ /ka/, 25 children out of 35 with normal hearing and 8 children out of 35 with cochlear implant had fusion of sounds.The children who used cochlear implants for less than three years did not exhibit fusion of sound i.e. McGurk effect was absent in this group of children. To conclude, the results demonstrate that consistent fusion of visual with auditory information for speech perception is shaped by experience with bimodal spoken language during early life. When auditory experience with speech is mediated by cochlear implant, the likelihood of acquiring bimodal fusion is increased and it greatly depends on the age of implantation. All the above results strongly support the need for screening children for hearing capabilities and providing cochlear implants and aural rehabilitation as early as possible.

Keywords: cochlear implant, congruent stimuli, mcgurk effect, non-congruent stimuli

Procedia PDF Downloads 277
1870 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

Authors: Sanja Seljan, Ivan Dunđer

Abstract:

The paper presents combined automatic speech recognition (ASR) for English and machine translation (MT) for English and Croatian in the domain of business correspondence. The first part presents results of training the ASR commercial system on two English data sets, enriched by error analysis. The second part presents results of machine translation performed by online tool Google Translate for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Keywords: automatic machine translation, integrated language technologies, quality evaluation, speech recognition

Procedia PDF Downloads 452
1869 Perceiving Casual Speech: A Gating Experiment with French Listeners of L2 English

Authors: Naouel Zoghlami

Abstract:

Spoken-word recognition involves the simultaneous activation of potential word candidates which compete with each other for final correct recognition. In continuous speech, the activation-competition process gets more complicated due to speech reductions existing at word boundaries. Lexical processing is more difficult in L2 than in L1 because L2 listeners often lack phonetic, lexico-semantic, syntactic, and prosodic knowledge in the target language. In this study, we investigate the on-line lexical segmentation hypotheses that French listeners of L2 English form and then revise as subsequent perceptual evidence is revealed. Our purpose is to shed further light on the processes of L2 spoken-word recognition in context and better understand L2 listening difficulties through a comparison of skilled and unskilled reactions at the point where their working hypothesis is rejected. We use a variant of the gating experiment in which subjects transcribe an English sentence presented in increments of progressively greater duration. The spoken sentence was “And this amazing athlete has just broken another world record”, chosen mainly because it included common reductions and phonetic features in English, such as elision and assimilation. Our preliminary results show that there is an important difference in the manner in which proficient and less-proficient L2 listeners handle connected speech. Less-proficient listeners delay recognition of words as they wait for lexical and syntactic evidence to appear in the gates. Further statistical results are currently being undertaken.

Keywords: gating paradigm, spoken word recognition, online lexical segmentation, L2 listening

Procedia PDF Downloads 441
1868 Epistemological and Ethical Dimensions of Current Concepts of Human Resilience in the Neurosciences

Authors: Norbert W. Paul

Abstract:

Since a number of years, scientific interest in human resilience is rapidly increasing especially in psychology and more recently and highly visible in neurobiological research. Concepts of resilience are regularly discussed in the light of liminal experiences and existential challenges in human life. Resilience research is providing both, explanatory models and strategies to promote or foster human resilience. Surprisingly, these approaches attracted little attention so far in philosophy in general and in ethics in particular. This is even more astonishing given the fact that the neurosciences as such have been and still are of major interest to philosophy and ethics and even brought about the specialized field of neuroethics, which, however, is not concerned with concepts of resilience, so far. As a result of the little attention given to the topic of resilience, the whole concept has to date been a philosophically under-theorized. This abstinence of ethics and philosophy in resilience research is lamentable because resilience as a concept as well as resilience interventions based on neurobiological findings do undoubtedly pose philosophical, social and ethical questions. In this paper, we will argue that particular notions of resilience are crossing the sometimes fine line between maintaining a person’s mental health despite the impact of severe psychological or physical adverse events and ethically more debatable discourses of enhancement. While we neither argue for or against enhancement nor re-interpret resilience research and interventions by subsuming them strategies of psychological and/or neuro-enhancement, we encourage those who see social or ethical problems with enhancement technologies should also take a closer look on resilience and the related neurobiological concepts. We will proceed in three steps. In our first step, we will describe the concept of resilience in general and its neurobiological study in particular. Here, we will point out some important differences in the way ‘resilience’ is conceptualized and how neurobiological research understands resilience. In what follows we will try to show that a one-sided concept of resilience – as it is often presented in neurobiological research on resilience – does pose social and ethical problems. Secondly, we will identify and explore the social and ethical challenges of (neurobiological) enhancement. In the last and final step of this paper, we will argue that a one-sided reading of resilience can be understood as latent form of enhancement in transition and poses ethical questions similar to those discussed in relation to other approaches to the biomedical enhancement of humans.

Keywords: resilience, neurosciences, epistemology, bioethics

Procedia PDF Downloads 125
1867 Comparison of Bismuth-Based Nanoparticles as Radiosensitization Agents for Radiotherapy

Authors: Merfat Algethami, Anton Blencowe, Bryce Feltis, Stephen Best, Moshi Geso

Abstract:

Nano-materials with high atomic number atoms have been demonstrated to enhance the effective radiation dose and thus potentially could improve therapeutic efficacy in radiotherapy. The optimal nanoparticulate agents require high X-ray absorption coefficients, low toxicity, and should be cost effective. The focus of our research is the development of a nanoparticle therapeutic agent that can be used in radiotherapy to provide optimal enhancement of the radiation effects on the target. In this study, we used bismuth (Bi) nanoparticles coated with starch and bismuth sulphide nanoparticles (Bi2S3) coated with polyvinylpyrrolidone (PVP). These NPs are of low toxicity and are one of the least expensive heavy metal-based nanoparticles. The aims of this study were to synthesise Bi2S3 and Bi NPs, and examine their cytotoxicity to human lung adenocarcinoma epithelial cells (A549). The dose enhancing effects of NPs on A549 cells were examined at both KV and MV energies. The preliminary results revealed that bismuth based nanoparticles show increased radio-sensitisation of cells, displaying dose enhancement with KV X-ray energies and to a lesser degree for the MV energies. We also observed that Bi NPs generated a greater dose enhancement effect than Bi2S3 NPs in irradiated A549 cells. The maximum Dose Enhancement Factor (DEF) was obtained at lower energy KV range when cells treated with Bi NPs (1.5) compared to the DEF of 1.2 when cells treated with Bi2S3NPs. Less radiation dose enhancement was observed when using high energy MV beam with higher DEF value of Bi NPs treatment (1.26) as compared to 1.06 DEF value with Bi2S3 NPs. The greater dose enhancement was achieved at KV energy range, due the effect of the photoelectric effect which is the dominant process of interaction of X-ray. The cytotoxic effect of Bi NPs on enhancing the X-ray dose was higher due to the higher amount of elemental Bismuth present in Bi NPs compared to Bi2S3 NPs. The results suggest that Bismuth based NPs can be considered as valuable dose enhancing agents when used in clinical applications.

Keywords: A549 lung cancer cells, Bi2S3 nanoparticles, dose enhancement effect, radio-sensitising agents

Procedia PDF Downloads 242
1866 Limiting Freedom of Expression to Fight Radicalization: The 'Silencing' of Terrorists Does Not Always Allow Rights to 'Speak Loudly'

Authors: Arianna Vedaschi

Abstract:

This paper addresses the relationship between freedom of expression, national security and radicalization. Is it still possible to talk about a balance between the first two elements? Or, due to the intrusion of the third, is it more appropriate to consider freedom of expression as “permanently disfigured” by securitarian concerns? In this study, both the legislative and the judicial level are taken into account and the comparative method is employed in order to provide the reader with a complete framework of relevant issues and a workable set of solutions. The analysis moves from the finding according to which the tension between free speech and national security has become a major issue in democratic countries, whose very essence is continuously endangered by the ever-changing and multi-faceted threat of international terrorism. In particular, a change in terrorist groups’ recruiting pattern, attracting more and more people by way of a cutting-edge communicative strategy, often employing sophisticated technology as a radicalization tool, has called on law-makers to modify their approach to dangerous speech. While traditional constitutional and criminal law used to punish speech only if it explicitly and directly incited the commission of a criminal action (“cause-effect” model), so-called glorification offences – punishing mere ideological support for terrorism, often on the web – are becoming commonplace in the comparative scenario. Although this is direct, and even somehow understandable, consequence of the impending terrorist menace, this research shows many problematic issues connected to such a preventive approach. First, from a predominantly theoretical point of view, this trend negatively impacts on the already blurred line between permissible and prohibited speech. Second, from a pragmatic point of view, such legislative tools are not always suitable to keep up with ongoing developments of both terrorist groups and their use of technology. In other words, there is a risk that such measures become outdated even before their application. Indeed, it seems hard to still talk about a proper balance: what was previously clearly perceived as a balancing of values (freedom of speech v. public security) has turned, in many cases, into a hierarchy with security at its apex. In light of these findings, this paper concludes that such a complex issue would perhaps be better dealt with through a combination of policies: not only criminalizing ‘terrorist speech,’ which should be relegated to a last resort tool, but acting at an even earlier stage, i.e., trying to prevent dangerous speech itself. This might be done by promoting social cohesion and the inclusion of minorities, so as to reduce the probability of people considering terrorist groups as a “viable option” to deal with the lack of identification within their social contexts.

Keywords: radicalization, free speech, international terrorism, national security

Procedia PDF Downloads 175
1865 Effect of Timing and Contributing Factors for Early Language Intervention in Toddlers with Repaired Cleft Lip and Palate

Authors: Pushpavathi M., Kavya V., Akshatha V.

Abstract:

Introduction: Cleft lip and palate (CLP) is a congenital condition which hinders effectual communication due to associated speech and language difficulties. Expressive language delay (ELD) is a feature seen in this population which is influenced by factors such as type and severity of CLP, age at surgical and linguistic intervention and also the type and intensity of speech and language therapy (SLT). Since CLP is the most common congenital abnormality seen in Indian children, early intervention is a necessity which plays a critical role in enhancing their speech and language skills. The interaction between the timing of intervention and factors which contribute to effective intervention by caregivers is an area which needs to be explored. Objectives: The present study attempts to determine the effect of timing of intervention on the contributing maternal factors for effective linguistic intervention in toddlers with repaired CLP with respect to the awareness, home training patterns, speech and non-speech behaviors of the mothers. Participants: Thirty six toddlers in the age range of 1 to 4 years diagnosed as ELD secondary to repaired CLP, along with their mothers served as participants. Group I (Early Intervention Group, EIG) included 19 mother-child pairs who came to seek SLT soon after corrective surgery and group II (Delayed Intervention Group, DIG) included 16 mother-child pairs who received SLT after the age of 3 years. Further, the groups were divided into group A, and group B. Group ‘A’ received SLT for 60 sessions by Speech Language Pathologist (SLP), while Group B received SLT for 30 sessions by SLP and 30 sessions only by mother without supervision of SLP. Method: The mothers were enrolled for the Early Language Intervention Program and following this, their awareness about CLP was assessed through the Parental awareness questionnaire. The quality of home training was assessed through Mohite’s Inventory. Subsequently, the speech and non-speech behaviors of the mothers were assessed using a Mother’s behavioral checklist. Detailed counseling and orientation was done to the mothers, and SLT was initiated for toddlers. After 60 sessions of intensive SLT, the questionnaire and checklists were re-administered to find out the changes in scores between the pre- and posttest measurements. Results: The scores obtained under different domains in the awareness questionnaire, Mohite’s inventory and Mothers behavior checklist were tabulated and subjected to statistical analysis. Since the data did not follow normal distribution (i.e. p > 0.05), Mann-Whitney U test was conducted which revealed that there was no significant difference between groups I and II as well as groups A and B. Further, Wilcoxon Signed Rank test revealed that mothers had better awareness regarding issues related to CLP and improved home-training abilities post-orientation (p ≤ 0.05). A statistically significant difference was also noted for speech and non-speech behaviors of the mothers (p ≤ 0.05). Conclusions: Extensive orientation and counseling helped mothers of both EI and DI groups to improve their knowledge about CLP. Intensive SLT using focused stimulation and a parent-implemented approach enabled them to carry out the intervention in an effectual manner.

Keywords: awareness, cleft lip and palate, early language intervention program, home training, orientation, timing of intervention

Procedia PDF Downloads 100
1864 Clinical Profile of Oral Sensory Abilities in Developmental Dysarthria

Authors: Swapna N., Deepthy Ann Joy

Abstract:

One of the major causes of communication disorders in pediatric population is Motor speech disorders. These disorders which affect the motor aspects of speech articulators can have an adverse effect on the communication abilities of children in their developmental period. The motor aspects are dependent on the sensory abilities of children with motor speech disorders. Hence, oral sensorimotor evaluation is an important component in the assessment of children with motor speech disorders. To our knowledge, the importance of oral motor examination has been well established, yet the sensory assessment of the oral structures has received less focus. One of the most common motor speech disorders seen in children is developmental dysarthria. The present study aimed to assess the orosensory aspects in children with developmental dysarthria (CDD). The control group consisted of 240 children in the age range of four and eight years which was divided into four subgroups (4-4.11, 5-5.11, 6-6.11 and 7-7.11 years). The experimental group consisted of 15 children who were diagnosed with developmental dysarthria secondary to cerebral palsy who belonged in the age range of four and eight years. The oro-sensory aspects such as response to touch, temperature, taste, texture, and orofacial sensitivity were evaluated and profiled. For this purpose, the authors used the ‘Oral Sensorimotor Evaluation Protocol- Children’ which was developed by the authors. The oro-sensory section of the protocol was administered and the clinical profile of oro-sensory abilities of typically developing children and CDD was obtained for each of the sensory abilities. The oro-sensory abilities of speech articulators such as lips, tongue, palate, jaw, and cheeks were assessed in detail and scored. The results indicated that experimental group had poorer scores on oro-sensory aspects such as light static touch, kinetic touch, deep pressure, vibration and double simultaneous touch. However, it was also found that the experimental group performed similar to control group on few aspects like temperature, taste, texture and orofacial sensitivity. Apart from the oro-motor abilities which has received utmost interest, the variation in the oro-sensory abilities of experimental and control group is highlighted and discussed in the present study. This emphasizes the need for assessing the oro-sensory abilities in children with developmental dysarthria in addition to oro-motor abilities.

Keywords: cerebral palsy, developmental dysarthria, orosensory assessment, touch

Procedia PDF Downloads 134
1863 Heat Transfer Studies on CNT Nanofluids in a Turbulent Flow Heat Exchanger

Authors: W. Rashmi, M. Khalid, O. Seiksan, R. Saidur, A. F. Ismail

Abstract:

Nanofluids have received much more attention since its discovery. They are believed to be promising coolants in heat transfer applications due to their enhanced thermal conductivity and heat transfer characteristics. In this study, the enhancement in heat transfer of CNT-nanofluids under turbulent flow conditions is investigated experimentally. Carbon nanotube (CNTs) concentration was varied between 0.051-0.085 wt%. The nanofluid suspension was stabilized by gum arabic (GA) through a process of homogenisation and sonication. The flow rates of cold fluid (water) is varied from 1.7-3 L/min and flow rates of the hot fluid is varied between 2-3.5 L/min. Thermal conductivity, density and viscosity of the nanofluids were also measured as a function of temperature and CNT concentration. The experimental results are validated with theoretical correlations for turbulent flow available in the literature. Results showed an enhancement in heat transfer range between 9-67% as a function of temperature and CNT concentration.

Keywords: nanofluids, carbon nanotubes (CNT), heat transfer enhancement, heat transfer

Procedia PDF Downloads 470
1862 The Impact of Speech Style on the Production of Spanish Vowels by Spanish-English Bilinguals and Spanish Monolinguals

Authors: Vivian Franco

Abstract:

There has been a great deal of research about vowel production of second language learners of Spanish, vowel variation across Spanish dialects, and more recently, research related to Spanish heritage speakers’ vowel production based on speech style. However, there is little investigation reported on Spanish heritage speakers’ vowel production in regard to task modality by incorporating own comparison groups of monolinguals and late bilinguals. Thus, the present study investigates the influence of speech style on Spanish heritage speakers’ vowel production by comparing Spanish-English early and late bilinguals and Spanish monolinguals. The study was guided by the following research question: How do early bilinguals (heritage speakers) differ/relate to advanced L2 speakers of Spanish (late bilinguals) and Spanish monolinguals in their vowel quality (acoustic distribution) and quantity (duration) based on speech style? The participants were a total of 11 speakers of Spanish: 7 early Spanish-English bilinguals with a similar linguistic background (simultaneous bilinguals of the second generation); 2 advanced L2 speakers of Spanish; and 2 Spanish monolinguals from Mexico. The study consisted of two tasks. The first one adopted a semi-spontaneous style by a solicited narration of life experiences and a description of a favorite movie with the purpose to collect spontaneous speech. The second task was a reading activity in which the participants read two paragraphs of a Mexican literary essay 'La nuez.' This task aimed to obtain a more controlled speech style. From this study, it can be concluded that early bilinguals and monolinguals show a smaller formant vowel space overall compared to the late bilinguals in both speech styles. In terms of formant values by stress, the early bilinguals and the late bilinguals resembled in the semi-spontaneous speech style as their unstressed vowel space overlapped with that of the unstressed vowels different from the monolinguals who displayed a slightly reduced unstressed vowel space. For the controlled data, the early bilinguals were similar to the monolinguals as their stressed and unstressed vowel spaces overlapped in comparison to the late bilinguals who showed a more clear reduction of unstressed vowel space. In regard to stress, the monolinguals revealed longer vowel duration overall. However, findings of duration by stress showed that the early bilinguals and the monolinguals remained stable with shorter values of unstressed vowels in the semi-spontaneous data and longer duration in the controlled data when compared to the late bilinguals who displayed opposite results. These findings suggest an implication for Spanish heritage speakers and L2 Spanish vowels research as it has been frequently argued that Spanish bilinguals differ from the Spanish monolinguals by their vowel reduction and centralized vowel space influenced by English. However, some Spanish varieties are characterized by vowel reduction especially in certain phonetic contexts so that some vowels present more weakening than others. Consequently, it would not be conclusive to affirm an English influence on the Spanish of these bilinguals.

Keywords: Spanish-English bilinguals, Spanish monolinguals, spontaneous and controlled speech, vowel production.

Procedia PDF Downloads 101
1861 Short Text Classification Using Part of Speech Feature to Analyze Students' Feedback of Assessment Components

Authors: Zainab Mutlaq Ibrahim, Mohamed Bader-El-Den, Mihaela Cocea

Abstract:

Students' textual feedback can hold unique patterns and useful information about learning process, it can hold information about advantages and disadvantages of teaching methods, assessment components, facilities, and other aspects of teaching. The results of analysing such a feedback can form a key point for institutions’ decision makers to advance and update their systems accordingly. This paper proposes a data mining framework for analysing end of unit general textual feedback using part of speech feature (PoS) with four machine learning algorithms: support vector machines, decision tree, random forest, and naive bays. The proposed framework has two tasks: first, to use the above algorithms to build an optimal model that automatically classifies the whole data set into two subsets, one subset is tailored to assessment practices (assessment related), and the other one is the non-assessment related data. Second task to use the same algorithms to build an optimal model for whole data set, and the new data subsets to automatically detect their sentiment. The significance of this paper is to compare the performance of the above four algorithms using part of speech feature to the performance of the same algorithms using n-grams feature. The paper follows Knowledge Discovery and Data Mining (KDDM) framework to construct the classification and sentiment analysis models, which is understanding the assessment domain, cleaning and pre-processing the data set, selecting and running the data mining algorithm, interpreting mined patterns, and consolidating the discovered knowledge. The results of this paper experiments show that both models which used both features performed very well regarding first task. But regarding the second task, models that used part of speech feature has underperformed in comparison with models that used unigrams and bigrams.

Keywords: assessment, part of speech, sentiment analysis, student feedback

Procedia PDF Downloads 106
1860 An Event-Related Potential Investigation of Speech-in-Noise Recognition in Native and Nonnative Speakers of English

Authors: Zahra Fotovatnia, Jeffery A. Jones, Alexandra Gottardo

Abstract:

Speech communication often occurs in environments where noise conceals part of a message. Listeners should compensate for the lack of auditory information by picking up distinct acoustic cues and using semantic and sentential context to recreate the speaker’s intended message. This situation seems to be more challenging in a nonnative than native language. On the other hand, early bilinguals are expected to show an advantage over the late bilingual and monolingual speakers of a language due to their better executive functioning components. In this study, English monolingual speakers were compared with early and late nonnative speakers of English to understand speech in noise processing (SIN) and the underlying neurobiological features of this phenomenon. Auditory mismatch negativities (MMNs) were recorded using a double-oddball paradigm in response to a minimal pair that differed in their middle vowel (beat/bit) at Wilfrid Laurier University in Ontario, Canada. The results did not show any significant structural and electroneural differences across groups. However, vocabulary knowledge correlated positively with performance on tests that measured SIN processing in participants who learned English after age 6. Moreover, their performance on the test negatively correlated with the integral area amplitudes in the left superior temporal gyrus (STG). In addition, the STG was engaged before the inferior frontal gyrus (IFG) in noise-free and low-noise test conditions in all groups. We infer that the pre-attentive processing of words engages temporal lobes earlier than the fronto-central areas and that vocabulary knowledge helps the nonnative perception of degraded speech.

Keywords: degraded speech perception, event-related brain potentials, mismatch negativities, brain regions

Procedia PDF Downloads 70
1859 Using Speech Emotion Recognition as a Longitudinal Biomarker for Alzheimer’s Diseases

Authors: Yishu Gong, Liangliang Yang, Jianyu Zhang, Zhengyu Chen, Sihong He, Xusheng Zhang, Wei Zhang

Abstract:

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects millions of people worldwide and is characterized by cognitive decline and behavioral changes. People living with Alzheimer’s disease often find it hard to complete routine tasks. However, there are limited objective assessments that aim to quantify the difficulty of certain tasks for AD patients compared to non-AD people. In this study, we propose to use speech emotion recognition (SER), especially the frustration level, as a potential biomarker for quantifying the difficulty patients experience when describing a picture. We build an SER model using data from the IEMOCAP dataset and apply the model to the DementiaBank data to detect the AD/non-AD group difference and perform longitudinal analysis to track the AD disease progression. Our results show that the frustration level detected from the SER model can possibly be used as a cost-effective tool for objective tracking of AD progression in addition to the Mini-Mental State Examination (MMSE) score.

Keywords: Alzheimer’s disease, speech emotion recognition, longitudinal biomarker, machine learning

Procedia PDF Downloads 81
1858 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah

Authors: Joy Aworo-Okoroh

Abstract:

Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.

Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah

Procedia PDF Downloads 106
1857 Deep-Learning to Generation of Weights for Image Captioning Using Part-of-Speech Approach

Authors: Tiago do Carmo Nogueira, Cássio Dener Noronha Vinhal, Gélson da Cruz Júnior, Matheus Rudolfo Diedrich Ullmann

Abstract:

Generating automatic image descriptions through natural language is a challenging task. Image captioning is a task that consistently describes an image by combining computer vision and natural language processing techniques. To accomplish this task, cutting-edge models use encoder-decoder structures. Thus, Convolutional Neural Networks (CNN) are used to extract the characteristics of the images, and Recurrent Neural Networks (RNN) generate the descriptive sentences of the images. However, cutting-edge approaches still suffer from problems of generating incorrect captions and accumulating errors in the decoders. To solve this problem, we propose a model based on the encoder-decoder structure, introducing a module that generates the weights according to the importance of the word to form the sentence, using the part-of-speech (PoS). Thus, the results demonstrate that our model surpasses state-of-the-art models.

Keywords: gated recurrent units, caption generation, convolutional neural network, part-of-speech

Procedia PDF Downloads 68
1856 Complications and Outcomes of Cochlear Implantation in Children Younger than 12 Months: A Multicenter Study

Authors: Alimohamad Asghari, Ahmad Daneshi, Mohammad Farhadi, Arash Bayat, Mohammad Ajalloueyan, Marjan Mirsalehi, Mohsen Rajati, Seyed Basir Hashemi, Nader Saki, Ali Omidvari

Abstract:

Evidence suggests that Cochlear Implantation (CI) is a beneficial approach for auditory and speech skills improvement in children with severe to profound hearing loss. However, it remains controversial if implantation in children <12 months is safe and effective compared to older children. The present study aimed to determine whether children's ages affect surgical complications and auditory and speech development. The current multicenter study enrolled 86 children who underwent CI surgery at <12 months of age (group A) and 362 children who underwent implantation between 12 and 24 months of age (group B). The Categories of Auditory Performance (CAP) and Speech Intelligibility Rating (SIR) scores were determined pre-impanation, and "one-year" and "two-year" post-implantation. Four complications (overall rate: 4.65%; three minor) occurred in group A and 12 complications (overall rate: 4.41%; nine minor) occurred in group B. We found no statistically significant difference in the complication rates between the groups (p>0.05). The mean SIR and CAP scores improved over time following CI activation in both groups. However, we did not find significant differences in CAP and SIR scores between the groups across different time points. Cochlear implantation is a safe and efficient procedure in children younger than 12 months, providing substantial auditory and speech benefits comparable to children undergoing implantation at 12 to 24 months of age. Furthermore, surgical complications in younger children are similar to those of children undergoing the CI at an older age.

Keywords: cochlear implant, Infant, complications, outcome

Procedia PDF Downloads 76
1855 English Learning Speech Assistant Speak Application in Artificial Intelligence

Authors: Albatool Al Abdulwahid, Bayan Shakally, Mariam Mohamed, Wed Almokri

Abstract:

Artificial intelligence has infiltrated every part of our life and every field we can think of. With technical developments, artificial intelligence applications are becoming more prevalent. We chose ELSA speak because it is a magnificent example of Artificial intelligent applications, ELSA speak is a smartphone application that is free to download on both IOS and Android smartphones. ELSA speak utilizes artificial intelligence to help non-native English speakers pronounce words and phrases similar to a native speaker, as well as enhance their English skills. It employs speech-recognition technology that aids the application to excel the pronunciation of its users. This remarkable feature distinguishes ELSA from other voice recognition algorithms and increase the efficiency of the application. This study focused on evaluating ELSA speak application, by testing the degree of effectiveness based on survey questions. The results of the questionnaire were variable. The generality of the participants strongly agreed that ELSA has helped them enhance their pronunciation skills. However, a few participants were unconfident about the application’s ability to assist them in their learning journey.

Keywords: ELSA speak application, artificial intelligence, speech-recognition technology, language learning, english pronunciation

Procedia PDF Downloads 73
1854 Thermal Analysis on Heat Transfer Enhancement and Fluid Flow for Al2O3 Water-Ethylene Glycol Nano Fluid in Single PEMFC Mini Channel

Authors: Irnie Zakaria, W. A. N. W. Mohamed, W. H. Azmi

Abstract:

Thermal enhancement of a single mini channel in Proton Exchange Membrane Fuel Cell (PEMFC) cooling plate is numerically investigated. In this study, low concentration of Al2O3 in Water - Ethylene Glycol mixtures is used as coolant in mini channel of carbon graphite plate to mimic the PEMFC cooling plate. A steady and incompressible flow with constant heat flux is assumed in the channel of 1mm x 5mm x 100mm. Nano particle of Al2O3 used ranges from 0.1, 0.3 and 0.5 vol % concentration and then dispersed in 60:40 (water: Ethylene Glycol) mixture. The effect of different flow rates to fluid flow and heat transfer enhancement in Re number range of 20 to 140 was observed. The result showed that heat transfer coefficient was improved by 18.11%, 9.86% and 5.37% for 0.5, 0.3 and 0.1 vol % Al2O3 in 60:40 (water: EG) as compared to base fluid of 60:40 (water: EG). It is also showed that the higher vol % concentration of Al2O3 performed better in term of thermal enhancement but at the expense of higher pumping power required due to increase in pressure drop experienced. Maximum additional pumping power of 0.0012W was required for 0.5 vol % Al2O3 in 60:40 (water: EG) at Re number 140.

Keywords: heat transfer, mini channel, nanofluid, PEMFC

Procedia PDF Downloads 311
1853 Myanmar Consonants Recognition System Based on Lip Movements Using Active Contour Model

Authors: T. Thein, S. Kalyar Myo

Abstract:

Human uses visual information for understanding the speech contents in noisy conditions or in situations where the audio signal is not available. The primary advantage of visual information is that it is not affected by the acoustic noise and cross talk among speakers. Using visual information from the lip movements can improve the accuracy and robustness of automatic speech recognition. However, a major challenge with most automatic lip reading system is to find a robust and efficient method for extracting the linguistically relevant speech information from a lip image sequence. This is a difficult task due to variation caused by different speakers, illumination, camera setting and the inherent low luminance and chrominance contrast between lip and non-lip region. Several researchers have been developing methods to overcome these problems; the one is lip reading. Moreover, it is well known that visual information about speech through lip reading is very useful for human speech recognition system. Lip reading is the technique of a comprehensive understanding of underlying speech by processing on the movement of lips. Therefore, lip reading system is one of the different supportive technologies for hearing impaired or elderly people, and it is an active research area. The need for lip reading system is ever increasing for every language. This research aims to develop a visual teaching method system for the hearing impaired persons in Myanmar, how to pronounce words precisely by identifying the features of lip movement. The proposed research will work a lip reading system for Myanmar Consonants, one syllable consonants (င (Nga)၊ ည (Nya)၊ မ (Ma)၊ လ (La)၊ ၀ (Wa)၊ သ (Tha)၊ ဟ (Ha)၊ အ (Ah) ) and two syllable consonants ( က(Ka Gyi)၊ ခ (Kha Gway)၊ ဂ (Ga Nge)၊ ဃ (Ga Gyi)၊ စ (Sa Lone)၊ ဆ (Sa Lain)၊ ဇ (Za Gwe) ၊ ဒ (Da Dway)၊ ဏ (Na Gyi)၊ န (Na Nge)၊ ပ (Pa Saug)၊ ဘ (Ba Gone)၊ ရ (Ya Gaug)၊ ဠ (La Gyi) ). In the proposed system, there are three subsystems, the first one is the lip localization system, which localizes the lips in the digital inputs. The next one is the feature extraction system, which extracts features of lip movement suitable for visual speech recognition. And the final one is the classification system. In the proposed research, Two Dimensional Discrete Cosine Transform (2D-DCT) and Linear Discriminant Analysis (LDA) with Active Contour Model (ACM) will be used for lip movement features extraction. Support Vector Machine (SVM) classifier is used for finding class parameter and class number in training set and testing set. Then, experiments will be carried out for the recognition accuracy of Myanmar consonants using the only visual information on lip movements which are useful for visual speech of Myanmar languages. The result will show the effectiveness of the lip movement recognition for Myanmar Consonants. This system will help the hearing impaired persons to use as the language learning application. This system can also be useful for normal hearing persons in noisy environments or conditions where they can find out what was said by other people without hearing voice.

Keywords: feature extraction, lip reading, lip localization, Active Contour Model (ACM), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), Two Dimensional Discrete Cosine Transform (2D-DCT)

Procedia PDF Downloads 260
1852 Medical Images Enhancement Using New Dynamic Band Pass Filter

Authors: Abdellatif Baba

Abstract:

In order to facilitate medical images analysis by improving their quality and readability, we present in this paper a new dynamic band pass filter as a general and suitable operator for different types of medical images. Our objective is to enrich the details of any treated medical image to make it sufficiently clear enough to give an understood and simplified meaning even for unspecialized people in the medical domain.

Keywords: medical image enhancement, dynamic band pass filter, analysis improvement

Procedia PDF Downloads 260
1851 Conspiracy Theory in Discussions of the Coronavirus Pandemic in the Gulf Region

Authors: Rasha Salameh

Abstract:

In light of the tense relationship between Saudi Arabia and Iran, this research paper sheds some light on Al-Arabiya’s reporting of Coronavirus in the Gulf. Particularly because most of the cases, in the beginning, were coming from Iran, some programs of this Saudi channel embraced a conspiracy theory. Hate speech has been used in talking about the topic and discussing it. The results of these discussions will be detailed in this paper in percentages with regard to the research sample, which includes five programs on Al-Arabiya channel: ‘DNA’, ‘Marraya’ (Mirrors), ‘Panorama’, ‘Tafaolcom’ (Your Interaction) and the ‘Diplomatic Street’, in the period between January 19, that is, the date of the first case in Iran, and April 10, 2020. The research shows the use of a conspiracy theory in the programs, in addition to some professional violations. The surveyed sample also shows that the matter receded due to the Arab Gulf states' preoccupation with the successively increasing cases that have appeared there since the start of the pandemic. The results indicate that hate speech was present in the sample at a rate of 98.1% and that most of the programs that dealt with the Iranian issue under the Corona pandemic on Al Arabiya used the conspiracy theory at a rate of 75.5%.

Keywords: Al-Arabiya, Iran, Corona, hate speech, conspiracy theory, politicization of the pandemic

Procedia PDF Downloads 103
1850 Reduced Lung Volume: A Possible Cause of Stuttering

Authors: Shantanu Arya, Sachin Sakhuja, Gunjan Mehta, Sanjay Munjal

Abstract:

Stuttering may be defined as a speech disorder affecting the fluency domain of speech and characterized by covert features like word substitution, omittance and circumlocution and overt features like prolongation of sound, syllables and blocks etc. Many etiologies have been postulated to explain stuttering based on various experiments and research. Moreover, Breathlessness has also been reported by many individuals with stuttering for which breathing exercises are generally advised. However, no studies reporting objective evaluation of the pulmonary capacity and further objective assessment of the efficacy of breathing exercises have been conducted. Pulmonary Function Test which evaluates parameters like Forced Vital Capacity, Peak Expiratory Flow Rate, Forced expiratory flow Rate can be used to study the pulmonary behavior of individuals with stuttering. The study aimed: a) To identify speech motor & physiologic behaviours associated with stuttering by administering PFT. b) To recognize possible reasons for an association between speech motor behaviour & stuttering severity. In this regard, PFT tests were administered on individuals who reported signs and symptoms of stuttering and showed abnormal scores on Stuttering Severity Index. Parameters like Forced Vital Capacity, Forced Expiratory Volume, Peak Expiratory Flow Rate (L/min), Forced Expiratory Flow Rate (L/min) were evaluated and correlated with scores of Stuttering Severity Index. Results showed significant decrease in the parameters (lower than normal scores) in individuals with established stuttering. Strong correlation was also found between degree of stuttering and the degree of decrease in the pulmonary volumes. Thus, it is evident that fluent speech requires strong support of lung pressure and requisite volumes. Further research in demonstrating the efficacy of abdominal breathing exercises in this regard is needed.

Keywords: forced expiratory flow rate, forced expiratory volume, forced vital capacity, peak expiratory flow rate, stuttering

Procedia PDF Downloads 243
1849 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 180
1848 The Use of Image Processing Responses Tools Applied to Analysing Bouguer Gravity Anomaly Map (Tangier-Tetuan's Area-Morocco)

Authors: Saad Bakkali

Abstract:

Image processing is a powerful tool for the enhancement of edges in images used in the interpretation of geophysical potential field data. Arial and terrestrial gravimetric surveys were carried out in the region of Tangier-Tetuan. From the observed and measured data of gravity Bouguer gravity anomalies map was prepared. This paper reports the results and interpretations of the transformed maps of Bouguer gravity anomaly of the Tangier-Tetuan area using image processing. Filtering analysis based on classical image process was applied. Operator image process like logarithmic and gamma correction are used. This paper also present the results obtained from this image processing analysis of the enhancement edges of the Bouguer gravity anomaly map of the Tangier-Tetuan zone.

Keywords: bouguer, tangier, filtering, gamma correction, logarithmic enhancement edges

Procedia PDF Downloads 398
1847 Atomic Decomposition Audio Data Compression and Denoising Using Sparse Dictionary Feature Learning

Authors: T. Bryan , V. Kepuska, I. Kostnaic

Abstract:

A method of data compression and denoising is introduced that is based on atomic decomposition of audio data using “basis vectors” that are learned from the audio data itself. The basis vectors are shown to have higher data compression and better signal-to-noise enhancement than the Gabor and gammatone “seed atoms” that were used to generate them. The basis vectors are the input weights of a Sparse AutoEncoder (SAE) that is trained using “envelope samples” of windowed segments of the audio data. The envelope samples are extracted from the audio data by performing atomic decomposition with Gabor or gammatone seed atoms. This process identifies segments of audio data that are locally coherent with the seed atoms. Envelope samples are extracted by identifying locally coherent audio data segments with Gabor or gammatone seed atoms, found by matching pursuit. The envelope samples are formed by taking the kronecker products of the atomic envelopes with the locally coherent data segments. Oracle signal-to-noise ratio (SNR) verses data compression curves are generated for the seed atoms as well as the basis vectors learned from Gabor and gammatone seed atoms. SNR data compression curves are generated for speech signals as well as early American music recordings. The basis vectors are shown to have higher denoising capability for data compression rates ranging from 90% to 99.84% for speech as well as music. Envelope samples are displayed as images by folding the time series into column vectors. This display method is used to compare of the output of the SAE with the envelope samples that produced them. The basis vectors are also displayed as images. Sparsity is shown to play an important role in producing the highest denoising basis vectors.

Keywords: sparse dictionary learning, autoencoder, sparse autoencoder, basis vectors, atomic decomposition, envelope sampling, envelope samples, Gabor, gammatone, matching pursuit

Procedia PDF Downloads 226
1846 Features of Normative and Pathological Realizations of Sibilant Sounds for Computer-Aided Pronunciation Evaluation in Children

Authors: Zuzanna Miodonska, Michal Krecichwost, Pawel Badura

Abstract:

Sigmatism (lisping) is a speech disorder in which sibilant consonants are mispronounced. The diagnosis of this phenomenon is usually based on the auditory assessment. However, the progress in speech analysis techniques creates a possibility of developing computer-aided sigmatism diagnosis tools. The aim of the study is to statistically verify whether specific acoustic features of sibilant sounds may be related to pronunciation correctness. Such knowledge can be of great importance while implementing classifiers and designing novel tools for automatic sibilants pronunciation evaluation. The study covers analysis of various speech signal measures, including features proposed in the literature for the description of normative sibilants realization. Amplitudes and frequencies of three fricative formants (FF) are extracted based on local spectral maxima of the friction noise. Skewness, kurtosis, four normalized spectral moments (SM) and 13 mel-frequency cepstral coefficients (MFCC) with their 1st and 2nd derivatives (13 Delta and 13 Delta-Delta MFCC) are included in the analysis as well. The resulting feature vector contains 51 measures. The experiments are performed on the speech corpus containing words with selected sibilant sounds (/ʃ, ʒ/) pronounced by 60 preschool children with proper pronunciation or with natural pathologies. In total, 224 /ʃ/ segments and 191 /ʒ/ segments are employed in the study. The Mann-Whitney U test is employed for the analysis of stigmatism and normative pronunciation. Statistically, significant differences are obtained in most of the proposed features in children divided into these two groups at p < 0.05. All spectral moments and fricative formants appear to be distinctive between pathology and proper pronunciation. These metrics describe the friction noise characteristic for sibilants, which makes them particularly promising for the use in sibilants evaluation tools. Correspondences found between phoneme feature values and an expert evaluation of the pronunciation correctness encourage to involve speech analysis tools in diagnosis and therapy of sigmatism. Proposed feature extraction methods could be used in a computer-assisted stigmatism diagnosis or therapy systems.

Keywords: computer-aided pronunciation evaluation, sigmatism diagnosis, speech signal analysis, statistical verification

Procedia PDF Downloads 273
1845 Sliver Nanoparticles Enhanced Visible and Near Infrared Emission of Er³+ Ions Doped Lithium Tungsten Tellurite Glasses

Authors: Sachin Mahajan, Ghizal Ansari

Abstract:

TeO2-WO3-Li2O glass doped erbium ions (1mol %) and embedded silver nanoparticles( Ag NPs) has successfully been prepared by melt quenching technique and increasing the heat-treatment duration. The amorphous nature of the glass is determined by X-ray diffraction method, and the presences of silver nanoparticles are confirmed using Transmission Electron Microscopy analysis. TEM image reveals that the Ag NPs are dispersed homogeneously with average size 18 nm. From the UV-Vis absorption spectra, the surface plasmon resonance (SPR) peaks are detected at 550 and 578 nm. Under 980 nm excitation wavelengths, enhancement of red upconversion fluorescence and near-infrared broadband emission around 1550nm of Er3+ ions doped tellurite glasses containing Ag NPs have been observed. The observed enhancement of Er3+ emission is mainly attributed to the local field effects of Ag NPs causes an intensified electromagnetic field around NPs. For observed enhancement involved mechanisms are discussed.

Keywords: erbium ions, silver nanoparticle, surface plasmon resonance, upconversion emission

Procedia PDF Downloads 563
1844 Performance Analysis of Heterogeneous Cellular Networks with Multiple Connectivity

Authors: Sungkyung Kim, Jee-Hyeon Na, Dong-Seung Kwon

Abstract:

Future mobile networks following 5th generation will be characterized by one thousand times higher gains in capacity; connections for at least one hundred billion devices; user experience capable of extremely low latency and response times. To be close to the capacity requirements and higher reliability, advanced technologies have been studied, such as multiple connectivity, small cell enhancement, heterogeneous networking, and advanced interference and mobility management. This paper is focused on the multiple connectivity in heterogeneous cellular networks. We investigate the performance of coverage and user throughput in several deployment scenarios. Using the stochastic geometry approach, the SINR distributions and the coverage probabilities are derived in case of dual connection. Also, to compare the user throughput enhancement among the deployment scenarios, we calculate the spectral efficiency and discuss our results.

Keywords: heterogeneous networks, multiple connectivity, small cell enhancement, stochastic geometry

Procedia PDF Downloads 298