Search results for: automatic spontaneous speech analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27993

Search results for: automatic spontaneous speech analysis

27873 Spontaneous and Posed Smile Detection: Deep Learning, Traditional Machine Learning, and Human Performance

Authors: Liang Wang, Beste F. Yuksel, David Guy Brizan

Abstract:

A computational model of affect that can distinguish between spontaneous and posed smiles with no errors on a large, popular data set using deep learning techniques is presented in this paper. A Long Short-Term Memory (LSTM) classifier, a type of Recurrent Neural Network, is utilized and compared to human classification. Results showed that while human classification (mean of 0.7133) was above chance, the LSTM model was more accurate than human classification and other comparable state-of-the-art systems. Additionally, a high accuracy rate was maintained with small amounts of training videos (70 instances). The derivation of important features to further understand the success of our computational model were analyzed, and it was inferred that thousands of pairs of points within the eyes and mouth are important throughout all time segments in a smile. This suggests that distinguishing between a posed and spontaneous smile is a complex task, one which may account for the difficulty and lower accuracy of human classification compared to machine learning models.

Keywords: affective computing, affect detection, computer vision, deep learning, human-computer interaction, machine learning, posed smile detection, spontaneous smile detection

Procedia PDF Downloads 101
27872 A Profile of the Patients at the Hearing and Speech Clinic at the University of Jordan: A Retrospective Study

Authors: Maisa Haj-Tas, Jehad Alaraifi

Abstract:

The significance of the study: This retrospective study examined the speech and language profiles of patients who received clinical services at the University of Jordan Hearing and Speech Clinic (UJ-HSC) from 2009 to 2014. The UJ-HSC clinic is located in the capital Amman and was established in the late 1990s. It is the first hearing and speech clinic in Jordan and one of first speech and hearing clinics in the Middle East. This clinic provides services to an annual average of 2000 patients who are diagnosed with different communication disorders. Examining the speech and language profiles of patients in this clinic could provide an insight about the most common disorders seen in patients who attend similar clinics in Jordan. It could also provide information about community awareness of the role of speech therapists in the management of speech and language disorders. Methodology: The researchers examined the clinical records of 1140 patients (797 males and 343 females) who received clinical services at the UJ-HSC between the years 2009 and 2014 for the purpose of data analysis for this study. The main variables examined in the study were disorder type and gender. Participants were divided into four age groups: children, adolescents, adults, and older adults. The examined disorders were classified as either speech disorders, language disorders, or dysphagia (i.e., swallowing problems). The disorders were further classified as childhood language impairments, articulation disorders, stuttering, cluttering, voice disorders, aphasia, and dysphagia. Results: The results indicated that the prevalence for language disorders was the highest (50.7%) followed by speech disorders (48.3%), and dysphagia (0.9%). The majority of patients who were seen at the JU-HSC were diagnosed with childhood language impairments (47.3%) followed consecutively by articulation disorders (21.1%), stuttering (16.3%), voice disorders (12.1%), aphasia (2.2%), dysphagia (0.9%), and cluttering (0.2%). As for gender, the majority of patients seen at the clinic were males in all disorders except for voice disorders and cluttering. Discussion: The results of the present study indicate that the majority of examined patients were diagnosed with childhood language impairments. Based on this result, the researchers suggest that there seems to be a high prevalence of childhood language impairments among children in Jordan compared to other types of speech and language disorders. The researchers also suggest that there is a need for further examination of the actual prevalence data on speech and language disorders in Jordan. The fact that many of the children seen at the UJ-HSC were brought to the clinic either as a result of parental concern or teacher referral indicates that there seems to an increased awareness among parents and teachers about the services speech pathologists can provide about assessment and treatment of childhood speech and language disorders. The small percentage of other disorders (i.e., stuttering, cluttering, dysphasia, aphasia, and voice disorders) seen at the UJ-HSC may indicate a little awareness by the local community about the role of speech pathologists in the assessment and treatment of these disorders.

Keywords: clinic, disorders, language, profile, speech

Procedia PDF Downloads 290
27871 Research and Development of Intelligent Cooling Channels Design System

Authors: Q. Niu, X. H. Zhou, W. Liu

Abstract:

The cooling channels of injection mould play a crucial role in determining the productivity of moulding process and the product quality. It’s not a simple task to design high quality cooling channels. In this paper, an intelligent cooling channels design system including automatic layout of cooling channels, interference checking and assembly of accessories is studied. Automatic layout of cooling channels using genetic algorithm is analyzed. Through integrating experience criteria of designing cooling channels, considering the factors such as the mould temperature and interference checking, the automatic layout of cooling channels is implemented. The method of checking interference based on distance constraint algorithm and the function of automatic and continuous assembly of accessories are developed and integrated into the system. Case studies demonstrate the feasibility and practicality of the intelligent design system.

Keywords: injection mould, cooling channel, intelligent design, automatic layout, interference checking

Procedia PDF Downloads 405
27870 A Retrospective Study to Evaluate Verbal Scores of Autistic Children Who Received Hyperbaric Oxygen Therapy

Authors: Tami Peterson

Abstract:

Hyperbaric oxygen therapy (HBOT) has been hypothesized as an effective treatment for increasing verbal language skills in individuals on the autism spectrum. A child’s ability to effectively communicate with peers, parents, and caregivers impacts their level of independence and quality of personal relationships. This retrospective study will compare the speech development of participants aged 2-17 years that received 40 sessions of HBOT at 2.0 ATA to those who had not. Both groups will have a verbal assessment every six months. There were 31 subjects in the HBO group and 32 subjects in the non-HBO group. The statistical analysis will focus on whether hyperbaric oxygen therapy made a significant difference in Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP) or Assessment of Basic Language and Learning Skills (ABLLS) results. The evidence demonstrates a strong correlation between HBOT and an increased change from baseline verbal scores compared to the control group, even in difficult to grasp areas such as spontaneous vocalization. We suggest this is due to the anti-inflammatory effects of hyperbaric oxygen therapy. Neuroinflammation causes hypoperfusion of critical central nervous system areas responsible for the symptoms described within the autism spectrum, such as problems with thought processing, memory, and speech. Decreasing the inflammation allows the brain to function properly, which results in improved verbal scores for the participants that underwent HBOT.

Keywords: assessment of basic language and learning skills, autism spectrum disorder, hyperbaric oxygen therapy, verbal behavior milestones assessment and placement program

Procedia PDF Downloads 180
27869 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah

Authors: Joy Aworo-Okoroh

Abstract:

Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.

Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah

Procedia PDF Downloads 107
27868 Speech Perception by Monolingual and Bilingual Dravidian Speakers under Adverse Listening Conditions

Authors: S. B. Rathna Kumar, Sale Kranthi, Sandya K. Varudhini

Abstract:

The precise perception of spoken language is influenced by several variables, including the listeners’ native language, distance between speaker and listener, reverberation and background noise. When noise is present in an acoustic environment, it masks the speech signal resulting in reduction in the redundancy of the acoustic and linguistic cues of speech. There is strong evidence that bilinguals face difficulty in speech perception for their second language compared with monolingual speakers under adverse listening conditions such as presence of background noise. This difficulty persists even for speakers who are highly proficient in their second language and is greater in those who have learned the second language later in life. The present study aimed to assess the performance of monolingual (Telugu speaking) and bilingual (Tamil as first language and Telugu as second language) speakers on Telugu speech perception task under quiet and noisy environments. The results indicated that both the groups performed similar in both quiet and noisy environments. The findings of the present study are not in accordance with the findings of previous studies which strongly report poorer speech perception in adverse listening conditions such as noise with bilingual speakers for their second language compared with monolinguals.

Keywords: monolingual, bilingual, second language, speech perception, quiet, noise

Procedia PDF Downloads 363
27867 Dual-Channel Multi-Band Spectral Subtraction Algorithm Dedicated to a Bilateral Cochlear Implant

Authors: Fathi Kallel, Ahmed Ben Hamida, Christian Berger-Vachon

Abstract:

In this paper, a Speech Enhancement Algorithm based on Multi-Band Spectral Subtraction (MBSS) principle is evaluated for Bilateral Cochlear Implant (BCI) users. Specifically, dual-channel noise power spectral estimation algorithm using Power Spectral Densities (PSD) and Cross Power Spectral Densities (CPSD) of the observed signals is studied. The enhanced speech signal is obtained using Dual-Channel Multi-Band Spectral Subtraction ‘DC-MBSS’ algorithm. For performance evaluation, objective speech assessment test relying on Perceptual Evaluation of Speech Quality (PESQ) score is performed to fix the optimal number of frequency bands needed in DC-MBSS algorithm. In order to evaluate the speech intelligibility, subjective listening tests are assessed with 3 deafened BCI patients. Experimental results obtained using French Lafon database corrupted by an additive babble noise at different Signal-to-Noise Ratios (SNR) showed that DC-MBSS algorithm improves speech understanding for single and multiple interfering noise sources.

Keywords: speech enhancement, spectral substracion, noise estimation, cochlear impalnt

Procedia PDF Downloads 518
27866 Freedom of Speech, Dissent and the Right to be Governed By Consensus are Inherent Rights Under Classical Islamic Law

Authors: Ziyad Motala

Abstract:

It is often proclaimed by leasers in Muslim majority countries that Islamic Law does not permit dissent against a ruler. This paper will evaluate and discuss freedom of speech and dissent as found in concrete prophetic examples during the time of the Prophet Muhammad. It will further look at the examples and practices during the time of the four Noble Caliphs, the immediate successors to the Prophet Muhammad. It will argue that the positivist position of absolute obedience to a ruler is inconsistent with the prophetic tradition. The examples of the Prophet and his immediate four successors (whose lessons Sunni Islam considers to be a source of Islamic Law) demonstrates among the earliest example of freedom of speech and dissent in human history. That tradition frowned upon an inert and uninvolved citizenry. It will conclude with lessons for modern day Muslim majority countries arguing with empirical evidence that freedom of speech, dissent and the right to be governed by consensus versus coercion are fundamental requisites of Islamic law.

Keywords: islamic law, demoracy, freedom of speech, right to dissent

Procedia PDF Downloads 49
27865 Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks

Authors: B. R. Campomanes-Alvarez, P. Quiros, B. Fernandez

Abstract:

Automatic Speech Recognition (ASR) is a machine-based process of decoding and transcribing oral speech. A typical ASR system receives acoustic input from a speaker or an audio file, analyzes it using algorithms, and produces an output in the form of a text. Some speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian Mixture Models (GMMs) to determine how well each state of each HMM fits a short window of frames of coefficients that represents the acoustic input. Another way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition systems. Acoustic models for state-of-the-art ASR systems are usually training on massive amounts of data. However, audio files with their corresponding transcriptions can be difficult to obtain, especially in the Spanish language. Hence, in the case of these low-resource scenarios, building an ASR model is considered as a complex task due to the lack of labeled data, resulting in an under-trained system. Semi-supervised learning approaches arise as necessary tasks given the high cost of transcribing audio data. The main goal of this proposal is to develop a procedure based on acoustic semi-supervised learning for Spanish ASR systems by using DNNs. This semi-supervised learning approach consists of: (a) Training a seed ASR model with a DNN using a set of audios and their respective transcriptions. A DNN with a one-hidden-layer network was initialized; increasing the number of hidden layers in training, to a five. A refinement, which consisted of the weight matrix plus bias term and a Stochastic Gradient Descent (SGD) training were also performed. The objective function was the cross-entropy criterion. (b) Decoding/testing a set of unlabeled data with the obtained seed model. (c) Selecting a suitable subset of the validated data to retrain the seed model, thereby improving its performance on the target test set. To choose the most precise transcriptions, three confidence scores or metrics, regarding the lattice concept (based on the graph cost, the acoustic cost and a combination of both), was performed as selection technique. The performance of the ASR system will be calculated by means of the Word Error Rate (WER). The test dataset was renewed in order to extract the new transcriptions added to the training dataset. Some experiments were carried out in order to select the best ASR results. A comparison between a GMM-based model without retraining and the DNN proposed system was also made under the same conditions. Results showed that the semi-supervised ASR-model based on DNNs outperformed the GMM-model, in terms of WER, in all tested cases. The best result obtained an improvement of 6% relative WER. Hence, these promising results suggest that the proposed technique could be suitable for building ASR models in low-resource environments.

Keywords: automatic speech recognition, deep neural networks, machine learning, semi-supervised learning

Procedia PDF Downloads 317
27864 Studies on the Spontaneous Reductive Decomposition Behavior of Permanganate in the Water

Authors: Hyun Kyu Lee, Won Zin Oh, June Hyun Kim, Jin Hee Kim, Sang June Choi, Hak Soo Kim

Abstract:

The oxidative dissolution of chromium oxide by manganese oxides including permanganate have been widely studied not only for the chemical decontamination of nuclear power plant, but also for the environmental control of the toxic chromate caused by naturally occurring manganese dioxide. However, little attention has been made for the spontaneous reductive decomposition of permanganate in the water, which is a competing reaction with the oxidation of the chromium oxide by permanganate. The objective of this study is to investigate the spontaneous reductive decomposition behavior of permanganate in the water, depending on the variation of acidity, temperature and concentration. Results of the experiments showed that the permanganate reductive decomposition product is manganese dioxide, and this reaction accompanies with the same molar amount of hydrogen ion consumption. Therefore, at the neutral condition (ex. potassium permanganate solution without acidic chemicals), the permanganate do not reduce by itself at any condition of temperature, concentration within the experimental range. From the results, we confirmed that the oxidation reaction for the permanganate reduction is the water oxidation that is accompanying the oxygen evolution. The experimental results on the reductive decomposition behavior of permanganate in the water also showed that the degree and rate of permanganate reduction increases with the temperature, acidity and concentration. The spontaneous decomposition of the permanganates obtained in the studies would become a good reference to select the operational condition, such as temperature, acidity and concentration, for the chemical decontamination of nuclear power plants.

Keywords: permanganate reduction, spontaneous decomposition, water oxidation, acidity, temperature, permanganate concentration, chemical decontamination, nuclear power plant

Procedia PDF Downloads 315
27863 Formulating a Definition of Hate Speech: From Divergence to Convergence

Authors: Avitus A. Agbor

Abstract:

Numerous incidents, ranging from trivial to catastrophic, do come to mind when one reflects on hate. The victims of these belong to specific identifiable groups within communities. These experiences evoke discussions on Islamophobia, xenophobia, homophobia, anti-Semitism, racism, ethnic hatred, atheism, and other brutal forms of bigotry. Common to all these is an invisible but portent force that drives all of them: hatred. Such hatred is usually fueled by a profound degree of intolerance (to diversity) and the zeal to impose on others their beliefs and practices which they consider to be the conventional norm. More importantly, the perpetuation of these hateful acts is the unfortunate outcome of an overplay of invectives and hate speech which, to a greater extent, cannot be divorced from hate. From a legal perspective, acknowledging the existence of an undeniable link between hate speech and hate is quite easy. However, both within and without legal scholarship, the notion of “hate speech” remains a conundrum: a phrase that is quite easily explained through experiences than propounding a watertight definition that captures the entire essence and nature of what it is. The problem is further compounded by a few factors: first, within the international human rights framework, the notion of hate speech is not used. In limiting the right to freedom of expression, the ICCPR simply excludes specific kinds of speeches (but does not refer to them as hate speech). Regional human rights instruments are not so different, except for the subsequent developments that took place in the European Union in which the notion has been carefully delineated, and now a much clearer picture of what constitutes hate speech is provided. The legal architecture in domestic legal systems clearly shows differences in approaches and regulation: making it more difficult. In short, what may be hate speech in one legal system may very well be acceptable legal speech in another legal system. Lastly, the cornucopia of academic voices on the issue of hate speech exude the divergence thereon. Yet, in the absence of a well-formulated and universally acceptable definition, it is important to consider how hate speech can be defined. Taking an evidence-based approach, this research looks into the issue of defining hate speech in legal scholarship and how and why such a formulation is of critical importance in the prohibition and prosecution of hate speech.

Keywords: hate speech, international human rights law, international criminal law, freedom of expression

Procedia PDF Downloads 39
27862 Design of an Augmented Automatic Choosing Control with Constrained Input by Lyapunov Functions Using Gradient Optimization Automatic Choosing Functions

Authors: Toshinori Nawata

Abstract:

In this paper a nonlinear feedback control called augmented automatic choosing control (AACC) for a class of nonlinear systems with constrained input is presented. When designing the control, a constant term which arises from linearization of a given nonlinear system is treated as a coefficient of a stable zero dynamics. Parameters of the control are suboptimally selected by maximizing the stable region in the sense of Lyapunov with the aid of a genetic algorithm. This approach is applied to a field excitation control problem of power system to demonstrate the splendidness of the AACC. Simulation results show that the new controller can improve performance remarkably well.

Keywords: augmented automatic choosing control, nonlinear control, genetic algorithm, zero dynamics

Procedia PDF Downloads 447
27861 A Comparative Study on Vowel Articulation in Malayalam Speaking Children Using Cochlear Implant

Authors: Deepthy Ann Joy, N. Sreedevi

Abstract:

Hearing impairment (HI) at an early age, identified before the onset of language development can reduce the negative effect on speech and language development of children. Early rehabilitation is very important in the improvement of speech production in children with HI. Other than conventional hearing aids, Cochlear Implants are being used in the rehabilitation of children with HI. However, delay in acquisition of speech and language milestones persist in children with Cochlear Implant (CI). Delay in speech milestones are reflected through speech sound errors. These errors reflect the temporal and spectral characteristics of speech. Hence, acoustical analysis of the speech sounds will provide a better representation of speech production skills in children with CI. The present study aimed at investigating the acoustic characteristics of vowels in Malayalam speaking children with a cochlear implant. The participants of the study consisted of 20 Malayalam speaking children in the age range of four and seven years. The experimental group consisted of 10 children with CI, and the control group consisted of 10 typically developing children. Acoustic analysis was carried out for 5 short (/a/, /i/, /u/, /e/, /o/) and 5 long vowels (/a:/, /i:/, /u:/, /e:/, /o:/) in word-initial position. The responses were recorded and analyzed for acoustic parameters such as Vowel duration, Ratio of the duration of a short and long vowel, Formant frequencies (F₁ and F₂) and Formant Centralization Ratio (FCR) computed using the formula (F₂u+F₂a+F₁i+F₁u)/(F₂i+F₁a). Findings of the present study indicated that the values for vowel duration were higher in experimental group compared to the control group for all the vowels except for /u/. Ratio of duration of short and long vowel was also found to be higher in experimental group compared to control group except for /i/. Further F₁ for all vowels was found to be higher in experimental group with variability noticed in F₂ values. FCR was found be higher in experimental group, indicating vowel centralization. Further, the results of independent t-test revealed no significant difference across the parameters in both the groups. It was found that the spectral and temporal measures in children with CI moved towards normal range. The result emphasizes the significance of early rehabilitation in children with hearing impairment. The role of rehabilitation related aspects are also discussed in detail which can be clinically incorporated for the betterment of speech therapeutic services in children with CI.

Keywords: acoustics, cochlear implant, Malayalam, vowels

Procedia PDF Downloads 117
27860 Multivariate Data Analysis for Automatic Atrial Fibrillation Detection

Authors: Zouhair Haddi, Stephane Delliaux, Jean-Francois Pons, Ismail Kechaf, Jean-Claude De Haro, Mustapha Ouladsine

Abstract:

Atrial fibrillation (AF) has been considered as the most common cardiac arrhythmia, and a major public health burden associated with significant morbidity and mortality. Nowadays, telemedical approaches targeting cardiac outpatients situate AF among the most challenged medical issues. The automatic, early, and fast AF detection is still a major concern for the healthcare professional. Several algorithms based on univariate analysis have been developed to detect atrial fibrillation. However, the published results do not show satisfactory classification accuracy. This work was aimed at resolving this shortcoming by proposing multivariate data analysis methods for automatic AF detection. Four publicly-accessible sets of clinical data (AF Termination Challenge Database, MIT-BIH AF, Normal Sinus Rhythm RR Interval Database, and MIT-BIH Normal Sinus Rhythm Databases) were used for assessment. All time series were segmented in 1 min RR intervals window and then four specific features were calculated. Two pattern recognition methods, i.e., Principal Component Analysis (PCA) and Learning Vector Quantization (LVQ) neural network were used to develop classification models. PCA, as a feature reduction method, was employed to find important features to discriminate between AF and Normal Sinus Rhythm. Despite its very simple structure, the results show that the LVQ model performs better on the analyzed databases than do existing algorithms, with high sensitivity and specificity (99.19% and 99.39%, respectively). The proposed AF detection holds several interesting properties, and can be implemented with just a few arithmetical operations which make it a suitable choice for telecare applications.

Keywords: atrial fibrillation, multivariate data analysis, automatic detection, telemedicine

Procedia PDF Downloads 238
27859 Voice Commands Recognition of Mentor Robot in Noisy Environment Using HTK

Authors: Khenfer-Koummich Fatma, Hendel Fatiha, Mesbahi Larbi

Abstract:

this paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a man-machine interface with a voice recognition system that allows the operator to tele-operate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands spoken in two languages: French and Arabic. The recognition rate obtained is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equal to 30 db, the Arabic speech recognition rate is 69% and 80% for French speech recognition rate. This can be explained by the ability of phonetic context of each speech when the noise is added.

Keywords: voice command, HMM, TIMIT, noise, HTK, Arabic, speech recognition

Procedia PDF Downloads 351
27858 The Istrian Istrovenetian-Croatian Bilingual Corpus

Authors: Nada Poropat Jeletic, Gordana Hrzica

Abstract:

Bilingual conversational corpora represent a meaningful and the most comprehensive data source for investigating the genuine contact phenomena in non-monitored bi-lingual speech productions. They can be particularly useful for bilingual research since some features of bilingual interaction can hardly be accessed with more traditional methodologies (e.g., elicitation tasks). The method of language sampling provides the resources for describing language interaction in a bilingual community and/or in bilingual situations (e.g. code-switching, amount of languages used, number of languages used, etc.). To capture these phenomena in genuine communication situations, such sampling should be as close as possible to spontaneous communication. Bilingual spoken corpus design is methodologically demanding. Therefore this paper aims at describing the methodological challenges that apply to the corpus design of the conversational corpus design of the Istrian Istrovenetian-Croatian Bilingual Corpus. Croatian is the first official language of the Croatian-Italian officially bilingual Istria County, while Istrovenetian is a diatopic subvariety of Venetian, a longlasting lingua franca in the Istrian peninsula, the mother tongue of the members of the Italian National Community in Istria and the primary code of informal everyday communication among the Istrian Italophone population. Within the CLARIN infrastructure, TalkBank is being used, as it provides relevant procedures for designing and analyzing bilingual corpora. Furthermore, it allows public availability allows for easy replication of studies and cumulative progress as a research community builds up around the corpus, while the tools developed within the field of corpus linguistics enable easy retrieval and analysis of information. The method of language sampling employed is kept at the level of spontaneous communication, in order to maximise the naturalness of the collected conversational data. All speakers have provided written informed consent in which they agree to be recorded at a random point within the period of one month after signing the consent. Participants are administered a background questionnaire providing information about the socioeconomic status and the exposure and language usage in the participants social networks. Recording data are being transcribed, phonologically adapted within a standard-sized orthographic form, coded and segmented (speech streams are being segmented into communication units based on syntactic criteria) and are being marked following the CHAT transcription system and its associated CLAN suite of programmes within the TalkBank toolkit. The corpus consists of transcribed sound recordings of 36 bilingual speakers, while the target is to publish the whole corpus by the end of 2020, by sampling spontaneous conversations among approximately 100 speakers from all the bilingual areas of Istria for ensuring representativeness (the participants are being recruited across three generations of native bilingual speakers in all the bilingual areas of the peninsula). Conversational corpora are still rare in TalkBank, so the Corpus will contribute to BilingBank as a highly relevant and scientifically reliable resource for an internationally established and active research community. The impact of the research of communities with societal bilingualism will contribute to the growing body of research on bilingualism and multilingualism, especially regarding topics of language dominance, language attrition and loss, interference and code-switching etc.

Keywords: conversational corpora, bilingual corpora, code-switching, language sampling, corpus design methodology

Procedia PDF Downloads 109
27857 Speech Rhythm Variation in Languages and Dialects: F0, Natural and Inverted Speech

Authors: Imen Ben Abda

Abstract:

Languages have been classified into different rhythm classes. 'Stress-timed' languages are exemplified by English, 'syllable-timed' languages by French and 'mora-timed' languages by Japanese. However, to our best knowledge, acoustic studies have not been unanimous in strictly establishing which rhythm category a given language belongs to and failed to show empirical evidence for isochrony. Perception seems to be a good approach to categorize languages into different rhythm classes. This study, within the scope of experimental phonetics, includes an account of different perceptual experiments using cues from natural and inverted speech, as well as pitch extracted from speech data. It is an attempt to categorize speech rhythm over a large set of Arabic (Tunisian, Algerian, Lebanese and Moroccan) and English dialects (Welsh, Irish, Scottish and Texan) as well as other languages such as Chinese, Japanese, French, and German. Listeners managed to classify the different languages and dialects into different rhythm classes using suprasegmental cues mainly rhythm and pitch (F0). They also perceived rhythmic differences even among languages and dialects belonging to the same rhythm class. This may show that there are different subclasses within very broad rhythmic typologies.

Keywords: F0, inverted speech, mora-timing, rhythm variation, stress-timing, syllable-timing

Procedia PDF Downloads 479
27856 Effects of Exposing Learners to Speech Acts in the German Teaching Material Schritte International: The Case of Requests

Authors: Wan-Lin Tsai

Abstract:

Speech act of requests is an important issue in the field of language learning and teaching because we cannot avoid making requesting in our daily life. This study examined whether or not the subjects who were freshmen and majored in German at Wenzao University of Languages were able to use the linguistic forms which they had learned from their course book Schritte International to make appropriate requests through dialogue completed tasks (DCT). The results revealed that the majority of the subjects were unable to use the forms to make appropriate requests in German due to the lack of explicit instructions. Furthermore, Chinese interference was observed in students' productions. Explicit instructions in speech acts are strongly recommended.

Keywords: Chinese interference, German pragmatics, German teaching, make appropriate requests in German, speech act of requesting

Procedia PDF Downloads 437
27855 Improved Feature Extraction Technique for Handling Occlusion in Automatic Facial Expression Recognition

Authors: Khadijat T. Bamigbade, Olufade F. W. Onifade

Abstract:

The field of automatic facial expression analysis has been an active research area in the last two decades. Its vast applicability in various domains has drawn so much attention into developing techniques and dataset that mirror real life scenarios. Many techniques such as Local Binary Patterns and its variants (CLBP, LBP-TOP) and lately, deep learning techniques, have been used for facial expression recognition. However, the problem of occlusion has not been sufficiently handled, making their results not applicable in real life situations. This paper develops a simple, yet highly efficient method tagged Local Binary Pattern-Histogram of Gradient (LBP-HOG) with occlusion detection in face image, using a multi-class SVM for Action Unit and in turn expression recognition. Our method was evaluated on three publicly available datasets which are JAFFE, CK, SFEW. Experimental results showed that our approach performed considerably well when compared with state-of-the-art algorithms and gave insight to occlusion detection as a key step to handling expression in wild.

Keywords: automatic facial expression analysis, local binary pattern, LBP-HOG, occlusion detection

Procedia PDF Downloads 139
27854 Childhood Apraxia of Speech and Autism: Interaction Influences and Treatment

Authors: Elad Vashdi

Abstract:

It is common to find speech deficit among children diagnosed with Autism. It can be found in the clinical field and recently in research. One of the DSM-V criteria suggests a speech delay (Delay in, or total lack of, the development of spoken language), but doesn't explain the cause of it. A common perception among professionals and families is that the inability to talk results from the autism. Autism is a name for a syndrome which just describes a phenomenon and is defined behaviorally. Since it is not based yet on a physiological gold standard, one can not conclude the nature of a deficit based on the name of the syndrome. A wide retrospective research (n=270) which included children with motor speech difficulties was conducted in Israel. The study analyzed entry evaluations in a private clinic during the years 2006-2013. The data was extracted from the reports. High percentage of children diagnosed with Autism (60%) was found. This result demonstrates the high relationship between Autism and motor speech problem. It also supports recent findings in research of Childhood apraxia of speech (CAS) occurrence among children with ASD. Only small percentage of the participants in this research (10%) were diagnosed with CAS even though their verbal deficits well fitted the guidelines for CAS diagnosis set by ASHA in 2007. This fact raises questions regarding the diagnostic procedure in Israel. The understanding that CAS might highly exist within Autism and can have a remarkable influence on the course of early development should be a guiding tool within the diagnosis procedure. CAS can explain the nature of the speech problem among some of the autistic children and guide the treatment in a more accurate way. Calculating the prevalence of CAS which includes the comorbidity with ASD reveals new numbers and suggests treating differently the CAS population.

Keywords: childhood apraxia of speech, Autism, treatment, speech

Procedia PDF Downloads 249
27853 Automatic Early Breast Cancer Segmentation Enhancement by Image Analysis and Hough Transform

Authors: David Jurado, Carlos Ávila

Abstract:

Detection of early signs of breast cancer development is crucial to quickly diagnose the disease and to define adequate treatment to increase the survival probability of the patient. Computer Aided Detection systems (CADs), along with modern data techniques such as Machine Learning (ML) and Neural Networks (NN), have shown an overall improvement in digital mammography cancer diagnosis, reducing the false positive and false negative rates becoming important tools for the diagnostic evaluations performed by specialized radiologists. However, ML and NN-based algorithms rely on datasets that might bring issues to the segmentation tasks. In the present work, an automatic segmentation and detection algorithm is described. This algorithm uses image processing techniques along with the Hough transform to automatically identify microcalcifications that are highly correlated with breast cancer development in the early stages. Along with image processing, automatic segmentation of high-contrast objects is done using edge extraction and circle Hough transform. This provides the geometrical features needed for an automatic mask design which extracts statistical features of the regions of interest. The results shown in this study prove the potential of this tool for further diagnostics and classification of mammographic images due to the low sensitivity to noisy images and low contrast mammographies.

Keywords: breast cancer, segmentation, X-ray imaging, hough transform, image analysis

Procedia PDF Downloads 44
27852 An Early Attempt of Artificial Intelligence-Assisted Language Oral Practice and Assessment

Authors: Paul Lam, Kevin Wong, Chi Him Chan

Abstract:

Constant practicing and accurate, immediate feedback are the keys to improving students’ speaking skills. However, traditional oral examination often fails to provide such opportunities to students. The traditional, face-to-face oral assessment is often time consuming – attending the oral needs of one student often leads to the negligence of others. Hence, teachers can only provide limited opportunities and feedback to students. Moreover, students’ incentive to practice is also reduced by their anxiety and shyness in speaking the new language. A mobile app was developed to use artificial intelligence (AI) to provide immediate feedback to students’ speaking performance as an attempt to solve the above-mentioned problems. Firstly, it was thought that online exercises would greatly increase the learning opportunities of students as they can now practice more without the needs of teachers’ presence. Secondly, the automatic feedback provided by the AI would enhance students’ motivation to practice as there is an instant evaluation of their performance. Lastly, students should feel less anxious and shy compared to directly practicing oral in front of teachers. Technically, the program made use of speech-to-text functions to generate feedback to students. To be specific, the software analyzes students’ oral input through certain speech-to-text AI engine and then cleans up the results further to the point that can be compared with the targeted text. The mobile app has invited English teachers for the pilot use and asked for their feedback. Preliminary trials indicated that the approach has limitations. Many of the users’ pronunciation were automatically corrected by the speech recognition function as wise guessing is already integrated into many of such systems. Nevertheless, teachers have confidence that the app can be further improved for accuracy. It has the potential to significantly improve oral drilling by giving students more chances to practice. Moreover, they believe that the success of this mobile app confirms the potential to extend the AI-assisted assessment to other language skills, such as writing, reading, and listening.

Keywords: artificial Intelligence, mobile learning, oral assessment, oral practice, speech-to-text function

Procedia PDF Downloads 80
27851 Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao

Abstract:

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Keywords: artificial intelligence and office, NLP, deep learning, text classification

Procedia PDF Downloads 151
27850 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 74
27849 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 105
27848 Understanding the Motivations behind the Assassination of Turkish Armenian Journalist, Hrant Dink

Authors: Nusret Mesut Sahin

Abstract:

Hrant Dink, a prominent Turkish-Armenian journalist, and editor-in-chief of the bilingual Turkish-Armenian newspaper Agos was assassinated in Istanbul on January 19th, 2007 by a nationalist extremist, Ogun Samast. Dink had been voicing the atrocities against the Armenians between 1915 and 1922 during the Ottoman rule, and his comments on the issue appeared in the Turkish media many times before his assassination. It has been argued that the suffocating atmosphere created by the Turkish news media targeting Mr. Dink made him a target of an extremist Turkish juvenile. This study analyzes the media news to understand and explain why Hrant Dink became the target of a nationalist extremist. In this research, content analysis of news articles (N= 170) is conducted to identify whether there is a link between hate speech against Hrant Dink in the Turkish media and his assassination. The content of the newspaper articles is categorized and coded according to the hate language being used. The analysis suggested that Turkish media paved the way for Dink’s assassination. Hate speech against Hrant Dink on the media had risen gradually before the assassination. The study also found that the number of news stories covering hate speech and racist discourse against non-Muslim citizens of Turkey also increased dramatically before the assassination. Therefore, hate speech against minorities in media narratives and news reports should be monitored, and political figures or leaders of social groups who are targeted by some media outlets should be protected.

Keywords: Hrant Dink, assassination, Turkish Armenian journalist, media

Procedia PDF Downloads 134
27847 Speech Motor Processing and Animal Sound Communication

Authors: Ana Cleide Vieira Gomes Guimbal de Aquino

Abstract:

Sound communication is present in most vertebrates, from fish, mainly in species that live in murky waters, to some species of reptiles, anuran amphibians, birds, and mammals, including primates. There are, in fact, relevant similarities between human language and animal sound communication, and among these similarities are the vocalizations called calls. The first specific call in human babies is crying, which has a characteristic prosodic contour and is motivated most of the time by the need for food and by affecting the puppy-caregiver interaction, with a view to communicating the necessities and food requests and guaranteeing the survival of the species. The present work aims to articulate speech processing in the motor context with aspects of the project entitled emotional states and vocalization: a comparative study of the prosodic contours of crying in human and non-human animals. First, concepts of speech motor processing and general aspects of speech evolution will be presented to relate these two approaches to animal sound communication.

Keywords: speech motor processing, animal communication, animal behaviour, language acquisition

Procedia PDF Downloads 57
27846 Fallacies of Argumentation in Modern American Political Discourse

Authors: Zarine Avetisyan

Abstract:

The process of speech production and transmission naturally implies the occurrence of certain defective assumptions and erroneous formulations which may be both spontaneous, caused by haste, carelessness, etc., or deliberate. Whether deliberate or not, fallacies always act by way of “faux pas”. In the latter case, we deal with fake or deceptive arguments which are the focus of the given paper. The paper departs from the assumption that fallacies are arguments that prove nothing. Additionally and more importantly, political discourse becomes the main domain for scholarly “cultivation” while pinning down fallacies. The fallacy of telling the truth but deliberately omitting important key details in order to falsify the larger picture called “the half truth” captures special attention in the given paper.

Keywords: break in the information chain, fallacy, half truth, political discourse

Procedia PDF Downloads 353
27845 Preliminary Study of the Phonological Development in Three and Four Year Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of research on phonological processes in three and four-year-old children. For the purpose of the study, an author's test was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the icing is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, an elision of sound, metathesis of sound, elision of a syllable, and elision of consonants clustered in a syllable. All examined children were identified with the articulatory disorder from type bilabial lambdacism. Measuring the correlation between the average length of repeated speech and the average length of generated speech, the analysis proves that the more words a child can repeat in part “repeated speech,” the more words they can be expected to generate in part “generating sentence.” The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: assessment, phonology, articulation, speech-language development

Procedia PDF Downloads 149
27844 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 180