Search results for: audio-visual speech recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2264

Search results for: audio-visual speech recognition

1904 Deficient Multisensory Integration with Concomitant Resting-State Connectivity in Adult Attention Deficit/Hyperactivity Disorder (ADHD)

Authors: Marcel Schulze, Behrem Aslan, Silke Lux, Alexandra Philipsen

Abstract:

Objective: Patients with Attention Deficit/Hyperactivity Disorder (ADHD) often report that they are being flooded by sensory impressions. Studies investigating sensory processing show hypersensitivity for sensory inputs across the senses in children and adults with ADHD. Especially the auditory modality is affected by deficient acoustical inhibition and modulation of signals. While studying unimodal signal-processing is relevant and well-suited in a controlled laboratory environment, everyday life situations occur multimodal. A complex interplay of the senses is necessary to form a unified percept. In order to achieve this, the unimodal sensory modalities are bound together in a process called multisensory integration (MI). In the current study we investigate MI in an adult ADHD sample using the McGurk-effect – a well-known illusion where incongruent speech like phonemes lead in case of successful integration to a new perceived phoneme via late top-down attentional allocation . In ADHD neuronal dysregulation at rest e.g., aberrant within or between network functional connectivity may also account for difficulties in integrating across the senses. Therefore, the current study includes resting-state functional connectivity to investigate a possible relation of deficient network connectivity and the ability of stimulus integration. Method: Twenty-five ADHD patients (6 females, age: 30.08 (SD:9,3) years) and twenty-four healthy controls (9 females; age: 26.88 (SD: 6.3) years) were recruited. MI was examined using the McGurk effect, where - in case of successful MI - incongruent speech-like phonemes between visual and auditory modality are leading to a perception of a new phoneme. Mann-Whitney-U test was applied to assess statistical differences between groups. Echo-planar imaging-resting-state functional MRI was acquired on a 3.0 Tesla Siemens Magnetom MR scanner. A seed-to-voxel analysis was realized using the CONN toolbox. Results: Susceptibility to McGurk was significantly lowered for ADHD patients (ADHDMdn:5.83%, ControlsMdn:44.2%, U= 160.5, p=0.022, r=-0.34). When ADHD patients integrated phonemes, reaction times were significantly longer (ADHDMdn:1260ms, ControlsMdn:582ms, U=41.0, p<.000, r= -0.56). In functional connectivity medio temporal gyrus (seed) was negatively associated with primary auditory cortex, inferior frontal gyrus, precentral gyrus, and fusiform gyrus. Conclusion: MI seems to be deficient for ADHD patients for stimuli that need top-down attentional allocation. This finding is supported by stronger functional connectivity from unimodal sensory areas to polymodal, MI convergence zones for complex stimuli in ADHD patients.

Keywords: attention-deficit hyperactivity disorder, audiovisual integration, McGurk-effect, resting-state functional connectivity

Procedia PDF Downloads 100
1903 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah

Authors: Joy Aworo-Okoroh

Abstract:

Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.

Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah

Procedia PDF Downloads 110
1902 Deep-Learning to Generation of Weights for Image Captioning Using Part-of-Speech Approach

Authors: Tiago do Carmo Nogueira, Cássio Dener Noronha Vinhal, Gélson da Cruz Júnior, Matheus Rudolfo Diedrich Ullmann

Abstract:

Generating automatic image descriptions through natural language is a challenging task. Image captioning is a task that consistently describes an image by combining computer vision and natural language processing techniques. To accomplish this task, cutting-edge models use encoder-decoder structures. Thus, Convolutional Neural Networks (CNN) are used to extract the characteristics of the images, and Recurrent Neural Networks (RNN) generate the descriptive sentences of the images. However, cutting-edge approaches still suffer from problems of generating incorrect captions and accumulating errors in the decoders. To solve this problem, we propose a model based on the encoder-decoder structure, introducing a module that generates the weights according to the importance of the word to form the sentence, using the part-of-speech (PoS). Thus, the results demonstrate that our model surpasses state-of-the-art models.

Keywords: gated recurrent units, caption generation, convolutional neural network, part-of-speech

Procedia PDF Downloads 73
1901 Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping

Authors: Endrick Barnacin, Jean-Luc Henry, Jack Molinié, Jimmy Nagau, Hélène Delatte, Gérard Lebreton

Abstract:

Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.

Keywords: pollen recognition, logistic model tree, expectation-maximization, local binary pattern

Procedia PDF Downloads 156
1900 Complications and Outcomes of Cochlear Implantation in Children Younger than 12 Months: A Multicenter Study

Authors: Alimohamad Asghari, Ahmad Daneshi, Mohammad Farhadi, Arash Bayat, Mohammad Ajalloueyan, Marjan Mirsalehi, Mohsen Rajati, Seyed Basir Hashemi, Nader Saki, Ali Omidvari

Abstract:

Evidence suggests that Cochlear Implantation (CI) is a beneficial approach for auditory and speech skills improvement in children with severe to profound hearing loss. However, it remains controversial if implantation in children <12 months is safe and effective compared to older children. The present study aimed to determine whether children's ages affect surgical complications and auditory and speech development. The current multicenter study enrolled 86 children who underwent CI surgery at <12 months of age (group A) and 362 children who underwent implantation between 12 and 24 months of age (group B). The Categories of Auditory Performance (CAP) and Speech Intelligibility Rating (SIR) scores were determined pre-impanation, and "one-year" and "two-year" post-implantation. Four complications (overall rate: 4.65%; three minor) occurred in group A and 12 complications (overall rate: 4.41%; nine minor) occurred in group B. We found no statistically significant difference in the complication rates between the groups (p>0.05). The mean SIR and CAP scores improved over time following CI activation in both groups. However, we did not find significant differences in CAP and SIR scores between the groups across different time points. Cochlear implantation is a safe and efficient procedure in children younger than 12 months, providing substantial auditory and speech benefits comparable to children undergoing implantation at 12 to 24 months of age. Furthermore, surgical complications in younger children are similar to those of children undergoing the CI at an older age.

Keywords: cochlear implant, Infant, complications, outcome

Procedia PDF Downloads 78
1899 Tetracycline as Chemosensor for Simultaneous Recognition of Al³⁺: Application to Bio-Imaging for Living Cells

Authors: Jesus Alfredo Ortega Granados, Pandiyan Thangarasu

Abstract:

Antibiotic tetracycline presents as a micro-contaminant in fresh water, wastewater and soils, causing environmental and health problems. In this work, tetracycline (TC) has been employed as chemo-sensor for the recognition of Al³⁺ without interring other ions, and the results show that it enhances the fluorescence intensity for Al³⁺ and there is no interference from other coexisting cation ions (Cd²⁺, Ni²⁺, Co²⁺, Sr²⁺, Mg²⁺, Fe³⁺, K⁺, Sm³⁺, Ag⁺, Na⁺, Ba²⁺, Zn²⁺, and Mn²⁺). For the addition of Cu²⁺ to [TET-Al³⁺], it appears that the intensity of fluorescence has been quenched. Other combinations of metal ions in addition to TC do not change the fluorescence behavior. The stoichiometry determined by Job´s plot for the interaction of TC with Al³⁺ was found to be 1:1. Importantly, the detection of Al³⁺⁺ successfully employed in the real samples like living cells, and it was found that TC efficiently performs as a fluorescent probe for Al³⁺ ion in living systems, especially in Saccharomyces cerevisiae; this is confirmed by confocal laser scanning microscopy.

Keywords: chemo-sensor, recognition of Al³⁺ ion, Saccharomyces cerevisiae, tetracycline,

Procedia PDF Downloads 157
1898 Recognition of Objects in a Maritime Environment Using a Combination of Pre- and Post-Processing of the Polynomial Fit Method

Authors: R. R. Hordijk, O. J. G. Somsen

Abstract:

Traditionally, radar systems are the eyes and ears of a ship. However, these systems have their drawbacks and nowadays they are extended with systems that work with video and photos. Processing of data from these videos and photos is however very labour-intensive and efforts are being made to automate this process. A major problem when trying to recognize objects in water is that the 'background' is not homogeneous so that traditional image recognition technics do not work well. Main question is, can a method be developed which automate this recognition process. There are a large number of parameters involved to facilitate the identification of objects on such images. One is varying the resolution. In this research, the resolution of some images has been reduced to the extreme value of 1% of the original to reduce clutter before the polynomial fit (pre-processing). It turned out that the searched object was clearly recognizable as its grey value was well above the average. Another approach is to take two images of the same scene shortly after each other and compare the result. Because the water (waves) fluctuates much faster than an object floating in the water one can expect that the object is the only stable item in the two images. Both these methods (pre-processing and comparing two images of the same scene) delivered useful results. Though it is too early to conclude that with these methods all image problems can be solved they are certainly worthwhile for further research.

Keywords: image processing, image recognition, polynomial fit, water

Procedia PDF Downloads 510
1897 Stereotypical Motor Movement Recognition Using Microsoft Kinect with Artificial Neural Network

Authors: M. Jazouli, S. Elhoufi, A. Majda, A. Zarghili, R. Aalouane

Abstract:

Autism spectrum disorder is a complex developmental disability. It is defined by a certain set of behaviors. Persons with Autism Spectrum Disorders (ASD) frequently engage in stereotyped and repetitive motor movements. The objective of this article is to propose a method to automatically detect this unusual behavior. Our study provides a clinical tool which facilitates for doctors the diagnosis of ASD. We focus on automatic identification of five repetitive gestures among autistic children in real time: body rocking, hand flapping, fingers flapping, hand on the face and hands behind back. In this paper, we present a gesture recognition system for children with autism, which consists of three modules: model-based movement tracking, feature extraction, and gesture recognition using artificial neural network (ANN). The first one uses the Microsoft Kinect sensor, the second one chooses points of interest from the 3D skeleton to characterize the gestures, and the last one proposes a neural connectionist model to perform the supervised classification of data. The experimental results show that our system can achieve above 93.3% recognition rate.

Keywords: ASD, artificial neural network, kinect, stereotypical motor movements

Procedia PDF Downloads 281
1896 Accuracy Improvement of Traffic Participant Classification Using Millimeter-Wave Radar by Leveraging Simulator Based on Domain Adaptation

Authors: Tokihiko Akita, Seiichi Mita

Abstract:

A millimeter-wave radar is the most robust against adverse environments, making it an essential environment recognition sensor for automated driving. However, the reflection signal is sparse and unstable, so it is difficult to obtain the high recognition accuracy. Deep learning provides high accuracy even for them in recognition, but requires large scale datasets with ground truth. Specially, it takes a lot of cost to annotate for a millimeter-wave radar. For the solution, utilizing a simulator that can generate an annotated huge dataset is effective. Simulation of the radar is more difficult to match with real world data than camera image, and recognition by deep learning with higher-order features using the simulator causes further deviation. We have challenged to improve the accuracy of traffic participant classification by fusing simulator and real-world data with domain adaptation technique. Experimental results with the domain adaptation network created by us show that classification accuracy can be improved even with a few real-world data.

Keywords: millimeter-wave radar, object classification, deep learning, simulation, domain adaptation

Procedia PDF Downloads 66
1895 On the Network Packet Loss Tolerance of SVM Based Activity Recognition

Authors: Gamze Uslu, Sebnem Baydere, Alper K. Demir

Abstract:

In this study, data loss tolerance of Support Vector Machines (SVM) based activity recognition model and multi activity classification performance when data are received over a lossy wireless sensor network is examined. Initially, the classification algorithm we use is evaluated in terms of resilience to random data loss with 3D acceleration sensor data for sitting, lying, walking and standing actions. The results show that the proposed classification method can recognize these activities successfully despite high data loss. Secondly, the effect of differentiated quality of service performance on activity recognition success is measured with activity data acquired from a multi hop wireless sensor network, which introduces high data loss. The effect of number of nodes on the reliability and multi activity classification success is demonstrated in simulation environment. To the best of our knowledge, the effect of data loss in a wireless sensor network on activity detection success rate of an SVM based classification algorithm has not been studied before.

Keywords: activity recognition, support vector machines, acceleration sensor, wireless sensor networks, packet loss

Procedia PDF Downloads 447
1894 Conspiracy Theory in Discussions of the Coronavirus Pandemic in the Gulf Region

Authors: Rasha Salameh

Abstract:

In light of the tense relationship between Saudi Arabia and Iran, this research paper sheds some light on Al-Arabiya’s reporting of Coronavirus in the Gulf. Particularly because most of the cases, in the beginning, were coming from Iran, some programs of this Saudi channel embraced a conspiracy theory. Hate speech has been used in talking about the topic and discussing it. The results of these discussions will be detailed in this paper in percentages with regard to the research sample, which includes five programs on Al-Arabiya channel: ‘DNA’, ‘Marraya’ (Mirrors), ‘Panorama’, ‘Tafaolcom’ (Your Interaction) and the ‘Diplomatic Street’, in the period between January 19, that is, the date of the first case in Iran, and April 10, 2020. The research shows the use of a conspiracy theory in the programs, in addition to some professional violations. The surveyed sample also shows that the matter receded due to the Arab Gulf states' preoccupation with the successively increasing cases that have appeared there since the start of the pandemic. The results indicate that hate speech was present in the sample at a rate of 98.1% and that most of the programs that dealt with the Iranian issue under the Corona pandemic on Al Arabiya used the conspiracy theory at a rate of 75.5%.

Keywords: Al-Arabiya, Iran, Corona, hate speech, conspiracy theory, politicization of the pandemic

Procedia PDF Downloads 107
1893 Keyframe Extraction Using Face Quality Assessment and Convolution Neural Network

Authors: Rahma Abed, Sahbi Bahroun, Ezzeddine Zagrouba

Abstract:

Due to the huge amount of data in videos, extracting the relevant frames became a necessity and an essential step prior to performing face recognition. In this context, we propose a method for extracting keyframes from videos based on face quality and deep learning for a face recognition task. This method has two steps. We start by generating face quality scores for each face image based on the use of three face feature extractors, including Gabor, LBP, and HOG. The second step consists in training a Deep Convolutional Neural Network in a supervised manner in order to select the frames that have the best face quality. The obtained results show the effectiveness of the proposed method compared to the methods of the state of the art.

Keywords: keyframe extraction, face quality assessment, face in video recognition, convolution neural network

Procedia PDF Downloads 198
1892 Reduced Lung Volume: A Possible Cause of Stuttering

Authors: Shantanu Arya, Sachin Sakhuja, Gunjan Mehta, Sanjay Munjal

Abstract:

Stuttering may be defined as a speech disorder affecting the fluency domain of speech and characterized by covert features like word substitution, omittance and circumlocution and overt features like prolongation of sound, syllables and blocks etc. Many etiologies have been postulated to explain stuttering based on various experiments and research. Moreover, Breathlessness has also been reported by many individuals with stuttering for which breathing exercises are generally advised. However, no studies reporting objective evaluation of the pulmonary capacity and further objective assessment of the efficacy of breathing exercises have been conducted. Pulmonary Function Test which evaluates parameters like Forced Vital Capacity, Peak Expiratory Flow Rate, Forced expiratory flow Rate can be used to study the pulmonary behavior of individuals with stuttering. The study aimed: a) To identify speech motor & physiologic behaviours associated with stuttering by administering PFT. b) To recognize possible reasons for an association between speech motor behaviour & stuttering severity. In this regard, PFT tests were administered on individuals who reported signs and symptoms of stuttering and showed abnormal scores on Stuttering Severity Index. Parameters like Forced Vital Capacity, Forced Expiratory Volume, Peak Expiratory Flow Rate (L/min), Forced Expiratory Flow Rate (L/min) were evaluated and correlated with scores of Stuttering Severity Index. Results showed significant decrease in the parameters (lower than normal scores) in individuals with established stuttering. Strong correlation was also found between degree of stuttering and the degree of decrease in the pulmonary volumes. Thus, it is evident that fluent speech requires strong support of lung pressure and requisite volumes. Further research in demonstrating the efficacy of abdominal breathing exercises in this regard is needed.

Keywords: forced expiratory flow rate, forced expiratory volume, forced vital capacity, peak expiratory flow rate, stuttering

Procedia PDF Downloads 245
1891 Arabic Light Word Analyser: Roles with Deep Learning Approach

Authors: Mohammed Abu Shquier

Abstract:

This paper introduces a word segmentation method using the novel BP-LSTM-CRF architecture for processing semantic output training. The objective of web morphological analysis tools is to link a formal morpho-syntactic description to a lemma, along with morpho-syntactic information, a vocalized form, a vocalized analysis with morpho-syntactic information, and a list of paradigms. A key objective is to continuously enhance the proposed system through an inductive learning approach that considers semantic influences. The system is currently under construction and development based on data-driven learning. To evaluate the tool, an experiment on homograph analysis was conducted. The tool also encompasses the assumption of deep binary segmentation hypotheses, the arbitrary choice of trigram or n-gram continuation probabilities, language limitations, and morphology for both Modern Standard Arabic (MSA) and Dialectal Arabic (DA), which provide justification for updating this system. Most Arabic word analysis systems are based on the phonotactic morpho-syntactic analysis of a word transmitted using lexical rules, which are mainly used in MENA language technology tools, without taking into account contextual or semantic morphological implications. Therefore, it is necessary to have an automatic analysis tool taking into account the word sense and not only the morpho-syntactic category. Moreover, they are also based on statistical/stochastic models. These stochastic models, such as HMMs, have shown their effectiveness in different NLP applications: part-of-speech tagging, machine translation, speech recognition, etc. As an extension, we focus on language modeling using Recurrent Neural Network (RNN); given that morphological analysis coverage was very low in dialectal Arabic, it is significantly important to investigate deeply how the dialect data influence the accuracy of these approaches by developing dialectal morphological processing tools to show that dialectal variability can support to improve analysis.

Keywords: NLP, DL, ML, analyser, MSA, RNN, CNN

Procedia PDF Downloads 9
1890 The Analysis of Deceptive and Truthful Speech: A Computational Linguistic Based Method

Authors: Seham El Kareh, Miramar Etman

Abstract:

Recently, detecting liars and extracting features which distinguish them from truth-tellers have been the focus of a wide range of disciplines. To the author’s best knowledge, most of the work has been done on facial expressions and body gestures but only few works have been done on the language used by both liars and truth-tellers. This paper sheds light on four axes. The first axis copes with building an audio corpus for deceptive and truthful speech for Egyptian Arabic speakers. The second axis focuses on examining the human perception of lies and proving our need for computational linguistic-based methods to extract features which characterize truthful and deceptive speech. The third axis is concerned with building a linguistic analysis program that could extract from the corpus the inter- and intra-linguistic cues for deceptive and truthful speech. The program built here is based on selected categories from the Linguistic Inquiry and Word Count program. Our results demonstrated that Egyptian Arabic speakers on one hand preferred to use first-person pronouns and present tense compared to the past tense when lying and their lies lacked of second-person pronouns, and on the other hand, when telling the truth, they preferred to use the verbs related to motion and the nouns related to time. The results also showed that there is a need for bigger data to prove the significance of words related to emotions and numbers.

Keywords: Egyptian Arabic corpus, computational analysis, deceptive features, forensic linguistics, human perception, truthful features

Procedia PDF Downloads 182
1889 Automatic Number Plate Recognition System Based on Deep Learning

Authors: T. Damak, O. Kriaa, A. Baccar, M. A. Ben Ayed, N. Masmoudi

Abstract:

In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used in the safety, the security, and the commercial aspects. Forethought, several methods and techniques are computing to achieve the better levels in terms of accuracy and real time execution. This paper proposed a computer vision algorithm of Number Plate Localization (NPL) and Characters Segmentation (CS). In addition, it proposed an improved method in Optical Character Recognition (OCR) based on Deep Learning (DL) techniques. In order to identify the number of detected plate after NPL and CS steps, the Convolutional Neural Network (CNN) algorithm is proposed. A DL model is developed using four convolution layers, two layers of Maxpooling, and six layers of fully connected. The model was trained by number image database on the Jetson TX2 NVIDIA target. The accuracy result has achieved 95.84%.

Keywords: ANPR, CS, CNN, deep learning, NPL

Procedia PDF Downloads 282
1888 New Formula for Revenue Recognition Likely to Change the Prescription for Pharma Industry

Authors: Shruti Hajirnis

Abstract:

In May 2014, FASB issued Accounting Standards Update (ASU) 2014-09, Revenue from Contracts with Customers (Topic 606), and the International Accounting Standards Board (IASB) issued International Financial Reporting Standards (IFRS) 15, Revenue from Contracts with Customers that will supersede virtually all revenue recognition requirements in IFRS and US GAAP. FASB and the IASB have basically achieved convergence with these standards, with only some minor differences such as collectability threshold, interim disclosure requirements, early application and effective date, impairment loss reversal and nonpublic entity requirements. This paper discusses the impact of five-step model prescribed in new revenue standard on the entities operating in Pharma industry. It also outlines the considerations for these entities while implementing the new standard.

Keywords: revenue recognition, pharma industry, standard, requirements

Procedia PDF Downloads 421
1887 Automatic Product Identification Based on Deep-Learning Theory in an Assembly Line

Authors: Fidel Lòpez Saca, Carlos Avilés-Cruz, Miguel Magos-Rivera, José Antonio Lara-Chávez

Abstract:

Automated object recognition and identification systems are widely used throughout the world, particularly in assembly lines, where they perform quality control and automatic part selection tasks. This article presents the design and implementation of an object recognition system in an assembly line. The proposed shapes-color recognition system is based on deep learning theory in a specially designed convolutional network architecture. The used methodology involve stages such as: image capturing, color filtering, location of object mass centers, horizontal and vertical object boundaries, and object clipping. Once the objects are cut out, they are sent to a convolutional neural network, which automatically identifies the type of figure. The identification system works in real-time. The implementation was done on a Raspberry Pi 3 system and on a Jetson-Nano device. The proposal is used in an assembly course of bachelor’s degree in industrial engineering. The results presented include studying the efficiency of the recognition and processing time.

Keywords: deep-learning, image classification, image identification, industrial engineering.

Procedia PDF Downloads 134
1886 Development and Application of the Proctoring System with Face Recognition for User Registration on the Educational Information Portal

Authors: Meruyert Serik, Nassipzhan Duisegaliyeva, Danara Tleumagambetova, Madina Ermaganbetova

Abstract:

This research paper explores the process of creating a proctoring system by evaluating the implementation of practical face recognition algorithms. Students of educational programs reviewed the research work "6B01511-Computer Science", "7M01511-Computer Science", "7M01525- STEM Education," and "8D01511-Computer Science" of Eurasian National University named after L.N. Gumilyov. As an outcome, a proctoring system will be created, enabling the conduction of tests and ensuring academic integrity checks within the system. Due to the correct operation of the system, test works are carried out. The result of the creation of the proctoring system will be the basis for the automation of the informational, educational portal developed by machine learning.

Keywords: artificial intelligence, education portal, face recognition, machine learning, proctoring

Procedia PDF Downloads 83
1885 2.5D Face Recognition Using Gabor Discrete Cosine Transform

Authors: Ali Cheraghian, Farshid Hajati, Soheila Gheisari, Yongsheng Gao

Abstract:

In this paper, we present a novel 2.5D face recognition method based on Gabor Discrete Cosine Transform (GDCT). In the proposed method, the Gabor filter is applied to extract feature vectors from the texture and the depth information. Then, Discrete Cosine Transform (DCT) is used for dimensionality and redundancy reduction to improve computational efficiency. The system is combined texture and depth information in the decision level, which presents higher performance compared to methods, which use texture and depth information, separately. The proposed algorithm is examined on publically available Bosphorus database including models with pose variation. The experimental results show that the proposed method has a higher performance compared to the benchmark.

Keywords: Gabor filter, discrete cosine transform, 2.5d face recognition, pose

Procedia PDF Downloads 300
1884 Segmentation of Arabic Handwritten Numeral Strings Based on Watershed Approach

Authors: Nidal F. Shilbayeh, Remah W. Al-Khatib, Sameer A. Nooh

Abstract:

Arabic offline handwriting recognition systems are considered as one of the most challenging topics. Arabic Handwritten Numeral Strings are used to automate systems that deal with numbers such as postal code, banking account numbers and numbers on car plates. Segmentation of connected numerals is the main bottleneck in the handwritten numeral recognition system.  This is in turn can increase the speed and efficiency of the recognition system. In this paper, we proposed algorithms for automatic segmentation and feature extraction of Arabic handwritten numeral strings based on Watershed approach. The algorithms have been designed and implemented to achieve the main goal of segmenting and extracting the string of numeral digits written by hand especially in a courtesy amount of bank checks. The segmentation algorithm partitions the string into multiple regions that can be associated with the properties of one or more criteria. The numeral extraction algorithm extracts the numeral string digits into separated individual digit. Both algorithms for segmentation and feature extraction have been tested successfully and efficiently for all types of numerals.

Keywords: handwritten numerals, segmentation, courtesy amount, feature extraction, numeral recognition

Procedia PDF Downloads 355
1883 Features of Normative and Pathological Realizations of Sibilant Sounds for Computer-Aided Pronunciation Evaluation in Children

Authors: Zuzanna Miodonska, Michal Krecichwost, Pawel Badura

Abstract:

Sigmatism (lisping) is a speech disorder in which sibilant consonants are mispronounced. The diagnosis of this phenomenon is usually based on the auditory assessment. However, the progress in speech analysis techniques creates a possibility of developing computer-aided sigmatism diagnosis tools. The aim of the study is to statistically verify whether specific acoustic features of sibilant sounds may be related to pronunciation correctness. Such knowledge can be of great importance while implementing classifiers and designing novel tools for automatic sibilants pronunciation evaluation. The study covers analysis of various speech signal measures, including features proposed in the literature for the description of normative sibilants realization. Amplitudes and frequencies of three fricative formants (FF) are extracted based on local spectral maxima of the friction noise. Skewness, kurtosis, four normalized spectral moments (SM) and 13 mel-frequency cepstral coefficients (MFCC) with their 1st and 2nd derivatives (13 Delta and 13 Delta-Delta MFCC) are included in the analysis as well. The resulting feature vector contains 51 measures. The experiments are performed on the speech corpus containing words with selected sibilant sounds (/ʃ, ʒ/) pronounced by 60 preschool children with proper pronunciation or with natural pathologies. In total, 224 /ʃ/ segments and 191 /ʒ/ segments are employed in the study. The Mann-Whitney U test is employed for the analysis of stigmatism and normative pronunciation. Statistically, significant differences are obtained in most of the proposed features in children divided into these two groups at p < 0.05. All spectral moments and fricative formants appear to be distinctive between pathology and proper pronunciation. These metrics describe the friction noise characteristic for sibilants, which makes them particularly promising for the use in sibilants evaluation tools. Correspondences found between phoneme feature values and an expert evaluation of the pronunciation correctness encourage to involve speech analysis tools in diagnosis and therapy of sigmatism. Proposed feature extraction methods could be used in a computer-assisted stigmatism diagnosis or therapy systems.

Keywords: computer-aided pronunciation evaluation, sigmatism diagnosis, speech signal analysis, statistical verification

Procedia PDF Downloads 276
1882 Audio-Visual Recognition Based on Effective Model and Distillation

Authors: Heng Yang, Tao Luo, Yakun Zhang, Kai Wang, Wei Qin, Liang Xie, Ye Yan, Erwei Yin

Abstract:

Recent years have seen that audio-visual recognition has shown great potential in a strong noise environment. The existing method of audio-visual recognition has explored methods with ResNet and feature fusion. However, on the one hand, ResNet always occupies a large amount of memory resources, restricting the application in engineering. On the other hand, the feature merging also brings some interferences in a high noise environment. In order to solve the problems, we proposed an effective framework with bidirectional distillation. At first, in consideration of the good performance in extracting of features, we chose the light model, Efficientnet as our extractor of spatial features. Secondly, self-distillation was applied to learn more information from raw data. Finally, we proposed a bidirectional distillation in decision-level fusion. In more detail, our experimental results are based on a multi-model dataset from 24 volunteers. Eventually, the lipreading accuracy of our framework was increased by 2.3% compared with existing systems, and our framework made progress in audio-visual fusion in a high noise environment compared with the system of audio recognition without visual.

Keywords: lipreading, audio-visual, Efficientnet, distillation

Procedia PDF Downloads 107
1881 Automatic Landmark Selection Based on Feature Clustering for Visual Autonomous Unmanned Aerial Vehicle Navigation

Authors: Paulo Fernando Silva Filho, Elcio Hideiti Shiguemori

Abstract:

The selection of specific landmarks for an Unmanned Aerial Vehicles’ Visual Navigation systems based on Automatic Landmark Recognition has significant influence on the precision of the system’s estimated position. At the same time, manual selection of the landmarks does not guarantee a high recognition rate, which would also result on a poor precision. This work aims to develop an automatic landmark selection that will take the image of the flight area and identify the best landmarks to be recognized by the Visual Navigation Landmark Recognition System. The criterion to select a landmark is based on features detected by ORB or AKAZE and edges information on each possible landmark. Results have shown that disposition of possible landmarks is quite different from the human perception.

Keywords: clustering, edges, feature points, landmark selection, X-means

Procedia PDF Downloads 251
1880 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 306
1879 The Influence of Neural Synchrony on Auditory Middle Latency and Late Latency Responses and Its Correlation with Audiological Profile in Individuals with Auditory Neuropathy

Authors: P. Renjitha, P. Hari Prakash

Abstract:

Auditory neuropathy spectrum disorder (ANSD) is an auditory disorder with normal cochlear outer hair cell function and disrupted auditory nerve function. It results in unique clinical characteristic with absent auditory brainstem response (ABR), absent acoustic reflex and the presence of otoacoustic emissions (OAE) and cochlear microphonics. The lesion site could be at cochlear inner hair cells, the synapse between the inner hair cells and type I auditory nerve fibers, and/or the auditory nerve itself. But the literatures on synchrony at higher auditory system are sporadic and are less understood. It might be interesting to see if there is a recovery of neural synchrony at higher auditory centers. Also, does the level at which the auditory system recovers with adequate synchrony to the extent of observable evoke response potentials (ERPs) can predict speech perception? In the current study, eight ANSD participants and healthy controls underwent detailed audiological assessment including ABR, auditory middle latency response (AMLR), and auditory late latency response (ALLR). AMLR was recorded for clicks and ALLR was evoked using 500Hz and 2 kHz tone bursts. Analysis revealed that the participant could be categorized into three groups. Group I (2/8) where ALLR was present only for 2kHz tone burst. Group II (4/8), where AMLR was absent and ALLR was seen for both the stimuli. Group III (2/8) consisted individuals with identifiable AMLR and ALLR for all the stimuli. The highest speech identification sore observed in ANSD group was 30% and hence considered having poor speech perception. Overall test result indicates that the site of neural synchrony recovery could be varying across individuals with ANSD. Some individuals show recovery of neural synchrony at the thalamocortical level while others show the same only at the cortical level. Within ALLR itself there could be variation across stimuli again could be related to neural synchrony. Nevertheless, none of these patterns could possible explain the speech perception ability of the individuals. Hence, it could be concluded that neural synchrony as measured by evoked potentials could not be a good clinical predictor speech perception.

Keywords: auditory late latency response, auditory middle latency response, auditory neuropathy spectrum disorder, correlation with speech identification score

Procedia PDF Downloads 118
1878 Sports Fans and Non-Interested Public Recognition of the Problems of Sports in Egypt through Caricature

Authors: Alaaeldin Hamdy Ahmed Mohammed

Abstract:

Introduction: This study examines sports’ fans and non-interested public perception and recognition of the problems that have negative impacts upon the Egyptian sports, particularly football, through caricatures. Eight caricature paintings were designed to express eight problems affecting the Egyptian sports and its development. These paintings were distributed on two groups of the fans and the non-interested public. Methods: The study was limited to eight caricatures representing the eight issues which are: the impact of stopping the sports activity on athletes, the effect of clubs’ disagreement, fanaticism between the members of the ultras of different clubs, the negative impact of the mingling of politics into sports, the negative role of the clubs affects the professionalism of the promising players, the conflict between the national organization responsible for sports, the breaking in of the fans to the playgrounds, the impact of the lack of planning on the national team. The Results: The results showed that both sports fans and those who are not interested in sports recognized the problems that the caricatures refer to and criticizes exaggeration although the rate was higher for the fans. These caricatures contributed also in their recognition of the danger of the negative impact of these problems on the Egyptian sports, particularly football which is the most common at the Egyptian sports fans. Discussion: This finding echoes the conclusion that caricatures are distinctive in the adults’ facial stimuli that are either systematically exaggerated recognition of them.

Keywords: caricature, fans, football, sports

Procedia PDF Downloads 291
1877 A Speeded up Robust Scale-Invariant Feature Transform Currency Recognition Algorithm

Authors: Daliyah S. Aljutaili, Redna A. Almutlaq, Suha A. Alharbi, Dina M. Ibrahim

Abstract:

All currencies around the world look very different from each other. For instance, the size, color, and pattern of the paper are different. With the development of modern banking services, automatic methods for paper currency recognition become important in many applications like vending machines. One of the currency recognition architecture’s phases is Feature detection and description. There are many algorithms that are used for this phase, but they still have some disadvantages. This paper proposes a feature detection algorithm, which merges the advantages given in the current SIFT and SURF algorithms, which we call, Speeded up Robust Scale-Invariant Feature Transform (SR-SIFT) algorithm. Our proposed SR-SIFT algorithm overcomes the problems of both the SIFT and SURF algorithms. The proposed algorithm aims to speed up the SIFT feature detection algorithm and keep it robust. Simulation results demonstrate that the proposed SR-SIFT algorithm decreases the average response time, especially in small and minimum number of best key points, increases the distribution of the number of best key points on the surface of the currency. Furthermore, the proposed algorithm increases the accuracy of the true best point distribution inside the currency edge than the other two algorithms.

Keywords: currency recognition, feature detection and description, SIFT algorithm, SURF algorithm, speeded up and robust features

Procedia PDF Downloads 215
1876 An Ensemble-based Method for Vehicle Color Recognition

Authors: Saeedeh Barzegar Khalilsaraei, Manoocheher Kelarestaghi, Farshad Eshghi

Abstract:

The vehicle color, as a prominent and stable feature, helps to identify a vehicle more accurately. As a result, vehicle color recognition is of great importance in intelligent transportation systems. Unlike conventional methods which use only a single Convolutional Neural Network (CNN) for feature extraction or classification, in this paper, four CNNs, with different architectures well-performing in different classes, are trained to extract various features from the input image. To take advantage of the distinct capability of each network, the multiple outputs are combined using a stack generalization algorithm as an ensemble technique. As a result, the final model performs better than each CNN individually in vehicle color identification. The evaluation results in terms of overall average accuracy and accuracy variance show the proposed method’s outperformance compared to the state-of-the-art rivals.

Keywords: Vehicle Color Recognition, Ensemble Algorithm, Stack Generalization, Convolutional Neural Network

Procedia PDF Downloads 55
1875 Human-Machine Cooperation in Facial Comparison Based on Likelihood Scores

Authors: Lanchi Xie, Zhihui Li, Zhigang Li, Guiqiang Wang, Lei Xu, Yuwen Yan

Abstract:

Image-based facial features can be classified into category recognition features and individual recognition features. Current automated face recognition systems extract a specific feature vector of different dimensions from a facial image according to their pre-trained neural network. However, to improve the efficiency of parameter calculation, an algorithm generally reduces the image details by pooling. The operation will overlook the details concerned much by forensic experts. In our experiment, we adopted a variety of face recognition algorithms based on deep learning, compared a large number of naturally collected face images with the known data of the same person's frontal ID photos. Downscaling and manual handling were performed on the testing images. The results supported that the facial recognition algorithms based on deep learning detected structural and morphological information and rarely focused on specific markers such as stains and moles. Overall performance, distribution of genuine scores and impostor scores, and likelihood ratios were tested to evaluate the accuracy of biometric systems and forensic experts. Experiments showed that the biometric systems were skilled in distinguishing category features, and forensic experts were better at discovering the individual features of human faces. In the proposed approach, a fusion was performed at the score level. At the specified false accept rate, the framework achieved a lower false reject rate. This paper contributes to improving the interpretability of the objective method of facial comparison and provides a novel method for human-machine collaboration in this field.

Keywords: likelihood ratio, automated facial recognition, facial comparison, biometrics

Procedia PDF Downloads 102