Search results for: audio-visual speech recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2283

Search results for: audio-visual speech recognition

1803 Object Detection Based on Plane Segmentation and Features Matching for a Service Robot

Authors: António J. R. Neves, Rui Garcia, Paulo Dias, Alina Trifan

Abstract:

With the aging of the world population and the continuous growth in technology, service robots are more and more explored nowadays as alternatives to healthcare givers or personal assistants for the elderly or disabled people. Any service robot should be capable of interacting with the human companion, receive commands, navigate through the environment, either known or unknown, and recognize objects. This paper proposes an approach for object recognition based on the use of depth information and color images for a service robot. We present a study on two of the most used methods for object detection, where 3D data is used to detect the position of objects to classify that are found on horizontal surfaces. Since most of the objects of interest accessible for service robots are on these surfaces, the proposed 3D segmentation reduces the processing time and simplifies the scene for object recognition. The first approach for object recognition is based on color histograms, while the second is based on the use of the SIFT and SURF feature descriptors. We present comparative experimental results obtained with a real service robot.

Keywords: object detection, feature, descriptors, SIFT, SURF, depth images, service robots

Procedia PDF Downloads 518
1802 Text Emotion Recognition by Multi-Head Attention based Bidirectional LSTM Utilizing Multi-Level Classification

Authors: Vishwanath Pethri Kamath, Jayantha Gowda Sarapanahalli, Vishal Mishra, Siddhesh Balwant Bandgar

Abstract:

Recognition of emotional information is essential in any form of communication. Growing HCI (Human-Computer Interaction) in recent times indicates the importance of understanding of emotions expressed and becomes crucial for improving the system or the interaction itself. In this research work, textual data for emotion recognition is used. The text being the least expressive amongst the multimodal resources poses various challenges such as contextual information and also sequential nature of the language construction. In this research work, the proposal is made for a neural architecture to resolve not less than 8 emotions from textual data sources derived from multiple datasets using google pre-trained word2vec word embeddings and a Multi-head attention-based bidirectional LSTM model with a one-vs-all Multi-Level Classification. The emotions targeted in this research are Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame, and Surprise. Textual data from multiple datasets were used for this research work such as ISEAR, Go Emotions, Affect datasets for creating the emotions’ dataset. Data samples overlap or conflicts were considered with careful preprocessing. Our results show a significant improvement with the modeling architecture and as good as 10 points improvement in recognizing some emotions.

Keywords: text emotion recognition, bidirectional LSTM, multi-head attention, multi-level classification, google word2vec word embeddings

Procedia PDF Downloads 152
1801 An Accurate Computation of 2D Zernike Moments via Fast Fourier Transform

Authors: Mohammed S. Al-Rawi, J. Bastos, J. Rodriguez

Abstract:

Object detection and object recognition are essential components of every computer vision system. Despite the high computational complexity and other problems related to numerical stability and accuracy, Zernike moments of 2D images (ZMs) have shown resilience when used in object recognition and have been used in various image analysis applications. In this work, we propose a novel method for computing ZMs via Fast Fourier Transform (FFT). Notably, this is the first algorithm that can generate ZMs up to extremely high orders accurately, e.g., it can be used to generate ZMs for orders up to 1000 or even higher. Furthermore, the proposed method is also simpler and faster than the other methods due to the availability of FFT software and/or hardware. The accuracies and numerical stability of ZMs computed via FFT have been confirmed using the orthogonality property. We also introduce normalizing ZMs with Neumann factor when the image is embedded in a larger grid, and color image reconstruction based on RGB normalization of the reconstructed images. Astonishingly, higher-order image reconstruction experiments show that the proposed methods are superior, both quantitatively and subjectively, compared to the q-recursive method.

Keywords: Chebyshev polynomial, fourier transform, fast algorithms, image recognition, pseudo Zernike moments, Zernike moments

Procedia PDF Downloads 241
1800 Individualized Emotion Recognition Through Dual-Representations and Ground-Established Ground Truth

Authors: Valentina Zhang

Abstract:

While facial expression is a complex and individualized behavior, all facial emotion recognition (FER) systems known to us rely on a single facial representation and are trained on universal data. We conjecture that: (i) different facial representations can provide different, sometimes complementing views of emotions; (ii) when employed collectively in a discussion group setting, they enable more accurate emotion reading which is highly desirable in autism care and other applications context sensitive to errors. In this paper, we first study FER using pixel-based DL vs semantics-based DL in the context of deepfake videos. Our experiment indicates that while the semantics-trained model performs better with articulated facial feature changes, the pixel-trained model outperforms on subtle or rare facial expressions. Armed with these findings, we have constructed an adaptive FER system learning from both types of models for dyadic or small interacting groups and further leveraging the synthesized group emotions as the ground truth for individualized FER training. Using a collection of group conversation videos, we demonstrate that FER accuracy and personalization can benefit from such an approach.

Keywords: neurodivergence care, facial emotion recognition, deep learning, ground truth for supervised learning

Procedia PDF Downloads 125
1799 A Review on Artificial Neural Networks in Image Processing

Authors: B. Afsharipoor, E. Nazemi

Abstract:

Artificial neural networks (ANNs) are powerful tool for prediction which can be trained based on a set of examples and thus, it would be useful for nonlinear image processing. The present paper reviews several paper regarding applications of ANN in image processing to shed the light on advantage and disadvantage of ANNs in this field. Different steps in the image processing chain including pre-processing, enhancement, segmentation, object recognition, image understanding and optimization by using ANN are summarized. Furthermore, results on using multi artificial neural networks are presented.

Keywords: neural networks, image processing, segmentation, object recognition, image understanding, optimization, MANN

Procedia PDF Downloads 371
1798 Challenges of Teaching and Learning English Speech Sounds in Five Selected Secondary Schools in Bauchi, Bauchi State, Nigeria

Authors: Mairo Musa Galadima, Phoebe Mshelia

Abstract:

In Nigeria, the national policy of education stipulates that the kindergarten primary schools and the legislature are to use the three popular Nigerian Languages namely: Hausa, Igbo and Yoruba. However, the English language seems to be preferred and this calls for this paper. Attempts were made to draw out the challenges faced by learners in understanding English speech sounds and using them to communicate effectively in English; using 5(five) selected secondary school in Bauchi. It was discover that challenges abound in the wrong use of stress and intonation, transfer of phonetic features from their first language. Others are inadequate qualified teachers and relevant materials including text-books. It is recommended that teachers of English should lay more emphasis on the teaching of supra-segmental features and should be encouraged to go for further studies, seminars and refresher courses.

Keywords: kindergarten, stress, phonetic and intonation, Nigeria

Procedia PDF Downloads 278
1797 Chaotic Sequence Noise Reduction and Chaotic Recognition Rate Improvement Based on Improved Local Geometric Projection

Authors: Rubin Dan, Xingcai Wang, Ziyang Chen

Abstract:

A chaotic time series noise reduction method based on the fusion of the local projection method, wavelet transform, and particle swarm algorithm (referred to as the LW-PSO method) is proposed to address the problem of false recognition due to noise in the recognition process of chaotic time series containing noise. The method first uses phase space reconstruction to recover the original dynamical system characteristics and removes the noise subspace by selecting the neighborhood radius; then it uses wavelet transform to remove D1-D3 high-frequency components to maximize the retention of signal information while least-squares optimization is performed by the particle swarm algorithm. The Lorenz system containing 30% Gaussian white noise is simulated and verified, and the phase space, SNR value, RMSE value, and K value of the 0-1 test method before and after noise reduction of the Schreiber method, local projection method, wavelet transform method, and LW-PSO method are compared and analyzed, which proves that the LW-PSO method has a better noise reduction effect compared with the other three common methods. The method is also applied to the classical system to evaluate the noise reduction effect of the four methods and the original system identification effect, which further verifies the superiority of the LW-PSO method. Finally, it is applied to the Chengdu rainfall chaotic sequence for research, and the results prove that the LW-PSO method can effectively reduce the noise and improve the chaos recognition rate.

Keywords: Schreiber noise reduction, wavelet transform, particle swarm optimization, 0-1 test method, chaotic sequence denoising

Procedia PDF Downloads 172
1796 A Voice Signal Encryption Scheme Based on Chaotic Theory

Authors: Hailang Yang

Abstract:

To ensure the confidentiality and integrity of speech signals in communication transmission, this paper proposes a voice signal encryption scheme based on chaotic theory. Firstly, the scheme utilizes chaotic mapping to generate a key stream and then employs the key stream to perform bitwise exclusive OR (XOR) operations for encrypting the speech signal. Additionally, the scheme utilizes a chaotic hash function to generate a Message Authentication Code (MAC), which is appended to the encrypted data to verify the integrity of the data. Subsequently, we analyze the security performance and encryption efficiency of the scheme, comparing and optimizing it against existing solutions. Finally, experimental results demonstrate that the proposed scheme can resist common attacks, achieving high-quality encryption and speed.

Keywords: chaotic theory, XOR encryption, chaotic hash function, Message Authentication Code (MAC)

Procedia PDF Downloads 27
1795 Long Short-Term Memory Based Model for Modeling Nicotine Consumption Using an Electronic Cigarette and Internet of Things Devices

Authors: Hamdi Amroun, Yacine Benziani, Mehdi Ammi

Abstract:

In this paper, we want to determine whether the accurate prediction of nicotine concentration can be obtained by using a network of smart objects and an e-cigarette. The approach consists of, first, the recognition of factors influencing smoking cessation such as physical activity recognition and participant’s behaviors (using both smartphone and smartwatch), then the prediction of the configuration of the e-cigarette (in terms of nicotine concentration, power, and resistance of e-cigarette). The study uses a network of commonly connected objects; a smartwatch, a smartphone, and an e-cigarette transported by the participants during an uncontrolled experiment. The data obtained from sensors carried in the three devices were trained by a Long short-term memory algorithm (LSTM). Results show that our LSTM-based model allows predicting the configuration of the e-cigarette in terms of nicotine concentration, power, and resistance with a root mean square error percentage of 12.9%, 9.15%, and 11.84%, respectively. This study can help to better control consumption of nicotine and offer an intelligent configuration of the e-cigarette to users.

Keywords: Iot, activity recognition, automatic classification, unconstrained environment

Procedia PDF Downloads 202
1794 A New Scheme for Chain Code Normalization in Arabic and Farsi Scripts

Authors: Reza Shakoori

Abstract:

This paper presents a structural correction of Arabic and Persian strokes using manipulation of their chain codes in order to improve the rate and performance of Persian and Arabic handwritten word recognition systems. It collects pure and effective features to represent a character with one consolidated feature vector and reduces variations in order to decrease the number of training samples and increase the chance of successful classification. Our results also show that how the proposed approaches can simplify classification and consequently recognition by reducing variations and possible noises on the chain code by keeping orientation of characters and their backbone structures.

Keywords: Arabic, chain code normalization, OCR systems, image processing

Procedia PDF Downloads 379
1793 Modified Form of Margin Based Angular Softmax Loss for Speaker Verification

Authors: Jamshaid ul Rahman, Akhter Ali, Adnan Manzoor

Abstract:

Learning-based systems have received increasing interest in recent years; recognition structures, including end-to-end speak recognition, are one of the hot topics in this area. A famous work on end-to-end speaker verification by using Angular Softmax Loss gained significant importance and is considered useful to directly trains a discriminative model instead of the traditional adopted i-vector approach. The margin-based strategy in angular softmax is beneficial to learn discriminative speaker embeddings where the random selection of margin values is a big issue in additive angular margin and multiplicative angular margin. As a better solution in this matter, we present an alternative approach by introducing a bit similar form of an additive parameter that was originally introduced for face recognition, and it has a capacity to adjust automatically with the corresponding margin values and is applicable to learn more discriminative features than the Softmax. Experiments are conducted on the part of Fisher dataset, where it observed that the additive parameter with angular softmax to train the front-end and probabilistic linear discriminant analysis (PLDA) in the back-end boosts the performance of the structure.

Keywords: additive parameter, angular softmax, speaker verification, PLDA

Procedia PDF Downloads 76
1792 EEG and ABER Abnormalities in Children with Speech and Language Delay

Authors: Bharati Mehta, Manish Parakh, Bharti Bhandari, Sneha Ambwani

Abstract:

Speech and language delay (SLD) is seen commonly as a co-morbidity in children having severe resistant focal and generalized, syndromic and symptomatic epilepsies. It is however not clear whether epilepsy contributes to or is a mere association in the pathogenesis of SLD. Also, it is acknowledged that Auditory Brainstem Evoked Responses (ABER), besides used for evaluating hearing threshold, also aid in prognostication of neurological disorders and abnormalities in the hearing pathway in the brainstem. There is no circumscribed or surrogate neurophysiologic laboratory marker to adjudge the extent of SLD. The current study was designed to evaluate the abnormalities in Electroencephalography (EEG) and ABER in children with SLD who do not have an overt hearing deficit or autism. 94 children of age group 2-8 years with predominant SLD and without any gross motor developmental delay, head injury, gross hearing disorder, cleft lip/palate and autism were selected. Standard video Electroencephalography using the 10:20 international system and ABER after click stimulus with intensities 110 db until 40 db was performed in all children. EEG was abnormal in 47.9% (n= 45; 36 boys and 9 girls) children. In the children with abnormal EEG, 64.5% (n=29) had an abnormal background, 57.8% (n=27) had presence of generalized interictal epileptiform discharges (IEDs), 20% (n=9) had focal epileptiform discharges exclusively from left side and 33.3% (n=15) had multifocal IEDs occurring both in isolation or associated with generalised abnormalities. In ABER, surprisingly, the peak latencies for waves I, III & V, inter-peak latencies I-III & I-V, III-V and wave amplitude ratio V/I, were found within normal limits in both ears of all the children. Thus in the current study it is certain that presence of generalized IEDs in EEG are seen in higher frequency with SLD and focal IEDs are seen exclusively in left hemisphere in these children. It may be possible that even with generalized EEG abnormalities present in these children, left hemispheric abnormalities as a part of this generalized dysfunction may be responsible for the speech and language dysfunction. The current study also emphasizes that ABER may not be routinely recommended as diagnostic or prognostic tool in children with SLD without frank hearing deficit or autism, thus reducing the burden on electro physiologists, laboratories and saving time and financial resources.

Keywords: ABER, EEG, speech, language delay

Procedia PDF Downloads 501
1791 Feature Extraction of MFCC Based on Fisher-Ratio and Correlated Distance Criterion for Underwater Target Signal

Authors: Han Xue, Zhang Lanyue

Abstract:

In order to seek more effective feature extraction technology, feature extraction method based on MFCC combined with vector hydrophone is exposed in the paper. The sound pressure signal and particle velocity signal of two kinds of ships are extracted by using MFCC and its evolution form, and the extracted features are fused by using fisher-ratio and correlated distance criterion. The features are then identified by BP neural network. The results showed that MFCC, First-Order Differential MFCC and Second-Order Differential MFCC features can be used as effective features for recognition of underwater targets, and the fusion feature can improve the recognition rate. Moreover, the results also showed that the recognition rate of the particle velocity signal is higher than that of the sound pressure signal, and it reflects the superiority of vector signal processing.

Keywords: vector information, MFCC, differential MFCC, fusion feature, BP neural network

Procedia PDF Downloads 501
1790 Attendance Management System Implementation Using Face Recognition

Authors: Zainab S. Abdullahi, Zakariyya H. Abdullahi, Sahnun Dahiru

Abstract:

Student attendance in schools is a very important aspect in school management record. In recent years, security systems have become one of the most demanding systems in school. Every institute have its own method of taking attendance, many schools in Nigeria use the old fashion way of taking attendance. That is writing the students name and registration number in a paper and submitting it to the lecturer at the end of the lecture which is time-consuming and insecure, because some students can write for their friends without the lecturer’s knowledge. In this paper, we propose a system that takes attendance using face recognition. There are many automatic methods available for this purpose i.e. biometric attendance, but they all waste time, because the students have to follow a queue to put their thumbs on a scanner which is time-consuming. This attendance is recorded by using a camera attached in front of the class room and capturing the student images, detect the faces in the image and compare the detected faces with database and mark the attendance. The principle component analysis was used to recognize the faces detected with a high accuracy rate. The paper reviews the related work in the field of attendance system, then describe the system architecture, software algorithm and result.

Keywords: attendance system, face detection, face recognition, PCA

Procedia PDF Downloads 340
1789 Freedom of Expression and Its Restriction in Audiovisual Media

Authors: Sevil Yildiz

Abstract:

Audio visual communication is a type of collective expression. Collective expression activity informs the masses, gives direction to opinions and establishes public opinion. Due to these characteristics, audio visual communication must be subjected to special restrictions. This has been stipulated in both the Constitution and the European Human Rights Agreement. This paper aims to review freedom of expression and its restriction in audio visual media. For this purpose, the authorisation of the Radio and Television Supreme Council to impose sanctions as an independent administrative authority empowered to regulate the field of audio visual communication has been reviewed with regard to freedom of expression and its limits.

Keywords: audio visual media, freedom of expression, its limits, radio and television supreme council

Procedia PDF Downloads 302
1788 Improving Machine Learning Translation of Hausa Using Named Entity Recognition

Authors: Aishatu Ibrahim Birma, Aminu Tukur, Abdulkarim Abbass Gora

Abstract:

Machine translation plays a vital role in the Field of Natural Language Processing (NLP), breaking down language barriers and enabling communication across diverse communities. In the context of Hausa, a widely spoken language in West Africa, mainly in Nigeria, effective translation systems are essential for enabling seamless communication and promoting cultural exchange. However, due to the unique linguistic characteristics of Hausa, accurate translation remains a challenging task. The research proposes an approach to improving the machine learning translation of Hausa by integrating Named Entity Recognition (NER) techniques. Named entities, such as person names, locations, organizations, and dates, are critical components of a language's structure and meaning. Incorporating NER into the translation process can enhance the quality and accuracy of translations by preserving the integrity of named entities and also maintaining consistency in translating entities (e.g., proper names), and addressing the cultural references specific to Hausa. The NER will be incorporated into Neural Machine Translation (NMT) for the Hausa to English Translation.

Keywords: machine translation, natural language processing (NLP), named entity recognition (NER), neural machine translation (NMT)

Procedia PDF Downloads 19
1787 The Role of Named Entity Recognition for Information Extraction

Authors: Girma Yohannis Bade, Olga Kolesnikova, Grigori Sidorov

Abstract:

Named entity recognition (NER) is a building block for information extraction. Though the information extraction process has been automated using a variety of techniques to find and extract a piece of relevant information from unstructured documents, the discovery of targeted knowledge still poses a number of research difficulties because of the variability and lack of structure in Web data. NER, a subtask of information extraction (IE), came to exist to smooth such difficulty. It deals with finding the proper names (named entities), such as the name of the person, country, location, organization, dates, and event in a document, and categorizing them as predetermined labels, which is an initial step in IE tasks. This survey paper presents the roles and importance of NER to IE from the perspective of different algorithms and application area domains. Thus, this paper well summarizes how researchers implemented NER in particular application areas like finance, medicine, defense, business, food science, archeology, and so on. It also outlines the three types of sequence labeling algorithms for NER such as feature-based, neural network-based, and rule-based. Finally, the state-of-the-art and evaluation metrics of NER were presented.

Keywords: the role of NER, named entity recognition, information extraction, sequence labeling algorithms, named entity application area

Procedia PDF Downloads 57
1786 Detailed Observations on Numerically Invariant Signatures

Authors: Reza Aghayan

Abstract:

Numerically invariant signatures were introduced as a new paradigm of the invariant recognition for visual objects modulo a certain group of transformations. This paper shows that the current formulation suffers from noise and indeterminacy in the resulting joint group-signatures and applies the n-difference technique and the m-mean signature method to minimize their effects. In our experimental results of applying the proposed numerical scheme to generate joint group-invariant signatures, the sensitivity of some parameters such as regularity and mesh resolution used in the algorithm will also be examined. Finally, several interesting observations are made.

Keywords: Euclidean and affine geometry, differential invariant G-signature curves, numerically invariant joint G-signatures, object recognition, noise, indeterminacy

Procedia PDF Downloads 370
1785 Electroencephalography-Based Intention Recognition and Consensus Assessment during Emergency Response

Authors: Siyao Zhu, Yifang Xu

Abstract:

After natural and man-made disasters, robots can bypass the danger, expedite the search, and acquire unprecedented situational awareness to design rescue plans. The hands-free requirement from the first responders excludes the use of tedious manual control and operation. In unknown, unstructured, and obstructed environments, natural-language-based supervision is not amenable for first responders to formulate, and is difficult for robots to understand. Brain-computer interface is a promising option to overcome the limitations. This study aims to test the feasibility of using electroencephalography (EEG) signals to decode human intentions and detect the level of consensus on robot-provided information. EEG signals were classified using machine-learning and deep-learning methods to discriminate search intentions and agreement perceptions. The results show that the average classification accuracy for intention recognition and consensus assessment is 67% and 72%, respectively, proving the potential of incorporating recognizable users’ bioelectrical responses into advanced robot-assisted systems for emergency response.

Keywords: consensus assessment, electroencephalogram, emergency response, human-robot collaboration, intention recognition, search and rescue

Procedia PDF Downloads 67
1784 Empowerment at the Grassroots: Impact of Participatory (in) Equalities in Policy Formulation and Recognition and Redistribution of Women at the Grassroots in India

Authors: Samanwita Paul

Abstract:

Borrowing from Kabeer’s framework of empowerment, participation of women at Panchayat level politics (grassroots level of politics in India) has been conceptualized as a resource in the study and the impact of the same in influencing the policies at the grassroots as an agency. The study attempts to examine such intricacies in the dynamics of participation and policy formulation at the Panchayat level and to assess its overall impact in altering the recognition and redistribution of women. A conscious attempt has been made to go beyond formal politics and consider participants of the informal political processes as subjects of the study. Primary surveys were conducted for data collection in 4 Panchayat villages (from Jalpaiguri district in West Bengal) of which 2 wards from each were selected based on the nature of reservation of the panchayat seats. In-depth interviews with the Panchayat members and an approximate of 80 voters from each of the villages were conducted. This has been further analyzed with the aid of appropriate statistical tools and narratives. Preliminary findings show that women from vulnerable sections tend to participate more in the political process since it offers them a means of negotiating with their vulnerabilities however in case of its impact on policy formulation, the effect of women’s participation does to appear to be as profound.

Keywords: recognition, redistribution, political participation, women

Procedia PDF Downloads 117
1783 Unsupervised Assistive and Adaptative Intelligent Agent in Smart Enviroment

Authors: Sebastião Pais, João Casal, Ricardo Ponciano, Sérgio Lorenço

Abstract:

The adaptation paradigm is a basic defining feature for pervasive computing systems. Adaptation systems must work efficiently in a smart environment while providing suitable information relevant to the user system interaction. The key objective is to deduce the information needed information changes. Therefore relying on fixed operational models would be inappropriate. This paper presents a study on developing an Intelligent Personal Assistant to assist the user in interacting with their Smart Environment. We propose an Unsupervised and Language-Independent Adaptation through Intelligent Speech Interface and a set of methods of Acquiring Knowledge, namely Semantic Similarity and Unsupervised Learning.

Keywords: intelligent personal assistants, intelligent speech interface, unsupervised learning, language-independent, knowledge acquisition, association measures, symmetric word similarities, attributional word similarities

Procedia PDF Downloads 533
1782 Unsupervised Assistive and Adaptive Intelligent Agent in Smart Environment

Authors: Sebastião Pais, João Casal, Ricardo Ponciano, Sérgio Lourenço

Abstract:

The adaptation paradigm is a basic defining feature for pervasive computing systems. Adaptation systems must work efficiently in smart environment while providing suitable information relevant to the user system interaction. The key objective is to deduce the information needed information changes. Therefore, relying on fixed operational models would be inappropriate. This paper presents a study on developing a Intelligent Personal Assistant to assist the user in interacting with their Smart Environment. We propose a Unsupervised and Language-Independent Adaptation through Intelligent Speech Interface and a set of methods of Acquiring Knowledge, namely Semantic Similarity and Unsupervised Learning.

Keywords: intelligent personal assistants, intelligent speech interface, unsupervised learning, language-independent, knowledge acquisition, association measures, symmetric word similarities, attributional word similarities

Procedia PDF Downloads 612
1781 Human Action Recognition Using Wavelets of Derived Beta Distributions

Authors: Neziha Jaouedi, Noureddine Boujnah, Mohamed Salim Bouhlel

Abstract:

In the framework of human machine interaction systems enhancement, we focus throw this paper on human behavior analysis and action recognition. Human behavior is characterized by actions and reactions duality (movements, psychological modification, verbal and emotional expression). It’s worth noting that many information is hidden behind gesture, sudden motion points trajectories and speeds, many research works reconstructed an information retrieval issues. In our work we will focus on motion extraction, tracking and action recognition using wavelet network approaches. Our contribution uses an analysis of human subtraction by Gaussian Mixture Model (GMM) and body movement through trajectory models of motion constructed from kalman filter. These models allow to remove the noise using the extraction of the main motion features and constitute a stable base to identify the evolutions of human activity. Each modality is used to recognize a human action using wavelets of derived beta distributions approach. The proposed approach has been validated successfully on a subset of KTH and UCF sports database.

Keywords: feautures extraction, human action classifier, wavelet neural network, beta wavelet

Procedia PDF Downloads 389
1780 Adversarial Attacks and Defenses on Deep Neural Networks

Authors: Jonathan Sohn

Abstract:

Deep neural networks (DNNs) have shown state-of-the-art performance for many applications, including computer vision, natural language processing, and speech recognition. Recently, adversarial attacks have been studied in the context of deep neural networks, which aim to alter the results of deep neural networks by modifying the inputs slightly. For example, an adversarial attack on a DNN used for object detection can cause the DNN to miss certain objects. As a result, the reliability of DNNs is undermined by their lack of robustness against adversarial attacks, raising concerns about their use in safety-critical applications such as autonomous driving. In this paper, we focus on studying the adversarial attacks and defenses on DNNs for image classification. There are two types of adversarial attacks studied which are fast gradient sign method (FGSM) attack and projected gradient descent (PGD) attack. A DNN forms decision boundaries that separate the input images into different categories. The adversarial attack slightly alters the image to move over the decision boundary, causing the DNN to misclassify the image. FGSM attack obtains the gradient with respect to the image and updates the image once based on the gradients to cross the decision boundary. PGD attack, instead of taking one big step, repeatedly modifies the input image with multiple small steps. There is also another type of attack called the target attack. This adversarial attack is designed to make the machine classify an image to a class chosen by the attacker. We can defend against adversarial attacks by incorporating adversarial examples in training. Specifically, instead of training the neural network with clean examples, we can explicitly let the neural network learn from the adversarial examples. In our experiments, the digit recognition accuracy on the MNIST dataset drops from 97.81% to 39.50% and 34.01% when the DNN is attacked by FGSM and PGD attacks, respectively. If we utilize FGSM training as a defense method, the classification accuracy greatly improves from 39.50% to 92.31% for FGSM attacks and from 34.01% to 75.63% for PGD attacks. To further improve the classification accuracy under adversarial attacks, we can also use a stronger PGD training method. PGD training improves the accuracy by 2.7% under FGSM attacks and 18.4% under PGD attacks over FGSM training. It is worth mentioning that both FGSM and PGD training do not affect the accuracy of clean images. In summary, we find that PGD attacks can greatly degrade the performance of DNNs, and PGD training is a very effective way to defend against such attacks. PGD attacks and defence are overall significantly more effective than FGSM methods.

Keywords: deep neural network, adversarial attack, adversarial defense, adversarial machine learning

Procedia PDF Downloads 172
1779 Challenges of Teaching and Learning English Speech Sounds in Five Selected Secondary Schools in Bauchi, Bauchi State, Nigeria

Authors: Mairo Musa Galadima, Phoebe Mshelia

Abstract:

In Nigeria, the national policy of education stipulates that the kindergarten-primary schools and the legislature are to use the three popular Nigerian Languages namely: Hausa, Igbo, and Yoruba. However, the English language seems to be preferred and this calls for this paper. Attempts were made to draw out the challenges faced by learners in understanding English speech sounds and using them to communicate effectively in English; using 5 (five) selected secondary school in Bauchi. It was discovered that challenges abound in the wrong use of stress and intonation, transfer of phonetic features from their first language. Others are inadequately qualified teachers and relevant materials including textbooks. It is recommended that teachers of English should lay more emphasis on the teaching of supra-segmental features and should be encouraged to go for further studies, seminars and refresher courses.

Keywords: stress and intonation, phonetic and challenges, teaching and learning English, secondary schools

Procedia PDF Downloads 330
1778 A Recognition Method for Spatio-Temporal Background in Korean Historical Novels

Authors: Seo-Hee Kim, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The most important elements of a novel are the characters, events and background. The background represents the time, place and situation that character appears, and conveys event and atmosphere more realistically. If readers have the proper knowledge about background of novels, it may be helpful for understanding the atmosphere of a novel and choosing a novel that readers want to read. In this paper, we are targeting Korean historical novels because spatio-temporal background especially performs an important role in historical novels among the genre of Korean novels. To the best of our knowledge, we could not find previous study that was aimed at Korean novels. In this paper, we build a Korean historical national dictionary. Our dictionary has historical places and temple names of kings over many generations as well as currently existing spatial words or temporal words in Korean history. We also present a method for recognizing spatio-temporal background based on patterns of phrasal words in Korean sentences. Our rules utilize postposition for spatial background recognition and temple names for temporal background recognition. The knowledge of the recognized background can help readers to understand the flow of events and atmosphere, and can use to visualize the elements of novels.

Keywords: data mining, Korean historical novels, Korean linguistic feature, spatio-temporal background

Procedia PDF Downloads 250
1777 Grid Pattern Recognition and Suppression in Computed Radiographic Images

Authors: Igor Belykh

Abstract:

Anti-scatter grids used in radiographic imaging for the contrast enhancement leave specific artifacts. Those artifacts may be visible or may cause Moiré effect when a digital image is resized on a diagnostic monitor. In this paper, we propose an automated grid artifacts detection and suppression algorithm which is still an actual problem. Grid artifacts detection is based on statistical approach in spatial domain. Grid artifacts suppression is based on Kaiser bandstop filter transfer function design and application avoiding ringing artifacts. Experimental results are discussed and concluded with description of advantages over existing approaches.

Keywords: grid, computed radiography, pattern recognition, image processing, filtering

Procedia PDF Downloads 257
1776 Alphabet Recognition Using Pixel Probability Distribution

Authors: Vaidehi Murarka, Sneha Mehta, Dishant Upadhyay

Abstract:

Our project topic is “Alphabet Recognition using pixel probability distribution”. The project uses techniques of Image Processing and Machine Learning in Computer Vision. Alphabet recognition is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files etc. Alphabet Recognition based OCR application is sometimes used in signature recognition which is used in bank and other high security buildings. One of the popular mobile applications includes reading a visiting card and directly storing it to the contacts. OCR's are known to be used in radar systems for reading speeders license plates and lots of other things. The implementation of our project has been done using Visual Studio and Open CV (Open Source Computer Vision). Our algorithm is based on Neural Networks (machine learning). The project was implemented in three modules: (1) Training: This module aims “Database Generation”. Database was generated using two methods: (a) Run-time generation included database generation at compilation time using inbuilt fonts of OpenCV library. Human intervention is not necessary for generating this database. (b) Contour–detection: ‘jpeg’ template containing different fonts of an alphabet is converted to the weighted matrix using specialized functions (contour detection and blob detection) of OpenCV. The main advantage of this type of database generation is that the algorithm becomes self-learning and the final database requires little memory to be stored (119kb precisely). (2) Preprocessing: Input image is pre-processed using image processing concepts such as adaptive thresholding, binarizing, dilating etc. and is made ready for segmentation. “Segmentation” includes extraction of lines, words, and letters from the processed text image. (3) Testing and prediction: The extracted letters are classified and predicted using the neural networks algorithm. The algorithm recognizes an alphabet based on certain mathematical parameters calculated using the database and weight matrix of the segmented image.

Keywords: contour-detection, neural networks, pre-processing, recognition coefficient, runtime-template generation, segmentation, weight matrix

Procedia PDF Downloads 366
1775 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development

Authors: L. Kamandulytė-Merfeldienė

Abstract:

The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.

Keywords: CHILDES, corpus of spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian

Procedia PDF Downloads 215
1774 A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Authors: Xu Han, Shanxiong Chen, Shiyu Zhu, Xiaoyu Lin, Fujia Zhao, Dingwang Wang

Abstract:

Character detection is an important issue for character recognition of ancient Yi books. The accuracy of detection directly affects the recognition effect of ancient Yi books. Considering the complex layout, the lack of standard typesetting and the mixed arrangement between images and texts, we propose a character detection method for ancient Yi books based on connected components and regressive character segmentation. First, the scanned images of ancient Yi books are preprocessed with nonlocal mean filtering, and then a modified local adaptive threshold binarization algorithm is used to obtain the binary images to segment the foreground and background for the images. Second, the non-text areas are removed by the method based on connected components. Finally, the single character in the ancient Yi books is segmented by our method. The experimental results show that the method can effectively separate the text areas and non-text areas for ancient Yi books and achieve higher accuracy and recall rate in the experiment of character detection, and effectively solve the problem of character detection and segmentation in character recognition of ancient books.

Keywords: CCS concepts, computing methodologies, interest point, salient region detections, image segmentation

Procedia PDF Downloads 107