Search results for: speech recognition performance
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 14737

Search results for: speech recognition performance

14647 The Convolution Recurrent Network of Using Residual LSTM to Process the Output of the Downsampling for Monaural Speech Enhancement

Authors: Shibo Wei, Ting Jiang

Abstract:

Convolutional-recurrent neural networks (CRN) have achieved much success recently in the speech enhancement field. The common processing method is to use the convolution layer to compress the feature space by multiple upsampling and then model the compressed features with the LSTM layer. At last, the enhanced speech is obtained by deconvolution operation to integrate the global information of the speech sequence. However, the feature space compression process may cause the loss of information, so we propose to model the upsampling result of each step with the residual LSTM layer, then join it with the output of the deconvolution layer and input them to the next deconvolution layer, by this way, we want to integrate the global information of speech sequence better. The experimental results show the network model (RES-CRN) we introduce can achieve better performance than LSTM without residual and overlaying LSTM simply in the original CRN in terms of scale-invariant signal-to-distortion ratio (SI-SNR), speech quality (PESQ), and intelligibility (STOI).

Keywords: convolutional-recurrent neural networks, speech enhancement, residual LSTM, SI-SNR

Procedia PDF Downloads 200
14646 Freedom of Speech and Involvement in Hatred Speech on Social Media Networks

Authors: Sara Chinnasamy, Michelle Gun, M. Adnan Hashim

Abstract:

Federal Constitution guarantees Malaysians the right to free speech and expression; yet hatred speech can be commonly found on social media platforms such as Facebook, Twitter, and Instagram. In Malaysia social media sphere, most hatred speech involves religion, race and politics. Recent cases of racial attacks on social media have created social tensions among Malaysians. Many Malaysians always argue on their rights to freedom of speech. However, there are laws that limit their expression to the public and protecting social media users from being a victim of hate speech. This paper aims to explore the attitude and involvement of Malaysian netizens towards freedom of speech and hatred speech on social media. It also examines the relationship between involvement in hatred speech among Malaysian netizens and attitude towards freedom of speech. For most Malaysians, practicing total freedom of speech in the open is unthinkable. As a result, the best channel to articulate their feelings and opinions liberally is the internet. With the advent of the internet medium, more and more Malaysians are conveying their viewpoints using the various internet channels although sensitivity of the audience is seldom taken into account. Consequently, this situation has led to pockets of social disharmony among the citizens. Although this unhealthy activity is denounced by the authority, netizens are generally of the view that they have the right to write anything they want. Using the quantitative method, survey was conducted among Malaysians aged between 18 and 50 years who are active social media users. Results from the survey reveal that despite a weak relationship level between hatred speech involvement on social media and attitude towards freedom of speech, the association is still considerably significant. As such, it can be safely presumed that hatred speech on social media occurs due to the freedom of speech that exists by way of social media channels.

Keywords: freedom of speech, hatred speech, social media, Malaysia, netizens

Procedia PDF Downloads 457
14645 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: phonogram, speech signal, temporal characteristics, fundamental frequency, biometric fingerprints

Procedia PDF Downloads 144
14644 Makhraj Recognition Using Convolutional Neural Network

Authors: Zan Azma Nasruddin, Irwan Mazlin, Nor Aziah Daud, Fauziah Redzuan, Fariza Hanis Abdul Razak

Abstract:

This paper focuses on a machine learning that learn the correct pronunciation of Makhraj Huroofs. Usually, people need to find an expert to pronounce the Huroof accurately. In this study, the researchers have developed a system that is able to learn the selected Huroofs which are ha, tsa, zho, and dza using the Convolutional Neural Network. The researchers present the chosen type of the CNN architecture to make the system that is able to learn the data (Huroofs) as quick as possible and produces high accuracy during the prediction. The researchers have experimented the system to measure the accuracy and the cross entropy in the training process.

Keywords: convolutional neural network, Makhraj recognition, speech recognition, signal processing, tensorflow

Procedia PDF Downloads 335
14643 Hand Gesture Detection via EmguCV Canny Pruning

Authors: N. N. Mosola, S. J. Molete, L. S. Masoebe, M. Letsae

Abstract:

Hand gesture recognition is a technique used to locate, detect, and recognize a hand gesture. Detection and recognition are concepts of Artificial Intelligence (AI). AI concepts are applicable in Human Computer Interaction (HCI), Expert systems (ES), etc. Hand gesture recognition can be used in sign language interpretation. Sign language is a visual communication tool. This tool is used mostly by deaf societies and those with speech disorder. Communication barriers exist when societies with speech disorder interact with others. This research aims to build a hand recognition system for Lesotho’s Sesotho and English language interpretation. The system will help to bridge the communication problems encountered by the mentioned societies. The system has various processing modules. The modules consist of a hand detection engine, image processing engine, feature extraction, and sign recognition. Detection is a process of identifying an object. The proposed system uses Canny pruning Haar and Haarcascade detection algorithms. Canny pruning implements the Canny edge detection. This is an optimal image processing algorithm. It is used to detect edges of an object. The system employs a skin detection algorithm. The skin detection performs background subtraction, computes the convex hull, and the centroid to assist in the detection process. Recognition is a process of gesture classification. Template matching classifies each hand gesture in real-time. The system was tested using various experiments. The results obtained show that time, distance, and light are factors that affect the rate of detection and ultimately recognition. Detection rate is directly proportional to the distance of the hand from the camera. Different lighting conditions were considered. The more the light intensity, the faster the detection rate. Based on the results obtained from this research, the applied methodologies are efficient and provide a plausible solution towards a light-weight, inexpensive system which can be used for sign language interpretation.

Keywords: canny pruning, hand recognition, machine learning, skin tracking

Procedia PDF Downloads 185
14642 Working Conditions, Motivation and Job Performance of Hotel Workers

Authors: Thushel Jayaweera

Abstract:

In performance evaluation literature, there has been no investigation indicating the impact of job characteristics, working conditions and motivation on the job performance among the hotel workers in Britain. This study tested the relationship between working conditions (physical and psychosocial working conditions) and job performance (task and contextual performance) with motivators (e.g. recognition, achievement, the work itself, the possibility for growth and work significance) as the mediating variable. A total of 254 hotel workers in 25 hotels in Bristol, United Kingdom participated in this study. Working conditions influenced job performance and motivation moderated the relationship between working conditions and job performance. Poor workplace conditions resulted in decreasing employee performance. The results point to the importance of motivators among hotel workers and highlighted that work be designed to provide recognition and sense of autonomy on the job to enhance job performance of the hotel workers. These findings have implications for organizational interventions aimed at increasing employee job performance.

Keywords: hotel workers, working conditions, motivation, job characteristics, job performance

Procedia PDF Downloads 598
14641 A Preliminary Analysis of The Effect After Cochlear Implantation in the Unilateral Hearing Loss

Authors: Haiqiao Du, Qian Wang, Shuwei Wang, Jianan Li

Abstract:

Purpose: The aim is to evaluate the effect of cochlear implantation (CI) in patients with unilateral hearing loss, with a view to providing data support for the selection of therapeutic interventions for patients with single-sided deafness (SSD)/asymmetric hearing loss (AHL) and the broadening of the indications for CI. Methods: The study subjects were patients with unilateral hearing loss who underwent cochlear implantation surgery in our hospital in August 2022 and were willing to cooperate with the test and were divided into 2 groups: SSD group and AHL group. The enrolled patients were followed up for hearing level, tinnitus changes, speech recognition ability, sound source localization ability, and quality of life at five-time points: preoperatively, and 1, 3, 6, and 12 months after postoperative start-up. Results: As of June 30, 2024, a total of nine patients completed follow-up, including four in the SSD group and five in the AHL group. The mean postoperative hearing aid thresholds on the CI side were 31.56 dB HL and 34.75 dB HL in the two groups, respectively. Of the four patients with preoperative tinnitus symptoms (three patients in the SSD group and one patient in the AHL group), all showed a degree of reduction in Tinnitus Handicap Inventory (THI) scores, except for one patient who showed no change. In both the SSD and AHL groups, the sound source localization results (expressed as RMS error values, with smaller values indicating better ability) were 66.87° and 77.41° preoperatively and 29.34° and 54.60° 12 months after postoperative start-up, respectively, which showed that the ability to localize the sound source improved significantly with longer implantation time. The level of speech recognition was assessed by 3 test methods: speech recognition rate of monosyllabic words in a quiet environment and speech recognition rate of different sound source directions at 0° and 90° (implantation side) in a noisy environment. The results of the 3 tests were 99.0%, 72.0%, and 36.0% in the preoperative SSD group and 96.0%, 83.6%, and 73.8% in the AHL group, respectively, whereas they fluctuated in the postoperative period 3 months after start-up, and stabilized at 12 months after start-up to 99.0%, 100.0%, and 100.0% in the SSD group and 99.5%, 96.0%, and 99.0%. Quality of life was subjectively evaluated by three tests: the Speech Spatial Quality of Sound Auditory Scale (SSQ-12), the Quality-of-Life Bilateral Listening Questionnaire (QLBHE), and the Nijmegen Cochlear Implantation Inventory (NCIQ). The results of the SSQ-12 (with a 10-point score out of 10) showed that the scores of preoperative and postoperative 12 months after start-up were 6.35 and 6.46 in the SSD group, while they were 5.61 and 9.83 in the AHL group. The QLBHE scores (100 points out of 100) were 61.0 and 76.0 in the SSD group and 53.4 and 63.7 in the AHL group for the preoperative versus the postoperative 12 months after start-up. Conclusion: Patients with unilateral hearing loss can benefit from cochlear implantation: CI implantation is effective in compensating for the hearing on the affected side and reduces the accompanying tinnitus symptoms; there is a significant improvement in sound source localization and speech recognition in the presence of noise; and the quality of life is improved.

Keywords: single-sided deafness, asymmetric hearing loss, cochlear implant, unilateral hearing loss

Procedia PDF Downloads 14
14640 Intervention of Self-Limiting L1 Inner Speech during L2 Presentations: A Study of Bangla-English Bilinguals

Authors: Abdul Wahid

Abstract:

Inner speech, also known as verbal thinking, self-talk or private speech, is characterized by the subjective language experience in the absence of overt or audible speech. It is a psychological form of verbal activity which is being rehearsed without the articulation of any sound wave. In Psychology, self-limiting speech means the type of speech which contains information that inhibits the development of the self. People, in most cases, experience inner speech in their first language. It is very frequent in Bangladesh where the Bangla (L1) speaking students lose track of speech during their presentations in English (L2). This paper investigates into the long pauses (more than 0.4 seconds long) in English (L2) presentations by Bangla speaking students (18-21 year old) and finds the intervention of Bangla (L1) inner speech as one of its causes. The overt speeches of the presenters are placed on Audacity Audio Editing software where the length of pauses are measured in milliseconds. Varieties of inner speech questionnaire (VISQ) have been conducted randomly amongst the participants out of whom 20 were selected who have similar phenomenology of inner speech. They have been interviewed to describe the type and content of the voices that went on in their head during the long pauses. The qualitative interview data are then codified and converted into quantitative data. It was observed that in more than 80% cases students experience self-limiting inner speech/self-talk during their unwanted pauses in L2 presentations.

Keywords: Bangla-English Bilinguals, inner speech, L1 intervention in bilingualism, motor schema, pauses, phonological loop, phonological store, working memory

Procedia PDF Downloads 152
14639 Hate Speech Detection Using Machine Learning: A Survey

Authors: Edemealem Desalegn Kingawa, Kafte Tasew Timkete, Mekashaw Girmaw Abebe, Terefe Feyisa, Abiyot Bitew Mihretie, Senait Teklemarkos Haile

Abstract:

Currently, hate speech is a growing challenge for society, individuals, policymakers, and researchers, as social media platforms make it easy to anonymously create and grow online friends and followers and provide an online forum for debate about specific issues of community life, culture, politics, and others. Despite this, research on identifying and detecting hate speech is not satisfactory performance, and this is why future research on this issue is constantly called for. This paper provides a systematic review of the literature in this field, with a focus on approaches like word embedding techniques, machine learning, deep learning technologies, hate speech terminology, and other state-of-the-art technologies with challenges. In this paper, we have made a systematic review of the last six years of literature from Research Gate and Google Scholar. Furthermore, limitations, along with algorithm selection and use challenges, data collection, and cleaning challenges, and future research directions, are discussed in detail.

Keywords: Amharic hate speech, deep learning approach, hate speech detection review, Afaan Oromo hate speech detection

Procedia PDF Downloads 177
14638 Enhanced Thai Character Recognition with Histogram Projection Feature Extraction

Authors: Benjawan Rangsikamol, Chutimet Srinilta

Abstract:

This research paper deals with extraction of Thai character features using the proposed histogram projection so as to improve the recognition performance. The process starts with transformation of image files into binary files before thinning. After character thinning, the skeletons are entered into the proposed extraction using histogram projection (horizontal and vertical) to extract unique features which are inputs of the subsequent recognition step. The recognition rate with the proposed extraction technique is as high as 97 percent since the technique works very well with the idiosyncrasies of Thai characters.

Keywords: character recognition, histogram projection, multilayer perceptron, Thai character features extraction

Procedia PDF Downloads 464
14637 Eisenhower’s Farewell Speech: Initial and Continuing Communication Effects

Authors: B. Kuiper

Abstract:

When Dwight D. Eisenhower delivered his final Presidential speech in 1961, he was using the opportunity to bid farewell to America, but he was also trying to warn his fellow countrymen about deeper challenges threatening the country. In this analysis, Eisenhower’s speech is examined in light of the impact it had on American culture, communication concepts, and political ramifications. The paper initially highlights the previous literature on the speech, especially in light of its 50th anniversary, and reveals a man whose main concern was how the speech’s words would affect his beloved country. The painstaking approach to the wording of the speech to reveal the intent is key, particularly in light of analyzing the motivations according to “virtuous communication.” This philosophical construct indicates that Eisenhower’s Farewell Address was crafted carefully according to a departing President’s deepest values and concerns, concepts that he wanted to pass along to his successor, to his country, and even to the world.

Keywords: Eisenhower, mass communication, political speech, rhetoric

Procedia PDF Downloads 274
14636 A Sparse Representation Speech Denoising Method Based on Adapted Stopping Residue Error

Authors: Qianhua He, Weili Zhou, Aiwu Chen

Abstract:

A sparse representation speech denoising method based on adapted stopping residue error was presented in this paper. Firstly, the cross-correlation between the clean speech spectrum and the noise spectrum was analyzed, and an estimation method was proposed. In the denoising method, an over-complete dictionary of the clean speech power spectrum was learned with the K-singular value decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the adjusted noise spectrum, and the orthogonal matching pursuit (OMP) approach was applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech was re-synthesised via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the conventional methods in terms of subjective and objective measure.

Keywords: speech denoising, sparse representation, k-singular value decomposition, orthogonal matching pursuit

Procedia PDF Downloads 499
14635 Handwriting Recognition of Gurmukhi Script: A Survey of Online and Offline Techniques

Authors: Ravneet Kaur

Abstract:

Character recognition is a very interesting area of pattern recognition. From past few decades, an intensive research on character recognition for Roman, Chinese, and Japanese and Indian scripts have been reported. In this paper, a review of Handwritten Character Recognition work on Indian Script Gurmukhi is being highlighted. Most of the published papers were summarized, various methodologies were analysed and their results are reported.

Keywords: Gurmukhi character recognition, online, offline, HCR survey

Procedia PDF Downloads 424
14634 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition

Procedia PDF Downloads 123
14633 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies

Procedia PDF Downloads 168
14632 An Early Attempt of Artificial Intelligence-Assisted Language Oral Practice and Assessment

Authors: Paul Lam, Kevin Wong, Chi Him Chan

Abstract:

Constant practicing and accurate, immediate feedback are the keys to improving students’ speaking skills. However, traditional oral examination often fails to provide such opportunities to students. The traditional, face-to-face oral assessment is often time consuming – attending the oral needs of one student often leads to the negligence of others. Hence, teachers can only provide limited opportunities and feedback to students. Moreover, students’ incentive to practice is also reduced by their anxiety and shyness in speaking the new language. A mobile app was developed to use artificial intelligence (AI) to provide immediate feedback to students’ speaking performance as an attempt to solve the above-mentioned problems. Firstly, it was thought that online exercises would greatly increase the learning opportunities of students as they can now practice more without the needs of teachers’ presence. Secondly, the automatic feedback provided by the AI would enhance students’ motivation to practice as there is an instant evaluation of their performance. Lastly, students should feel less anxious and shy compared to directly practicing oral in front of teachers. Technically, the program made use of speech-to-text functions to generate feedback to students. To be specific, the software analyzes students’ oral input through certain speech-to-text AI engine and then cleans up the results further to the point that can be compared with the targeted text. The mobile app has invited English teachers for the pilot use and asked for their feedback. Preliminary trials indicated that the approach has limitations. Many of the users’ pronunciation were automatically corrected by the speech recognition function as wise guessing is already integrated into many of such systems. Nevertheless, teachers have confidence that the app can be further improved for accuracy. It has the potential to significantly improve oral drilling by giving students more chances to practice. Moreover, they believe that the success of this mobile app confirms the potential to extend the AI-assisted assessment to other language skills, such as writing, reading, and listening.

Keywords: artificial Intelligence, mobile learning, oral assessment, oral practice, speech-to-text function

Procedia PDF Downloads 103
14631 Speech Perception by Monolingual and Bilingual Dravidian Speakers under Adverse Listening Conditions

Authors: S. B. Rathna Kumar, Sale Kranthi, Sandya K. Varudhini

Abstract:

The precise perception of spoken language is influenced by several variables, including the listeners’ native language, distance between speaker and listener, reverberation and background noise. When noise is present in an acoustic environment, it masks the speech signal resulting in reduction in the redundancy of the acoustic and linguistic cues of speech. There is strong evidence that bilinguals face difficulty in speech perception for their second language compared with monolingual speakers under adverse listening conditions such as presence of background noise. This difficulty persists even for speakers who are highly proficient in their second language and is greater in those who have learned the second language later in life. The present study aimed to assess the performance of monolingual (Telugu speaking) and bilingual (Tamil as first language and Telugu as second language) speakers on Telugu speech perception task under quiet and noisy environments. The results indicated that both the groups performed similar in both quiet and noisy environments. The findings of the present study are not in accordance with the findings of previous studies which strongly report poorer speech perception in adverse listening conditions such as noise with bilingual speakers for their second language compared with monolinguals.

Keywords: monolingual, bilingual, second language, speech perception, quiet, noise

Procedia PDF Downloads 389
14630 Dual-Channel Multi-Band Spectral Subtraction Algorithm Dedicated to a Bilateral Cochlear Implant

Authors: Fathi Kallel, Ahmed Ben Hamida, Christian Berger-Vachon

Abstract:

In this paper, a Speech Enhancement Algorithm based on Multi-Band Spectral Subtraction (MBSS) principle is evaluated for Bilateral Cochlear Implant (BCI) users. Specifically, dual-channel noise power spectral estimation algorithm using Power Spectral Densities (PSD) and Cross Power Spectral Densities (CPSD) of the observed signals is studied. The enhanced speech signal is obtained using Dual-Channel Multi-Band Spectral Subtraction ‘DC-MBSS’ algorithm. For performance evaluation, objective speech assessment test relying on Perceptual Evaluation of Speech Quality (PESQ) score is performed to fix the optimal number of frequency bands needed in DC-MBSS algorithm. In order to evaluate the speech intelligibility, subjective listening tests are assessed with 3 deafened BCI patients. Experimental results obtained using French Lafon database corrupted by an additive babble noise at different Signal-to-Noise Ratios (SNR) showed that DC-MBSS algorithm improves speech understanding for single and multiple interfering noise sources.

Keywords: speech enhancement, spectral substracion, noise estimation, cochlear impalnt

Procedia PDF Downloads 549
14629 Printed Thai Character Recognition Using Particle Swarm Optimization Algorithm

Authors: Phawin Sangsuvan, Chutimet Srinilta

Abstract:

This Paper presents the applications of Particle Swarm Optimization (PSO) Method for Thai optical character recognition (OCR). OCR consists of the pre-processing, character recognition and post-processing. Before enter into recognition process. The Character must be “Prepped” by pre-processing process. The PSO is an optimization method that belongs to the swarm intelligence family based on the imitation of social behavior patterns of animals. Route of each particle is determined by an individual data among neighborhood particles. The interaction of the particles with neighbors is the advantage of Particle Swarm to determine the best solution. So PSO is interested by a lot of researchers in many difficult problems including character recognition. As the previous this research used a Projection Histogram to extract printed digits features and defined the simple Fitness Function for PSO. The results reveal that PSO gives 67.73% for testing dataset. So in the future there can be explored enhancement the better performance of PSO with improve the Fitness Function.

Keywords: character recognition, histogram projection, particle swarm optimization, pattern recognition techniques

Procedia PDF Downloads 477
14628 An Improved OCR Algorithm on Appearance Recognition of Electronic Components Based on Self-adaptation of Multifont Template

Authors: Zhu-Qing Jia, Tao Lin, Tong Zhou

Abstract:

The recognition method of Optical Character Recognition has been expensively utilized, while it is rare to be employed specifically in recognition of electronic components. This paper suggests a high-effective algorithm on appearance identification of integrated circuit components based on the existing methods of character recognition, and analyze the pros and cons.

Keywords: optical character recognition, fuzzy page identification, mutual correlation matrix, confidence self-adaptation

Procedia PDF Downloads 540
14627 Speech Acts and Politeness Strategies in an EFL Classroom in Georgia

Authors: Tinatin Kurdghelashvili

Abstract:

The paper deals with the usage of speech acts and politeness strategies in an EFL classroom in Georgia (Rep of). It explores the students’ and the teachers’ practice of the politeness strategies and the speech acts of apology, thanking, request, compliment/encouragement, command, agreeing/disagreeing, addressing and code switching. The research method includes observation as well as a questionnaire. The target group involves the students from Georgian public schools and two certified, experienced local English teachers. The analysis is based on Searle’s Speech Act Theory and Brown and Levinson’s politeness strategies. The findings show that the students have certain knowledge regarding politeness yet they fail to apply them in English communication. In addition, most of the speech acts from the classroom interaction are used by the teachers and not the students. Thereby, it is suggested that teachers should cultivate the students’ communicative competence and attempt to give them opportunities to practice more English speech acts than they do today.

Keywords: english as a foreign language, Georgia, politeness principles, speech acts

Procedia PDF Downloads 636
14626 The Importance of Visual Communication in Artificial Intelligence

Authors: Manjitsingh Rajput

Abstract:

Visual communication plays an important role in artificial intelligence (AI) because it enables machines to understand and interpret visual information, similar to how humans do. This abstract explores the importance of visual communication in AI and emphasizes the importance of various applications such as computer vision, object emphasis recognition, image classification and autonomous systems. In going deeper, with deep learning techniques and neural networks that modify visual understanding, In addition to AI programming, the abstract discusses challenges facing visual interfaces for AI, such as data scarcity, domain optimization, and interpretability. Visual communication and other approaches, such as natural language processing and speech recognition, have also been explored. Overall, this abstract highlights the critical role that visual communication plays in advancing AI capabilities and enabling machines to perceive and understand the world around them. The abstract also explores the integration of visual communication with other modalities like natural language processing and speech recognition, emphasizing the critical role of visual communication in AI capabilities. This methodology explores the importance of visual communication in AI development and implementation, highlighting its potential to enhance the effectiveness and accessibility of AI systems. It provides a comprehensive approach to integrating visual elements into AI systems, making them more user-friendly and efficient. In conclusion, Visual communication is crucial in AI systems for object recognition, facial analysis, and augmented reality, but challenges like data quality, interpretability, and ethics must be addressed. Visual communication enhances user experience, decision-making, accessibility, and collaboration. Developers can integrate visual elements for efficient and accessible AI systems.

Keywords: visual communication AI, computer vision, visual aid in communication, essence of visual communication.

Procedia PDF Downloads 95
14625 The Influence of Advertising Captions on the Internet through the Consumer Purchasing Decision

Authors: Suwimol Apapol, Punrapha Praditpong

Abstract:

The objectives of the study were to find out the frequencies of figures of speech in fragrance advertising captions as well as the types of figures of speech most commonly applied in captions. The relation between figures of speech and fragrance was also examined in order to analyze how figures of speech were used to represent fragrance. Thirty-five fragrance advertisements were randomly selected from the Internet. Content analysis was applied in order to consider the relation between figures of speech and fragrance. The results showed that figures of speech were found in almost every fragrance advertisement except one advertisement of several Goods service. Thirty-four fragrance advertising captions used at least one kind of figure of speech. Metaphor was most frequently found and also most frequently applied in fragrance advertising captions, followed by alliteration, rhyme, simile and personification, and hyperbole respectively which is in harmony with the research hypotheses as well.

Keywords: advertising captions, captions on internet, consumer purchasing decision, e-commerce

Procedia PDF Downloads 270
14624 An Improved Face Recognition Algorithm Using Histogram-Based Features in Spatial and Frequency Domains

Authors: Qiu Chen, Koji Kotani, Feifei Lee, Tadahiro Ohmi

Abstract:

In this paper, we propose an improved face recognition algorithm using histogram-based features in spatial and frequency domains. For adding spatial information of the face to improve recognition performance, a region-division (RD) method is utilized. The facial area is firstly divided into several regions, then feature vectors of each facial part are generated by Binary Vector Quantization (BVQ) histogram using DCT coefficients in low frequency domains, as well as Local Binary Pattern (LBP) histogram in spatial domain. Recognition results with different regions are first obtained separately and then fused by weighted averaging. Publicly available ORL database is used for the evaluation of our proposed algorithm, which is consisted of 40 subjects with 10 images per subject containing variations in lighting, posing, and expressions. It is demonstrated that face recognition using RD method can achieve much higher recognition rate.

Keywords: binary vector quantization (BVQ), DCT coefficients, face recognition, local binary patterns (LBP)

Procedia PDF Downloads 349
14623 Effect Analysis of an Improved Adaptive Speech Noise Reduction Algorithm in Online Communication Scenarios

Authors: Xingxing Peng

Abstract:

With the development of society, there are more and more online communication scenarios such as teleconference and online education. In the process of conference communication, the quality of voice communication is a very important part, and noise may cause the communication effect of participants to be greatly reduced. Therefore, voice noise reduction has an important impact on scenarios such as voice calls. This research focuses on the key technologies of the sound transmission process. The purpose is to maintain the audio quality to the maximum so that the listener can hear clearer and smoother sound. Firstly, to solve the problem that the traditional speech enhancement algorithm is not ideal when dealing with non-stationary noise, an adaptive speech noise reduction algorithm is studied in this paper. Traditional noise estimation methods are mainly used to deal with stationary noise. In this chapter, we study the spectral characteristics of different noise types, especially the characteristics of non-stationary Burst noise, and design a noise estimator module to deal with non-stationary noise. Noise features are extracted from non-speech segments, and the noise estimation module is adjusted in real time according to different noise characteristics. This adaptive algorithm can enhance speech according to different noise characteristics, improve the performance of traditional algorithms to deal with non-stationary noise, so as to achieve better enhancement effect. The experimental results show that the algorithm proposed in this chapter is effective and can better adapt to different types of noise, so as to obtain better speech enhancement effect.

Keywords: speech noise reduction, speech enhancement, self-adaptation, Wiener filter algorithm

Procedia PDF Downloads 57
14622 Prosodic Characteristics of Post Traumatic Stress Disorder Induced Speech Changes

Authors: Jarek Krajewski, Andre Wittenborn, Martin Sauerland

Abstract:

This abstract describes a promising approach for estimating post-traumatic stress disorder (PTSD) based on prosodic speech characteristics. It illustrates the validity of this method by briefly discussing results from an Arabic refugee sample (N= 47, 32 m, 15 f). A well-established standardized self-report scale “Reaction of Adolescents to Traumatic Stress” (RATS) was used to determine the ground truth level of PTSD. The speech material was prompted by telling about autobiographical related sadness inducing experiences (sampling rate 16 kHz, 8 bit resolution). In order to investigate PTSD-induced speech changes, a self-developed set of 136 prosodic speech features was extracted from the .wav files. This set was adapted to capture traumatization related speech phenomena. An artificial neural network (ANN) machine learning model was applied to determine the PTSD level and reached a correlation of r = .37. These results indicate that our classifiers can achieve similar results to those seen in speech-based stress research.

Keywords: speech prosody, PTSD, machine learning, feature extraction

Procedia PDF Downloads 90
14621 A Mixing Matrix Estimation Algorithm for Speech Signals under the Under-Determined Blind Source Separation Model

Authors: Jing Wu, Wei Lv, Yibing Li, Yuanfan You

Abstract:

The separation of speech signals has become a research hotspot in the field of signal processing in recent years. It has many applications and influences in teleconferencing, hearing aids, speech recognition of machines and so on. The sounds received are usually noisy. The issue of identifying the sounds of interest and obtaining clear sounds in such an environment becomes a problem worth exploring, that is, the problem of blind source separation. This paper focuses on the under-determined blind source separation (UBSS). Sparse component analysis is generally used for the problem of under-determined blind source separation. The method is mainly divided into two parts. Firstly, the clustering algorithm is used to estimate the mixing matrix according to the observed signals. Then the signal is separated based on the known mixing matrix. In this paper, the problem of mixing matrix estimation is studied. This paper proposes an improved algorithm to estimate the mixing matrix for speech signals in the UBSS model. The traditional potential algorithm is not accurate for the mixing matrix estimation, especially for low signal-to noise ratio (SNR).In response to this problem, this paper considers the idea of an improved potential function method to estimate the mixing matrix. The algorithm not only avoids the inuence of insufficient prior information in traditional clustering algorithm, but also improves the estimation accuracy of mixing matrix. This paper takes the mixing of four speech signals into two channels as an example. The results of simulations show that the approach in this paper not only improves the accuracy of estimation, but also applies to any mixing matrix.

Keywords: DBSCAN, potential function, speech signal, the UBSS model

Procedia PDF Downloads 135
14620 An Algorithm Based on the Nonlinear Filter Generator for Speech Encryption

Authors: A. Belmeguenai, K. Mansouri, R. Djemili

Abstract:

This work present a new algorithm based on the nonlinear filter generator for speech encryption and decryption. The proposed algorithm consists on the use a linear feedback shift register (LFSR) whose polynomial is primitive and nonlinear Boolean function. The purpose of this system is to construct Keystream with good statistical properties, but also easily computable on a machine with limited capacity calculated. This proposed speech encryption scheme is very simple, highly efficient, and fast to implement the speech encryption and decryption. We conclude the paper by showing that this system can resist certain known attacks.

Keywords: nonlinear filter generator, stream ciphers, speech encryption, security analysis

Procedia PDF Downloads 296
14619 Using Augmented Reality to Enhance Doctor Patient Communication

Authors: Rutusha Bhutada, Gaurav Chavan, Sarvesh Kasat, Varsha Mujumdar

Abstract:

This software system will be an Augmented Reality application designed to maximize the doctor’s productivity by providing tools to assist in automating the patient recognition and updating patient’s records using face and voice recognition features, which would otherwise have to be performed manually. By maximizing the doctor’s work efficiency and production, the application will meet the doctor’s needs while remaining easy to understand and use. More specifically, this application is designed to allow a doctor to manage his productive time in handling the patient without losing eye-contact with him and communicate with a group of other doctors for consultation, for in-place treatments through video streaming, as a video study. The system also contains a relational database containing a list of doctor, patient and display techniques.

Keywords: augmented reality, hand-held devices, head-mounted devices, marker based systems, speech recognition, face detection

Procedia PDF Downloads 436
14618 An Evaluation of Neural Network Efficacies for Image Recognition on Edge-AI Computer Vision Platform

Authors: Jie Zhao, Meng Su

Abstract:

Image recognition, as one of the most critical technologies in computer vision, works to help machine-like robotics understand a scene, that is, if deployed appropriately, will trigger the revolution in remote sensing and industry automation. With the developments of AI technologies, there are many prevailing and sophisticated neural networks as technologies developed for image recognition. However, computer vision platforms as hardware, supporting neural networks for image recognition, as crucial as the neural network technologies, need to be more congruently addressed as the research subjects. In contrast, different computer vision platforms are deterministic to leverage the performance of different neural networks for recognition. In this paper, three different computer vision platforms – Jetson Nano(with 4GB), a standalone laptop(with RTX 3000s, using CUDA), and Google Colab (web-based, using GPU) are explored and four prominent neural network architectures (including AlexNet, VGG(16/19), GoogleNet, and ResNet(18/34/50)), are investigated. In the context of pairwise usage between different computer vision platforms and distinctive neural networks, with the merits of recognition accuracy and time efficiency, the performances are evaluated. In the case study using public imageNets, our findings provide a nuanced perspective on optimizing image recognition tasks across Edge-AI platforms, offering guidance on selecting appropriate neural network structures to maximize performance under hardware constraints.

Keywords: alexNet, VGG, googleNet, resNet, Jetson nano, CUDA, COCO-NET, cifar10, imageNet large scale visual recognition challenge (ILSVRC), google colab

Procedia PDF Downloads 90