Search results for: speech segmentation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1137

Search results for: speech segmentation

987 Childhood Apraxia of Speech and Autism: Interaction Influences and Treatment

Authors: Elad Vashdi

Abstract:

It is common to find speech deficit among children diagnosed with Autism. It can be found in the clinical field and recently in research. One of the DSM-V criteria suggests a speech delay (Delay in, or total lack of, the development of spoken language), but doesn't explain the cause of it. A common perception among professionals and families is that the inability to talk results from the autism. Autism is a name for a syndrome which just describes a phenomenon and is defined behaviorally. Since it is not based yet on a physiological gold standard, one can not conclude the nature of a deficit based on the name of the syndrome. A wide retrospective research (n=270) which included children with motor speech difficulties was conducted in Israel. The study analyzed entry evaluations in a private clinic during the years 2006-2013. The data was extracted from the reports. High percentage of children diagnosed with Autism (60%) was found. This result demonstrates the high relationship between Autism and motor speech problem. It also supports recent findings in research of Childhood apraxia of speech (CAS) occurrence among children with ASD. Only small percentage of the participants in this research (10%) were diagnosed with CAS even though their verbal deficits well fitted the guidelines for CAS diagnosis set by ASHA in 2007. This fact raises questions regarding the diagnostic procedure in Israel. The understanding that CAS might highly exist within Autism and can have a remarkable influence on the course of early development should be a guiding tool within the diagnosis procedure. CAS can explain the nature of the speech problem among some of the autistic children and guide the treatment in a more accurate way. Calculating the prevalence of CAS which includes the comorbidity with ASD reveals new numbers and suggests treating differently the CAS population.

Keywords: childhood apraxia of speech, Autism, treatment, speech

Procedia PDF Downloads 249
986 Automatic Early Breast Cancer Segmentation Enhancement by Image Analysis and Hough Transform

Authors: David Jurado, Carlos Ávila

Abstract:

Detection of early signs of breast cancer development is crucial to quickly diagnose the disease and to define adequate treatment to increase the survival probability of the patient. Computer Aided Detection systems (CADs), along with modern data techniques such as Machine Learning (ML) and Neural Networks (NN), have shown an overall improvement in digital mammography cancer diagnosis, reducing the false positive and false negative rates becoming important tools for the diagnostic evaluations performed by specialized radiologists. However, ML and NN-based algorithms rely on datasets that might bring issues to the segmentation tasks. In the present work, an automatic segmentation and detection algorithm is described. This algorithm uses image processing techniques along with the Hough transform to automatically identify microcalcifications that are highly correlated with breast cancer development in the early stages. Along with image processing, automatic segmentation of high-contrast objects is done using edge extraction and circle Hough transform. This provides the geometrical features needed for an automatic mask design which extracts statistical features of the regions of interest. The results shown in this study prove the potential of this tool for further diagnostics and classification of mammographic images due to the low sensitivity to noisy images and low contrast mammographies.

Keywords: breast cancer, segmentation, X-ray imaging, hough transform, image analysis

Procedia PDF Downloads 44
985 Robust Electrical Segmentation for Zone Coherency Delimitation Base on Multiplex Graph Community Detection

Authors: Noureddine Henka, Sami Tazi, Mohamad Assaad

Abstract:

The electrical grid is a highly intricate system designed to transfer electricity from production areas to consumption areas. The Transmission System Operator (TSO) is responsible for ensuring the efficient distribution of electricity and maintaining the grid's safety and quality. However, due to the increasing integration of intermittent renewable energy sources, there is a growing level of uncertainty, which requires a faster responsive approach. A potential solution involves the use of electrical segmentation, which involves creating coherence zones where electrical disturbances mainly remain within the zone. Indeed, by means of coherent electrical zones, it becomes possible to focus solely on the sub-zone, reducing the range of possibilities and aiding in managing uncertainty. It allows faster execution of operational processes and easier learning for supervised machine learning algorithms. Electrical segmentation can be applied to various applications, such as electrical control, minimizing electrical loss, and ensuring voltage stability. Since the electrical grid can be modeled as a graph, where the vertices represent electrical buses and the edges represent electrical lines, identifying coherent electrical zones can be seen as a clustering task on graphs, generally called community detection. Nevertheless, a critical criterion for the zones is their ability to remain resilient to the electrical evolution of the grid over time. This evolution is due to the constant changes in electricity generation and consumption, which are reflected in graph structure variations as well as line flow changes. One approach to creating a resilient segmentation is to design robust zones under various circumstances. This issue can be represented through a multiplex graph, where each layer represents a specific situation that may arise on the grid. Consequently, resilient segmentation can be achieved by conducting community detection on this multiplex graph. The multiplex graph is composed of multiple graphs, and all the layers share the same set of vertices. Our proposal involves a model that utilizes a unified representation to compute a flattening of all layers. This unified situation can be penalized to obtain (K) connected components representing the robust electrical segmentation clusters. We compare our robust segmentation to the segmentation based on a single reference situation. The robust segmentation proves its relevance by producing clusters with high intra-electrical perturbation and low variance of electrical perturbation. We saw through the experiences when robust electrical segmentation has a benefit and in which context.

Keywords: community detection, electrical segmentation, multiplex graph, power grid

Procedia PDF Downloads 45
984 COVID-19 Detection from Computed Tomography Images Using UNet Segmentation, Region Extraction, and Classification Pipeline

Authors: Kenan Morani, Esra Kaya Ayana

Abstract:

This study aimed to develop a novel pipeline for COVID-19 detection using a large and rigorously annotated database of computed tomography (CT) images. The pipeline consists of UNet-based segmentation, lung extraction, and a classification part, with the addition of optional slice removal techniques following the segmentation part. In this work, a batch normalization was added to the original UNet model to produce lighter and better localization, which is then utilized to build a full pipeline for COVID-19 diagnosis. To evaluate the effectiveness of the proposed pipeline, various segmentation methods were compared in terms of their performance and complexity. The proposed segmentation method with batch normalization outperformed traditional methods and other alternatives, resulting in a higher dice score on a publicly available dataset. Moreover, at the slice level, the proposed pipeline demonstrated high validation accuracy, indicating the efficiency of predicting 2D slices. At the patient level, the full approach exhibited higher validation accuracy and macro F1 score compared to other alternatives, surpassing the baseline. The classification component of the proposed pipeline utilizes a convolutional neural network (CNN) to make final diagnosis decisions. The COV19-CT-DB dataset, which contains a large number of CT scans with various types of slices and rigorously annotated for COVID-19 detection, was utilized for classification. The proposed pipeline outperformed many other alternatives on the dataset.

Keywords: classification, computed tomography, lung extraction, macro F1 score, UNet segmentation

Procedia PDF Downloads 100
983 Data-Driven Market Segmentation in Hospitality Using Unsupervised Machine Learning

Authors: Rik van Leeuwen, Ger Koole

Abstract:

Within hospitality, marketing departments use segmentation to create tailored strategies to ensure personalized marketing. This study provides a data-driven approach by segmenting guest profiles via hierarchical clustering based on an extensive set of features. The industry requires understandable outcomes that contribute to adaptability for marketing departments to make data-driven decisions and ultimately driving profit. A marketing department specified a business question that guides the unsupervised machine learning algorithm. Features of guests change over time; therefore, there is a probability that guests transition from one segment to another. The purpose of the study is to provide steps in the process from raw data to actionable insights, which serve as a guideline for how hospitality companies can adopt an algorithmic approach.

Keywords: hierarchical cluster analysis, hospitality, market segmentation

Procedia PDF Downloads 75
982 Market Segmentation and Conjoint Analysis for Apple Family Design

Authors: Abbas Al-Refaie, Nour Bata

Abstract:

A distributor of Apple products' experiences numerous difficulties in developing marketing strategies for new and existing mobile product entries that maximize customer satisfaction and the firm's profitability. This research, therefore, integrates market segmentation in platform-based product family design and conjoint analysis to identify iSystem combinations that increase customer satisfaction and business profits. First, the enhanced market segmentation grid is created. Then, the estimated demand model is formulated. Finally, the profit models are constructed then used to determine the ideal product family design that maximizes profit. Conjoint analysis is used to explore customer preferences with their satisfaction levels. A total of 200 surveys are collected about customer preferences. Then, simulation is used to determine the importance values for each attribute. Finally, sensitivity analysis is conducted to determine the product family design that maximizes both objectives. In conclusion, the results of this research shall provide great support to Apple distributors in determining the best marketing strategies that enhance their market share.

Keywords: market segmentation, conjoint analysis, market strategies, optimization

Procedia PDF Downloads 328
981 LGG Architecture for Brain Tumor Segmentation Using Convolutional Neural Network

Authors: Sajeeha Ansar, Asad Ali Safi, Sheikh Ziauddin, Ahmad R. Shahid, Faraz Ahsan

Abstract:

The most aggressive form of brain tumor is called glioma. Glioma is kind of tumor that arises from glial tissue of the brain and occurs quite often. A fully automatic 2D-CNN model for brain tumor segmentation is presented in this paper. We performed pre-processing steps to remove noise and intensity variances using N4ITK and standard intensity correction, respectively. We used Keras open-source library with Theano as backend for fast implementation of CNN model. In addition, we used BRATS 2015 MRI dataset to evaluate our proposed model. Furthermore, we have used SimpleITK open-source library in our proposed model to analyze images. Moreover, we have extracted random 2D patches for proposed 2D-CNN model for efficient brain segmentation. Extracting 2D patched instead of 3D due to less dimensional information present in 2D which helps us in reducing computational time. Dice Similarity Coefficient (DSC) is used as performance measure for the evaluation of the proposed method. Our method achieved DSC score of 0.77 for complete, 0.76 for core, 0.77 for enhanced tumor regions. However, these results are comparable with methods already implemented 2D CNN architecture.

Keywords: brain tumor segmentation, convolutional neural networks, deep learning, LGG

Procedia PDF Downloads 156
980 Best-Performing Color Space for Land-Sea Segmentation Using Wavelet Transform Color-Texture Features and Fusion of over Segmentation

Authors: Seynabou Toure, Oumar Diop, Kidiyo Kpalma, Amadou S. Maiga

Abstract:

Color and texture are the two most determinant elements for perception and recognition of the objects in an image. For this reason, color and texture analysis find a large field of application, for example in image classification and segmentation. But, the pioneering work in texture analysis was conducted on grayscale images, thus discarding color information. Many grey-level texture descriptors have been proposed and successfully used in numerous domains for image classification: face recognition, industrial inspections, food science medical imaging among others. Taking into account color in the definition of these descriptors makes it possible to better characterize images. Color texture is thus the subject of recent work, and the analysis of color texture images is increasingly attracting interest in the scientific community. In optical remote sensing systems, sensors measure separately different parts of the electromagnetic spectrum; the visible ones and even those that are invisible to the human eye. The amounts of light reflected by the earth in spectral bands are then transformed into grayscale images. The primary natural colors Red (R) Green (G) and Blue (B) are then used in mixtures of different spectral bands in order to produce RGB images. Thus, good color texture discrimination can be achieved using RGB under controlled illumination conditions. Some previous works investigate the effect of using different color space for color texture classification. However, the selection of the best performing color space in land-sea segmentation is an open question. Its resolution may bring considerable improvements in certain applications like coastline detection, where the detection result is strongly dependent on the performance of the land-sea segmentation. The aim of this paper is to present the results of a study conducted on different color spaces in order to show the best-performing color space for land-sea segmentation. In this sense, an experimental analysis is carried out using five different color spaces (RGB, XYZ, Lab, HSV, YCbCr). For each color space, the Haar wavelet decomposition is used to extract different color texture features. These color texture features are then used for Fusion of Over Segmentation (FOOS) based classification; this allows segmentation of the land part from the sea one. By analyzing the different results of this study, the HSV color space is found as the best classification performance while using color and texture features; which is perfectly coherent with the results presented in the literature.

Keywords: classification, coastline, color, sea-land segmentation

Procedia PDF Downloads 216
979 Speech Motor Processing and Animal Sound Communication

Authors: Ana Cleide Vieira Gomes Guimbal de Aquino

Abstract:

Sound communication is present in most vertebrates, from fish, mainly in species that live in murky waters, to some species of reptiles, anuran amphibians, birds, and mammals, including primates. There are, in fact, relevant similarities between human language and animal sound communication, and among these similarities are the vocalizations called calls. The first specific call in human babies is crying, which has a characteristic prosodic contour and is motivated most of the time by the need for food and by affecting the puppy-caregiver interaction, with a view to communicating the necessities and food requests and guaranteeing the survival of the species. The present work aims to articulate speech processing in the motor context with aspects of the project entitled emotional states and vocalization: a comparative study of the prosodic contours of crying in human and non-human animals. First, concepts of speech motor processing and general aspects of speech evolution will be presented to relate these two approaches to animal sound communication.

Keywords: speech motor processing, animal communication, animal behaviour, language acquisition

Procedia PDF Downloads 57
978 Localization of Frontal and Temporal Speech Areas in Brain Tumor Patients by Their Structural Connections with Probabilistic Tractography

Authors: B.Shukir, H.Woo, P.Barzo, D.Kis

Abstract:

Preoperative brain mapping in tumors involving the speech areas has an important role to reduce surgical risks. Functional magnetic resonance imaging (fMRI) is the gold standard method to localize cortical speech areas preoperatively, but its availability in clinical routine is difficult. Diffusion MRI based probabilistic tractography is available in head MRI. It’s used to segment cortical subregions by their structural connectivity. In our study, we used probabilistic tractography to localize the frontal and temporal cortical speech areas. 15 patients with left frontal tumor were enrolled to our study. Speech fMRI and diffusion MRI acquired preoperatively. The standard automated anatomical labelling atlas 3 (AAL3) cortical atlas used to define 76 left frontal and 118 left temporal potential speech areas. 4 types of tractography were run according to the structural connection of these regions to the left arcuate fascicle (FA) to localize those cortical areas which have speech functions: 1, frontal through FA; 2, frontal with FA; 3, temporal to FA; 4, temporal with FA connections were determined. Thresholds of 1%, 5%, 10% and 15% applied. At each level, the number of affected frontal and temporal regions by fMRI and tractography were defined, the sensitivity and specificity were calculated. At the level of 1% threshold showed the best results. Sensitivity was 61,631,4% and 67,1523,12%, specificity was 87,210,4% and 75,611,37% for frontal and temporal regions, respectively. From our study, we conclude that probabilistic tractography is a reliable preoperative technique to localize cortical speech areas. However, its results are not feasible that the neurosurgeon rely on during the operation.

Keywords: brain mapping, brain tumor, fMRI, probabilistic tractography

Procedia PDF Downloads 125
977 Mood Choices and Modality Patterns in Donald Trump’s Inaugural Presidential Speech

Authors: Mary Titilayo Olowe

Abstract:

The controversies that trailed the political campaign and eventual choice of Donald Trump as the American president is so great that expectations are high as to what the content of his inaugural speech will portray. Given the fact that language is a dynamic vehicle of expressing intentions, the speech needs to be objectively assessed so as to access its content in the manner intended through the three strands of meaning postulated by the Systemic Functional Grammar (SFG): the ideational, the interpersonal and the textual. The focus of this paper, however, is on the interpersonal meaning which deals with how language exhibits social roles and relationship. This paper, therefore, attempts to analyse President Donald Trump’s inaugural speech to elicit interpersonal meaning in it. The analysis is done from the perspective of mood and modality which are housed in SFG. Results of the mood choice which is basically declarative, reveal an information-centered speech while the high option for the modal verb operator ‘will’ shows president Donald Trump’s ability to establish an equal and reliant relationship with his audience, i.e., the Americans. In conclusion, the appeal of the speech to different levels of Interpersonal meaning is largely responsible for its overall effectiveness. One can, therefore, understand the reason for the massive reaction it generates at the center of global discourse.

Keywords: interpersonal, modality, mood, systemic functional grammar

Procedia PDF Downloads 189
976 Using Self Organizing Feature Maps for Automatic Prostate Segmentation in TRUS Images

Authors: Ahad Salimi, Hassan Masoumi

Abstract:

Prostate cancer is one of the most common recognized cancers in men, and, is one of the most important mortality factors of cancer in this group. Determining of prostate’s boundary in TRUS (Transrectal Ultra Sound) images is very necessary for prostate cancer treatments. The weakness edges and speckle noise make the ultrasound images inherently to segment. In this paper a new automatic algorithm for prostate segmentation in TRUS images proposed that include three main stages. At first morphological smoothing and sticks filtering are used for noise removing. In second step, for finding a point in prostate region, SOFM algorithm is enlisted and in the last step, the boundary of prostate extracting accompanying active contour is employed. For validation of proposed method, a number of experiments are conducted. The results obtained by our algorithm show the promise of the proposed algorithm.

Keywords: SOFM, preprocessing, GVF contour, segmentation

Procedia PDF Downloads 300
975 A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Authors: Xu Han, Shanxiong Chen, Shiyu Zhu, Xiaoyu Lin, Fujia Zhao, Dingwang Wang

Abstract:

Character detection is an important issue for character recognition of ancient Yi books. The accuracy of detection directly affects the recognition effect of ancient Yi books. Considering the complex layout, the lack of standard typesetting and the mixed arrangement between images and texts, we propose a character detection method for ancient Yi books based on connected components and regressive character segmentation. First, the scanned images of ancient Yi books are preprocessed with nonlocal mean filtering, and then a modified local adaptive threshold binarization algorithm is used to obtain the binary images to segment the foreground and background for the images. Second, the non-text areas are removed by the method based on connected components. Finally, the single character in the ancient Yi books is segmented by our method. The experimental results show that the method can effectively separate the text areas and non-text areas for ancient Yi books and achieve higher accuracy and recall rate in the experiment of character detection, and effectively solve the problem of character detection and segmentation in character recognition of ancient books.

Keywords: CCS concepts, computing methodologies, interest point, salient region detections, image segmentation

Procedia PDF Downloads 99
974 Speech Identification Test for Individuals with High-Frequency Sloping Hearing Loss in Telugu

Authors: S. B. Rathna Kumar, Sandya K. Varudhini, Aparna Ravichandran

Abstract:

Telugu is a south central Dravidian language spoken in Andhra Pradesh, a southern state of India. The available speech identification tests in Telugu have been developed to determine the communication problems of individuals having a flat frequency hearing loss. These conventional speech audiometric tests would provide redundant information when used on individuals with high-frequency sloping hearing loss because of better hearing sensitivity in the low- and mid-frequency regions. Hence, conventional speech identification tests do not indicate the true nature of the communication problem of individuals with high-frequency sloping hearing loss. It is highly possible that a person with a high-frequency sloping hearing loss may get maximum scores if conventional speech identification tests are used. Hence, there is a need to develop speech identification test materials that are specifically designed to assess the speech identification performance of individuals with high-frequency sloping hearing loss. The present study aimed to develop speech identification test for individuals with high-frequency sloping hearing loss in Telugu. Individuals with high-frequency sloping hearing loss have difficulty in perception of voiceless consonants whose spectral energy is above 1000 Hz. Hence, the word lists constructed with phonemes having mid- and high-frequency spectral energy will estimate speech identification performance better for such individuals. The phonemes /k/, /g/, /c/, /ṭ/ /t/, /p/, /s/, /ś/, /ṣ/ and /h/are preferred for the construction of words as these phonemes have spectral energy distributed in the frequencies above 1000 KHz predominantly. The present study developed two word lists in Telugu (each word list contained 25 words) for evaluating speech identification performance of individuals with high-frequency sloping hearing loss. The performance of individuals with high-frequency sloping hearing loss was evaluated using both conventional and high-frequency word lists under recorded voice condition. The results revealed that the developed word lists were found to be more sensitive in identifying the true nature of the communication problem of individuals with high-frequency sloping hearing loss.

Keywords: speech identification test, high-frequency sloping hearing loss, recorded voice condition, Telugu

Procedia PDF Downloads 394
973 A Corpus-Based Contrastive Analysis of Directive Speech Act Verbs in English and Chinese Legal Texts

Authors: Wujian Han

Abstract:

In the process of human interaction and communication, speech act verbs are considered to be the most active component and the main means for information transmission, and are also taken as an indication of the structure of linguistic behavior. The theoretical value and practical significance of such everyday built-in metalanguage have long been recognized. This paper, which is part of a bigger study, is aimed to provide useful insights for a more precise and systematic application to speech act verbs translation between English and Chinese, especially with regard to the degree to which generic integrity is maintained in the practice of translation of legal documents. In this study, the corpus, i.e. Chinese legal texts and their English translations, English legal texts, ordinary Chinese texts, and ordinary English texts, serve as a testing ground for examining contrastively the usage of English and Chinese directive speech act verbs in legal genre. The scope of this paper is relatively wide and essentially covers all directive speech act verbs which are used in ordinary English and Chinese, such as order, command, request, prohibit, threat, advice, warn and permit. The researcher, by combining the corpus methodology with a contrastive perspective, explored a range of characteristics of English and Chinese directive speech act verbs including their semantic, syntactic and pragmatic features, and then contrasted them in a structured way. It has been found that there are similarities between English and Chinese directive speech act verbs in legal genre, such as similar semantic components between English speech act verbs and their translation equivalents in Chinese, formal and accurate usage of English and Chinese directive speech act verbs in legal contexts. But notable differences have been identified in areas of difference between their usage in the original Chinese and English legal texts such as valency patterns and frequency of occurrences. For example, the subjects of some directive speech act verbs are very frequently omitted in Chinese legal texts, but this is not the case in English legal texts. One of the practicable methods to achieve adequacy and conciseness in speech act verb translation from Chinese into English in legal genre is to repeat the subjects or the message with discrepancy, and vice versa. In addition, translation effects such as overuse and underuse of certain directive speech act verbs are also found in the translated English texts compared to the original English texts. Legal texts constitute a particularly valuable material for speech act verb study. Building up such a contrastive picture of the Chinese and English speech act verbs in legal language would yield results of value and interest to legal translators and students of language for legal purposes and have practical application to legal translation between English and Chinese.

Keywords: contrastive analysis, corpus-based, directive speech act verbs, legal texts, translation between English and Chinese

Procedia PDF Downloads 447
972 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition

Procedia PDF Downloads 85
971 Subband Coding and Glottal Closure Instant (GCI) Using SEDREAMS Algorithm

Authors: Harisudha Kuresan, Dhanalakshmi Samiappan, T. Rama Rao

Abstract:

In modern telecommunication applications, Glottal Closure Instants location finding is important and is directly evaluated from the speech waveform. Here, we study the GCI using Speech Event Detection using Residual Excitation and the Mean Based Signal (SEDREAMS) algorithm. Speech coding uses parameter estimation using audio signal processing techniques to model the speech signal combined with generic data compression algorithms to represent the resulting modeled in a compact bit stream. This paper proposes a sub-band coder SBC, which is a type of transform coding and its performance for GCI detection using SEDREAMS are evaluated. In SBCs code in the speech signal is divided into two or more frequency bands and each of these sub-band signal is coded individually. The sub-bands after being processed are recombined to form the output signal, whose bandwidth covers the whole frequency spectrum. Then the signal is decomposed into low and high-frequency components and decimation and interpolation in frequency domain are performed. The proposed structure significantly reduces error, and precise locations of Glottal Closure Instants (GCIs) are found using SEDREAMS algorithm.

Keywords: SEDREAMS, GCI, SBC, GOI

Procedia PDF Downloads 326
970 Recognition of Voice Commands of Mentor Robot in Noisy Environment Using Hidden Markov Model

Authors: Khenfer Koummich Fatma, Hendel Fatiha, Mesbahi Larbi

Abstract:

This paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a human-machine interface with a voice recognition system that allows the operator to teleoperate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands pronounced in two languages: French and Arabic. The obtained recognition rate is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equals 30 dB, in this case; the Arabic speech recognition rate is 69%, and the French speech recognition rate is 80%. This can be explained by the ability of phonetic context of each speech when the noise is added.

Keywords: Arabic speech recognition, Hidden Markov Model (HMM), HTK, noise, TIMIT, voice command

Procedia PDF Downloads 339
969 Google Translate: AI Application

Authors: Shaima Almalhan, Lubna Shukri, Miriam Talal, Safaa Teskieh

Abstract:

Since artificial intelligence is a rapidly evolving topic that has had a significant impact on technical growth and innovation, this paper examines people's awareness, use, and engagement with the Google Translate application. To see how familiar aware users are with the app and its features, quantitative and qualitative research was conducted. The findings revealed that consumers have a high level of confidence in the application and how far people they benefit from this sort of innovation and how convenient it makes communication.

Keywords: artificial intelligence, google translate, speech recognition, language translation, camera translation, speech to text, text to speech

Procedia PDF Downloads 119
968 Reduction of False Positives in Head-Shoulder Detection Based on Multi-Part Color Segmentation

Authors: Lae-Jeong Park

Abstract:

The paper presents a method that utilizes figure-ground color segmentation to extract effective global feature in terms of false positive reduction in the head-shoulder detection. Conventional detectors that rely on local features such as HOG due to real-time operation suffer from false positives. Color cue in an input image provides salient information on a global characteristic which is necessary to alleviate the false positives of the local feature based detectors. An effective approach that uses figure-ground color segmentation has been presented in an effort to reduce the false positives in object detection. In this paper, an extended version of the approach is presented that adopts separate multipart foregrounds instead of a single prior foreground and performs the figure-ground color segmentation with each of the foregrounds. The multipart foregrounds include the parts of the head-shoulder shape and additional auxiliary foregrounds being optimized by a search algorithm. A classifier is constructed with the feature that consists of a set of the multiple resulting segmentations. Experimental results show that the presented method can discriminate more false positive than the single prior shape-based classifier as well as detectors with the local features. The improvement is possible because the presented approach can reduce the false positives that have the same colors in the head and shoulder foregrounds.

Keywords: pedestrian detection, color segmentation, false positive, feature extraction

Procedia PDF Downloads 255
967 Training a Neural Network to Segment, Detect and Recognize Numbers

Authors: Abhisek Dash

Abstract:

This study had three neural networks, one for number segmentation, one for number detection and one for number recognition all of which are coupled to one another. All networks were trained on the MNIST dataset and were convolutional. It was assumed that the images had lighter background and darker foreground. The segmentation network took 28x28 images as input and had sixteen outputs. Segmentation training starts when a dark pixel is encountered. Taking a window(7x7) over that pixel as focus, the eight neighborhood of the focus was checked for further dark pixels. The segmentation network was then trained to move in those directions which had dark pixels. To this end the segmentation network had 16 outputs. They were arranged as “go east”, ”don’t go east ”, “go south east”, “don’t go south east”, “go south”, “don’t go south” and so on w.r.t focus window. The focus window was resized into a 28x28 image and the network was trained to consider those neighborhoods which had dark pixels. The neighborhoods which had dark pixels were pushed into a queue in a particular order. The neighborhoods were then popped one at a time stitched to the existing partial image of the number one at a time and trained on which neighborhoods to consider when the new partial image was presented. The above process was repeated until the image was fully covered by the 7x7 neighborhoods and there were no more uncovered black pixels. During testing the network scans and looks for the first dark pixel. From here on the network predicts which neighborhoods to consider and segments the image. After this step the group of neighborhoods are passed into the detection network. The detection network took 28x28 images as input and had two outputs denoting whether a number was detected or not. Since the ground truth of the bounds of a number was known during training the detection network outputted in favor of number not found until the bounds were not met and vice versa. The recognition network was a standard CNN that also took 28x28 images and had 10 outputs for recognition of numbers from 0 to 9. This network was activated only when the detection network votes in favor of number detected. The above methodology could segment connected and overlapping numbers. Additionally the recognition unit was only invoked when a number was detected which minimized false positives. It also eliminated the need for rules of thumb as segmentation is learned. The strategy can also be extended to other characters as well.

Keywords: convolutional neural networks, OCR, text detection, text segmentation

Procedia PDF Downloads 130
966 A Technique for Image Segmentation Using K-Means Clustering Classification

Authors: Sadia Basar, Naila Habib, Awais Adnan

Abstract:

The paper presents the Technique for Image Segmentation Using K-Means Clustering Classification. The presented algorithms were specific, however, missed the neighboring information and required high-speed computerized machines to run the segmentation algorithms. Clustering is the process of partitioning a group of data points into a small number of clusters. The proposed method is content-aware and feature extraction method which is able to run on low-end computerized machines, simple algorithm, required low-quality streaming, efficient and used for security purpose. It has the capability to highlight the boundary and the object. At first, the user enters the data in the representation of the input. Then in the next step, the digital image is converted into groups clusters. Clusters are divided into many regions. The same categories with same features of clusters are assembled within a group and different clusters are placed in other groups. Finally, the clusters are combined with respect to similar features and then represented in the form of segments. The clustered image depicts the clear representation of the digital image in order to highlight the regions and boundaries of the image. At last, the final image is presented in the form of segments. All colors of the image are separated in clusters.

Keywords: clustering, image segmentation, K-means function, local and global minimum, region

Procedia PDF Downloads 350
965 Marker-Controlled Level-Set for Segmenting Breast Tumor from Thermal Images

Authors: Swathi Gopakumar, Sruthi Krishna, Shivasubramani Krishnamoorthy

Abstract:

Contactless, painless and radiation-free thermal imaging technology is one of the preferred screening modalities for detection of breast cancer. However, poor signal to noise ratio and the inexorable need to preserve edges defining cancer cells and normal cells, make the segmentation process difficult and hence unsuitable for computer-aided diagnosis of breast cancer. This paper presents key findings from a research conducted on the appraisal of two promising techniques, for the detection of breast cancer: (I) marker-controlled, Level-set segmentation of anisotropic diffusion filtered preprocessed image versus (II) Segmentation using marker-controlled level-set on a Gaussian-filtered image. Gaussian-filtering processes the image uniformly, whereas anisotropic filtering processes only in specific areas of a thermographic image. The pre-processed (Gaussian-filtered and anisotropic-filtered) images of breast samples were then applied for segmentation. The segmentation of breast starts with initial level-set function. In this study, marker refers to the position of the image to which initial level-set function is applied. The markers are generally placed on the left and right side of the breast, which may vary with the breast size. The proposed method was carried out on images from an online database with samples collected from women of varying breast characteristics. It was observed that the breast was able to be segmented out from the background by adjustment of the markers. From the results, it was observed that as a pre-processing technique, anisotropic filtering with level-set segmentation, preserved the edges more effectively than Gaussian filtering. Segmented image, by application of anisotropic filtering was found to be more suitable for feature extraction, enabling automated computer-aided diagnosis of breast cancer.

Keywords: anisotropic diffusion, breast, Gaussian, level-set, thermograms

Procedia PDF Downloads 353
964 Recognition by the Voice and Speech Features of the Emotional State of Children by Adults and Automatically

Authors: Elena E. Lyakso, Olga V. Frolova, Yuri N. Matveev, Aleksey S. Grigorev, Alexander S. Nikolaev, Viktor A. Gorodnyi

Abstract:

The study of the children’s emotional sphere depending on age and psychoneurological state is of great importance for the design of educational programs for children and their social adaptation. Atypical development may be accompanied by violations or specificities of the emotional sphere. To study characteristics of the emotional state reflection in the voice and speech features of children, the perceptual study with the participation of adults and the automatic recognition of speech were conducted. Speech of children with typical development (TD), with Down syndrome (DS), and with autism spectrum disorders (ASD) aged 6-12 years was recorded. To obtain emotional speech in children, model situations were created, including a dialogue between the child and the experimenter containing questions that can cause various emotional states in the child and playing with a standard set of toys. The questions and toys were selected, taking into account the child’s age, developmental characteristics, and speech skills. For the perceptual experiment by adults, test sequences containing speech material of 30 children: TD, DS, and ASD were created. The listeners were 100 adults (age 19.3 ± 2.3 years). The listeners were tasked with determining the children’s emotional state as “comfort – neutral – discomfort” while listening to the test material. Spectrographic analysis of speech signals was conducted. For automatic recognition of the emotional state, 6594 speech files containing speech material of children were prepared. Automatic recognition of three states, “comfort – neutral – discomfort,” was performed using automatically extracted from the set of acoustic features - the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) and the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS). The results showed that the emotional state is worse determined by the speech of TD children (comfort – 58% of correct answers, discomfort – 56%). Listeners better recognized discomfort in children with ASD and DS (78% of answers) than comfort (70% and 67%, respectively, for children with DS and ASD). The neutral state is better recognized by the speech of children with ASD (67%) than by the speech of children with DS (52%) and TD children (54%). According to the automatic recognition data using the acoustic feature set GeMAPSv01b, the accuracy of automatic recognition of emotional states for children with ASD is 0.687; children with DS – 0.725; TD children – 0.641. When using the acoustic feature set eGeMAPSv01b, the accuracy of automatic recognition of emotional states for children with ASD is 0.671; children with DS – 0.717; TD children – 0.631. The use of different models showed similar results, with better recognition of emotional states by the speech of children with DS than by the speech of children with ASD. The state of comfort is automatically determined better by the speech of TD children (precision – 0.546) and children with ASD (0.523), discomfort – children with DS (0.504). The data on the specificities of recognition by adults of the children’s emotional state by their speech may be used in recruitment for working with children with atypical development. Automatic recognition data can be used to create alternative communication systems and automatic human-computer interfaces for social-emotional learning. Acknowledgment: This work was financially supported by the Russian Science Foundation (project 18-18-00063).

Keywords: autism spectrum disorders, automatic recognition of speech, child’s emotional speech, Down syndrome, perceptual experiment

Procedia PDF Downloads 159
963 Liver Lesion Extraction with Fuzzy Thresholding in Contrast Enhanced Ultrasound Images

Authors: Abder-Rahman Ali, Adélaïde Albouy-Kissi, Manuel Grand-Brochier, Viviane Ladan-Marcus, Christine Hoeffl, Claude Marcus, Antoine Vacavant, Jean-Yves Boire

Abstract:

In this paper, we present a new segmentation approach for focal liver lesions in contrast enhanced ultrasound imaging. This approach, based on a two-cluster Fuzzy C-Means methodology, considers type-II fuzzy sets to handle uncertainty due to the image modality (presence of speckle noise, low contrast, etc.), and to calculate the optimum inter-cluster threshold. Fine boundaries are detected by a local recursive merging of ambiguous pixels. The method has been tested on a representative database. Compared to both Otsu and type-I Fuzzy C-Means techniques, the proposed method significantly reduces the segmentation errors.

Keywords: defuzzification, fuzzy clustering, image segmentation, type-II fuzzy sets

Procedia PDF Downloads 454
962 Compensatory Articulation of Pressure Consonants in Telugu Cleft Palate Speech: A Spectrographic Analysis

Authors: Indira Kothalanka

Abstract:

For individuals born with a cleft palate (CP), there is no separation between the nasal cavity and the oral cavity, due to which they cannot build up enough air pressure in the mouth for speech. Therefore, it is common for them to have speech problems. Common cleft type speech errors include abnormal articulation (compensatory or obligatory) and abnormal resonance (hyper, hypo and mixed nasality). These are generally resolved after palate repair. However, in some individuals, articulation problems do persist even after the palate repair. Such individuals develop variant articulations in an attempt to compensate for the inability to produce the target phonemes. A spectrographic analysis is used to investigate the compensatory articulatory behaviours of pressure consonants in the speech of 10 Telugu speaking individuals aged between 7-17 years with a history of cleft palate. Telugu is a Dravidian language which is spoken in Andhra Pradesh and Telangana states in India. It is a language with the third largest number of native speakers in India and the most spoken Dravidian language. The speech of the informants is analysed using single word list, sentences, passage and conversation. Spectrographic analysis is carried out using PRAAT, speech analysis software. The place and manner of articulation of consonant sounds is studied through spectrograms with the help of various acoustic cues. The types of compensatory articulation identified are glottal stops, palatal stops, uvular, velar stops and nasal fricatives which are non-native in Telugu.

Keywords: cleft palate, compensatory articulation, spectrographic analysis, PRAAT

Procedia PDF Downloads 417
961 Towards Long-Range Pixels Connection for Context-Aware Semantic Segmentation

Authors: Muhammad Zubair Khan, Yugyung Lee

Abstract:

Deep learning has recently achieved enormous response in semantic image segmentation. The previously developed U-Net inspired architectures operate with continuous stride and pooling operations, leading to spatial data loss. Also, the methods lack establishing long-term pixels connection to preserve context knowledge and reduce spatial loss in prediction. This article developed encoder-decoder architecture with bi-directional LSTM embedded in long skip-connections and densely connected convolution blocks. The network non-linearly combines the feature maps across encoder-decoder paths for finding dependency and correlation between image pixels. Additionally, the densely connected convolutional blocks are kept in the final encoding layer to reuse features and prevent redundant data sharing. The method applied batch-normalization for reducing internal covariate shift in data distributions. The empirical evidence shows a promising response to our method compared with other semantic segmentation techniques.

Keywords: deep learning, semantic segmentation, image analysis, pixels connection, convolution neural network

Procedia PDF Downloads 76
960 Computer Aided Analysis of Breast Based Diagnostic Problems from Mammograms Using Image Processing and Deep Learning Methods

Authors: Ali Berkan Ural

Abstract:

This paper presents the analysis, evaluation, and pre-diagnosis of early stage breast based diagnostic problems (breast cancer, nodulesorlumps) by Computer Aided Diagnosing (CAD) system from mammogram radiological images. According to the statistics, the time factor is crucial to discover the disease in the patient (especially in women) as possible as early and fast. In the study, a new algorithm is developed using advanced image processing and deep learning method to detect and classify the problem at earlystagewithmoreaccuracy. This system first works with image processing methods (Image acquisition, Noiseremoval, Region Growing Segmentation, Morphological Operations, Breast BorderExtraction, Advanced Segmentation, ObtainingRegion Of Interests (ROIs), etc.) and segments the area of interest of the breast and then analyzes these partly obtained area for cancer detection/lumps in order to diagnosis the disease. After segmentation, with using the Spectrogramimages, 5 different deep learning based methods (specified Convolutional Neural Network (CNN) basedAlexNet, ResNet50, VGG16, DenseNet, Xception) are applied to classify the breast based problems.

Keywords: computer aided diagnosis, breast cancer, region growing, segmentation, deep learning

Procedia PDF Downloads 63
959 Virtual Reality Based 3D Video Games and Speech-Lip Synchronization Superseding Algebraic Code Excited Linear Prediction

Authors: P. S. Jagadeesh Kumar, S. Meenakshi Sundaram, Wenli Hu, Yang Yung

Abstract:

In 3D video games, the dominance of production is unceasingly growing with a protruding level of affordability in terms of budget. Afterward, the automation of speech-lip synchronization technique is customarily onerous and has advanced a critical research subject in virtual reality based 3D video games. This paper presents one of these automatic tools, precisely riveted on the synchronization of the speech and the lip movement of the game characters. A robust and precise speech recognition segment that systematized with Algebraic Code Excited Linear Prediction method is developed which unconventionally delivers lip sync results. The Algebraic Code Excited Linear Prediction algorithm is constructed on that used in code-excited linear prediction, but Algebraic Code Excited Linear Prediction codebooks have an explicit algebraic structure levied upon them. This affords a quicker substitute to the software enactments of lip sync algorithms and thus advances the superiority of service factors abridged production cost.

Keywords: algebraic code excited linear prediction, speech-lip synchronization, video games, virtual reality

Procedia PDF Downloads 440
958 Simulation and Performance Evaluation of Transmission Lines with Shield Wire Segmentation against Atmospheric Discharges Using ATPDraw

Authors: Marcio S. da Silva, Jose Mauricio de B. Bezerra, Antonio E. de A. Nogueira

Abstract:

This paper aims to make a performance analysis of shield wire transmission lines against atmospheric discharges when it is made the option of sectioning the shield wire and verify if the tolerability of the change. As a goal of this work, it was established to make complete modeling of a transmission line in the ATPDraw program with shield wire grounded in all the towers and in some towers. The methodology used to make the proposed evaluation was to choose an actual transmission line that served as a case study. From the choice of transmission line and verification of all its topology and materials, complete modeling of the line using the ATPDraw software was performed. Then several atmospheric discharges were simulated by striking the grounded shield wires in each tower. These simulations served to identify the behavior of the existing line against atmospheric discharges. After this first analysis, the same line was reconsidered with shield wire segmentation. The shielding wire segmentation technique aims to reduce induced losses in shield wires and is adopted in some transmission lines in Brazil. With the same conditions of atmospheric discharge the transmission line, this time with shield wire segmentation was again evaluated. The results obtained showed that it is possible to obtain similar performances against atmospheric discharges between a shield wired line in multiple towers and the same line with shield wire segmentation if some precautions are adopted as verification of the ground resistance of the wire segmented shield, adequacy of the maximum length of the segmented gap, evaluation of the separation length of the electrodes of the insulator spark, among others. As a conclusion, it is verified that since the correct assessment and adopted the correct criteria of adjustment a transmission line with shielded wire segmentation can perform very similar to the traditional use with multiple earths. This solution contributes in a very important way to the reduction of energy losses in transmission lines.

Keywords: atmospheric discharges, ATPDraw, shield wire, transmission lines

Procedia PDF Downloads 143