Search results for: audio segmentation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 833

Search results for: audio segmentation

683 Deep Learning-Based Liver 3D Slicer for Image-Guided Therapy: Segmentation and Needle Aspiration

Authors: Ahmedou Moulaye Idriss, Tfeil Yahya, Tamas Ungi, Gabor Fichtinger

Abstract:

Image-guided therapy (IGT) plays a crucial role in minimally invasive procedures for liver interventions. Accurate segmentation of the liver and precise needle placement is essential for successful interventions such as needle aspiration. In this study, we propose a deep learning-based liver 3D slicer designed to enhance segmentation accuracy and facilitate needle aspiration procedures. The developed 3D slicer leverages state-of-the-art convolutional neural networks (CNNs) for automatic liver segmentation in medical images. The CNN model is trained on a diverse dataset of liver images obtained from various imaging modalities, including computed tomography (CT) and magnetic resonance imaging (MRI). The trained model demonstrates robust performance in accurately delineating liver boundaries, even in cases with anatomical variations and pathological conditions. Furthermore, the 3D slicer integrates advanced image registration techniques to ensure accurate alignment of preoperative images with real-time interventional imaging. This alignment enhances the precision of needle placement during aspiration procedures, minimizing the risk of complications and improving overall intervention outcomes. To validate the efficacy of the proposed deep learning-based 3D slicer, a comprehensive evaluation is conducted using a dataset of clinical cases. Quantitative metrics, including the Dice similarity coefficient and Hausdorff distance, are employed to assess the accuracy of liver segmentation. Additionally, the performance of the 3D slicer in guiding needle aspiration procedures is evaluated through simulated and clinical interventions. Preliminary results demonstrate the effectiveness of the developed 3D slicer in achieving accurate liver segmentation and guiding needle aspiration procedures with high precision. The integration of deep learning techniques into the IGT workflow shows great promise for enhancing the efficiency and safety of liver interventions, ultimately contributing to improved patient outcomes.

Keywords: deep learning, liver segmentation, 3D slicer, image guided therapy, needle aspiration

Procedia PDF Downloads 46
682 Multimodal Convolutional Neural Network for Musical Instrument Recognition

Authors: Yagya Raj Pandeya, Joonwhoan Lee

Abstract:

The dynamic behavior of music and video makes it difficult to evaluate musical instrument playing in a video by computer system. Any television or film video clip with music information are rich sources for analyzing musical instruments using modern machine learning technologies. In this research, we integrate the audio and video information sources using convolutional neural network (CNN) and pass network learned features through recurrent neural network (RNN) to preserve the dynamic behaviors of audio and video. We use different pre-trained CNN for music and video feature extraction and then fine tune each model. The music network use 2D convolutional network and video network use 3D convolution (C3D). Finally, we concatenate each music and video feature by preserving the time varying features. The long short term memory (LSTM) network is used for long-term dynamic feature characterization and then use late fusion with generalized mean. The proposed network performs better performance to recognize the musical instrument using audio-video multimodal neural network.

Keywords: multimodal, 3D convolution, music-video feature extraction, generalized mean

Procedia PDF Downloads 213
681 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 280
680 A Comparison of Proxemics and Postural Head Movements during Pop Music versus Matched Music Videos

Authors: Harry J. Witchel, James Ackah, Carlos P. Santos, Nachiappan Chockalingam, Carina E. I. Westling

Abstract:

Introduction: Proxemics is the study of how people perceive and use space. It is commonly proposed that when people like or engage with a person/object, they will move slightly closer to it, often quite subtly and subconsciously. Music videos are known to add entertainment value to a pop song. Our hypothesis was that by adding appropriately matched video to a pop song, it would lead to a net approach of the head to the monitor screen compared to simply listening to an audio-only version of the song. Methods: We presented to 27 participants (ages 21.00 ± 2.89, 15 female) seated in front of 47.5 x 27 cm monitor two musical stimuli in a counterbalanced order; all stimuli were based on music videos by the band OK Go: Here It Goes Again (HIGA, boredom ratings (0-100) = 15.00 ± 4.76, mean ± SEM, standard-error-of-the-mean) and Do What You Want (DWYW, boredom ratings = 23.93 ± 5.98), which did not differ in boredom elicited (P = 0.21, rank-sum test). Each participant experienced each song only once, and one song (counterbalanced) as audio-only versus the other song as a music video. The movement was measured by video-tracking using Kinovea 0.8, based on recording from a lateral aspect; before beginning, each participant had a reflective motion tracking marker placed on the outer canthus of the left eye. Analysis of the Kinovea X-Y coordinate output in comma-separated-variables format was performed in Matlab, as were non-parametric statistical tests. Results: We found that the audio-only stimuli (combined for both HIGA and DWYW, mean ± SEM, 35.71 ± 5.36) were significantly more boring than the music video versions (19.46 ± 3.83, P = 0.0066 Wilcoxon Signed Rank Test (WSRT), Cohen's d = 0.658, N = 28). We also found that participants' heads moved around twice as much during the audio-only versions (speed = 0.590 ± 0.095 mm/sec) compared to the video versions (0.301 ± 0.063 mm/sec, P = 0.00077, WSRT). However, the participants' mean head-to-screen distances were not detectably smaller (i.e. head closer to the screen) during the music videos (74.4 ± 1.8 cm) compared to the audio-only stimuli (73.9 ± 1.8 cm, P = 0.37, WSRT). If anything, during the audio-only condition, they were slightly closer. Interestingly, the ranges of the head-to-screen distances were smaller during the music video (8.6 ± 1.4 cm) compared to the audio-only (12.9 ± 1.7 cm, P = 0.0057, WSRT), the standard deviations were also smaller (P = 0.0027, WSRT), and their heads were held 7 mm higher (video 116.1 ± 0.8 vs. audio-only 116.8 ± 0.8 cm above floor, P = 0.049, WSRT). Discussion: As predicted, sitting and listening to experimenter-selected pop music was more boring than when the music was accompanied by a matched, professionally-made video. However, we did not find that the proxemics of the situation led to approaching the screen. Instead, adding video led to efforts to control the head to a more central and upright viewing position and to suppress head fidgeting.

Keywords: boredom, engagement, music videos, posture, proxemics

Procedia PDF Downloads 165
679 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng

Abstract:

Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 121
678 A Posteriori Trading-Inspired Model-Free Time Series Segmentation

Authors: Plessen Mogens Graf

Abstract:

Within the context of multivariate time series segmentation, this paper proposes a method inspired by a posteriori optimal trading. After a normalization step, time series are treated channelwise as surrogate stock prices that can be traded optimally a posteriori in a virtual portfolio holding either stock or cash. Linear transaction costs are interpreted as hyperparameters for noise filtering. Trading signals, as well as trading signals obtained on the reversed time series, are used for unsupervised channelwise labeling before a consensus over all channels is reached that determines the final segmentation time instants. The method is model-free such that no model prescriptions for segments are made. Benefits of proposed approach include simplicity, computational efficiency, and adaptability to a wide range of different shapes of time series. Performance is demonstrated on synthetic and real-world data, including a large-scale dataset comprising a multivariate time series of dimension 1000 and length 2709. Proposed method is compared to a popular model-based bottom-up approach fitting piecewise affine models and to a recent model-based top-down approach fitting Gaussian models and found to be consistently faster while producing more intuitive results in the sense of segmenting time series at peaks and valleys.

Keywords: time series segmentation, model-free, trading-inspired, multivariate data

Procedia PDF Downloads 133
677 Comprehensive Evaluation of COVID-19 Through Chest Images

Authors: Parisa Mansour

Abstract:

The coronavirus disease 2019 (COVID-19) was discovered and rapidly spread to various countries around the world since the end of 2019. Computed tomography (CT) images have been used as an important alternative to the time-consuming RT. PCR test. However, manual segmentation of CT images alone is a major challenge as the number of suspected cases increases. Thus, accurate and automatic segmentation of COVID-19 infections is urgently needed. Because the imaging features of the COVID-19 infection are different and similar to the background, existing medical image segmentation methods cannot achieve satisfactory performance. In this work, we try to build a deep convolutional neural network adapted for the segmentation of chest CT images with COVID-19 infections. First, we maintain a large and novel chest CT image database containing 165,667 annotated chest CT images from 861 patients with confirmed COVID-19. Inspired by the observation that the boundary of an infected lung can be improved by global intensity adjustment, we introduce a feature variable block into the proposed deep CNN, which adjusts the global features of features to segment the COVID-19 infection. The proposed PV array can effectively and adaptively improve the performance of functions in different cases. We combine features of different scales by proposing a progressive atrocious space pyramid fusion scheme to deal with advanced infection regions with various aspects and shapes. We conducted experiments on data collected in China and Germany and showed that the proposed deep CNN can effectively produce impressive performance.

Keywords: chest, COVID-19, chest Image, coronavirus, CT image, chest CT

Procedia PDF Downloads 55
676 A Guide to the Implementation of Ambisonics Super Stereo

Authors: Alessio Mastrorillo, Giuseppe Silvi, Francesco Scagliola

Abstract:

In this work, we introduce an Ambisonics decoder with an implementation of the C-format, also called Super Stereo. This format is an alternative to conventional stereo and binaural decoding. Unlike those, this format conveys audio information from the horizontal plane and works with stereo speakers and headphones. The two C-format channels can also return a reconstructed planar B-format. This work provides an open-source implementation for this format. We implement an all-pass filter for signal quadrature, as required by the decoding equations. This filter works with six Biquads in a cascade configuration, with values for control frequency and quality factor discovered experimentally. The phase response of the filter delivers a small error in the 20-14.000Hz range. The decoder has been tested with audio sources up to 192kHz sample rate, returning pristine sound quality and detailed stereo image. It has been included in the Envelop for Live suite and is available as an open-source repository. This decoder has applications in Virtual Reality and 360° audio productions, music composition, and online streaming.

Keywords: ambisonics, UHJ, quadrature filter, virtual reality, Gerzon, decoder, stereo, binaural, biquad

Procedia PDF Downloads 89
675 Subtitled Based-Approach for Learning Foreign Arabic Language

Authors: Elleuch Imen

Abstract:

In this paper, it propose a new approach for learning Arabic as a foreign language via audio-visual translation, particularly subtitling. The approach consists of developing video sequences appropriate to different levels of learning (from A1 to C2) containing conversations, quizzes, games and others. Each video aims to achieve a specific objective, such as the correct pronunciation of Arabic words, the correct syntactic structuring of Arabic sentences, the recognition of the morphological characteristics of terms and the semantic understanding of statements. The subtitled videos obtained can be incorporated into different Arabic second language learning tools such as Moocs, websites, platforms, etc.

Keywords: arabic foreign language, learning, audio-visuel translation, subtitled videos

Procedia PDF Downloads 58
674 Clustering Based Level Set Evaluation for Low Contrast Images

Authors: Bikshalu Kalagadda, Srikanth Rangu

Abstract:

The important object of images segmentation is to extract objects with respect to some input features. One of the important methods for image segmentation is Level set method. Generally medical images and synthetic images with low contrast of pixel profile, for such images difficult to locate interested features in images. In conventional level set function, develops irregularity during its process of evaluation of contour of objects, this destroy the stability of evolution process. For this problem a remedy is proposed, a new hybrid algorithm is Clustering Level Set Evolution. Kernel fuzzy particles swarm optimization clustering with the Distance Regularized Level Set (DRLS) and Selective Binary, and Gaussian Filtering Regularized Level Set (SBGFRLS) methods are used. The ability of identifying different regions becomes easy with improved speed. Efficiency of the modified method can be evaluated by comparing with the previous method for similar specifications. Comparison can be carried out by considering medical and synthetic images.

Keywords: segmentation, clustering, level set function, re-initialization, Kernel fuzzy, swarm optimization

Procedia PDF Downloads 351
673 Life Stage Customer Segmentation by Fine-Tuning Large Language Models

Authors: Nikita Katyal, Shaurya Uppal

Abstract:

This paper tackles the significant challenge of accurately classifying customers within a retailer’s customer base. Accurate classification is essential for developing targeted marketing strategies that effectively engage this important demographic. To address this issue, we propose a method that utilizes Large Language Models (LLMs). By employing LLMs, we analyze the metadata associated with product purchases derived from historical data to identify key product categories that act as distinguishing factors. These categories, such as baby food, eldercare products, or family-sized packages, offer valuable insights into the likely household composition of customers, including families with babies, families with kids/teenagers, families with pets, households caring for elders, or mixed households. We segment high-confidence customers into distinct categories by integrating historical purchase behavior with LLM-powered product classification. This paper asserts that life stage segmentation can significantly enhance e-commerce businesses’ ability to target the appropriate customers with tailored products and campaigns, thereby augmenting sales and improving customer retention. Additionally, the paper details the data sources, model architecture, and evaluation metrics employed for the segmentation task.

Keywords: LLMs, segmentation, product tags, fine-tuning, target segments, marketing communication

Procedia PDF Downloads 21
672 Tumor Boundary Extraction Using Intensity and Texture-Based on Gradient Vector

Authors: Namita Mittal, Himakshi Shekhawat, Ankit Vidyarthi

Abstract:

In medical research study, doctors and radiologists face lot of complexities in analysing the brain tumors in Magnetic Resonance (MR) images. Brain tumor detection is difficult due to amorphous tumor shape and overlapping of similar tissues in nearby region. So, radiologists require one such clinically viable solution which helps in automatic segmentation of tumor inside brain MR image. Initially, segmentation methods were used to detect tumor, by dividing the image into segments but causes loss of information. In this paper, a hybrid method is proposed which detect Region of Interest (ROI) on the basis of difference in intensity values and texture values of tumor region using nearby tissues with Gradient Vector Flow (GVF) technique in the identification of ROI. Proposed approach uses both intensity and texture values for identification of abnormal section of the brain MR images. Experimental results show that proposed method outperforms GVF method without any loss of information.

Keywords: brain tumor, GVF, intensity, MR images, segmentation, texture

Procedia PDF Downloads 430
671 A Segmentation Method for Grayscale Images Based on the Firefly Algorithm and the Gaussian Mixture Model

Authors: Donatella Giuliani

Abstract:

In this research, we propose an unsupervised grayscale image segmentation method based on a combination of the Firefly Algorithm and the Gaussian Mixture Model. Firstly, the Firefly Algorithm has been applied in a histogram-based research of cluster means. The Firefly Algorithm is a stochastic global optimization technique, centered on the flashing characteristics of fireflies. In this context it has been performed to determine the number of clusters and the related cluster means in a histogram-based segmentation approach. Successively these means are used in the initialization step for the parameter estimation of a Gaussian Mixture Model. The parametric probability density function of a Gaussian Mixture Model is represented as a weighted sum of Gaussian component densities, whose parameters are evaluated applying the iterative Expectation-Maximization technique. The coefficients of the linear super-position of Gaussians can be thought as prior probabilities of each component. Applying the Bayes rule, the posterior probabilities of the grayscale intensities have been evaluated, therefore their maxima are used to assign each pixel to the clusters, according to their gray-level values. The proposed approach appears fairly solid and reliable when applied even to complex grayscale images. The validation has been performed by using different standard measures, more precisely: the Root Mean Square Error (RMSE), the Structural Content (SC), the Normalized Correlation Coefficient (NK) and the Davies-Bouldin (DB) index. The achieved results have strongly confirmed the robustness of this gray scale segmentation method based on a metaheuristic algorithm. Another noteworthy advantage of this methodology is due to the use of maxima of responsibilities for the pixel assignment that implies a consistent reduction of the computational costs.

Keywords: clustering images, firefly algorithm, Gaussian mixture model, meta heuristic algorithm, image segmentation

Procedia PDF Downloads 215
670 Riesz Mixture Model for Brain Tumor Detection

Authors: Mouna Zitouni, Mariem Tounsi

Abstract:

This research introduces an application of the Riesz mixture model for medical image segmentation for accurate diagnosis and treatment of brain tumors. We propose a pixel classification technique based on the Riesz distribution, derived from an extended Bartlett decomposition. To our knowledge, this is the first study addressing this approach. The Expectation-Maximization algorithm is implemented for parameter estimation. A comparative analysis, using both synthetic and real brain images, demonstrates the superiority of the Riesz model over a recent method based on the Wishart distribution.

Keywords: EM algorithm, segmentation, Riesz probability distribution, Wishart probability distribution

Procedia PDF Downloads 16
669 Implementation of CNV-CH Algorithm Using Map-Reduce Approach

Authors: Aishik Deb, Rituparna Sinha

Abstract:

We have developed an algorithm to detect the abnormal segment/"structural variation in the genome across a number of samples. We have worked on simulated as well as real data from the BAM Files and have designed a segmentation algorithm where abnormal segments are detected. This algorithm aims to improve the accuracy and performance of the existing CNV-CH algorithm. The next-generation sequencing (NGS) approach is very fast and can generate large sequences in a reasonable time. So the huge volume of sequence information gives rise to the need for Big Data and parallel approaches of segmentation. Therefore, we have designed a map-reduce approach for the existing CNV-CH algorithm where a large amount of sequence data can be segmented and structural variations in the human genome can be detected. We have compared the efficiency of the traditional and map-reduce algorithms with respect to precision, sensitivity, and F-Score. The advantages of using our algorithm are that it is fast and has better accuracy. This algorithm can be applied to detect structural variations within a genome, which in turn can be used to detect various genetic disorders such as cancer, etc. The defects may be caused by new mutations or changes to the DNA and generally result in abnormally high or low base coverage and quantification values.

Keywords: cancer detection, convex hull segmentation, map reduce, next generation sequencing

Procedia PDF Downloads 135
668 Improvement of Brain Tumors Detection Using Markers and Boundaries Transform

Authors: Yousif Mohamed Y. Abdallah, Mommen A. Alkhir, Amel S. Algaddal

Abstract:

This was experimental study conducted to study segmentation of brain in MRI images using edge detection and morphology filters. For brain MRI images each film scanned using digitizer scanner then treated by using image processing program (MatLab), where the segmentation was studied. The scanned image was saved in a TIFF file format to preserve the quality of the image. Brain tissue can be easily detected in MRI image if the object has sufficient contrast from the background. We use edge detection and basic morphology tools to detect a brain. The segmentation of MRI images steps using detection and morphology filters were image reading, detection entire brain, dilation of the image, filling interior gaps inside the image, removal connected objects on borders and smoothen the object (brain). The results of this study were that it showed an alternate method for displaying the segmented object would be to place an outline around the segmented brain. Those filters approaches can help in removal of unwanted background information and increase diagnostic information of Brain MRI.

Keywords: improvement, brain, matlab, markers, boundaries

Procedia PDF Downloads 514
667 Method Comprising One to One Web Based Real Time Communications

Authors: Lata Kiran Dey, Rajendra Kumar, Biren Karmakar

Abstract:

Web Real Time Communications is a collection of standards, protocols, which provides real-time communications capabilities between web browsers and devices. This paper outlines the design and further implementation of web real-time communications on secure web applications having audio and video call capabilities. This proposed application may put up a system that will be able to work over both desktops as well as the mobile browser. Though, WebRTC also gives a set of JavaScript standard RTC APIs, which primarily works over the real-time communication framework. This helps to build a suitable communication application, which enables the audio, video, and message transfer in between the today’s modern browsers having WebRTC support.

Keywords: WebRTC, SIP, RTC, JavaScript, SRTP, secure web sockets, browser

Procedia PDF Downloads 147
666 A Fuzzy Approach to Liver Tumor Segmentation with Zernike Moments

Authors: Abder-Rahman Ali, Antoine Vacavant, Manuel Grand-Brochier, Adélaïde Albouy-Kissi, Jean-Yves Boire

Abstract:

In this paper, we present a new segmentation approach for liver lesions in regions of interest within MRI (Magnetic Resonance Imaging). This approach, based on a two-cluster Fuzzy C-Means methodology, considers the parameter variable compactness to handle uncertainty. Fine boundaries are detected by a local recursive merging of ambiguous pixels with a sequential forward floating selection with Zernike moments. The method has been tested on both synthetic and real images. When applied on synthetic images, the proposed approach provides good performance, segmentations obtained are accurate, their shape is consistent with the ground truth, and the extracted information is reliable. The results obtained on MR images confirm such observations. Our approach allows, even for difficult cases of MR images, to extract a segmentation with good performance in terms of accuracy and shape, which implies that the geometry of the tumor is preserved for further clinical activities (such as automatic extraction of pharmaco-kinetics properties, lesion characterization, etc).

Keywords: defuzzification, floating search, fuzzy clustering, Zernike moments

Procedia PDF Downloads 451
665 Computer Aided Diagnostic System for Detection and Classification of a Brain Tumor through MRI Using Level Set Based Segmentation Technique and ANN Classifier

Authors: Atanu K Samanta, Asim Ali Khan

Abstract:

Due to the acquisition of huge amounts of brain tumor magnetic resonance images (MRI) in clinics, it is very difficult for radiologists to manually interpret and segment these images within a reasonable span of time. Computer-aided diagnosis (CAD) systems can enhance the diagnostic capabilities of radiologists and reduce the time required for accurate diagnosis. An intelligent computer-aided technique for automatic detection of a brain tumor through MRI is presented in this paper. The technique uses the following computational methods; the Level Set for segmentation of a brain tumor from other brain parts, extraction of features from this segmented tumor portion using gray level co-occurrence Matrix (GLCM), and the Artificial Neural Network (ANN) to classify brain tumor images according to their respective types. The entire work is carried out on 50 images having five types of brain tumor. The overall classification accuracy using this method is found to be 98% which is significantly good.

Keywords: brain tumor, computer-aided diagnostic (CAD) system, gray-level co-occurrence matrix (GLCM), tumor segmentation, level set method

Procedia PDF Downloads 509
664 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier

Authors: Saurabh Farkya, Govinda Surampudi

Abstract:

Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.

Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)

Procedia PDF Downloads 496
663 Enhancing the Pricing Expertise of an Online Distribution Channel

Authors: Luis N. Pereira, Marco P. Carrasco

Abstract:

Dynamic pricing is a revenue management strategy in which hotel suppliers define, over time, flexible and different prices for their services for different potential customers, considering the profile of e-consumers and the demand and market supply. This means that the fundamentals of dynamic pricing are based on economic theory (price elasticity of demand) and market segmentation. This study aims to define a dynamic pricing strategy and a contextualized offer to the e-consumers profile in order to improve the number of reservations of an online distribution channel. Segmentation methods (hierarchical and non-hierarchical) were used to identify and validate an optimal number of market segments. A profile of the market segments was studied, considering the characteristics of the e-consumers and the probability of reservation a room. In addition, the price elasticity of demand was estimated for each segment using econometric models. Finally, predictive models were used to define rules for classifying new e-consumers into pre-defined segments. The empirical study illustrates how it is possible to improve the intelligence of an online distribution channel system through an optimal dynamic pricing strategy and a contextualized offer to the profile of each new e-consumer. A database of 11 million e-consumers of an online distribution channel was used in this study. The results suggest that an appropriate policy of market segmentation in using of online reservation systems is benefit for the service suppliers because it brings high probability of reservation and generates more profit than fixed pricing.

Keywords: dynamic pricing, e-consumers segmentation, online reservation systems, predictive analytics

Procedia PDF Downloads 234
662 Teaching Speaking Skills to Adult English Language Learners through ALM

Authors: Wichuda Kunnu, Aungkana Sukwises

Abstract:

Audio-lingual method (ALM) is a teaching approach that is claimed that ineffective for teaching second/foreign languages. Because some linguists and second/foreign language teachers believe that ALM is a rote learning style. However, this study is done on a belief that ALM will be able to solve Thais’ English speaking problem. This paper aims to report the findings on teaching English speaking to adult learners with an “adapted ALM”, one distinction of which is to use Thai as the medium language of instruction. The participants are consisted of 9 adult learners. They were allowed to speak English more freely using both the materials presented in the class and their background knowledge of English. At the end of the course, they spoke English more fluently, more confidently, to the extent that they applied what they learnt both in and outside the class.

Keywords: teaching English, audio lingual method, cognitive science, psychology

Procedia PDF Downloads 416
661 VideoAssist: A Labelling Assistant to Increase Efficiency in Annotating Video-Based Fire Dataset Using a Foundation Model

Authors: Keyur Joshi, Philip Dietrich, Tjark Windisch, Markus König

Abstract:

In the field of surveillance-based fire detection, the volume of incoming data is increasing rapidly. However, the labeling of a large industrial dataset is costly due to the high annotation costs associated with current state-of-the-art methods, which often require bounding boxes or segmentation masks for model training. This paper introduces VideoAssist, a video annotation solution that utilizes a video-based foundation model to annotate entire videos with minimal effort, requiring the labeling of bounding boxes for only a few keyframes. To the best of our knowledge, VideoAssist is the first method to significantly reduce the effort required for labeling fire detection videos. The approach offers bounding box and segmentation annotations for the video dataset with minimal manual effort. Results demonstrate that the performance of labels annotated by VideoAssist is comparable to those annotated by humans, indicating the potential applicability of this approach in fire detection scenarios.

Keywords: fire detection, label annotation, foundation models, object detection, segmentation

Procedia PDF Downloads 5
660 A Review on Artificial Neural Networks in Image Processing

Authors: B. Afsharipoor, E. Nazemi

Abstract:

Artificial neural networks (ANNs) are powerful tool for prediction which can be trained based on a set of examples and thus, it would be useful for nonlinear image processing. The present paper reviews several paper regarding applications of ANN in image processing to shed the light on advantage and disadvantage of ANNs in this field. Different steps in the image processing chain including pre-processing, enhancement, segmentation, object recognition, image understanding and optimization by using ANN are summarized. Furthermore, results on using multi artificial neural networks are presented.

Keywords: neural networks, image processing, segmentation, object recognition, image understanding, optimization, MANN

Procedia PDF Downloads 404
659 Counting People Utilizing Space-Time Imagery

Authors: Ahmed Elmarhomy, K. Terada

Abstract:

An automated method for counting passerby has been proposed using virtual-vertical measurement lines. Space-time image is representing the human regions which are treated using the segmentation process. Different color space has been used to perform the template matching. A proper template matching has been achieved to determine direction and speed of passing people. Distinguish one or two passersby has been investigated using a correlation between passerby speed and the human-pixel area. Finally, the effectiveness of the presented method has been experimentally verified.

Keywords: counting people, measurement line, space-time image, segmentation, template matching

Procedia PDF Downloads 450
658 Segmentation Using Multi-Thresholded Sobel Images: Application to the Separation of Stuck Pollen Grains

Authors: Endrick Barnacin, Jean-Luc Henry, Jimmy Nagau, Jack Molinie

Abstract:

Being able to identify biological particles such as spores, viruses, or pollens is important for health care professionals, as it allows for appropriate therapeutic management of patients. Optical microscopy is a technology widely used for the analysis of these types of microorganisms, because, compared to other types of microscopy, it is not expensive. The analysis of an optical microscope slide is a tedious and time-consuming task when done manually. However, using machine learning and computer vision, this process can be automated. The first step of an automated microscope slide image analysis process is segmentation. During this step, the biological particles are localized and extracted. Very often, the use of an automatic thresholding method is sufficient to locate and extract the particles. However, in some cases, the particles are not extracted individually because they are stuck to other biological elements. In this paper, we propose a stuck particles separation method based on the use of the Sobel operator and thresholding. We illustrate it by applying it to the separation of 813 images of adjacent pollen grains. The method correctly separated 95.4% of these images.

Keywords: image segmentation, stuck particles separation, Sobel operator, thresholding

Procedia PDF Downloads 128
657 The Feasibility of Online, Interactive Workshops to Facilitate Anatomy Education during the UK COVID-19 Lockdowns

Authors: Prabhvir Singh Marway, Kai Lok Chan, Maria-Ruxandra Jinga, Rachel Bok Ying Lee, Matthew Bok Kit Lee, Krishan Nandapalan, Sze Yi Beh, Harry Carr, Christopher Kui

Abstract:

We piloted a structured series of online workshops on the 3D segmentation of anatomical structures from CT scans. 33 participants were recruited from four UK universities for two-day workshops between 2020 and 2021. Open-source software (3D-Slicer) was used. We hypothesized that active participation via real-time screen-sharing and voice-communication via Discord would enable improved engagement and learning, despite national lockdowns. Written feedback indicated positive learning experiences, with subjective measures of anatomical understanding and software confidence improving.

Keywords: medical education, workshop, segmentation, anatomy

Procedia PDF Downloads 199
656 The Effect of Head Posture on the Kinematics of the Spine During Lifting and Lowering Tasks

Authors: Mehdi Nematimoez

Abstract:

Head posture is paramount to retaining gaze and balance in many activities; its control is thus important in many activities. However, little information is available about the effects of head movement restriction on other spine segment kinematics and movement patterns during lifting and lowering tasks. The aim of this study was to examine the effects of head movement restriction on relative angles and their derivatives using the stepwise segmentation approach during lifting and lowering tasks. Ten healthy men lifted and lowered a box using two styles (stoop and squat), with two loads (i.e., 10 and 20% of body weight); they performed these tasks with two instructed head postures (1. Flexing the neck to keep contact between chin and chest over the task cycle; 2. No instruction, free head posture). The spine was divided into five segments, tracked by six cluster markers (C7, T3, T6, T9, T12, and L5). Relative angles between spine segments and their derivatives (first and second) were analyzed by a stepwise segmentation approach to consider the effect of each segment on the whole spine. Accordingly, head posture significantly affected the derivatives of the relative angles and manifested latency in spine segments movement, i.e., cephalad-to-caudad or caudad-to-cephalad patterns. The relative angles for C7-T3 and T3-T6 increased over the cycle of all lifting and lowering tasks; nevertheless, in lower segments increased significantly when the spine moved into upright standing. However, these effects were clearer during lifting than lowering. Conclusively, the neck flexion can unevenly increase the flexion angles of spine segments from cervical to lumbar over lifting and lowering tasks; furthermore, stepwise segmentation reveals potential for assessing the segmental contribution in spine ROM and movement patterns.

Keywords: head movement restriction, spine kinematics, lifting, lowering, stepwise segmentation

Procedia PDF Downloads 243
655 Active Contours for Image Segmentation Based on Complex Domain Approach

Authors: Sajid Hussain

Abstract:

The complex domain approach for image segmentation based on active contour has been designed, which deforms step by step to partition an image into numerous expedient regions. A novel region-based trigonometric complex pressure force function is proposed, which propagates around the region of interest using image forces. The signed trigonometric force function controls the propagation of the active contour and the active contour stops on the exact edges of the object accurately. The proposed model makes the level set function binary and uses Gaussian smoothing kernel to adjust and escape the re-initialization procedure. The working principle of the proposed model is as follows: The real image data is transformed into complex data by iota (i) times of image data and the average iota (i) times of horizontal and vertical components of the gradient of image data is inserted in the proposed model to catch complex gradient of the image data. A simple finite difference mathematical technique has been used to implement the proposed model. The efficiency and robustness of the proposed model have been verified and compared with other state-of-the-art models.

Keywords: image segmentation, active contour, level set, Mumford and Shah model

Procedia PDF Downloads 112
654 An Automated System for the Detection of Citrus Greening Disease Based on Visual Descriptors

Authors: Sidra Naeem, Ayesha Naeem, Sahar Rahim, Nadia Nawaz Qadri

Abstract:

Citrus greening is a bacterial disease that causes considerable damage to citrus fruits worldwide. Efficient method for this disease detection must be carried out to minimize the production loss. This paper presents a pattern recognition system that comprises three stages for the detection of citrus greening from Orange leaves: segmentation, feature extraction and classification. Image segmentation is accomplished by adaptive thresholding. The feature extraction stage comprises of three visual descriptors i.e. shape, color and texture. From shape feature we have used asymmetry index, from color feature we have used histogram of Cb component from YCbCr domain and from texture feature we have used local binary pattern. Classification was done using support vector machines and k nearest neighbors. The best performances of the system is Accuracy = 88.02% and AUROC = 90.1% was achieved by automatic segmented images. Our experiments validate that: (1). Segmentation is an imperative preprocessing step for computer assisted diagnosis of citrus greening, and (2). The combination of shape, color and texture features form a complementary set towards the identification of citrus greening disease.

Keywords: citrus greening, pattern recognition, feature extraction, classification

Procedia PDF Downloads 183