Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 899

Search results for: automatic transliteration

719 Automatic Product Identification Based on Deep-Learning Theory in an Assembly Line

Authors: Fidel Lòpez Saca, Carlos Avilés-Cruz, Miguel Magos-Rivera, José Antonio Lara-Chávez

Abstract:

Automated object recognition and identification systems are widely used throughout the world, particularly in assembly lines, where they perform quality control and automatic part selection tasks. This article presents the design and implementation of an object recognition system in an assembly line. The proposed shapes-color recognition system is based on deep learning theory in a specially designed convolutional network architecture. The used methodology involve stages such as: image capturing, color filtering, location of object mass centers, horizontal and vertical object boundaries, and object clipping. Once the objects are cut out, they are sent to a convolutional neural network, which automatically identifies the type of figure. The identification system works in real-time. The implementation was done on a Raspberry Pi 3 system and on a Jetson-Nano device. The proposal is used in an assembly course of bachelor’s degree in industrial engineering. The results presented include studying the efficiency of the recognition and processing time.

Keywords: deep-learning, image classification, image identification, industrial engineering.

Procedia PDF Downloads 158

718 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 280

717 An Artificial Intelligence Supported QUAL2K Model for the Simulation of Various Physiochemical Parameters of Water

Authors: Mehvish Bilal, Navneet Singh, Jasir Mushtaq

Abstract:

Water pollution puts people's health at risk, and it can also impact the ecology. For practitioners of integrated water resources management (IWRM), water quality modelling may be useful for informing decisions about pollution control (such as discharge permitting) or demand management (such as abstraction permitting). To comprehend the current pollutant load, movement of effective load movement of contaminants generates effective relation between pollutants, mathematical simulation, source, and water quality is regarded as one of the best estimating tools. The current study involves the Qual2k model, which includes manual simulation of the various physiochemical characteristics of water. To this end, various sensors could be installed for the automatic simulation of various physiochemical characteristics of water. An artificial intelligence model has been proposed for the automatic simulation of water quality parameters. Models of water quality have become an effective tool for identifying worldwide water contamination, as well as the ultimate fate and behavior of contaminants in the water environment. Water quality model research is primarily conducted in Europe and other industrialized countries in the first world, where theoretical underpinnings and practical research are prioritized.

Keywords: artificial intelligence, QUAL2K, simulation, physiochemical parameters

Procedia PDF Downloads 100

716 The Algorithm of Semi-Automatic Thai Spoonerism Words for Bi-Syllable

Authors: Nutthapat Kaewrattanapat, Wannarat Bunchongkien

Abstract:

The purposes of this research are to study and develop the algorithm of Thai spoonerism words by semi-automatic computer programs, that is to say, in part of data input, syllables are already separated and in part of spoonerism, the developed algorithm is utilized, which can establish rules and mechanisms in Thai spoonerism words for bi-syllables by utilizing analysis in elements of the syllables, namely cluster consonant, vowel, intonation mark and final consonant. From the study, it is found that bi-syllable Thai spoonerism has 1 case of spoonerism mechanism, namely transposition in value of vowel, intonation mark and consonant of both 2 syllables but keeping consonant value and cluster word (if any). From the study, the rules and mechanisms in Thai spoonerism word were applied to develop as Thai spoonerism word software, utilizing PHP program. the software was brought to conduct a performance test on software execution; it is found that the program performs bi-syllable Thai spoonerism correctly or 99% of all words used in the test and found faults on the program at 1% as the words obtained from spoonerism may not be spelling in conformity with Thai grammar and the answer in Thai spoonerism could be more than 1 answer.

Keywords: algorithm, spoonerism, computational linguistics, Thai spoonerism

Procedia PDF Downloads 234

715 Building Information Modeling-Based Approach for Automatic Quantity Take-off and Cost Estimation

Authors: Lo Kar Yin, Law Ka Mei

Abstract:

Architectural, engineering, construction and operations (AECO) industry practitioners have been well adapting to the dynamic construction market from the fundamental training of its discipline. As further triggered by the pandemic since 2019, great steps are taken in virtual environment and the best collaboration is strived with project teams without boundaries. With adoption of Building Information Modeling-based approach and qualitative analysis, this paper is to review quantity take-off and cost estimation process through modeling techniques in liaison with suppliers, fabricators, subcontractors, contractors, designers, consultants and services providers in the construction industry value chain for automatic project cost budgeting, project cost control and cost evaluation on design options of in-situ reinforced-concrete construction and Modular Integrated Construction (MiC) at design stage, variation of works and cash flow/spending analysis at construction stage as far as practicable, with a view to sharing the findings for enhancing mutual trust and co-operation among AECO industry practitioners. It is to foster development through a common prototype of design and build project delivery method in NEC Engineering and Construction Contract (ECC) Options A and C.

Keywords: building information modeling, cost estimation, quantity take-off, modeling techniques

Procedia PDF Downloads 184

714 Automatic Music Score Recognition System Using Digital Image Processing

Authors: Yuan-Hsiang Chang, Zhong-Xian Peng, Li-Der Jeng

Abstract:

Music has always been an integral part of human’s daily lives. But, for the most people, reading musical score and turning it into melody is not easy. This study aims to develop an Automatic music score recognition system using digital image processing, which can be used to read and analyze musical score images automatically. The technical approaches included: (1) staff region segmentation; (2) image preprocessing; (3) note recognition; and (4) accidental and rest recognition. Digital image processing techniques (e.g., horizontal /vertical projections, connected component labeling, morphological processing, template matching, etc.) were applied according to musical notes, accidents, and rests in staff notations. Preliminary results showed that our system could achieve detection and recognition rates of 96.3% and 91.7%, respectively. In conclusion, we presented an effective automated musical score recognition system that could be integrated in a system with a media player to play music/songs given input images of musical score. Ultimately, this system could also be incorporated in applications for mobile devices as a learning tool, such that a music player could learn to play music/songs.

Keywords: connected component labeling, image processing, morphological processing, optical musical recognition

Procedia PDF Downloads 417

713 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 177

712 Automatic Adjustment of Thresholds via Closed-Loop Feedback Mechanism for Solder Paste Inspection

Authors: Chia-Chen Wei, Pack Hsieh, Jeffrey Chen

Abstract:

Surface Mount Technology (SMT) is widely used in the area of the electronic assembly in which the electronic components are mounted to the surface of the printed circuit board (PCB). Most of the defects in the SMT process are mainly related to the quality of solder paste printing. These defects lead to considerable manufacturing costs in the electronics assembly industry. Therefore, the solder paste inspection (SPI) machine for controlling and monitoring the amount of solder paste printing has become an important part of the production process. So far, the setting of the SPI threshold is based on statistical analysis and experts’ experiences to determine the appropriate threshold settings. Because the production data are not normal distribution and there are various variations in the production processes, defects related to solder paste printing still occur. In order to solve this problem, this paper proposes an online machine learning algorithm, called the automatic threshold adjustment (ATA) algorithm, and closed-loop architecture in the SMT process to determine the best threshold settings. Simulation experiments prove that our proposed threshold settings improve the accuracy from 99.85% to 100%.

Keywords: big data analytics, Industry 4.0, SPI threshold setting, surface mount technology

Procedia PDF Downloads 115

711 Protection of Website Owners' Rights: Proportionality of Website Blocking in Russia and Beyond

Authors: Ekaterina Semenova

Abstract:

The article explores the issue of website owners’ liability for the illicit content. Whilst various issues of secondary liability of internet access providers for the illicit content have been widely discussed in the law doctrine, the liability of website owners has attracted less attention. Meanwhile, the website blocking injunctions influence website owners’ rights most, since website owners have the interest to keep their website online, rather than internet access providers. The discussion of internet access providers’ liability overshadows the necessity to protect the website owners’ rights to due process and proportionality of blocking injunctions. The analysis of Russian website blocking regulation and case law showed that the protection of website owners’ rights depends on the kind of illicit content: some content induces automatic blocking injunctions without prior notice of website owners and any opportunity to appeal, while other content does not invoke automatic blocking and provides an opportunity for the website owner to avoid or appeal an injunction. Comparative analysis of website blocking regulations in European countries reveals different approaches to the proportionality of website blocking and website owner’s rights protection. Based on the findings of the study, we conclude that the global trend to impose website blocking injunctions on wide range of illicit content without due process of law interferes with the rights of website owners.

Keywords: illicit content, liability, Russia, website blocking

Procedia PDF Downloads 349

710 Aircraft Automatic Collision Avoidance Using Spiral Geometric Approach

Authors: M. Orefice, V. Di Vito

Abstract:

This paper provides a description of a Collision Avoidance algorithm that has been developed starting from the mathematical modeling of the flight of insects, in terms of spirals and conchospirals geometric paths. It is able to calculate a proper avoidance manoeuver aimed to prevent the infringement of a predefined distance threshold between ownship and the considered intruder, while minimizing the ownship trajectory deviation from the original path and in compliance with the aircraft performance limitations and dynamic constraints. The algorithm is designed in order to be suitable for real-time applications, so that it can be considered for the implementation in the most recent airborne automatic collision avoidance systems using the traffic data received through an ADS-B IN device. The presented approach is able to take into account the rules-of-the-air, due to the possibility to select, through specifically designed decision making logic based on the consideration of the encounter geometry, the direction of the calculated collision avoidance manoeuver that allows complying with the rules-of-the-air, as for instance the fundamental right of way rule. In the paper, the proposed collision avoidance algorithm is presented and its preliminary design and software implementation is described. The applicability of this method has been proved through preliminary simulation tests performed in a 2D environment considering single intruder encounter geometries, as reported and discussed in the paper.

Keywords: ADS-B Based Application, Collision Avoidance, RPAS, Spiral Geometry.

Procedia PDF Downloads 239

709 Automatic Identification of Pectoral Muscle

Authors: Ana L. M. Pavan, Guilherme Giacomini, Allan F. F. Alves, Marcela De Oliveira, Fernando A. B. Neto, Maria E. D. Rosa, Andre P. Trindade, Diana R. De Pina

Abstract:

Mammography is a worldwide image modality used to diagnose breast cancer, even in asymptomatic women. Due to its large availability, mammograms can be used to measure breast density and to predict cancer development. Women with increased mammographic density have a four- to sixfold increase in their risk of developing breast cancer. Therefore, studies have been made to accurately quantify mammographic breast density. In clinical routine, radiologists perform image evaluations through BIRADS (Breast Imaging Reporting and Data System) assessment. However, this method has inter and intraindividual variability. An automatic objective method to measure breast density could relieve radiologist’s workload by providing a first aid opinion. However, pectoral muscle is a high density tissue, with similar characteristics of fibroglandular tissues. It is consequently hard to automatically quantify mammographic breast density. Therefore, a pre-processing is needed to segment the pectoral muscle which may erroneously be quantified as fibroglandular tissue. The aim of this work was to develop an automatic algorithm to segment and extract pectoral muscle in digital mammograms. The database consisted of thirty medio-lateral oblique incidence digital mammography from São Paulo Medical School. This study was developed with ethical approval from the authors’ institutions and national review panels under protocol number 3720-2010. An algorithm was developed, in Matlab® platform, for the pre-processing of images. The algorithm uses image processing tools to automatically segment and extract the pectoral muscle of mammograms. Firstly, it was applied thresholding technique to remove non-biological information from image. Then, the Hough transform is applied, to find the limit of the pectoral muscle, followed by active contour method. Seed of active contour is applied in the limit of pectoral muscle found by Hough transform. An experienced radiologist also manually performed the pectoral muscle segmentation. Both methods, manual and automatic, were compared using the Jaccard index and Bland-Altman statistics. The comparison between manual and the developed automatic method presented a Jaccard similarity coefficient greater than 90% for all analyzed images, showing the efficiency and accuracy of segmentation of the proposed method. The Bland-Altman statistics compared both methods in relation to area (mm²) of segmented pectoral muscle. The statistic showed data within the 95% confidence interval, enhancing the accuracy of segmentation compared to the manual method. Thus, the method proved to be accurate and robust, segmenting rapidly and freely from intra and inter-observer variability. It is concluded that the proposed method may be used reliably to segment pectoral muscle in digital mammography in clinical routine. The segmentation of the pectoral muscle is very important for further quantifications of fibroglandular tissue volume present in the breast.

Keywords: active contour, fibroglandular tissue, hough transform, pectoral muscle

Procedia PDF Downloads 350

708 Automatic Generating CNC-Code for Milling Machine

Authors: Chalakorn Chitsaart, Suchada Rianmora, Mann Rattana-Areeyagon, Wutichai Namjaiprasert

Abstract:

G-code is the main factor in computer numerical control (CNC) machine for controlling the tool-paths and generating the profile of the object’s features. For obtaining high surface accuracy of the surface finish, non-stop operation is required for CNC machine. Recently, to design a new product, the strategy that concerns about a change that has low impact on business and does not consume lot of resources has been introduced. Cost and time for designing minor changes can be reduced since the traditional geometric details of the existing models are applied. In order to support this strategy as the alternative channel for machining operation, this research proposes the automatic generating codes for CNC milling operation. Using this technique can assist the manufacturer to easily change the size and the geometric shape of the product during the operation where the time spent for setting up or processing the machine are reduced. The algorithm implemented on MATLAB platform is developed by analyzing and evaluating the geometric information of the part. Codes are created rapidly to control the operations of the machine. Comparing to the codes obtained from CAM, this developed algorithm can shortly generate and simulate the cutting profile of the part.

Keywords: geometric shapes, milling operation, minor changes, CNC Machine, G-code, cutting parameters

Procedia PDF Downloads 348

707 Design of Semi-Autonomous Street Cleaning Vehicle

Authors: Khouloud Safa Azoud, Süleyman Baştürk

Abstract:

In the pursuit of cleaner and more sustainable urban environments, advanced technologies play a critical role in evolving sanitation systems. This paper presents two distinct advancements in automated cleaning machines designed to improve urban sanitation. The first advancement is a semi-automatic road surface cleaning machine that integrates human labor with solar energy to enhance environmental sustainability and adaptability, especially in regions with limited access to electricity. By reducing carbon emissions and increasing operational efficiency, this approach offers significant potential for urban sanitation enhancement. The second advancement is a multifunctional semi-automatic street cleaning machine equipped with a camera, Arduino programming, and GPS for an autonomous operation aimed at addressing cost barriers in developing countries. Prioritizing low energy consumption and cost-effectiveness, this machine provides versatile cleaning solutions adaptable to various environmental conditions. By integrating solar energy with autonomous operating systems and careful design, these developments represent substantial progress in sustainable urban sanitation, particularly in developing regions.

Keywords: automated cleaning machines, solar energy integration, operational efficiency, urban sanitation systems

Procedia PDF Downloads 31

706 GPRS Based Automatic Metering System

Authors: Constant Akama, Frank Kulor, Frederick Agyemang

Abstract:

All over the world, due to increasing population, electric power distribution companies are looking for more efficient ways of reading electricity meters. In Ghana, the prepaid metering system was introduced in 2007 to replace the manual system of reading which was fraught with inefficiencies. However, the prepaid system in Ghana is not capable of integration with online systems such as e-commerce platforms and remote monitoring systems. In this paper, we present a design framework for an automatic metering system that can be integrated with e-commerce platforms and remote monitoring systems. The meter was designed using ADE 7755 which reads the energy consumption and the reading is processed by a microcontroller connected to Sim900 General Packet Radio Service module containing a GSM chip provisioned with an Access Point Name. The system also has a billing server and a management server located at the premises of the utility company which communicate with the meter over a Virtual Private Network and GPRS. With this system, customers can buy credit online and the credit will be transferred securely to the meter. Also, when a fault is reported, the utility company can log into the meter remotely through the management server to troubleshoot the problem.

Keywords: access point name, general packet radio service, GSM, virtual private network

Procedia PDF Downloads 298

705 PLC Based Automatic Railway Crossing System for India

Authors: Tapan Upadhyay, Aqib Siddiqui, Sameer Khan

Abstract:

Railway crossing system in India is a manually operated level crossing system, either manned or unmanned. The main aim is to protect pedestrians and vehicles from colliding with trains, which pass at regular intervals, as India has the largest and busiest railway network. But because of human error and negligence, every year thousands of lives are lost due to accidents at railway crossings. To avoid this, we suggest a solution, by using Programmable Logical Controller (PLC) based automatic system, which will automatically control the barrier as well as roadblocks to stop people from crossing while security warning is given. Often people avoid security warning, and pass two-wheelers from beneath the barrier, while the train is at a distance away. This paper aims at reducing the fatality and accident rate by controlling barrier and roadblocks using sensors which sense the incoming train and vehicles and sends a signal to PLC. The PLC in return sends a signal to barrier and roadblocks. Once the train passes, the barrier and roadblocks retrieve back, and the passage is clear for vehicles and pedestrians to cross. PLC’s are used because they are very flexible, cost effective, space efficient, reduces complexity and minimises errors. Supervisory Control And Data Acquisition (SCADA) is used to monitor the functioning.

Keywords: level crossing, PLC, sensors, SCADA

Procedia PDF Downloads 426

704 Exploring Pre-Trained Automatic Speech Recognition Model HuBERT for Early Alzheimer’s Disease and Mild Cognitive Impairment Detection in Speech

Authors: Monica Gonzalez Machorro

Abstract:

Dementia is hard to diagnose because of the lack of early physical symptoms. Early dementia recognition is key to improving the living condition of patients. Speech technology is considered a valuable biomarker for this challenge. Recent works have utilized conventional acoustic features and machine learning methods to detect dementia in speech. BERT-like classifiers have reported the most promising performance. One constraint, nonetheless, is that these studies are either based on human transcripts or on transcripts produced by automatic speech recognition (ASR) systems. This research contribution is to explore a method that does not require transcriptions to detect early Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This is achieved by fine-tuning a pre-trained ASR model for the downstream early AD and MCI tasks. To do so, a subset of the thoroughly studied Pitt Corpus is customized. The subset is balanced for class, age, and gender. Data processing also involves cropping the samples into 10-second segments. For comparison purposes, a baseline model is defined by training and testing a Random Forest with 20 extracted acoustic features using the librosa library implemented in Python. These are: zero-crossing rate, MFCCs, spectral bandwidth, spectral centroid, root mean square, and short-time Fourier transform. The baseline model achieved a 58% accuracy. To fine-tune HuBERT as a classifier, an average pooling strategy is employed to merge the 3D representations from audio into 2D representations, and a linear layer is added. The pre-trained model used is ‘hubert-large-ls960-ft’. Empirically, the number of epochs selected is 5, and the batch size defined is 1. Experiments show that our proposed method reaches a 69% balanced accuracy. This suggests that the linguistic and speech information encoded in the self-supervised ASR-based model is able to learn acoustic cues of AD and MCI.

Keywords: automatic speech recognition, early Alzheimer’s recognition, mild cognitive impairment, speech impairment

Procedia PDF Downloads 126

703 Double Layer Security Authentication Model for Automatic Dependent Surveillance-Broadcast

Authors: Buse T. Aydin, Enver Ozdemir

Abstract:

An automatic dependent surveillance-broadcast (ADS-B) system has serious security problems. In this study, a double layer authentication scheme between the aircraft and ground station, aircraft to aircraft, ground station to ATC tower is designed to prevent any unauthorized aircrafts from introducing themselves as friends. This method can be used as a solution to the problem of authentication. The method is a combination of classical cryptographic methods and new generation physical layers. The first layer has employed the embedded key of the aircraft. The embedded key is assumed to installed during the construction of the utility. The other layer is a physical attribute (flight path, distance, etc.) between the aircraft and the ATC tower. We create a mathematical model so that two layers’ information is employed and an aircraft is authenticated as a friend or unknown according to the accuracy of the results of the model. The results of the aircraft are compared with the results of the ATC tower and if the values found by the aircraft and ATC tower match within a certain error margin, we mark the aircraft as friend. As a result, the ADS-B messages coming from this authenticated friendly aircraft will be processed. In this method, even if the embedded key is captured by the unknown aircraft, without the information of the second layer, the unknown aircraft can easily be determined. Overall, in this work, we present a reliable system by adding physical layer in the authentication process.

Keywords: ADS-B, authentication, communication with physical layer security, cryptography, identification friend or foe

Procedia PDF Downloads 177

702 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lopez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation tasks. However, a new wave of interest has surged: automatic programming language code generation. This task consists of translating natural language instructions to a source code. Despite the fact that well-known pre-trained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformer neural network. It aims to generate java source code from natural language text. JaCoText leverages the advantages of both natural language and code generation models. More specifically, we study some findings from state of the art and use them to (1) initialize our model from powerful pre-trained models, (2) explore additional pretraining on our java dataset, (3) lead experiments combining the unimodal and bimodal data in training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: java code generation, natural language processing, sequence-to-sequence models, transformer neural networks

Procedia PDF Downloads 283

701 Bird-Adapted Filter for Avian Species and Individual Identification Systems Improvement

Authors: Ladislav Ptacek, Jan Vanek, Jan Eisner, Alexandra Pruchova, Pavel Linhart, Ludek Muller, Dana Jirotkova

Abstract:

One of the essential steps of avian song processing is signal filtering. Currently, the standard methods of filtering are the Mel Bank Filter or linear filter distribution. In this article, a new type of bank filter called the Bird-Adapted Filter is introduced; whereby the signal filtering is modifiable, based upon a new mathematical description of audiograms for particular bird species or order, which was named the Avian Audiogram Unified Equation. According to the method, filters may be deliberately distributed by frequency. The filters are more concentrated in bands of higher sensitivity where there is expected to be more information transmitted and vice versa. Further, it is demonstrated a comparison of various filters for automatic individual recognition of chiffchaff (Phylloscopus collybita). The average Equal Error Rate (EER) value for Linear bank filter was 16.23%, for Mel Bank Filter 18.71%, the Bird-Adapted Filter gave 14.29%, and Bird-Adapted Filter with 1/3 modification was 12.95%. This approach would be useful for practical use in automatic systems for avian species and individual identification. Since the Bird-Adapted Filter filtration is based on the measured audiograms of particular species or orders, selecting the distribution according to the avian vocalization provides the most precise filter distribution to date.

Keywords: avian audiogram, bird individual identification, bird song processing, bird species recognition, filter bank

Procedia PDF Downloads 385

700 Reed: An Approach Towards Quickly Bootstrapping Multilingual Acoustic Models

Authors: Bipasha Sen, Aditya Agarwal

Abstract:

Multilingual automatic speech recognition (ASR) system is a single entity capable of transcribing multiple languages sharing a common phone space. Performance of such a system is highly dependent on the compatibility of the languages. State of the art speech recognition systems are built using sequential architectures based on recurrent neural networks (RNN) limiting the computational parallelization in training. This poses a significant challenge in terms of time taken to bootstrap and validate the compatibility of multiple languages for building a robust multilingual system. Complex architectural choices based on self-attention networks are made to improve the parallelization thereby reducing the training time. In this work, we propose Reed, a simple system based on 1D convolutions which uses very short context to improve the training time. To improve the performance of our system, we use raw time-domain speech signals directly as input. This enables the convolutional layers to learn feature representations rather than relying on handcrafted features such as MFCC. We report improvement on training and inference times by atleast a factor of 4x and 7.4x respectively with comparable WERs against standard RNN based baseline systems on SpeechOcean's multilingual low resource dataset.

Keywords: convolutional neural networks, language compatibility, low resource languages, multilingual automatic speech recognition

Procedia PDF Downloads 122

699 An Automatic Large Classroom Attendance Conceptual Model Using Face Counting

Authors: Sirajdin Olagoke Adeshina, Haidi Ibrahim, Akeem Salawu

Abstract:

large lecture theatres cannot be covered by a single camera but rather by a multicamera setup because of their size, shape, and seating arrangements. Although, classroom capture is achievable through a single camera. Therefore, a design and implementation of a multicamera setup for a large lecture hall were considered. Researchers have shown emphasis on the impact of class attendance taken on the academic performance of students. However, the traditional method of carrying out this exercise is below standard, especially for large lecture theatres, because of the student population, the time required, sophistication, exhaustiveness, and manipulative influence. An automated large classroom attendance system is, therefore, imperative. The common approach in this system is face detection and recognition, where known student faces are captured and stored for recognition purposes. This approach will require constant face database updates due to constant changes in the facial features. Alternatively, face counting can be performed by cropping the localized faces on the video or image into a folder and then count them. This research aims to develop a face localization-based approach to detect student faces in classroom images captured using a multicamera setup. A selected Haar-like feature cascade face detector trained with an asymmetric goal to minimize the False Rejection Rate (FRR) relative to the False Acceptance Rate (FAR) was applied on Raspberry Pi 4B. A relationship between the two factors (FRR and FAR) was established using a constant (λ) as a trade-off between the two factors for automatic adjustment during training. An evaluation of the proposed approach and the conventional AdaBoost on classroom datasets shows an improvement of 8% TPR (output result of low FRR) and 7% minimization of the FRR. The average learning speed of the proposed approach was improved with 1.19s execution time per image compared to 2.38s of the improved AdaBoost. Consequently, the proposed approach achieved 97% TPR with an overhead constraint time of 22.9s compared to 46.7s of the improved Adaboost when evaluated on images obtained from a large lecture hall (DK5) USM.

Keywords: automatic attendance, face detection, haar-like cascade, manual attendance

Procedia PDF Downloads 69

698 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language

Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim

Abstract:

The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.

Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition

Procedia PDF Downloads 321

697 Path Planning for Orchard Robot Using Occupancy Grid Map in 2D Environment

Authors: Satyam Raikwar, Thomas Herlitzius, Jens Fehrmann

Abstract:

In recent years, the autonomous navigation of orchard and field robots is an emerging technology of the mobile robotics in agriculture. One of the core aspects of autonomous navigation builds upon path planning, which is still a crucial issue. Generally, for simple representation, the path planning for a mobile robot is performed in a two-dimensional space, which creates a path between the start and goal point. This paper presents the automatic path planning approach for robots used in orchards and vineyards using occupancy grid maps with field consideration. The orchards and vineyards are usually structured environment and their topology is assumed to be constant over time; therefore, in this approach, an RGB image of a field is used as a working environment. These images undergone different image processing operations and then discretized into two-dimensional grid matrices. The individual grid or cell of these grid matrices represents the occupancy of the space, whether it is free or occupied. The grid matrix represents the robot workspace for motion and path planning. After the grid matrix is described, a probabilistic roadmap (PRM) path algorithm is used to create the obstacle-free path over these occupancy grids. The path created by this method was successfully verified in the test area. Furthermore, this approach is used in the navigation of the orchard robot.

Keywords: orchard robots, automatic path planning, occupancy grid, probabilistic roadmap

Procedia PDF Downloads 155

696 DEEPMOTILE: Motility Analysis of Human Spermatozoa Using Deep Learning in Sri Lankan Population

Authors: Chamika Chiran Perera, Dananjaya Perera, Chirath Dasanayake, Banuka Athuraliya

Abstract:

Male infertility is a major problem in the world, and it is a neglected and sensitive health issue in Sri Lanka. It can be determined by analyzing human semen samples. Sperm motility is one of many factors that can evaluate male’s fertility potential. In Sri Lanka, this analysis is performed manually. Manual methods are time consuming and depend on the person, but they are reliable and it can depend on the expert. Machine learning and deep learning technologies are currently being investigated to automate the spermatozoa motility analysis, and these methods are unreliable. These automatic methods tend to produce false positive results and false detection. Current automatic methods support different techniques, and some of them are very expensive. Due to the geographical variance in spermatozoa characteristics, current automatic methods are not reliable for motility analysis in Sri Lanka. The suggested system, DeepMotile, is to explore a method to analyze motility of human spermatozoa automatically and present it to the andrology laboratories to overcome current issues. DeepMotile is a novel deep learning method for analyzing spermatozoa motility parameters in the Sri Lankan population. To implement the current approach, Sri Lanka patient data were collected anonymously as a dataset, and glass slides were used as a low-cost technique to analyze semen samples. Current problem was identified as microscopic object detection and tackling the problem. YOLOv5 was customized and used as the object detector, and it achieved 94 % mAP (mean average precision), 86% Precision, and 90% Recall with the gathered dataset. StrongSORT was used as the object tracker, and it was validated with andrology experts due to the unavailability of annotated ground truth data. Furthermore, this research has identified many potential ways for further investigation, and andrology experts can use this system to analyze motility parameters with realistic accuracy.

Keywords: computer vision, deep learning, convolutional neural networks, multi-target tracking, microscopic object detection and tracking, male infertility detection, motility analysis of human spermatozoa

Procedia PDF Downloads 105

695 Fuzzy Time Series Forecasting Based on Fuzzy Logical Relationships, PSO Technique, and Automatic Clustering Algorithm

Authors: A. K. M. Kamrul Islam, Abdelhamid Bouchachia, Suang Cang, Hongnian Yu

Abstract:

Forecasting model has a great impact in terms of prediction and continues to do so into the future. Although many forecasting models have been studied in recent years, most researchers focus on different forecasting methods based on fuzzy time series to solve forecasting problems. The forecasted models accuracy fully depends on the two terms that are the length of the interval in the universe of discourse and the content of the forecast rules. Moreover, a hybrid forecasting method can be an effective and efficient way to improve forecasts rather than an individual forecasting model. There are different hybrids forecasting models which combined fuzzy time series with evolutionary algorithms, but the performances are not quite satisfactory. In this paper, we proposed a hybrid forecasting model which deals with the first order as well as high order fuzzy time series and particle swarm optimization to improve the forecasted accuracy. The proposed method used the historical enrollments of the University of Alabama as dataset in the forecasting process. Firstly, we considered an automatic clustering algorithm to calculate the appropriate interval for the historical enrollments. Then particle swarm optimization and fuzzy time series are combined that shows better forecasting accuracy than other existing forecasting models.

Keywords: fuzzy time series (fts), particle swarm optimization, clustering algorithm, hybrid forecasting model

Procedia PDF Downloads 249

694 A Review on 3D Smart City Platforms Using Remotely Sensed Data to Aid Simulation and Urban Analysis

Authors: Slim Namouchi, Bruno Vallet, Imed Riadh Farah

Abstract:

3D urban models provide powerful tools for decision making, urban planning, and smart city services. The accuracy of this 3D based systems is directly related to the quality of these models. Since manual large-scale modeling, such as cities or countries is highly time intensive and very expensive process, a fully automatic 3D building generation is needed. However, 3D modeling process result depends on the input data, the proprieties of the captured objects, and the required characteristics of the reconstructed 3D model. Nowadays, producing 3D real-world model is no longer a problem. Remotely sensed data had experienced a remarkable increase in the recent years, especially data acquired using unmanned aerial vehicles (UAV). While the scanning techniques are developing, the captured data amount and the resolution are getting bigger and more precise. This paper presents a literature review, which aims to identify different methods of automatic 3D buildings extractions either from LiDAR or the combination of LiDAR and satellite or aerial images. Then, we present open source technologies, and data models (e.g., CityGML, PostGIS, Cesiumjs) used to integrate these models in geospatial base layers for smart city services.

Keywords: CityGML, LiDAR, remote sensing, SIG, Smart City, 3D urban modeling

Procedia PDF Downloads 134

693 Lip Localization Technique for Myanmar Consonants Recognition Based on Lip Movements

Authors: Thein Thein, Kalyar Myo San

Abstract:

Lip reading system is one of the different supportive technologies for hearing impaired, or elderly people or non-native speakers. For normal hearing persons in noisy environments or in conditions where the audio signal is not available, lip reading techniques can be used to increase their understanding of spoken language. Hearing impaired persons have used lip reading techniques as important tools to find out what was said by other people without hearing voice. Thus, visual speech information is important and become active research area. Using visual information from lip movements can improve the accuracy and robustness of a speech recognition system and the need for lip reading system is ever increasing for every language. However, the recognition of lip movement is a difficult task because of the region of interest (ROI) is nonlinear and noisy. Therefore, this paper proposes method to detect the accurate lips shape and to localize lip movement towards automatic lip tracking by using the combination of Otsu global thresholding technique and Moore Neighborhood Tracing Algorithm. Proposed method shows how accurate lip localization and tracking which is useful for speech recognition. In this work of study and experiments will be carried out the automatic lip localizing the lip shape for Myanmar consonants using the only visual information from lip movements which is useful for visual speech of Myanmar languages.

Keywords: lip reading, lip localization, lip tracking, Moore neighborhood tracing algorithm

Procedia PDF Downloads 352

692 Automatic Motion Trajectory Analysis for Dual Human Interaction Using Video Sequences

Authors: Yuan-Hsiang Chang, Pin-Chi Lin, Li-Der Jeng

Abstract:

Advance in techniques of image and video processing has enabled the development of intelligent video surveillance systems. This study was aimed to automatically detect moving human objects and to analyze events of dual human interaction in a surveillance scene. Our system was developed in four major steps: image preprocessing, human object detection, human object tracking, and motion trajectory analysis. The adaptive background subtraction and image processing techniques were used to detect and track moving human objects. To solve the occlusion problem during the interaction, the Kalman filter was used to retain a complete trajectory for each human object. Finally, the motion trajectory analysis was developed to distinguish between the interaction and non-interaction events based on derivatives of trajectories related to the speed of the moving objects. Using a database of 60 video sequences, our system could achieve the classification accuracy of 80% in interaction events and 95% in non-interaction events, respectively. In summary, we have explored the idea to investigate a system for the automatic classification of events for interaction and non-interaction events using surveillance cameras. Ultimately, this system could be incorporated in an intelligent surveillance system for the detection and/or classification of abnormal or criminal events (e.g., theft, snatch, fighting, etc.).

Keywords: motion detection, motion tracking, trajectory analysis, video surveillance

Procedia PDF Downloads 545

691 Study of Human Upper Arm Girth during Elbow Isokinetic Contractions Based on a Smart Circumferential Measuring System

Authors: Xi Wang, Xiaoming Tao, Raymond C. H. So

Abstract:

As one of the convenient and noninvasive sensing approaches, the automatic limb girth measurement has been applied to detect intention behind human motion from muscle deformation. The sensing validity has been elaborated by preliminary researches but still need more fundamental study, especially on kinetic contraction modes. Based on the novel fabric strain sensors, a soft and smart limb girth measurement system was developed by the authors’ group, which can measure the limb girth in-motion. Experiments were carried out on elbow isometric flexion and elbow isokinetic flexion (biceps’ isokinetic contractions) of 90°/s, 60°/s, and 120°/s for 10 subjects (2 canoeists and 8 ordinary people). After removal of natural circumferential increments due to elbow position, the joint torque is found not uniformly sensitive to the limb circumferential strains, but declining as elbow joint angle rises, regardless of the angular speed. Moreover, the maximum joint torque was found as an exponential function of the joint’s angular speed. This research highly contributes to the application of the automatic limb girth measuring during kinetic contractions, and it is useful to predict the contraction level of voluntary skeletal muscles.

Keywords: fabric strain sensor, muscle deformation, isokinetic contraction, joint torque, limb girth strain

Procedia PDF Downloads 336

690 Visual Template Detection and Compositional Automatic Regular Expression Generation for Business Invoice Extraction

Authors: Anthony Proschka, Deepak Mishra, Merlyn Ramanan, Zurab Baratashvili

Abstract:

Small and medium-sized businesses receive over 160 billion invoices every year. Since these documents exhibit many subtle differences in layout and text, extracting structured fields such as sender name, amount, and VAT rate from them automatically is an open research question. In this paper, existing work in template-based document extraction is extended, and a system is devised that is able to reliably extract all required fields for up to 70% of all documents in the data set, more than any other previously reported method. The approaches are described for 1) detecting through visual features which template a given document belongs to, 2) automatically generating extraction rules for a given new template by composing regular expressions from multiple components, and 3) computing confidence scores that indicate the accuracy of the automatic extractions. The system can generate templates with as little as one training sample and only requires the ground truth field values instead of detailed annotations such as bounding boxes that are hard to obtain. The system is deployed and used inside a commercial accounting software.

Keywords: data mining, information retrieval, business, feature extraction, layout, business data processing, document handling, end-user trained information extraction, document archiving, scanned business documents, automated document processing, F1-measure, commercial accounting software

Procedia PDF Downloads 129