Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 19026

Search results for: object recognition system

18756 Hybrid Deep Learning and FAST-BRISK 3D Object Detection Technique for Bin-Picking Application

Authors: Thanakrit Taweesoontorn, Sarucha Yanyong, Poom Konghuayrob

Abstract:

Robotic arms have gained popularity in various industries due to their accuracy and efficiency. This research proposes a method for bin-picking tasks using the Cobot, combining the YOLOv5 CNNs model for object detection and pose estimation with traditional feature detection (FAST), feature description (BRISK), and matching algorithms. By integrating these algorithms and utilizing a small-scale depth sensor camera for capturing depth and color images, the system achieves real-time object detection and accurate pose estimation, enabling the robotic arm to pick objects correctly in both position and orientation. Furthermore, the proposed method is implemented within the ROS framework to provide a seamless platform for robotic control and integration. This integration of robotics, cameras, and AI technology contributes to the development of industrial robotics, opening up new possibilities for automating challenging tasks and improving overall operational efficiency.

Keywords: robotic vision, image processing, applications of robotics, artificial intelligent

Procedia PDF Downloads 47

18755 Recognition of Grocery Products in Images Captured by Cellular Phones

Authors: Farshideh Einsele, Hassan Foroosh

Abstract:

In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation, style, illumination, and can suffer from perspective distortion. Pre-processing is performed to make the characters scale and rotation invariant. Since text degradations can not be appropriately defined using wellknown geometric transformations such as translation, rotation, affine transformation and shearing, we use the whole character black pixels as our feature vector. Classification is performed with minimum distance classifier using the maximum likelihood criterion, which delivers very promising Character Recognition Rate (CRR) of 89%. We achieve considerably higher Word Recognition Rate (WRR) of 99% when using lower level linguistic knowledge about product words during the recognition process.

Keywords: camera-based OCR, feature extraction, document, image processing, grocery products

Procedia PDF Downloads 375

18754 Object Trajectory Extraction by Using Mean of Motion Vectors Form Compressed Video Bitstream

Authors: Ching-Ting Hsu, Wei-Hua Ho, Yi-Chun Chang

Abstract:

Video object tracking is one of the popular research topics in computer graphics area. The trajectory can be applied in security, traffic control, even the sports training. The trajectory for sports training can be utilized to analyze the athlete’s performance without traditional sensors. There are many relevant works which utilize mean shift algorithm with background subtraction. This kind of the schemes should select a kernel function which may affect the accuracy and performance. In this paper, we consider the motion information in the pre-coded bitstream. The proposed algorithm extracts the trajectory by composing the motion vectors from the pre-coded bitstream. We gather the motion vectors from the overlap area of the object and calculate mean of the overlapped motion vectors. We implement and simulate our proposed algorithm in H.264 video codec. The performance is better than relevant works and keeps the accuracy of the object trajectory. The experimental results show that the proposed trajectory extraction can extract trajectory form the pre-coded bitstream in high accuracy and achieve higher performance other relevant works.

Keywords: H.264, video bitstream, video object tracking, sports training

Procedia PDF Downloads 402

18753 Selecting Answers for Questions with Multiple Answer Choices in Arabic Question Answering Based on Textual Entailment Recognition

Authors: Anes Enakoa, Yawei Liang

Abstract:

Question Answering (QA) system is one of the most important and demanding tasks in the field of Natural Language Processing (NLP). In QA systems, the answer generation task generates a list of candidate answers to the user's question, in which only one answer is correct. Answer selection is one of the main components of the QA, which is concerned with selecting the best answer choice from the candidate answers suggested by the system. However, the selection process can be very challenging especially in Arabic due to its particularities. To address this challenge, an approach is proposed to answer questions with multiple answer choices for Arabic QA systems based on Textual Entailment (TE) recognition. The developed approach employs a Support Vector Machine that considers lexical, semantic and syntactic features in order to recognize the entailment between the generated hypotheses (H) and the text (T). A set of experiments has been conducted for performance evaluation and the overall performance of the proposed method reached an accuracy of 67.5% with C@1 score of 80.46%. The obtained results are promising and demonstrate that the proposed method is effective for TE recognition task.

Keywords: information retrieval, machine learning, natural language processing, question answering, textual entailment

Procedia PDF Downloads 120

18752 An Erudite Technique for Face Detection and Recognition Using Curvature Analysis

Authors: S. Jagadeesh Kumar

Abstract:

Face detection and recognition is an authoritative technology for image database management, video surveillance, and human computer interface (HCI). Face recognition is a rapidly nascent method, which has been extensively discarded in forensics such as felonious identification, tenable entree, and custodial security. This paper recommends an erudite technique using curvature analysis (CA) that has less false positives incidence, operative in different light environments and confiscates the artifacts that are introduced during image acquisition by ring correction in polar coordinate (RCP) method. This technique affronts mean and median filtering technique to remove the artifacts but it works in polar coordinate during image acquisition. Investigational fallouts for face detection and recognition confirms decent recitation even in diagonal orientation and stance variation.

Keywords: curvature analysis, ring correction in polar coordinate method, face detection, face recognition, human computer interaction

Procedia PDF Downloads 254

18751 SiamMask++: More Accurate Object Tracking through Layer Wise Aggregation in Visual Object Tracking

Authors: Hyunbin Choi, Jihyeon Noh, Changwon Lim

Abstract:

In this paper, we propose SiamMask++, an architecture that performs layer-wise aggregation and depth-wise cross-correlation and introduce multi-RPN module and multi-MASK module to improve EAO (Expected Average Overlap), a representative performance evaluation metric for Visual Object Tracking (VOT) challenge. The proposed architecture, SiamMask++, has two versions, namely, bi_SiamMask++, which satisfies the real time (56fps) on systems equipped with GPUs (Titan XP), and rf_SiamMask++, which combines mask refinement modules for EAO improvements. Tests are performed on VOT2016, VOT2018 and VOT2019, the representative datasets of Visual Object Tracking tasks labeled as rotated bounding boxes. SiamMask++ perform better than SiamMask on all the three datasets tested. SiamMask++ is achieved performance of 62.6% accuracy, 26.2% robustness and 39.8% EAO, especially on the VOT2018 dataset. Compared to SiamMask, this is an improvement of 4.18%, 37.17%, 23.99%, respectively. In addition, we do an experimental in-depth analysis of how much the introduction of features and multi modules extracted from the backbone affects the performance of our model in the VOT task.

Keywords: visual object tracking, video, deep learning, layer wise aggregation, Siamese network

Procedia PDF Downloads 114

18750 Mathematical Reconstruction of an Object Image Using X-Ray Interferometric Fourier Holography Method

Authors: M. K. Balyan

Abstract:

The main principles of X-ray Fourier interferometric holography method are discussed. The object image is reconstructed by the mathematical method of Fourier transformation. The three methods are presented – method of approximation, iteration method and step by step method. As an example the complex amplitude transmission coefficient reconstruction of a beryllium wire is considered. The results reconstructed by three presented methods are compared. The best results are obtained by means of step by step method.

Keywords: dynamical diffraction, hologram, object image, X-ray holography

Procedia PDF Downloads 364

18749 Composing Method of Decision-Making Function for Construction Management Using Active 4D/5D/6D Objects

Authors: Hyeon-Seung Kim, Sang-Mi Park, Sun-Ju Han, Leen-Seok Kang

Abstract:

As BIM (Building Information Modeling) application continually expands, the visual simulation techniques used for facility design and construction process information are becoming increasingly advanced and diverse. For building structures, BIM application is design - oriented to utilize 3D objects for conflict management, whereas for civil engineering structures, the usability of nD object - oriented construction stage simulation is important in construction management. Simulations of 5D and 6D objects, for which cost and resources are linked along with process simulation in 4D objects, are commonly used, but they do not provide a decision - making function for process management problems that occur on site because they mostly focus on the visual representation of current status for process information. In this study, an nD CAD system is constructed that facilitates an optimized schedule simulation that minimizes process conflict, a construction duration reduction simulation according to execution progress status, optimized process plan simulation according to project cost change by year, and optimized resource simulation for field resource mobilization capability. Through this system, the usability of conventional simple simulation objects is expanded to the usability of active simulation objects with which decision - making is possible. Furthermore, to close the gap between field process situations and planned 4D process objects, a technique is developed to facilitate a comparative simulation through the coordinated synchronization of an actual video object acquired by an on - site web camera and VR concept 4D object. This synchronization and simulation technique can also be applied to smartphone video objects captured in the field in order to increase the usability of the 4D object. Because yearly project costs change frequently for civil engineering construction, an annual process plan should be recomposed appropriately according to project cost decreases/increases compared with the plan. In the 5D CAD system provided in this study, an active 5D object utilization concept is introduced to perform a simulation in an optimized process planning state by finding a process optimized for the changed project cost without changing the construction duration through a technique such as genetic algorithm. Furthermore, in resource management, an active 6D object utilization function is introduced that can analyze and simulate an optimized process plan within a possible scope of moving resources by considering those resources that can be moved under a given field condition, instead of using a simple resource change simulation by schedule. The introduction of an active BIM function is expected to increase the field utilization of conventional nD objects.

Keywords: 4D, 5D, 6D, active BIM

Procedia PDF Downloads 250

18748 Relational Attention Shift on Images Using Bu-Td Architecture and Sequential Structure Revealing

Authors: Alona Faktor

Abstract:

In this work, we present a NN-based computational model that can perform attention shifts according to high-level instruction. The instruction specifies the type of attentional shift using explicit geometrical relation. The instruction also can be of cognitive nature, specifying more complex human-human interaction or human-object interaction, or object-object interaction. Applying this approach sequentially allows obtaining a structural description of an image. A novel data-set of interacting humans and objects is constructed using a computer graphics engine. Using this data, we perform systematic research of relational segmentation shifts.

Keywords: cognitive science, attentin, deep learning, generalization

Procedia PDF Downloads 168

18747 An Evaluation of Neural Network Efficacies for Image Recognition on Edge-AI Computer Vision Platform

Authors: Jie Zhao, Meng Su

Abstract:

Image recognition, as one of the most critical technologies in computer vision, works to help machine-like robotics understand a scene, that is, if deployed appropriately, will trigger the revolution in remote sensing and industry automation. With the developments of AI technologies, there are many prevailing and sophisticated neural networks as technologies developed for image recognition. However, computer vision platforms as hardware, supporting neural networks for image recognition, as crucial as the neural network technologies, need to be more congruently addressed as the research subjects. In contrast, different computer vision platforms are deterministic to leverage the performance of different neural networks for recognition. In this paper, three different computer vision platforms – Jetson Nano(with 4GB), a standalone laptop(with RTX 3000s, using CUDA), and Google Colab (web-based, using GPU) are explored and four prominent neural network architectures (including AlexNet, VGG(16/19), GoogleNet, and ResNet(18/34/50)), are investigated. In the context of pairwise usage between different computer vision platforms and distinctive neural networks, with the merits of recognition accuracy and time efficiency, the performances are evaluated. In the case study using public imageNets, our findings provide a nuanced perspective on optimizing image recognition tasks across Edge-AI platforms, offering guidance on selecting appropriate neural network structures to maximize performance under hardware constraints.

Keywords: alexNet, VGG, googleNet, resNet, Jetson nano, CUDA, COCO-NET, cifar10, imageNet large scale visual recognition challenge (ILSVRC), google colab

Procedia PDF Downloads 50

18746 An Efficient Fundamental Matrix Estimation for Moving Object Detection

Authors: Yeongyu Choi, Ju H. Park, S. M. Lee, Ho-Youl Jung

Abstract:

In this paper, an improved method for estimating fundamental matrix is proposed. The method is applied effectively to monocular camera based moving object detection. The method consists of corner points detection, moving object’s motion estimation and fundamental matrix calculation. The corner points are obtained by using Harris corner detector, motions of moving objects is calculated from pyramidal Lucas-Kanade optical flow algorithm. Through epipolar geometry analysis using RANSAC, the fundamental matrix is calculated. In this method, we have improved the performances of moving object detection by using two threshold values that determine inlier or outlier. Through the simulations, we compare the performances with varying the two threshold values.

Keywords: corner detection, optical flow, epipolar geometry, RANSAC

Procedia PDF Downloads 376

18745 Improvement of Microscopic Detection of Acid-Fast Bacilli for Tuberculosis by Artificial Intelligence-Assisted Microscopic Platform and Medical Image Recognition System

Authors: Hsiao-Chuan Huang, King-Lung Kuo, Mei-Hsin Lo, Hsiao-Yun Chou, Yusen Lin

Abstract:

The most robust and economical method for laboratory diagnosis of TB is to identify mycobacterial bacilli (AFB) under acid-fast staining despite its disadvantages of low sensitivity and labor-intensive. Though digital pathology becomes popular in medicine, an automated microscopic system for microbiology is still not available. A new AI-assisted automated microscopic system, consisting of a microscopic scanner and recognition program powered by big data and deep learning, may significantly increase the sensitivity of TB smear microscopy. Thus, the objective is to evaluate such an automatic system for the identification of AFB. A total of 5,930 smears was enrolled for this study. An intelligent microscope system (TB-Scan, Wellgen Medical, Taiwan) was used for microscopic image scanning and AFB detection. 272 AFB smears were used for transfer learning to increase the accuracy. Referee medical technicians were used as Gold Standard for result discrepancy. Results showed that, under a total of 1726 AFB smears, the automated system's accuracy, sensitivity and specificity were 95.6% (1,650/1,726), 87.7% (57/65), and 95.9% (1,593/1,661), respectively. Compared to culture, the sensitivity for human technicians was only 33.8% (38/142); however, the automated system can achieve 74.6% (106/142), which is significantly higher than human technicians, and this is the first of such an automated microscope system for TB smear testing in a controlled trial. This automated system could achieve higher TB smear sensitivity and laboratory efficiency and may complement molecular methods (eg. GeneXpert) to reduce the total cost for TB control. Furthermore, such an automated system is capable of remote access by the internet and can be deployed in the area with limited medical resources.

Keywords: TB smears, automated microscope, artificial intelligence, medical imaging

Procedia PDF Downloads 190

18744 Audio-Visual Recognition Based on Effective Model and Distillation

Authors: Heng Yang, Tao Luo, Yakun Zhang, Kai Wang, Wei Qin, Liang Xie, Ye Yan, Erwei Yin

Abstract:

Recent years have seen that audio-visual recognition has shown great potential in a strong noise environment. The existing method of audio-visual recognition has explored methods with ResNet and feature fusion. However, on the one hand, ResNet always occupies a large amount of memory resources, restricting the application in engineering. On the other hand, the feature merging also brings some interferences in a high noise environment. In order to solve the problems, we proposed an effective framework with bidirectional distillation. At first, in consideration of the good performance in extracting of features, we chose the light model, Efficientnet as our extractor of spatial features. Secondly, self-distillation was applied to learn more information from raw data. Finally, we proposed a bidirectional distillation in decision-level fusion. In more detail, our experimental results are based on a multi-model dataset from 24 volunteers. Eventually, the lipreading accuracy of our framework was increased by 2.3% compared with existing systems, and our framework made progress in audio-visual fusion in a high noise environment compared with the system of audio recognition without visual.

Keywords: lipreading, audio-visual, Efficientnet, distillation

Procedia PDF Downloads 102

18743 Named Entity Recognition System for Tigrinya Language

Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager

Abstract:

The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.

Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF

Procedia PDF Downloads 64

18742 The Contribution of Lower Visual Channels and Evolutionary Origin of the Tunnel Effect

Authors: Shai Gabay

Abstract:

The tunnel effect describes the phenomenon where a moving object seems to persist even when temporarily hidden from view. Numerous studies indicate that humans, infants, and nonhuman primates possess object persistence, relying on spatiotemporal cues to track objects that are dynamically occluded. While this ability is associated with neural activity in the cerebral neocortex of humans and mammals, the role of subcortical mechanisms remains ambiguous. In our current investigation, we explore the functional contribution of monocular aspects of the visual system, predominantly subcortical, to the representation of occluded objects. This is achieved by manipulating whether the reappearance of an object occurs in the same or different eye from its disappearance. Additionally, we employ Archerfish, renowned for their precision in dislodging insect prey with water jets, as a phylogenetic model to probe the evolutionary origins of the tunnel effect. Our findings reveal the active involvement of subcortical structures in the mental representation of occluded objects, a process evident even in species that do not possess cortical tissue.

Keywords: archerfish, tunnel effect, mental representations, monocular channels, subcortical structures

Procedia PDF Downloads 4

18741 Detection of Pharmaceutical Personal Protective Equipment in Video Stream

Authors: Michael Leontiev, Danil Zhilikov, Dmitry Lobanov, Lenar Klimov, Vyacheslav Chertan, Daniel Bobrov, Vladislav Maslov, Vasilii Vologdin, Ksenia Balabaeva

Abstract:

Pharmaceutical manufacturing is a complex process, where each stage requires a high level of safety and sterility. Personal Protective Equipment (PPE) is used for this purpose. Despite all the measures of control, the human factor (improper PPE wearing) causes numerous losses to human health and material property. This research proposes a solid computer vision system for ensuring safety in pharmaceutical laboratories. For this, we have tested a wide range of state-of-the-art object detection methods. Composing previously obtained results in this sphere with our own approach to this problem, we have reached a high accuracy ([email protected]) ranging from 0.77 up to 0.98 in detecting all the elements of a common set of PPE used in pharmaceutical laboratories. Our system is a step towards safe medicine production.

Keywords: sterility and safety in pharmaceutical development, personal protective equipment, computer vision, object detection, monitoring in pharmaceutical development, PPE

Procedia PDF Downloads 38

18740 LuMee: A Centralized Smart Protector for School Children who are Using Online Education

Authors: Lumindu Dilumka, Ranaweera I. D., Sudusinghe S. P., Sanduni Kanchana A. M. K.

Abstract:

This study was motivated by the challenges experienced by parents and guardians in ensuring the safety of children in cyberspace. In the last two or three years, online education has become very popular all over the world due to the Covid 19 pandemic. Therefore, parents, guardians and teachers must ensure the safety of children in cyberspace. Children are more likely to go astray and there are plenty of online programs are waiting to get them on the wrong track and also, children who are engaging in the online education can be distracted at any moment. Therefore, parents should keep a close check on their children's online activity. Apart from that, due to the unawareness of children, they tempt to share their sensitive information, causing a chance of being a victim of phishing attacks from outsiders. These problems can be overcome through the proposed web-based system. We use feature extraction, web tracking and analysis mechanisms, image processing and name entity recognition to implement this web-based system.

Keywords: online education, cyber bullying, social media, face recognition, web tracker, privacy data

Procedia PDF Downloads 52

18739 Design and Development of 5-DOF Color Sorting Manipulator for Industrial Applications

Authors: Atef A. Ata, Sohair F. Rezeka, Ahmed El-Shenawy, Mohammed Diab

Abstract:

Image processing in today’s world grabs massive attentions as it leads to possibilities of broaden application in many fields of high technology. The real challenge is how to improve existing sorting system applications which consists of two integrated stations of processing and handling with a new image processing feature. Existing color sorting techniques use a set of inductive, capacitive, and optical sensors to differentiate object color. This research presents a mechatronics color sorting system solution with the application of image processing. A 5-DOF robot arm is designed and developed with pick and place operation to be main part of the color sorting system. Image processing procedure senses the circular objects in an image captured in real time by a webcam attached at the end-effector then extracts color and position information out of it. This information is passed as a sequence of sorting commands to the manipulator that has pick-and-place mechanism. Performance analysis proves that this color based object sorting system works very accurate under ideal condition in term of adequate illumination, circular objects shape and color. The circular objects tested for sorting are red, green and blue. For non-ideal condition, such as unspecified color the accuracy reduces to 80%.

Keywords: robotics manipulator, 5-DOF manipulator, image processing, color sorting, pick-and-place

Procedia PDF Downloads 341

18738 Multivariate Output-Associative RVM for Multi-Dimensional Affect Predictions

Authors: Achut Manandhar, Kenneth D. Morton, Peter A. Torrione, Leslie M. Collins

Abstract:

The current trends in affect recognition research are to consider continuous observations from spontaneous natural interactions in people using multiple feature modalities, and to represent affect in terms of continuous dimensions, incorporate spatio-temporal correlation among affect dimensions, and provide fast affect predictions. These research efforts have been propelled by a growing effort to develop affect recognition system that can be implemented to enable seamless real-time human-computer interaction in a wide variety of applications. Motivated by these desired attributes of an affect recognition system, in this work a multi-dimensional affect prediction approach is proposed by integrating multivariate Relevance Vector Machine (MVRVM) with a recently developed Output-associative Relevance Vector Machine (OARVM) approach. The resulting approach can provide fast continuous affect predictions by jointly modeling the multiple affect dimensions and their correlations. Experiments on the RECOLA database show that the proposed approach performs competitively with the OARVM while providing faster predictions during testing.

Keywords: dimensional affect prediction, output-associative RVM, multivariate regression, fast testing

Procedia PDF Downloads 260

18737 Online Handwritten Character Recognition for South Indian Scripts Using Support Vector Machines

Authors: Steffy Maria Joseph, Abdu Rahiman V, Abdul Hameed K. M.

Abstract:

Online handwritten character recognition is a challenging field in Artificial Intelligence. The classification success rate of current techniques decreases when the dataset involves similarity and complexity in stroke styles, number of strokes and stroke characteristics variations. Malayalam is a complex south indian language spoken by about 35 million people especially in Kerala and Lakshadweep islands. In this paper, we consider the significant feature extraction for the similar stroke styles of Malayalam. This extracted feature set are suitable for the recognition of other handwritten south indian languages like Tamil, Telugu and Kannada. A classification scheme based on support vector machines (SVM) is proposed to improve the accuracy in classification and recognition of online malayalam handwritten characters. SVM Classifiers are the best for real world applications. The contribution of various features towards the accuracy in recognition is analysed. Performance for different kernels of SVM are also studied. A graphical user interface has developed for reading and displaying the character. Different writing styles are taken for each of the 44 alphabets. Various features are extracted and used for classification after the preprocessing of input data samples. Highest recognition accuracy of 97% is obtained experimentally at the best feature combination with polynomial kernel in SVM.

Keywords: SVM, matlab, malayalam, South Indian scripts, onlinehandwritten character recognition

Procedia PDF Downloads 547

18736 The Design of Intelligent Classroom Management System with Raspberry PI

Authors: Sathapath Kilaso

Abstract:

Attendance checking in the classroom for student is object to record the student’s attendance in order to support the learning activities in the classroom. Despite the teaching trend in the 21st century is the student-center learning and the lecturer duty is to mentor and give an advice, the classroom learning is still important in order to let the student interact with the classmate and the lecturer or for a specific subject which the in-class learning is needed. The development of the system prototype by applied the microcontroller technology and embedded system with the “internet of thing” trend and the web socket technique will allow the lecturer to be alerted immediately whenever the data is updated.

Keywords: arduino, embedded system, classroom, raspberry PI

Procedia PDF Downloads 348

18735 Accuracy of Autonomy Navigation of Unmanned Aircraft Systems through Imagery

Authors: Sidney A. Lima, Hermann J. H. Kux, Elcio H. Shiguemori

Abstract:

The Unmanned Aircraft Systems (UAS) usually navigate through the Global Navigation Satellite System (GNSS) associated with an Inertial Navigation System (INS). However, GNSS can have its accuracy degraded at any time or even turn off the signal of GNSS. In addition, there is the possibility of malicious interferences, known as jamming. Therefore, the image navigation system can solve the autonomy problem, because if the GNSS is disabled or degraded, the image navigation system would continue to provide coordinate information for the INS, allowing the autonomy of the system. This work aims to evaluate the accuracy of the positioning though photogrammetry concepts. The methodology uses orthophotos and Digital Surface Models (DSM) as a reference to represent the object space and photograph obtained during the flight to represent the image space. For the calculation of the coordinates of the perspective center and camera attitudes, it is necessary to know the coordinates of homologous points in the object space (orthophoto coordinates and DSM altitude) and image space (column and line of the photograph). So if it is possible to automatically identify in real time the homologous points the coordinates and attitudes can be calculated whit their respective accuracies. With the methodology applied in this work, it is possible to verify maximum errors in the order of 0.5 m in the positioning and 0.6º in the attitude of the camera, so the navigation through the image can reach values equal to or higher than the GNSS receivers without differential correction. Therefore, navigating through the image is a good alternative to enable autonomous navigation.

Keywords: autonomy, navigation, security, photogrammetry, remote sensing, spatial resection, UAS

Procedia PDF Downloads 160

18734 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 68

18733 SAMRA: Dataset in Al-Soudani Arabic Maghrebi Script for Recognition of Arabic Ancient Words Handwritten

Authors: Sidi Ahmed Maouloud, Cheikh Ba

Abstract:

Much of West Africa’s cultural heritage is written in the Al-Soudani Arabic script, which was widely used in West Africa before the time of European colonization. This Al-Soudani Arabic script is an African version of the Maghrebi script, in particular, the Al-Mebssout script. However, the local African qualities were incorporated into the Al-Soudani script in a way that gave it a unique African diversity and character. Despite the existence of several Arabic datasets in Oriental script, allowing for the analysis, layout, and recognition of texts written in these calligraphies, many Arabic scripts and written traditions remain understudied. In this paper, we present a dataset of words from Al-Soudani calligraphy scripts. This dataset consists of 100 images selected from three different manuscripts written in Al-Soudani Arabic script by different copyists. The primary source for this database was the libraries of Boston University and Cambridge University. This dataset highlights the unique characteristics of the Al-Soudani Arabic script as well as the new challenges it presents in terms of automatic word recognition of Arabic manuscripts. An HTR system based on a hybrid ANN (CRNN-CTC) is also proposed to test this dataset. SAMRA is a dataset of annotated Arabic manuscript words in the Al-Soudani script that can help researchers automatically recognize and analyze manuscript words written in this script.

Keywords: dataset, CRNN-CTC, handwritten words recognition, Al-Soudani Arabic script, HTR, manuscripts

Procedia PDF Downloads 74

18732 Sign Language Recognition of Static Gestures Using Kinect™ and Convolutional Neural Networks

Authors: Rohit Semwal, Shivam Arora, Saurav, Sangita Roy

Abstract:

This work proposes a supervised framework with deep convolutional neural networks (CNNs) for vision-based sign language recognition of static gestures. Our approach addresses the acquisition and segmentation of correct inputs for the CNN-based classifier. Microsoft Kinect™ sensor, despite complex environmental conditions, can track hands efficiently. Skin Colour based segmentation is applied on cropped images of hands in different poses, used to depict different sign language gestures. The segmented hand images are used as an input for our classifier. The CNN classifier proposed in the paper is able to classify the input images with a high degree of accuracy. The system was trained and tested on 39 static sign language gestures, including 26 letters of the alphabet and 13 commonly used words. This paper includes a problem definition for building the proposed system, which acts as a sign language translator between deaf/mute and the rest of the society. It is then followed by a focus on reviewing existing knowledge in the area and work done by other researchers. It also describes the working principles behind different components of CNNs in brief. The architecture and system design specifications of the proposed system are discussed in the subsequent sections of the paper to give the reader a clear picture of the system in terms of the capability required. The design then gives the top-level details of how the proposed system meets the requirements.

Keywords: sign language, CNN, HCI, segmentation

Procedia PDF Downloads 119

18731 Labyrinthine Venous Vasculature Ablation for the Treatment of Sudden Sensorineural Hearing Loss: Two Case Reports

Authors: Kritin K. Verma, Bailey Duhon, Patrick W. Slater

Abstract:

Objective: To introduce the possible etiological role that the Labyrinthine Venous Vasculature (LVV) has in venous congestion of the cochlear system in Sudden Sensorineural Hearing Loss (SSNHL) patients. Patients: Two patients (62-year-old female, 50-year-old male) presented within twenty-four hours of onset of SSNHL. Intervention: Following failed conservative and salvage techniques, the patients underwent ablation of the labyrinthine venous vasculature ipsilateral to the side of the loss. Main Outcome Measures: Improvement of sudden SSNHL based on an improvement of pure-tone audiometric (PTA) low-tone scoring averages at 250, 500, and 1000 Hz. Word recognition scoring using the NU-6 word list was used to assess quality of life. Results: Case 1 experienced a 51.7 dB increase in low-tone PTA and an increased word recognition scoring of 90%. Case 2 experienced a 33.4 dB increase in low-tone PTA and 60% increase in word recognition score. No major complications noted. Conclusion: Two patients experienced significant improvement in their low-tone PTA and word recognition scoring following the labyrinthine venous vasculature ablation.

Keywords: case report, sudden sensorineural hearing loss, venous congestion, vascular ablation

Procedia PDF Downloads 107

18730 Developing an AI-Driven Application for Real-Time Emotion Recognition from Human Vocal Patterns

Authors: Sayor Ajfar Aaron, Mushfiqur Rahman, Sajjat Hossain Abir, Ashif Newaz

Abstract:

This study delves into the development of an artificial intelligence application designed for real-time emotion recognition from human vocal patterns. Utilizing advanced machine learning algorithms, including deep learning and neural networks, the paper highlights both the technical challenges and potential opportunities in accurately interpreting emotional cues from speech. Key findings demonstrate the critical role of diverse training datasets and the impact of ambient noise on recognition accuracy, offering insights into future directions for improving robustness and applicability in real-world scenarios.

Keywords: artificial intelligence, convolutional neural network, emotion recognition, vocal pattern

Procedia PDF Downloads 4

18729 Fine Grained Action Recognition of Skateboarding Tricks

Authors: Frederik Calsius, Mirela Popa, Alexia Briassouli

Abstract:

In the field of machine learning, it is common practice to use benchmark datasets to prove the working of a method. The domain of action recognition in videos often uses datasets like Kinet-ics, Something-Something, UCF-101 and HMDB-51 to report results. Considering the properties of the datasets, there are no datasets that focus solely on very short clips (2 to 3 seconds), and on highly-similar fine-grained actions within one specific domain. This paper researches how current state-of-the-art action recognition methods perform on a dataset that consists of highly similar, fine-grained actions. To do so, a dataset of skateboarding tricks was created. The performed analysis highlights both benefits and limitations of state-of-the-art methods, while proposing future research directions in the activity recognition domain. The conducted research shows that the best results are obtained by fusing RGB data with OpenPose data for the Temporal Shift Module.

Keywords: activity recognition, fused deep representations, fine-grained dataset, temporal modeling

Procedia PDF Downloads 199

18728 Relationships among Tourists’ Needs for Uniqueness, Perceived Authenticity and Behavioral Intentions

Authors: Deniz Karagöz Yüncü

Abstract:

This study tested a structural model which investigates the relationships among tourists’ need for uniqueness, perceived authenticity (object-based authenticity and existential authenticity) and behavioral intentions to consume cultural and heritage destinations. The sample of the study comprised of 281 participants in a cultural heritage site, in Cappadocia, Turkey. The data were provided via face to face interviews in two months (September and October) which considered the high season. Structural equation modeling was employed to test the causal relationships among the hypotheses. Findings revealed tourists’ creative choice had an influence on object-based authenticity and existential authenticity. Tourists’ avoidance had an influence on object-based authenticity. The study concluded that two dimensions, namely, the object based authenticity and existential authenticity had significant impact on behavioral intentions.

Keywords: needs for uniqueness, perceived existential authenticity, emotions, behavioral intentions

Procedia PDF Downloads 216

18727 The Lawfulness of the Determination of a Criminal Suspect as a New Pre-Trial's Object

Authors: Muhammad Tanziel Aziezi

Abstract:

In Indonesia, pre-trial (in Indonesia called ‘praperadilan’) is a mechanism that is regulated on Criminal Procedure Code as a form of oversight and check and balance on the process at the stage of inquiry, investigation, and prosecution, so that actions taken by the State (in this case, the police and prosecutor) is carried out in accordance with its authority and not violate human rights. Article 77 of the Criminal Procedure Code has been set that the object may be filed pretrial is just about the lawfulness of the arrest, the lawfulness of the detention, and the legitimacy of stopping investigation and prosecution. However, since the beginning of 2015, there was a further object which is then entered as a pre-trial object, namely the lawfulness of the determination of a criminal suspect. This is because the determination of the suspect is considered as one of the forceful measures that could restrict the rights of a person, so the implementation should have oversight and checks and balances by the courts. This paper will discuss the development of the pre-trial on the lawfulness of the determination of a criminal suspect as a new judicial mechanism as the protection of human rights in Indonesia.

Keywords: criminal procedure law, pre-trial, lawfulness of determination of a criminal suspect, check and balance by the court

Procedia PDF Downloads 310