Search results for: decoder
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 36

Search results for: decoder

36 Image Captioning with Vision-Language Models

Authors: Promise Ekpo Osaine, Daniel Melesse

Abstract:

Image captioning is an active area of research in the multi-modal artificial intelligence (AI) community as it connects vision and language understanding, especially in settings where it is required that a model understands the content shown in an image and generates semantically and grammatically correct descriptions. In this project, we followed a standard approach to a deep learning-based image captioning model, injecting architecture for the encoder-decoder setup, where the encoder extracts image features, and the decoder generates a sequence of words that represents the image content. As such, we investigated image encoders, which are ResNet101, InceptionResNetV2, EfficientNetB7, EfficientNetV2M, and CLIP. As a caption generation structure, we explored long short-term memory (LSTM). The CLIP-LSTM model demonstrated superior performance compared to the encoder-decoder models, achieving a BLEU-1 score of 0.904 and a BLEU-4 score of 0.640. Additionally, among the CNN-LSTM models, EfficientNetV2M-LSTM exhibited the highest performance with a BLEU-1 score of 0.896 and a BLEU-4 score of 0.586 while using a single-layer LSTM.

Keywords: multi-modal AI systems, image captioning, encoder, decoder, BLUE score

Procedia PDF Downloads 77
35 A Guide to the Implementation of Ambisonics Super Stereo

Authors: Alessio Mastrorillo, Giuseppe Silvi, Francesco Scagliola

Abstract:

In this work, we introduce an Ambisonics decoder with an implementation of the C-format, also called Super Stereo. This format is an alternative to conventional stereo and binaural decoding. Unlike those, this format conveys audio information from the horizontal plane and works with stereo speakers and headphones. The two C-format channels can also return a reconstructed planar B-format. This work provides an open-source implementation for this format. We implement an all-pass filter for signal quadrature, as required by the decoding equations. This filter works with six Biquads in a cascade configuration, with values for control frequency and quality factor discovered experimentally. The phase response of the filter delivers a small error in the 20-14.000Hz range. The decoder has been tested with audio sources up to 192kHz sample rate, returning pristine sound quality and detailed stereo image. It has been included in the Envelop for Live suite and is available as an open-source repository. This decoder has applications in Virtual Reality and 360° audio productions, music composition, and online streaming.

Keywords: ambisonics, UHJ, quadrature filter, virtual reality, Gerzon, decoder, stereo, binaural, biquad

Procedia PDF Downloads 91
34 High Performance Field Programmable Gate Array-Based Stochastic Low-Density Parity-Check Decoder Design for IEEE 802.3an Standard

Authors: Ghania Zerari, Abderrezak Guessoum, Rachid Beguenane

Abstract:

This paper introduces high-performance architecture for fully parallel stochastic Low-Density Parity-Check (LDPC) field programmable gate array (FPGA) based LDPC decoder. The new approach is designed to decrease the decoding latency and to reduce the FPGA logic utilisation. To accomplish the target logic utilisation reduction, the routing of the proposed sub-variable node (VN) internal memory is designed to utilize one slice distributed RAM. Furthermore, a VN initialization, using the channel input probability, is achieved to enhance the decoder convergence, without extra resources and without integrating the output saturated-counters. The Xilinx FPGA implementation, of IEEE 802.3an standard LDPC code, shows that the proposed decoding approach attain high performance along with reduction of FPGA logic utilisation.

Keywords: low-density parity-check (LDPC) decoder, stochastic decoding, field programmable gate array (FPGA), IEEE 802.3an standard

Procedia PDF Downloads 297
33 Extending Image Captioning to Video Captioning Using Encoder-Decoder

Authors: Sikiru Ademola Adewale, Joe Thomas, Bolanle Hafiz Matti, Tosin Ige

Abstract:

This project demonstrates the implementation and use of an encoder-decoder model to perform a many-to-many mapping of video data to text captions. The many-to-many mapping occurs via an input temporal sequence of video frames to an output sequence of words to form a caption sentence. Data preprocessing, model construction, and model training are discussed. Caption correctness is evaluated using 2-gram BLEU scores across the different splits of the dataset. Specific examples of output captions were shown to demonstrate model generality over the video temporal dimension. Predicted captions were shown to generalize over video action, even in instances where the video scene changed dramatically. Model architecture changes are discussed to improve sentence grammar and correctness.

Keywords: decoder, encoder, many-to-many mapping, video captioning, 2-gram BLEU

Procedia PDF Downloads 108
32 Analysis of Joint Source Channel LDPC Coding for Correlated Sources Transmission over Noisy Channels

Authors: Marwa Ben Abdessalem, Amin Zribi, Ammar Bouallègue

Abstract:

In this paper, a Joint Source Channel coding scheme based on LDPC codes is investigated. We consider two concatenated LDPC codes, one allows to compress a correlated source and the second to protect it against channel degradations. The original information can be reconstructed at the receiver by a joint decoder, where the source decoder and the channel decoder run in parallel by transferring extrinsic information. We investigate the performance of the JSC LDPC code in terms of Bit-Error Rate (BER) in the case of transmission over an Additive White Gaussian Noise (AWGN) channel, and for different source and channel rate parameters. We emphasize how JSC LDPC presents a performance tradeoff depending on the channel state and on the source correlation. We show that, the JSC LDPC is an efficient solution for a relatively low Signal-to-Noise Ratio (SNR) channel, especially with highly correlated sources. Finally, a source-channel rate optimization has to be applied to guarantee the best JSC LDPC system performance for a given channel.

Keywords: AWGN channel, belief propagation, joint source channel coding, LDPC codes

Procedia PDF Downloads 357
31 End-to-End Spanish-English Sequence Learning Translation Model

Authors: Vidhu Mitha Goutham, Ruma Mukherjee

Abstract:

The low availability of well-trained, unlimited, dynamic-access models for specific languages makes it hard for corporate users to adopt quick translation techniques and incorporate them into product solutions. As translation tasks increasingly require a dynamic sequence learning curve; stable, cost-free opensource models are scarce. We survey and compare current translation techniques and propose a modified sequence to sequence model repurposed with attention techniques. Sequence learning using an encoder-decoder model is now paving the path for higher precision levels in translation. Using a Convolutional Neural Network (CNN) encoder and a Recurrent Neural Network (RNN) decoder background, we use Fairseq tools to produce an end-to-end bilingually trained Spanish-English machine translation model including source language detection. We acquire competitive results using a duo-lingo-corpus trained model to provide for prospective, ready-made plug-in use for compound sentences and document translations. Our model serves a decent system for large, organizational data translation needs. While acknowledging its shortcomings and future scope, it also identifies itself as a well-optimized deep neural network model and solution.

Keywords: attention, encoder-decoder, Fairseq, Seq2Seq, Spanish, translation

Procedia PDF Downloads 175
30 Deep-Learning to Generation of Weights for Image Captioning Using Part-of-Speech Approach

Authors: Tiago do Carmo Nogueira, Cássio Dener Noronha Vinhal, Gélson da Cruz Júnior, Matheus Rudolfo Diedrich Ullmann

Abstract:

Generating automatic image descriptions through natural language is a challenging task. Image captioning is a task that consistently describes an image by combining computer vision and natural language processing techniques. To accomplish this task, cutting-edge models use encoder-decoder structures. Thus, Convolutional Neural Networks (CNN) are used to extract the characteristics of the images, and Recurrent Neural Networks (RNN) generate the descriptive sentences of the images. However, cutting-edge approaches still suffer from problems of generating incorrect captions and accumulating errors in the decoders. To solve this problem, we propose a model based on the encoder-decoder structure, introducing a module that generates the weights according to the importance of the word to form the sentence, using the part-of-speech (PoS). Thus, the results demonstrate that our model surpasses state-of-the-art models.

Keywords: gated recurrent units, caption generation, convolutional neural network, part-of-speech

Procedia PDF Downloads 102
29 Towards Long-Range Pixels Connection for Context-Aware Semantic Segmentation

Authors: Muhammad Zubair Khan, Yugyung Lee

Abstract:

Deep learning has recently achieved enormous response in semantic image segmentation. The previously developed U-Net inspired architectures operate with continuous stride and pooling operations, leading to spatial data loss. Also, the methods lack establishing long-term pixels connection to preserve context knowledge and reduce spatial loss in prediction. This article developed encoder-decoder architecture with bi-directional LSTM embedded in long skip-connections and densely connected convolution blocks. The network non-linearly combines the feature maps across encoder-decoder paths for finding dependency and correlation between image pixels. Additionally, the densely connected convolutional blocks are kept in the final encoding layer to reuse features and prevent redundant data sharing. The method applied batch-normalization for reducing internal covariate shift in data distributions. The empirical evidence shows a promising response to our method compared with other semantic segmentation techniques.

Keywords: deep learning, semantic segmentation, image analysis, pixels connection, convolution neural network

Procedia PDF Downloads 102
28 Low Light Image Enhancement with Multi-Stage Interconnected Autoencoders Integration in Pix to Pix GAN

Authors: Muhammad Atif, Cang Yan

Abstract:

The enhancement of low-light images is a significant area of study aimed at enhancing the quality of captured images in challenging lighting environments. Recently, methods based on convolutional neural networks (CNN) have gained prominence as they offer state-of-the-art performance. However, many approaches based on CNN rely on increasing the size and complexity of the neural network. In this study, we propose an alternative method for improving low-light images using an autoencoder-based multiscale knowledge transfer model. Our method leverages the power of three autoencoders, where the encoders of the first two autoencoders are directly connected to the decoder of the third autoencoder. Additionally, the decoder of the first two autoencoders is connected to the encoder of the third autoencoder. This architecture enables effective knowledge transfer, allowing the third autoencoder to learn and benefit from the enhanced knowledge extracted by the first two autoencoders. We further integrate the proposed model into the PIX to PIX GAN framework. By integrating our proposed model as the generator in the GAN framework, we aim to produce enhanced images that not only exhibit improved visual quality but also possess a more authentic and realistic appearance. These experimental results, both qualitative and quantitative, show that our method is better than the state-of-the-art methodologies.

Keywords: low light image enhancement, deep learning, convolutional neural network, image processing

Procedia PDF Downloads 80
27 Design of SAE J2716 Single Edge Nibble Transmission Digital Sensor Interface for Automotive Applications

Authors: Jongbae Lee, Seongsoo Lee

Abstract:

Modern sensors often embed small-size digital controller for sensor control, value calibration, and signal processing. These sensors require digital data communication with host microprocessors, but conventional digital communication protocols are too heavy for price reduction. SAE J2716 SENT (single edge nibble transmission) protocol transmits direct digital waveforms instead of complicated analog modulated signals. In this paper, a SENT interface is designed in Verilog HDL (hardware description language) and implemented in FPGA (field-programmable gate array) evaluation board. The designed SENT interface consists of frame encoder/decoder, configuration register, tick period generator, CRC (cyclic redundancy code) generator/checker, and TX/RX (transmission/reception) buffer. Frame encoder/decoder is implemented as a finite state machine, and it controls whole SENT interface. Configuration register contains various parameters such as operation mode, tick length, CRC option, pause pulse option, and number of nibble data. Tick period generator generates tick signals from input clock. CRC generator/checker generates or checks CRC in the SENT data frame. TX/RX buffer stores transmission/received data. The designed SENT interface can send or receives digital data in 25~65 kbps at 3 us tick. Synthesized in 0.18 um fabrication technologies, it is implemented about 2,500 gates.

Keywords: digital sensor interface, SAE J2716, SENT, verilog HDL

Procedia PDF Downloads 300
26 Memory Based Reinforcement Learning with Transformers for Long Horizon Timescales and Continuous Action Spaces

Authors: Shweta Singh, Sudaman Katti

Abstract:

The most well-known sequence models make use of complex recurrent neural networks in an encoder-decoder configuration. The model used in this research makes use of a transformer, which is based purely on a self-attention mechanism, without relying on recurrence at all. More specifically, encoders and decoders which make use of self-attention and operate based on a memory, are used. In this research work, results for various 3D visual and non-visual reinforcement learning tasks designed in Unity software were obtained. Convolutional neural networks, more specifically, nature CNN architecture, are used for input processing in visual tasks, and comparison with standard long short-term memory (LSTM) architecture is performed for both visual tasks based on CNNs and non-visual tasks based on coordinate inputs. This research work combines the transformer architecture with the proximal policy optimization technique used popularly in reinforcement learning for stability and better policy updates while training, especially for continuous action spaces, which are used in this research work. Certain tasks in this paper are long horizon tasks that carry on for a longer duration and require extensive use of memory-based functionalities like storage of experiences and choosing appropriate actions based on recall. The transformer, which makes use of memory and self-attention mechanism in an encoder-decoder configuration proved to have better performance when compared to LSTM in terms of exploration and rewards achieved. Such memory based architectures can be used extensively in the field of cognitive robotics and reinforcement learning.

Keywords: convolutional neural networks, reinforcement learning, self-attention, transformers, unity

Procedia PDF Downloads 136
25 A CMOS Capacitor Array for ESPAR with Fast Switching Time

Authors: Jin-Sup Kim, Se-Hwan Choi, Jae-Young Lee

Abstract:

A 8-bit CMOS capacitor array is designed for using in electrically steerable passive array radiator (ESPAR). The proposed capacitor array shows the fast response time in rising and falling characteristics. Compared to other works in silicon-on-insulator (SOI) or silicon-on-sapphire (SOS) technologies, it shows a comparable tuning range and switching time with low power consumption. Using the 0.18um CMOS, the capacitor array features a tuning range of 1.5 to 12.9 pF at 2.4GHz. Including the 2X4 decoder for control interface, the Chip size is 350um X 145um. Current consumption is about 80 nA at 1.8 V operation.

Keywords: CMOS capacitor array, ESPAR, SOI, SOS, switching time

Procedia PDF Downloads 589
24 Defect Detection for Nanofibrous Images with Deep Learning-Based Approaches

Authors: Gaokai Liu

Abstract:

Automatic defect detection for nanomaterial images is widely required in industrial scenarios. Deep learning approaches are considered as the most effective solutions for the great majority of image-based tasks. In this paper, an edge guidance network for defect segmentation is proposed. First, the encoder path with multiple convolution and downsampling operations is applied to the acquisition of shared features. Then two decoder paths both are connected to the last convolution layer of the encoder and supervised by the edge and segmentation labels, respectively, to guide the whole training process. Meanwhile, the edge and encoder outputs from the same stage are concatenated to the segmentation corresponding part to further tune the segmentation result. Finally, the effectiveness of the proposed method is verified via the experiments on open nanofibrous datasets.

Keywords: deep learning, defect detection, image segmentation, nanomaterials

Procedia PDF Downloads 149
23 Optical Multicast over OBS Networks: An Approach Based on Code-Words and Tunable Decoders

Authors: Maha Sliti, Walid Abdallah, Noureddine Boudriga

Abstract:

In the frame of this work, we present an optical multicasting approach based on optical code-words. Our approach associates, in the edge node, an optical code-word to a group multicast address. In the core node, a set of tunable decoders are used to send a traffic data to multiple destinations based on the received code-word. The use of code-words, which correspond to the combination of an input port and a set of output ports, allows the implementation of an optical switching matrix. At the reception of a burst, it will be delayed in an optical memory. And, the received optical code-word is split to a set of tunable optical decoders. When it matches a configured code-word, the delayed burst is switched to a set of output ports.

Keywords: optical multicast, optical burst switching networks, optical code-words, tunable decoder, virtual optical memory

Procedia PDF Downloads 607
22 Reducing Power Consumption in Network on Chip Using Scramble Techniques

Authors: Vinayaga Jagadessh Raja, R. Ganesan, S. Ramesh Kumar

Abstract:

An ever more significant fraction of the overall power dissipation of a network-on-chip (NoC) based system on- chip (SoC) is due to the interconnection scheme. In information, as equipment shrinks, the power contributes of NoC links starts to compete with that of NoC routers. In this paper, we propose the use of clock gating in the data encoding techniques as a viable way to reduce both power dissipation and time consumption of NoC links. The projected scramble scheme exploits the wormhole switching techniques. That is, flits are scramble by the network interface (NI) before they are injected in the network and are decoded by the target NI. This makes the scheme transparent to the underlying network since the encoder and decoder logic is integrated in the NI and no modification of the routers structural design is required. We review the projected scramble scheme on a set of representative data streams (both synthetic and extracted from real applications) showing that it is possible to reduce the power contribution of both the self-switching activity and the coupling switching activity in inter-routers links.

Keywords: Xilinx 12.1, power consumption, Encoder, NOC

Procedia PDF Downloads 400
21 Drug-Drug Interaction Prediction in Diabetes Mellitus

Authors: Rashini Maduka, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

Drug-drug interactions (DDIs) can happen when two or more drugs are taken together. Today DDIs have become a serious health issue due to adverse drug effects. In vivo and in vitro methods for identifying DDIs are time-consuming and costly. Therefore, in-silico-based approaches are preferred in DDI identification. Most machine learning models for DDI prediction are used chemical and biological drug properties as features. However, some drug features are not available and costly to extract. Therefore, it is better to make automatic feature engineering. Furthermore, people who have diabetes already suffer from other diseases and take more than one medicine together. Then adverse drug effects may happen to diabetic patients and cause unpleasant reactions in the body. In this study, we present a model with a graph convolutional autoencoder and a graph decoder using a dataset from DrugBank version 5.1.3. The main objective of the model is to identify unknown interactions between antidiabetic drugs and the drugs taken by diabetic patients for other diseases. We considered automatic feature engineering and used Known DDIs only as the input for the model. Our model has achieved 0.86 in AUC and 0.86 in AP.

Keywords: drug-drug interaction prediction, graph embedding, graph convolutional networks, adverse drug effects

Procedia PDF Downloads 100
20 Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

Authors: Hamed Alqahtani, Manolya Kavakli-Thorne

Abstract:

The large pose discrepancy is one of the critical challenges in face recognition during video surveillance. Due to the entanglement of pose attributes with identity information, the conventional approaches for pose-independent representation lack in providing quality results in recognizing largely posed faces. In this paper, we propose a practical approach to disentangle the pose attribute from the identity information followed by synthesis of a face using a classifier network in latent space. The proposed approach employs a modified generative adversarial network framework consisting of an encoder-decoder structure embedded with a classifier in manifold space for carrying out factorization on the latent encoding. It can be further generalized to other face and non-face attributes for real-life video frames containing faces with significant attribute variations. Experimental results and comparison with state of the art in the field prove that the learned representation of the proposed approach synthesizes more compelling perceptual images through a combination of adversarial and classification losses.

Keywords: disentanglement, face detection, generative adversarial networks, video surveillance

Procedia PDF Downloads 129
19 Multimodal Direct Neural Network Positron Emission Tomography Reconstruction

Authors: William Whiteley, Jens Gregor

Abstract:

In recent developments of direct neural network based positron emission tomography (PET) reconstruction, two prominent architectures have emerged for converting measurement data into images: 1) networks that contain fully-connected layers; and 2) networks that primarily use a convolutional encoder-decoder architecture. In this paper, we present a multi-modal direct PET reconstruction method called MDPET, which is a hybrid approach that combines the advantages of both types of networks. MDPET processes raw data in the form of sinograms and histo-images in concert with attenuation maps to produce high quality multi-slice PET images (e.g., 8x440x440). MDPET is trained on a large whole-body patient data set and evaluated both quantitatively and qualitatively against target images reconstructed with the standard PET reconstruction benchmark of iterative ordered subsets expectation maximization. The results show that MDPET outperforms the best previously published direct neural network methods in measures of bias, signal-to-noise ratio, mean absolute error, and structural similarity.

Keywords: deep learning, image reconstruction, machine learning, neural network, positron emission tomography

Procedia PDF Downloads 110
18 Randomness in Cybertext: A Study on Computer-Generated Poetry from the Perspective of Semiotics

Authors: Hongliang Zhang

Abstract:

The use of chance procedures and randomizers in poetry-writing can be traced back to surrealist works, which, by appealing to Sigmund Freud's theories, were still logocentrism. In the 1960s, random permutation and combination were extensively used by the Oulipo, John Cage and Jackson Mac Low, which further deconstructed the metaphysical presence of writing. Today, the randomly-generated digital poetry has emerged as a genre of cybertext which should be co-authored by readers. At the same time, the classical theories have now been updated by cybernetics and media theories. N· Katherine Hayles put forward the concept of ‘the floating signifiers’ by Jacques Lacan to be the ‘the flickering signifiers’ , arguing that the technology per se has become a part of the textual production. This paper makes a historical review of the computer-generated poetry in the perspective of semiotics, emphasizing that the randomly-generated digital poetry which hands over the dual tasks of both interpretation and writing to the readers demonstrates the intervention of media technology in literature. With the participation of computerized algorithm and programming languages, poems randomly generated by computers have not only blurred the boundary between encoder and decoder, but also raises the issue of human-machine. It is also a significant feature of the cybertext that the productive process of the text is full of randomness.

Keywords: cybertext, digital poetry, poetry generator, semiotics

Procedia PDF Downloads 175
17 Sea-Land Segmentation Method Based on the Transformer with Enhanced Edge Supervision

Authors: Lianzhong Zhang, Chao Huang

Abstract:

Sea-land segmentation is a basic step in many tasks such as sea surface monitoring and ship detection. The existing sea-land segmentation algorithms have poor segmentation accuracy, and the parameter adjustments are cumbersome and difficult to meet actual needs. Also, the current sea-land segmentation adopts traditional deep learning models that use Convolutional Neural Networks (CNN). At present, the transformer architecture has achieved great success in the field of natural images, but its application in the field of radar images is less studied. Therefore, this paper proposes a sea-land segmentation method based on the transformer architecture to strengthen edge supervision. It uses a self-attention mechanism with a gating strategy to better learn relative position bias. Meanwhile, an additional edge supervision branch is introduced. The decoder stage allows the feature information of the two branches to interact, thereby improving the edge precision of the sea-land segmentation. Based on the Gaofen-3 satellite image dataset, the experimental results show that the method proposed in this paper can effectively improve the accuracy of sea-land segmentation, especially the accuracy of sea-land edges. The mean IoU (Intersection over Union), edge precision, overall precision, and F1 scores respectively reach 96.36%, 84.54%, 99.74%, and 98.05%, which are superior to those of the mainstream segmentation models and have high practical application values.

Keywords: SAR, sea-land segmentation, deep learning, transformer

Procedia PDF Downloads 181
16 Maximum-likelihood Inference of Multi-Finger Movements Using Neural Activities

Authors: Kyung-Jin You, Kiwon Rhee, Marc H. Schieber, Nitish V. Thakor, Hyun-Chool Shin

Abstract:

It remains unknown whether M1 neurons encode multi-finger movements independently or as a certain neural network of single finger movements although multi-finger movements are physically a combination of single finger movements. We present an evidence of correlation between single and multi-finger movements and also attempt a challenging task of semi-blind decoding of neural data with minimum training of the neural decoder. Data were collected from 115 task-related neurons in M1 of a trained rhesus monkey performing flexion and extension of each finger and the wrist (12 single and 6 two-finger-movements). By exploiting correlation of temporal firing pattern between movements, we found that correlation coefficient for physically related movements pairs is greater than others; neurons tuned to single finger movements increased their firing rate when multi-finger commands were instructed. According to this knowledge, neural semi-blind decoding is done by choosing the greatest and the second greatest likelihood for canonical candidates. We achieved a decoding accuracy about 60% for multiple finger movement without corresponding training data set. this results suggest that only with the neural activities on single finger movements can be exploited to control dexterous multi-fingered neuroprosthetics.

Keywords: finger movement, neural activity, blind decoding, M1

Procedia PDF Downloads 320
15 An Approach of Node Model TCnNet: Trellis Coded Nanonetworks on Graphene Composite Substrate

Authors: Diogo Ferreira Lima Filho, José Roberto Amazonas

Abstract:

Nanotechnology opens the door to new paradigms that introduces a variety of novel tools enabling a plethora of potential applications in the biomedical, industrial, environmental, and military fields. This work proposes an integrated node model by applying the same concepts of TCNet to networks of nanodevices where the nodes are cooperatively interconnected with a low-complexity Mealy Machine (MM) topology integrating in the same electronic system the modules necessary for independent operation in wireless sensor networks (WSNs), consisting of Rectennas (RF to DC power converters), Code Generators based on Finite State Machine (FSM) & Trellis Decoder and On-chip Transmit/Receive with autonomy in terms of energy sources applying the Energy Harvesting technique. This approach considers the use of a Graphene Composite Substrate (GCS) for the integrated electronic circuits meeting the following characteristics: mechanical flexibility, miniaturization, and optical transparency, besides being ecological. In addition, graphene consists of a layer of carbon atoms with the configuration of a honeycomb crystal lattice, which has attracted the attention of the scientific community due to its unique Electrical Characteristics.

Keywords: composite substrate, energy harvesting, finite state machine, graphene, nanotechnology, rectennas, wireless sensor networks

Procedia PDF Downloads 105
14 DCDNet: Lightweight Document Corner Detection Network Based on Attention Mechanism

Authors: Kun Xu, Yuan Xu, Jia Qiao

Abstract:

The document detection plays an important role in optical character recognition and text analysis. Because the traditional detection methods have weak generalization ability, and deep neural network has complex structure and large number of parameters, which cannot be well applied in mobile devices, this paper proposes a lightweight Document Corner Detection Network (DCDNet). DCDNet is a two-stage architecture. The first stage with Encoder-Decoder structure adopts depthwise separable convolution to greatly reduce the network parameters. After introducing the Feature Attention Union (FAU) module, the second stage enhances the feature information of spatial and channel dim and adaptively adjusts the size of receptive field to enhance the feature expression ability of the model. Aiming at solving the problem of the large difference in the number of pixel distribution between corner and non-corner, Weighted Binary Cross Entropy Loss (WBCE Loss) is proposed to define corner detection problem as a classification problem to make the training process more efficient. In order to make up for the lack of Dataset of document corner detection, a Dataset containing 6620 images named Document Corner Detection Dataset (DCDD) is made. Experimental results show that the proposed method can obtain fast, stable and accurate detection results on DCDD.

Keywords: document detection, corner detection, attention mechanism, lightweight

Procedia PDF Downloads 354
13 Deep Vision: A Robust Dominant Colour Extraction Framework for T-Shirts Based on Semantic Segmentation

Authors: Kishore Kumar R., Kaustav Sengupta, Shalini Sood Sehgal, Poornima Santhanam

Abstract:

Fashion is a human expression that is constantly changing. One of the prime factors that consistently influences fashion is the change in colour preferences. The role of colour in our everyday lives is very significant. It subconsciously explains a lot about one’s mindset and mood. Analyzing the colours by extracting them from the outfit images is a critical study to examine the individual’s/consumer behaviour. Several research works have been carried out on extracting colours from images, but to the best of our knowledge, there were no studies that extract colours to specific apparel and identify colour patterns geographically. This paper proposes a framework for accurately extracting colours from T-shirt images and predicting dominant colours geographically. The proposed method consists of two stages: first, a U-Net deep learning model is adopted to segment the T-shirts from the images. Second, the colours are extracted only from the T-shirt segments. The proposed method employs the iMaterialist (Fashion) 2019 dataset for the semantic segmentation task. The proposed framework also includes a mechanism for gathering data and analyzing India’s general colour preferences. From this research, it was observed that black and grey are the dominant colour in different regions of India. The proposed method can be adapted to study fashion’s evolving colour preferences.

Keywords: colour analysis in t-shirts, convolutional neural network, encoder-decoder, k-means clustering, semantic segmentation, U-Net model

Procedia PDF Downloads 111
12 Linear Decoding Applied to V5/MT Neuronal Activity on Past Trials Predicts Current Sensory Choices

Authors: Ben Hadj Hassen Sameh, Gaillard Corentin, Andrew Parker, Kristine Krug

Abstract:

Perceptual decisions about sequences of sensory stimuli often show serial dependence. The behavioural choice on one trial is often affected by the choice on previous trials. We investigated whether the neuronal signals in extrastriate visual area V5/MT on preceding trials might influence choice on the current trial and thereby reveal the neuronal mechanisms of sequential choice effects. We analysed data from 30 single neurons recorded from V5/MT in three Rhesus monkeys making sequential choices about the direction of rotation of a three-dimensional cylinder. We focused exclusively on the responses of neurons that showed significant choice-related firing (mean choice probability =0.73) while the monkey viewed perceptually ambiguous stimuli. Application of a wavelet transform to the choice-related firing revealed differences in the frequency band of neuronal activity that depended on whether the previous trial resulted in a correct choice for an unambiguous stimulus that was in the neuron’s preferred direction (low alpha and high beta and gamma) or non-preferred direction (high alpha and low beta and gamma). To probe this in further detail, we applied a regularized linear decoder to predict the choice for an ambiguous trial by referencing the neuronal activity of the preceding unambiguous trial. Neuronal activity on a previous trial provided a significant prediction of the current choice (61% correc, 95%Cl~52%t), even when limiting analysis to preceding trials that were correct and rewarded. These findings provide a potential neuronal signature of sequential choice effects in the primate visual cortex.

Keywords: perception, decision making, attention, decoding, visual system

Procedia PDF Downloads 139
11 Variational Explanation Generator: Generating Explanation for Natural Language Inference Using Variational Auto-Encoder

Authors: Zhen Cheng, Xinyu Dai, Shujian Huang, Jiajun Chen

Abstract:

Recently, explanatory natural language inference has attracted much attention for the interpretability of logic relationship prediction, which is also known as explanation generation for Natural Language Inference (NLI). Existing explanation generators based on discriminative Encoder-Decoder architecture have achieved noticeable results. However, we find that these discriminative generators usually generate explanations with correct evidence but incorrect logic semantic. It is due to that logic information is implicitly encoded in the premise-hypothesis pairs and difficult to model. Actually, logic information identically exists between premise-hypothesis pair and explanation. And it is easy to extract logic information that is explicitly contained in the target explanation. Hence we assume that there exists a latent space of logic information while generating explanations. Specifically, we propose a generative model called Variational Explanation Generator (VariationalEG) with a latent variable to model this space. Training with the guide of explicit logic information in target explanations, latent variable in VariationalEG could capture the implicit logic information in premise-hypothesis pairs effectively. Additionally, to tackle the problem of posterior collapse while training VariaztionalEG, we propose a simple yet effective approach called Logic Supervision on the latent variable to force it to encode logic information. Experiments on explanation generation benchmark—explanation-Stanford Natural Language Inference (e-SNLI) demonstrate that the proposed VariationalEG achieves significant improvement compared to previous studies and yields a state-of-the-art result. Furthermore, we perform the analysis of generated explanations to demonstrate the effect of the latent variable.

Keywords: natural language inference, explanation generation, variational auto-encoder, generative model

Procedia PDF Downloads 151
10 Selection of Optimal Reduced Feature Sets of Brain Signal Analysis Using Heuristically Optimized Deep Autoencoder

Authors: Souvik Phadikar, Nidul Sinha, Rajdeep Ghosh

Abstract:

In brainwaves research using electroencephalogram (EEG) signals, finding the most relevant and effective feature set for identification of activities in the human brain is a big challenge till today because of the random nature of the signals. The feature extraction method is a key issue to solve this problem. Finding those features that prove to give distinctive pictures for different activities and similar for the same activities is very difficult, especially for the number of activities. The performance of a classifier accuracy depends on this quality of feature set. Further, more number of features result in high computational complexity and less number of features compromise with the lower performance. In this paper, a novel idea of the selection of optimal feature set using a heuristically optimized deep autoencoder is presented. Using various feature extraction methods, a vast number of features are extracted from the EEG signals and fed to the autoencoder deep neural network. The autoencoder encodes the input features into a small set of codes. To avoid the gradient vanish problem and normalization of the dataset, a meta-heuristic search algorithm is used to minimize the mean square error (MSE) between encoder input and decoder output. To reduce the feature set into a smaller one, 4 hidden layers are considered in the autoencoder network; hence it is called Heuristically Optimized Deep Autoencoder (HO-DAE). In this method, no features are rejected; all the features are combined into the response of responses of the hidden layer. The results reveal that higher accuracy can be achieved using optimal reduced features. The proposed HO-DAE is also compared with the regular autoencoder to test the performance of both. The performance of the proposed method is validated and compared with the other two methods recently reported in the literature, which reveals that the proposed method is far better than the other two methods in terms of classification accuracy.

Keywords: autoencoder, brainwave signal analysis, electroencephalogram, feature extraction, feature selection, optimization

Procedia PDF Downloads 114
9 Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli

Abstract:

Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are playing an increasingly important role in automated customer service. These models, which are able to recognize complex relationships between input and output sequences, are crucial for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the focus of the model during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the domain of chatbots using the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Utilizing the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k=3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k=3). These results emphasize the crucial influence of selecting an appropriate attention-scoring function in improving the performance of seq2seq models for chatbots. Particularly, the model that integrates tanh activation proves to be a promising approach to improve the quality of chatbots in the customer support context.

Keywords: attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence

Procedia PDF Downloads 78
8 Analysis and Identification of Trends in Electric Vehicle Crash Data

Authors: Cody Stolle, Mojdeh Asadollahipajouh, Khaleb Pafford, Jada Iwuoha, Samantha White, Becky Mueller

Abstract:

Battery-electric vehicles (BEVs) are growing in sales and popularity in the United States as an alternative to traditional internal combustion engine vehicles (ICEVs). BEVs are generally heavier than corresponding models of ICEVs, with large battery packs located beneath the vehicle floorpan, a “skateboard” chassis, and have front and rear crush space available in the trunk and “frunk” or front trunk. The geometrical and frame differences between the vehicles may lead to incompatibilities with gasoline vehicles during vehicle-to-vehicle crashes as well as run-off-road crashes with roadside barriers, which were designed to handle lighter ICEVs with higher centers-of-mass and with dedicated structural chasses. Crash data were collected from 10 states spanning a five-year period between 2017 and 2021. Vehicle Identification Number (VIN) codes were processed with the National Highway Traffic Safety Administration (NHTSA) VIN decoder to extract BEV models from ICEV models. Crashes were filtered to isolate only vehicles produced between 2010 and 2021, and the crash circumstances (weather, time of day, maximum injury) were compared between BEVs and ICEVs. In Washington, 436,613 crashes were identified, which satisfied the selection criteria, and 3,371 of these crashes (0.77%) involved a BEV. The number of crashes which noted a fire were comparable between BEVs and ICEVs of similar model years (0.3% and 0.33%, respectively), and no differences were discernable for the time of day, weather conditions, road geometry, or other prevailing factors (e.g., run-off-road). However, crashes involving BEVs rose rapidly; 31% of all BEV crashes occurred in just 2021. Results indicate that BEVs are performing comparably to ICEVs, and events surrounding BEV crashes are statistically indistinguishable from ICEV crashes.

Keywords: battery-electric vehicles, transportation safety, infrastructure crashworthiness, run-off-road crashes, ev crash data analysis

Procedia PDF Downloads 89
7 Performance of High Efficiency Video Codec over Wireless Channels

Authors: Mohd Ayyub Khan, Nadeem Akhtar

Abstract:

Due to recent advances in wireless communication technologies and hand-held devices, there is a huge demand for video-based applications such as video surveillance, video conferencing, remote surgery, Digital Video Broadcast (DVB), IPTV, online learning courses, YouTube, WhatsApp, Instagram, Facebook, Interactive Video Games. However, the raw videos posses very high bandwidth which makes the compression a must before its transmission over the wireless channels. The High Efficiency Video Codec (HEVC) (also called H.265) is latest state-of-the-art video coding standard developed by the Joint effort of ITU-T and ISO/IEC teams. HEVC is targeted for high resolution videos such as 4K or 8K resolutions that can fulfil the recent demands for video services. The compression ratio achieved by the HEVC is twice as compared to its predecessor H.264/AVC for same quality level. The compression efficiency is generally increased by removing more correlation between the frames/pixels using complex techniques such as extensive intra and inter prediction techniques. As more correlation is removed, the chances of interdependency among coded bits increases. Thus, bit errors may have large effect on the reconstructed video. Sometimes even single bit error can lead to catastrophic failure of the reconstructed video. In this paper, we study the performance of HEVC bitstream over additive white Gaussian noise (AWGN) channel. Moreover, HEVC over Quadrature Amplitude Modulation (QAM) combined with forward error correction (FEC) schemes are also explored over the noisy channel. The video will be encoded using HEVC, and the coded bitstream is channel coded to provide some redundancies. The channel coded bitstream is then modulated using QAM and transmitted over AWGN channel. At the receiver, the symbols are demodulated and channel decoded to obtain the video bitstream. The bitstream is then used to reconstruct the video using HEVC decoder. It is observed that as the signal to noise ratio of channel is decreased the quality of the reconstructed video decreases drastically. Using proper FEC codes, the quality of the video can be restored up to certain extent. Thus, the performance analysis of HEVC presented in this paper may assist in designing the optimized code rate of FEC such that the quality of the reconstructed video is maximized over wireless channels.

Keywords: AWGN, forward error correction, HEVC, video coding, QAM

Procedia PDF Downloads 149