Search results for: Encoder
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 74

Search results for: Encoder

44 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 38
43 Performance Comparison of Space-Time Block and Trellis Codes under Rayleigh Channels

Authors: Jing Qingfeng, Wu Jiajia

Abstract:

Due to the crowded orbits and shortage of frequency resources, utilizing of MIMO technology to improve spectrum efficiency and increase the capacity has become a necessary trend of broadband satellite communication. We analyze the main influenced factors and compare the BER performance of space-time block code (STBC) scheme and space-time trellis code (STTC) scheme. This paper emphatically studies the bit error rate (BER) performance of STTC and STBC under Rayleigh channel. The main emphasis is placed on the effects of the factors, such as terminal environment and elevation angles, on the BER performance of STBC and STTC schemes. Simulation results indicate that performance of STTC under Rayleigh channel is obviously improved with the increasing of transmitting and receiving antennas numbers, but the encoder state has little impact on the performance. Under Rayleigh channel, performance of Alamouti code is better than that of STTC.

Keywords: MIMO, space time block code (STBC), space time trellis code (STTC), Rayleigh channel

Procedia PDF Downloads 320
42 Scientific Recommender Systems Based on Neural Topic Model

Authors: Smail Boussaadi, Hassina Aliane

Abstract:

With the rapid growth of scientific literature, it is becoming increasingly challenging for researchers to keep up with the latest findings in their fields. Academic, professional networks play an essential role in connecting researchers and disseminating knowledge. To improve the user experience within these networks, we need effective article recommendation systems that provide personalized content.Current recommendation systems often rely on collaborative filtering or content-based techniques. However, these methods have limitations, such as the cold start problem and difficulty in capturing semantic relationships between articles. To overcome these challenges, we propose a new approach that combines BERTopic (Bidirectional Encoder Representations from Transformers), a state-of-the-art topic modeling technique, with community detection algorithms in a academic, professional network. Experiences confirm our performance expectations by showing good relevance and objectivity in the results.

Keywords: scientific articles, community detection, academic social network, recommender systems, neural topic model

Procedia PDF Downloads 60
41 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 131
40 Bit Error Rate Performance of MIMO Systems for Wireless Communications

Authors: E. Ghayoula, M. Haj Taieb, A. Bouallegue, J. Y. Chouinard, R. Ghayoula

Abstract:

This paper evaluates the bit error rate (BER) performance of MIMO systems for wireless communication. MIMO uses multiple transmitting antennas, multiple receiving antennas and the space-time block codes to provide diversity. MIMO transmits signal encoded by space-time block (STBC) encoder through different transmitting antennas. These signals arrive at the receiver at slightly different times. Spatially separated multiple receiving antennas are employed to provide diversity reception to combat the effect of fading in the channel. This paper presents a detailed study of diversity coding for MIMO systems. STBC techniques are implemented and simulation results in terms of the BER performance with varying number of MIMO transmitting and receiving antennas are presented. Our results show how increasing the number of both transmit and receive antenna improves system performance and reduces the bit error rate.

Keywords: MIMO systems, diversity, BER, MRRC, SIMO, MISO, STBC, alamouti, SNR

Procedia PDF Downloads 467
39 Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations of previous approaches, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with attention mechanism. In a previous work on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: transformers, generative ai, gene expression design, classification

Procedia PDF Downloads 29
38 On the Utility of Bidirectional Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of the flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on the spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts, as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with an attention mechanism. In previous works on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work, with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on the presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: machine learning, classification and regression, gene circuit design, bidirectional transformers

Procedia PDF Downloads 33
37 KCBA, A Method for Feature Extraction of Colonoscopy Images

Authors: Vahid Bayrami Rad

Abstract:

In recent years, the use of artificial intelligence techniques, tools, and methods in processing medical images and health-related applications has been highlighted and a lot of research has been done in this regard. For example, colonoscopy and diagnosis of colon lesions are some cases in which the process of diagnosis of lesions can be improved by using image processing and artificial intelligence algorithms, which help doctors a lot. Due to the lack of accurate measurements and the variety of injuries in colonoscopy images, the process of diagnosing the type of lesions is a little difficult even for expert doctors. Therefore, by using different software and image processing, doctors can be helped to increase the accuracy of their observations and ultimately improve their diagnosis. Also, by using automatic methods, the process of diagnosing the type of disease can be improved. Therefore, in this paper, a deep learning framework called KCBA is proposed to classify colonoscopy lesions which are composed of several methods such as K-means clustering, a bag of features and deep auto-encoder. Finally, according to the experimental results, the proposed method's performance in classifying colonoscopy images is depicted considering the accuracy criterion.

Keywords: colorectal cancer, colonoscopy, region of interest, narrow band imaging, texture analysis, bag of feature

Procedia PDF Downloads 23
36 Domain Adaptive Dense Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then, the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. We also explore contrastive learning as a method for training domain-adapted dense retrievers and show that it leads to strong performance in various retrieval settings. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, contrastive learning, unsupervised training

Procedia PDF Downloads 61
35 Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

Authors: Hamed Alqahtani, Manolya Kavakli-Thorne

Abstract:

The large pose discrepancy is one of the critical challenges in face recognition during video surveillance. Due to the entanglement of pose attributes with identity information, the conventional approaches for pose-independent representation lack in providing quality results in recognizing largely posed faces. In this paper, we propose a practical approach to disentangle the pose attribute from the identity information followed by synthesis of a face using a classifier network in latent space. The proposed approach employs a modified generative adversarial network framework consisting of an encoder-decoder structure embedded with a classifier in manifold space for carrying out factorization on the latent encoding. It can be further generalized to other face and non-face attributes for real-life video frames containing faces with significant attribute variations. Experimental results and comparison with state of the art in the field prove that the learned representation of the proposed approach synthesizes more compelling perceptual images through a combination of adversarial and classification losses.

Keywords: disentanglement, face detection, generative adversarial networks, video surveillance

Procedia PDF Downloads 93
34 A Simple and Efficient Method for Accurate Measurement and Control of Power Frequency Deviation

Authors: S. J. Arif

Abstract:

In the presented technique, a simple method is given for accurate measurement and control of power frequency deviation. The sinusoidal signal for which the frequency deviation measurement is required is transformed to a low voltage level and passed through a zero crossing detector to convert it into a pulse train. Another stable square wave signal of 10 KHz is obtained using a crystal oscillator and decade dividing assemblies (DDA). These signals are combined digitally and then passed through decade counters to give a unique combination of pulses or levels, which are further encoded to make them equally suitable for both control applications and display units. The developed circuit using discrete components has a resolution of 0.5 Hz and completes measurement within 20 ms. The realized circuit is simulated and synthesized using Verilog HDL and subsequently implemented on FPGA. The results of measurement on FPGA are observed on a very high resolution logic analyzer. These results accurately match the simulation results as well as the results of same circuit implemented with discrete components. The proposed system is suitable for accurate measurement and control of power frequency deviation.

Keywords: digital encoder for frequency measurement, frequency deviation measurement, measurement and control systems, power systems

Procedia PDF Downloads 347
33 Multimodal Direct Neural Network Positron Emission Tomography Reconstruction

Authors: William Whiteley, Jens Gregor

Abstract:

In recent developments of direct neural network based positron emission tomography (PET) reconstruction, two prominent architectures have emerged for converting measurement data into images: 1) networks that contain fully-connected layers; and 2) networks that primarily use a convolutional encoder-decoder architecture. In this paper, we present a multi-modal direct PET reconstruction method called MDPET, which is a hybrid approach that combines the advantages of both types of networks. MDPET processes raw data in the form of sinograms and histo-images in concert with attenuation maps to produce high quality multi-slice PET images (e.g., 8x440x440). MDPET is trained on a large whole-body patient data set and evaluated both quantitatively and qualitatively against target images reconstructed with the standard PET reconstruction benchmark of iterative ordered subsets expectation maximization. The results show that MDPET outperforms the best previously published direct neural network methods in measures of bias, signal-to-noise ratio, mean absolute error, and structural similarity.

Keywords: deep learning, image reconstruction, machine learning, neural network, positron emission tomography

Procedia PDF Downloads 88
32 The Co-Simulation Interface SystemC/Matlab Applied in JPEG and SDR Application

Authors: Walid Hassairi, Moncef Bousselmi, Mohamed Abid

Abstract:

Functional verification is a major part of today’s system design task. Several approaches are available for verification on a high abstraction level, where designs are often modeled using MATLAB/Simulink. However, different approaches are a barrier to a unified verification flow. In this paper, we propose a co-simulation interface between SystemC and MATLAB and Simulink to enable functional verification of multi-abstraction levels designs. The resulting verification flow is tested on JPEG compression algorithm. The required synchronization of both simulation environments, as well as data type conversion is solved using the proposed co-simulation flow. We divided into two encoder jpeg parts. First implemented in SystemC which is the DCT is representing the HW part. Second, consisted of quantization and entropy encoding which is implemented in Matlab is the SW part. For communication and synchronization between these two parts we use S-Function and engine in Simulink matlab. With this research premise, this study introduces a new implementation of a Hardware SystemC of DCT. We compare the result of our simulation compared to SW / SW. We observe a reduction in simulation time you have 88.15% in JPEG and the design efficiency of the supply design is 90% in SDR.

Keywords: hardware/software, co-design, co-simulation, systemc, matlab, s-function, communication, synchronization

Procedia PDF Downloads 361
31 Masked Candlestick Model: A Pre-Trained Model for Trading Prediction

Authors: Ling Qi, Matloob Khushi, Josiah Poon

Abstract:

This paper introduces a pre-trained Masked Candlestick Model (MCM) for trading time-series data. The pre-trained model is based on three core designs. First, we convert trading price data at each data point as a set of normalized elements and produce embeddings of each element. Second, we generate a masked sequence of such embedded elements as inputs for self-supervised learning. Third, we use the encoder mechanism from the transformer to train the inputs. The masked model learns the contextual relations among the sequence of embedded elements, which can aid downstream classification tasks. To evaluate the performance of the pre-trained model, we fine-tune MCM for three different downstream classification tasks to predict future price trends. The fine-tuned models achieved better accuracy rates for all three tasks than the baseline models. To better analyze the effectiveness of MCM, we test the same architecture for three currency pairs, namely EUR/GBP, AUD/USD, and EUR/JPY. The experimentation results demonstrate MCM’s effectiveness on all three currency pairs and indicate the MCM’s capability for signal extraction from trading data.

Keywords: masked language model, transformer, time series prediction, trading prediction, embedding, transfer learning, self-supervised learning

Procedia PDF Downloads 90
30 Randomness in Cybertext: A Study on Computer-Generated Poetry from the Perspective of Semiotics

Authors: Hongliang Zhang

Abstract:

The use of chance procedures and randomizers in poetry-writing can be traced back to surrealist works, which, by appealing to Sigmund Freud's theories, were still logocentrism. In the 1960s, random permutation and combination were extensively used by the Oulipo, John Cage and Jackson Mac Low, which further deconstructed the metaphysical presence of writing. Today, the randomly-generated digital poetry has emerged as a genre of cybertext which should be co-authored by readers. At the same time, the classical theories have now been updated by cybernetics and media theories. N· Katherine Hayles put forward the concept of ‘the floating signifiers’ by Jacques Lacan to be the ‘the flickering signifiers’ , arguing that the technology per se has become a part of the textual production. This paper makes a historical review of the computer-generated poetry in the perspective of semiotics, emphasizing that the randomly-generated digital poetry which hands over the dual tasks of both interpretation and writing to the readers demonstrates the intervention of media technology in literature. With the participation of computerized algorithm and programming languages, poems randomly generated by computers have not only blurred the boundary between encoder and decoder, but also raises the issue of human-machine. It is also a significant feature of the cybertext that the productive process of the text is full of randomness.

Keywords: cybertext, digital poetry, poetry generator, semiotics

Procedia PDF Downloads 148
29 Deep Supervision Based-Unet to Detect Buildings Changes from VHR Aerial Imagery

Authors: Shimaa Holail, Tamer Saleh, Xiongwu Xiao

Abstract:

Building change detection (BCD) from satellite imagery is an essential topic in urbanization monitoring, agricultural land management, and updating geospatial databases. Recently, methods for detecting changes based on deep learning have made significant progress and impressive results. However, it has the problem of being insensitive to changes in buildings with complex spectral differences, and the features being extracted are not discriminatory enough, resulting in incomplete buildings and irregular boundaries. To overcome these problems, we propose a dual Siamese network based on the Unet model with the addition of a deep supervision strategy (DS) in this paper. This network consists of a backbone (encoder) based on ImageNet pre-training, a fusion block, and feature pyramid networks (FPN) to enhance the step-by-step information of the changing regions and obtain a more accurate BCD map. To train the proposed method, we created a new dataset (EGY-BCD) of high-resolution and multi-temporal aerial images captured over New Cairo in Egypt to detect building changes for this purpose. The experimental results showed that the proposed method is effective and performs well with the EGY-BCD dataset regarding the overall accuracy, F1-score, and mIoU, which were 91.6 %, 80.1 %, and 73.5 %, respectively.

Keywords: building change detection, deep supervision, semantic segmentation, EGY-BCD dataset

Procedia PDF Downloads 74
28 Low Light Image Enhancement with Multi-Stage Interconnected Autoencoders Integration in Pix to Pix GAN

Authors: Muhammad Atif, Cang Yan

Abstract:

The enhancement of low-light images is a significant area of study aimed at enhancing the quality of captured images in challenging lighting environments. Recently, methods based on convolutional neural networks (CNN) have gained prominence as they offer state-of-the-art performance. However, many approaches based on CNN rely on increasing the size and complexity of the neural network. In this study, we propose an alternative method for improving low-light images using an autoencoder-based multiscale knowledge transfer model. Our method leverages the power of three autoencoders, where the encoders of the first two autoencoders are directly connected to the decoder of the third autoencoder. Additionally, the decoder of the first two autoencoders is connected to the encoder of the third autoencoder. This architecture enables effective knowledge transfer, allowing the third autoencoder to learn and benefit from the enhanced knowledge extracted by the first two autoencoders. We further integrate the proposed model into the PIX to PIX GAN framework. By integrating our proposed model as the generator in the GAN framework, we aim to produce enhanced images that not only exhibit improved visual quality but also possess a more authentic and realistic appearance. These experimental results, both qualitative and quantitative, show that our method is better than the state-of-the-art methodologies.

Keywords: low light image enhancement, deep learning, convolutional neural network, image processing

Procedia PDF Downloads 34
27 Long Short-Term Memory (LSTM) Matters: A Sequential Brief Text that Assistive Approach of Text Summarization

Authors: Sharun Akter Khushbu

Abstract:

‘SOS’ addresses text summary such as feasibility study and allows more comprehensive methods on text of language resources. Resources language has been exploited by the importance of text documental procedure. Throughout this key idea will come out a machine interpreter called an SOS that has built an argumentative as an employed model is LSTM-CNN(long short-term memory- recurrent neural network). Summarization of Bengali text formulated by the information of latent structure instead of brief input string counting as text. Text summarization is the proper utilization of optimal solutions being time reduction, and easy interpretation whenever human-generated summary and machine targeted summary remain similar and without degrading the semantic summarization quality. According to the problem affirmation key idea has advanced an algorithm with the method of encoder and decoder describing a sequential structure that is rigorously connected with actual predicted and meaningful output. Regarding the seq2seq approach aimed in the future with high semantic summarization similarity on behalf of the large data samples that are also enlisted by the method. Thus, the SOS method assigns a discriminator over Bengali text documents where encoded input sequences such as summary and decoded the targeted summary of gist will be an error-free machine.

Keywords: LSTM-CNN, NN, SOS, text summarization

Procedia PDF Downloads 42
26 Embedded Digital Image System

Authors: Dawei Li, Cheng Liu, Yiteng Liu

Abstract:

This paper introduces an embedded digital image system for Chinese space environment vertical exploration sounding rocket. In order to record the flight status of the sounding rocket as well as the payloads, an onboard embedded image processing system based on ADV212, a JPEG2000 compression chip, is designed in this paper. Since the sounding rocket is not designed to be recovered, all image data should be transmitted to the ground station before the re-entry while the downlink band used for the image transmission is only about 600 kbps. Under the same condition of compression ratio compared with other algorithm, JPEG2000 standard algorithm can achieve better image quality. So JPEG2000 image compression is applied under this condition with a limited downlink data band. This embedded image system supports lossless to 200:1 real time compression, with two cameras to monitor nose ejection and motor separation, and two cameras to monitor boom deployment. The encoder, ADV7182, receives PAL signal from the camera, then output the ITU-R BT.656 signal to ADV212. ADV7182 switches between four input video channels as the program sequence. Two SRAMs are used for Ping-pong operation and one 512 Mb SDRAM for buffering high frame-rate images. The whole image system has the characteristics of low power dissipation, low cost, small size and high reliability, which is rather suitable for this sounding rocket application.

Keywords: ADV212, image system, JPEG2000, sounding rocket

Procedia PDF Downloads 396
25 3D Printing Perceptual Models of Preference Using a Fuzzy Extreme Learning Machine Approach

Authors: Xinyi Le

Abstract:

In this paper, 3D printing orientations were determined through our perceptual model. Some FDM (Fused Deposition Modeling) 3D printers, which are widely used in universities and industries, often require support structures during the additive manufacturing. After removing the residual material, some surface artifacts remain at the contact points. These artifacts will damage the function and visual effect of the model. To prevent the impact of these artifacts, we present a fuzzy extreme learning machine approach to find printing directions that avoid placing supports in perceptually significant regions. The proposed approach is able to solve the evaluation problem by combing both the subjective knowledge and objective information. Our method combines the advantages of fuzzy theory, auto-encoders, and extreme learning machine. Fuzzy set theory is applied for dealing with subjective preference information, and auto-encoder step is used to extract good features without supervised labels before extreme learning machine. An extreme learning machine method is then developed successfully for training and learning perceptual models. The performance of this perceptual model will be demonstrated on both natural and man-made objects. It is a good human-computer interaction practice which draws from supporting knowledge on both the machine side and the human side.

Keywords: 3d printing, perceptual model, fuzzy evaluation, data-driven approach

Procedia PDF Downloads 403
24 Cross Attention Fusion for Dual-Stream Speech Emotion Recognition

Authors: Shaode Yu, Jiajian Meng, Bing Zhu, Hang Yu, Qiurui Sun

Abstract:

Speech emotion recognition (SER) is for recognizing human subjective emotions through audio data in-depth analysis. From speech audios, how to comprehensively extract emotional information and how to effectively fuse extracted features remain challenging. This paper presents a dual-stream SER framework that embraces both full training and transfer learning of different networks for thorough feature encoding. Besides, a plug-and-play cross-attention fusion (CAF) module is implemented for the valid integration of the dual-stream encoder output. The effectiveness of the proposed CAF module is compared to the other three fusion modules (feature summation, feature concatenation, and feature-wise linear modulation) on two databases (RAVDESS and IEMO-CAP) using different dual-stream encoders (full training network, DPCNN or TextRCNN; transfer learning network, HuBERT or Wav2Vec2). Experimental results suggest that the CAF module can effectively reconcile conflicts between features from different encoders and outperform the other three feature fusion modules on the SER task. In the future, the plug-and-play CAF module can be extended for multi-branch feature fusion, and the dual-stream SER framework can be widened for multi-stream data representation to improve the recognition performance and generalization capacity.

Keywords: speech emotion recognition, cross-attention fusion, dual-stream, pre-trained

Procedia PDF Downloads 41
23 DCDNet: Lightweight Document Corner Detection Network Based on Attention Mechanism

Authors: Kun Xu, Yuan Xu, Jia Qiao

Abstract:

The document detection plays an important role in optical character recognition and text analysis. Because the traditional detection methods have weak generalization ability, and deep neural network has complex structure and large number of parameters, which cannot be well applied in mobile devices, this paper proposes a lightweight Document Corner Detection Network (DCDNet). DCDNet is a two-stage architecture. The first stage with Encoder-Decoder structure adopts depthwise separable convolution to greatly reduce the network parameters. After introducing the Feature Attention Union (FAU) module, the second stage enhances the feature information of spatial and channel dim and adaptively adjusts the size of receptive field to enhance the feature expression ability of the model. Aiming at solving the problem of the large difference in the number of pixel distribution between corner and non-corner, Weighted Binary Cross Entropy Loss (WBCE Loss) is proposed to define corner detection problem as a classification problem to make the training process more efficient. In order to make up for the lack of Dataset of document corner detection, a Dataset containing 6620 images named Document Corner Detection Dataset (DCDD) is made. Experimental results show that the proposed method can obtain fast, stable and accurate detection results on DCDD.

Keywords: document detection, corner detection, attention mechanism, lightweight

Procedia PDF Downloads 324
22 Digital Joint Equivalent Channel Hybrid Precoding for Millimeterwave Massive Multiple Input Multiple Output Systems

Authors: Linyu Wang, Mingjun Zhu, Jianhong Xiang, Hanyu Jiang

Abstract:

Aiming at the problem that the spectral efficiency of hybrid precoding (HP) is too low in the current millimeter wave (mmWave) massive multiple input multiple output (MIMO) system, this paper proposes a digital joint equivalent channel hybrid precoding algorithm, which is based on the introduction of digital encoding matrix iteration. First, the objective function is expanded to obtain the relation equation, and the pseudo-inverse iterative function of the analog encoder is derived by using the pseudo-inverse method, which solves the problem of greatly increasing the amount of computation caused by the lack of rank of the digital encoding matrix and reduces the overall complexity of hybrid precoding. Secondly, the analog coding matrix and the millimeter-wave sparse channel matrix are combined into an equivalent channel, and then the equivalent channel is subjected to Singular Value Decomposition (SVD) to obtain a digital coding matrix, and then the derived pseudo-inverse iterative function is used to iteratively regenerate the simulated encoding matrix. The simulation results show that the proposed algorithm improves the system spectral efficiency by 10~20%compared with other algorithms and the stability is also improved.

Keywords: mmWave, massive MIMO, hybrid precoding, singular value decompositing, equivalent channel

Procedia PDF Downloads 66
21 Deep Vision: A Robust Dominant Colour Extraction Framework for T-Shirts Based on Semantic Segmentation

Authors: Kishore Kumar R., Kaustav Sengupta, Shalini Sood Sehgal, Poornima Santhanam

Abstract:

Fashion is a human expression that is constantly changing. One of the prime factors that consistently influences fashion is the change in colour preferences. The role of colour in our everyday lives is very significant. It subconsciously explains a lot about one’s mindset and mood. Analyzing the colours by extracting them from the outfit images is a critical study to examine the individual’s/consumer behaviour. Several research works have been carried out on extracting colours from images, but to the best of our knowledge, there were no studies that extract colours to specific apparel and identify colour patterns geographically. This paper proposes a framework for accurately extracting colours from T-shirt images and predicting dominant colours geographically. The proposed method consists of two stages: first, a U-Net deep learning model is adopted to segment the T-shirts from the images. Second, the colours are extracted only from the T-shirt segments. The proposed method employs the iMaterialist (Fashion) 2019 dataset for the semantic segmentation task. The proposed framework also includes a mechanism for gathering data and analyzing India’s general colour preferences. From this research, it was observed that black and grey are the dominant colour in different regions of India. The proposed method can be adapted to study fashion’s evolving colour preferences.

Keywords: colour analysis in t-shirts, convolutional neural network, encoder-decoder, k-means clustering, semantic segmentation, U-Net model

Procedia PDF Downloads 79
20 A Deep Learning Approach to Online Social Network Account Compromisation

Authors: Edward K. Boahen, Brunel E. Bouya-Moko, Changda Wang

Abstract:

The major threat to online social network (OSN) users is account compromisation. Spammers now spread malicious messages by exploiting the trust relationship established between account owners and their friends. The challenge in detecting a compromised account by service providers is validating the trusted relationship established between the account owners, their friends, and the spammers. Another challenge is the increase in required human interaction with the feature selection. Research available on supervised learning (machine learning) has limitations with the feature selection and accounts that cannot be profiled, like application programming interface (API). Therefore, this paper discusses the various behaviours of the OSN users and the current approaches in detecting a compromised OSN account, emphasizing its limitations and challenges. We propose a deep learning approach that addresses and resolve the constraints faced by the previous schemes. We detailed our proposed optimized nonsymmetric deep auto-encoder (OPT_NDAE) for unsupervised feature learning, which reduces the required human interaction levels in the selection and extraction of features. We evaluated our proposed classifier using the NSL-KDD and KDDCUP'99 datasets in a graphical user interface enabled Weka application. The results obtained indicate that our proposed approach outperformed most of the traditional schemes in OSN compromised account detection with an accuracy rate of 99.86%.

Keywords: computer security, network security, online social network, account compromisation

Procedia PDF Downloads 90
19 Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison

Authors: Po-Fang Hsu, Chiching Wei

Abstract:

In this paper, we present a novel neural graph matching approach applied to document comparison. Document comparison is a common task in the legal and financial industries. In some cases, the most important differences may be the addition or omission of words, sentences, clauses, or paragraphs. However, it is a challenging task without recording or tracing the whole edited process. Under many temporal uncertainties, we explore the potentiality of our approach to proximate the accurate comparison to make sure which element blocks have a relation of edition with others. In the beginning, we apply a document layout analysis that combines traditional and modern technics to segment layouts in blocks of various types appropriately. Then we transform this issue into a problem of layout graph matching with textual awareness. Regarding graph matching, it is a long-studied problem with a broad range of applications. However, different from previous works focusing on visual images or structural layout, we also bring textual features into our model for adapting this domain. Specifically, based on the electronic document, we introduce an encoder to deal with the visual presentation decoding from PDF. Additionally, because the modifications can cause the inconsistency of document layout analysis between modified documents and the blocks can be merged and split, Sinkhorn divergence is adopted in our neural graph approach, which tries to overcome both these issues with many-to-many block matching. We demonstrate this on two categories of layouts, as follows., legal agreement and scientific articles, collected from our real-case datasets.

Keywords: document comparison, graph matching, graph neural network, modification similarity, multi-modal

Procedia PDF Downloads 151
18 A Grey-Box Text Attack Framework Using Explainable AI

Authors: Esther Chiramal, Kelvin Soh Boon Kai

Abstract:

Explainable AI is a strong strategy implemented to understand complex black-box model predictions in a human-interpretable language. It provides the evidence required to execute the use of trustworthy and reliable AI systems. On the other hand, however, it also opens the door to locating possible vulnerabilities in an AI model. Traditional adversarial text attack uses word substitution, data augmentation techniques, and gradient-based attacks on powerful pre-trained Bidirectional Encoder Representations from Transformers (BERT) variants to generate adversarial sentences. These attacks are generally white-box in nature and not practical as they can be easily detected by humans e.g., Changing the word from “Poor” to “Rich”. We proposed a simple yet effective Grey-box cum Black-box approach that does not require the knowledge of the model while using a set of surrogate Transformer/BERT models to perform the attack using Explainable AI techniques. As Transformers are the current state-of-the-art models for almost all Natural Language Processing (NLP) tasks, an attack generated from BERT1 is transferable to BERT2. This transferability is made possible due to the attention mechanism in the transformer that allows the model to capture long-range dependencies in a sequence. Using the power of BERT generalisation via attention, we attempt to exploit how transformers learn by attacking a few surrogate transformer variants which are all based on a different architecture. We demonstrate that this approach is highly effective to generate semantically good sentences by changing as little as one word that is not detectable by humans while still fooling other BERT models.

Keywords: BERT, explainable AI, Grey-box text attack, transformer

Procedia PDF Downloads 112
17 Learning Dynamic Representations of Nodes in Temporally Variant Graphs

Authors: Sandra Mitrovic, Gaurav Singh

Abstract:

In many industries, including telecommunications, churn prediction has been a topic of active research. A lot of attention has been drawn on devising the most informative features, and this area of research has gained even more focus with spread of (social) network analytics. The call detail records (CDRs) have been used to construct customer networks and extract potentially useful features. However, to the best of our knowledge, no studies including network features have yet proposed a generic way of representing network information. Instead, ad-hoc and dataset dependent solutions have been suggested. In this work, we build upon a recently presented method (node2vec) to obtain representations for nodes in observed network. The proposed approach is generic and applicable to any network and domain. Unlike node2vec, which assumes a static network, we consider a dynamic and time-evolving network. To account for this, we propose an approach that constructs the feature representation of each node by generating its node2vec representations at different timestamps, concatenating them and finally compressing using an auto-encoder-like method in order to retain reasonably long and informative feature vectors. We test the proposed method on churn prediction task in telco domain. To predict churners at timestamp ts+1, we construct training and testing datasets consisting of feature vectors from time intervals [t1, ts-1] and [t2, ts] respectively, and use traditional supervised classification models like SVM and Logistic Regression. Observed results show the effectiveness of proposed approach as compared to ad-hoc feature selection based approaches and static node2vec.

Keywords: churn prediction, dynamic networks, node2vec, auto-encoders

Procedia PDF Downloads 288
16 Network Conditioning and Transfer Learning for Peripheral Nerve Segmentation in Ultrasound Images

Authors: Harold Mauricio Díaz-Vargas, Cristian Alfonso Jimenez-Castaño, David Augusto Cárdenas-Peña, Guillermo Alberto Ortiz-Gómez, Alvaro Angel Orozco-Gutierrez

Abstract:

Precise identification of the nerves is a crucial task performed by anesthesiologists for an effective Peripheral Nerve Blocking (PNB). Now, anesthesiologists use ultrasound imaging equipment to guide the PNB and detect nervous structures. However, visual identification of the nerves from ultrasound images is difficult, even for trained specialists, due to artifacts and low contrast. The recent advances in deep learning make neural networks a potential tool for accurate nerve segmentation systems, so addressing the above issues from raw data. The most widely spread U-Net network yields pixel-by-pixel segmentation by encoding the input image and decoding the attained feature vector into a semantic image. This work proposes a conditioning approach and encoder pre-training to enhance the nerve segmentation of traditional U-Nets. Conditioning is achieved by the one-hot encoding of the kind of target nerve a the network input, while the pre-training considers five well-known deep networks for image classification. The proposed approach is tested in a collection of 619 US images, where the best C-UNet architecture yields an 81% Dice coefficient, outperforming the 74% of the best traditional U-Net. Results prove that pre-trained models with the conditional approach outperform their equivalent baseline by supporting learning new features and enriching the discriminant capability of the tested networks.

Keywords: nerve segmentation, U-Net, deep learning, ultrasound imaging, peripheral nerve blocking

Procedia PDF Downloads 76
15 Reconstruction of Visual Stimuli Using Stable Diffusion with Text Conditioning

Authors: ShyamKrishna Kirithivasan, Shreyas Battula, Aditi Soori, Richa Ramesh, Ramamoorthy Srinath

Abstract:

The human brain, among the most complex and mysterious aspects of the body, harbors vast potential for extensive exploration. Unraveling these enigmas, especially within neural perception and cognition, delves into the realm of neural decoding. Harnessing advancements in generative AI, particularly in Visual Computing, seeks to elucidate how the brain comprehends visual stimuli observed by humans. The paper endeavors to reconstruct human-perceived visual stimuli using Functional Magnetic Resonance Imaging (fMRI). This fMRI data is then processed through pre-trained deep-learning models to recreate the stimuli. Introducing a new architecture named LatentNeuroNet, the aim is to achieve the utmost semantic fidelity in stimuli reconstruction. The approach employs a Latent Diffusion Model (LDM) - Stable Diffusion v1.5, emphasizing semantic accuracy and generating superior quality outputs. This addresses the limitations of prior methods, such as GANs, known for poor semantic performance and inherent instability. Text conditioning within the LDM's denoising process is handled by extracting text from the brain's ventral visual cortex region. This extracted text undergoes processing through a Bootstrapping Language-Image Pre-training (BLIP) encoder before it is injected into the denoising process. In conclusion, a successful architecture is developed that reconstructs the visual stimuli perceived and finally, this research provides us with enough evidence to identify the most influential regions of the brain responsible for cognition and perception.

Keywords: BLIP, fMRI, latent diffusion model, neural perception.

Procedia PDF Downloads 42