Search results for: pretraining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6

Search results for: pretraining

6 Self-Supervised Pretraining on Sequences of Functional Magnetic Resonance Imaging Data for Transfer Learning to Brain Decoding Tasks

Authors: Sean Paulsen, Michael Casey

Abstract:

In this work we present a self-supervised pretraining framework for transformers on functional Magnetic Resonance Imaging (fMRI) data. First, we pretrain our architecture on two self-supervised tasks simultaneously to teach the model a general understanding of the temporal and spatial dynamics of human auditory cortex during music listening. Our pretraining results are the first to suggest a synergistic effect of multitask training on fMRI data. Second, we finetune the pretrained models and train additional fresh models on a supervised fMRI classification task. We observe significantly improved accuracy on held-out runs with the finetuned models, which demonstrates the ability of our pretraining tasks to facilitate transfer learning. This work contributes to the growing body of literature on transformer architectures for pretraining and transfer learning with fMRI data, and serves as a proof of concept for our pretraining tasks and multitask pretraining on fMRI data.

Keywords: transfer learning, fMRI, self-supervised, brain decoding, transformer, multitask training

Procedia PDF Downloads 55
5 Generating Product Description with Generative Pre-Trained Transformer 2

Authors: Minh-Thuan Nguyen, Phuong-Thai Nguyen, Van-Vinh Nguyen, Quang-Minh Nguyen

Abstract:

Research on automatically generating descriptions for e-commerce products is gaining increasing attention in recent years. However, the generated descriptions of their systems are often less informative and attractive because of lacking training datasets or the limitation of these approaches, which often use templates or statistical methods. In this paper, we explore a method to generate production descriptions by using the GPT-2 model. In addition, we apply text paraphrasing and task-adaptive pretraining techniques to improve the qualify of descriptions generated from the GPT-2 model. Experiment results show that our models outperform the baseline model through automatic evaluation and human evaluation. Especially, our methods achieve a promising result not only on the seen test set but also in the unseen test set.

Keywords: GPT-2, product description, transformer, task-adaptive, language model, pretraining

Procedia PDF Downloads 160
4 A Deep Learning Model with Greedy Layer-Wise Pretraining Approach for Optimal Syngas Production by Dry Reforming of Methane

Authors: Maryam Zarabian, Hector Guzman, Pedro Pereira-Almao, Abraham Fapojuwo

Abstract:

Dry reforming of methane (DRM) has sparked significant industrial and scientific interest not only as a viable alternative for addressing the environmental concerns of two main contributors of the greenhouse effect, i.e., carbon dioxide (CO₂) and methane (CH₄), but also produces syngas, i.e., a mixture of hydrogen (H₂) and carbon monoxide (CO) utilized by a wide range of downstream processes as a feedstock for other chemical productions. In this study, we develop an AI-enable syngas production model to tackle the problem of achieving an equivalent H₂/CO ratio [1:1] with respect to the most efficient conversion. Firstly, the unsupervised density-based spatial clustering of applications with noise (DBSAN) algorithm removes outlier data points from the original experimental dataset. Then, random forest (RF) and deep neural network (DNN) models employ the error-free dataset to predict the DRM results. DNN models inherently would not be able to obtain accurate predictions without a huge dataset. To cope with this limitation, we employ reusing pre-trained layers’ approaches such as transfer learning and greedy layer-wise pretraining. Compared to the other deep models (i.e., pure deep model and transferred deep model), the greedy layer-wise pre-trained deep model provides the most accurate prediction as well as similar accuracy to the RF model with R² values 1.00, 0.999, 0.999, 0.999, 0.999, and 0.999 for the total outlet flow, H₂/CO ratio, H₂ yield, CO yield, CH₄ conversion, and CO₂ conversion outputs, respectively.

Keywords: artificial intelligence, dry reforming of methane, artificial neural network, deep learning, machine learning, transfer learning, greedy layer-wise pretraining

Procedia PDF Downloads 51
3 Efficient Layout-Aware Pretraining for Multimodal Form Understanding

Authors: Armineh Nourbakhsh, Sameena Shah, Carolyn Rose

Abstract:

Layout-aware language models have been used to create multimodal representations for documents that are in image form, achieving relatively high accuracy in document understanding tasks. However, the large number of parameters in the resulting models makes building and using them prohibitive without access to high-performing processing units with large memory capacity. We propose an alternative approach that can create efficient representations without the need for a neural visual backbone. This leads to an 80% reduction in the number of parameters compared to the smallest SOTA model, widely expanding applicability. In addition, our layout embeddings are pre-trained on spatial and visual cues alone and only fused with text embeddings in downstream tasks, which can facilitate applicability to low-resource of multi-lingual domains. Despite using 2.5% of training data, we show competitive performance on two form understanding tasks: semantic labeling and link prediction.

Keywords: layout understanding, form understanding, multimodal document understanding, bias-augmented attention

Procedia PDF Downloads 107
2 Large Neural Networks Learning From Scratch With Very Few Data and Without Explicit Regularization

Authors: Christoph Linse, Thomas Martinetz

Abstract:

Recent findings have shown that Neural Networks generalize also in over-parametrized regimes with zero training error. This is surprising, since it is completely against traditional machine learning wisdom. In our empirical study we fortify these findings in the domain of fine-grained image classification. We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining. We train the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and StanfordCars with 100 classes and more, perform a comprehensive comparative study and draw implications for the practical application of CNNs. Finally, we show that VGG19 with 140 million weights learns to distinguish airplanes and motorbikes with up to 95% accuracy using only 20 training samples per class.

Keywords: convolutional neural networks, fine-grained image classification, generalization, image recognition, over-parameterized, small data sets

Procedia PDF Downloads 48
1 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lopez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation tasks. However, a new wave of interest has surged: automatic programming language code generation. This task consists of translating natural language instructions to a source code. Despite the fact that well-known pre-trained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformer neural network. It aims to generate java source code from natural language text. JaCoText leverages the advantages of both natural language and code generation models. More specifically, we study some findings from state of the art and use them to (1) initialize our model from powerful pre-trained models, (2) explore additional pretraining on our java dataset, (3) lead experiments combining the unimodal and bimodal data in training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: java code generation, natural language processing, sequence-to-sequence models, transformer neural networks

Procedia PDF Downloads 227