SEQUENCE-TO-SEQUENCE LEARNING PART D: CODING ENCODER DECODER MODEL WITH TEACHER FORCING

Published: 15 November 2020
on channel: Murat Karakaya Akademi

4,270

Access all tutorials at https://www.muratkarakaya.net
Code: https://colab.research.google.com/dri...
Seq2Seq playlist: • Seq2Seq Learning Tutorials

Welcome to the Part D of Seq2Seq Learning Tutorial Series. In this tutorial, we will design an Encoder Decoder model to be trained with "Teacher Forcing" to solve the sample Seq2Seq problem introduced in Part A.

Teacher forcing: During training decoder receives the correct output from the training set as the previously decoded result to predict the next output. However, during inference decoder receives the previously decoded result to predict the next output. Teacher forcing improves training process.

Murat Karakaya Akademi

We will use the LSTM layer in Keras as the Recurrent Neural Network.

In this tutorial, we will code Encoder Decoder model using (LSTM) Recurrent Neural Network to solve the sample Seq2Seq problem introduced in Part A. We will use LSTM as the Recurrent Neural Network layer in Keras.

Our aim is to code an Encoder Decoder Model with Attention.
We are given a parallel data set including X (input) and y (output) such that X[i] and y[i] have some relationship. In real life (like Machine Language Translation, Image Captioning, etc.), we are given (or build) a parallel dataset: X sequences and corresponding y sequences. Before starting, you need to know:
Python
Keras/TF
Deep Neural Networks
Recurrent Neural Network concepts
LSTM parameters and outputs
Keras Functional API
BASIC ENCODER DECODER ARCHITECTURE/DESIGN
How a Basic Encoder Decoder Model solves Seq2Seq Learning Problem: Conceptualy, we have two main components working together in the model: Encoder encodes the sequence input into a new representation .This representation is called Context/Thought Vector. Decoder decodes the Context/Thought Vector into output sequence. Note 1: There are other porposed methods to solve seq2seq problems such as Convolution models or Reinforcement methods. Note 2: In this tutorial we focus on using Recurrent Nueral Networks in Enoder- Decoder architecture. We will use LSTM as the Recurrent Nueral Network. Key Concepts
Training: During training, we train the encoder and decoder such that they work together to create a context (representation) between input and output

Inference (Prediction): After learning how to create the context (representation), they can work together to predict the output

Encode all- decode one at a time: Mostly, encoder reads all the input sequence and create a context (representation) vector. Decoder use this context (representation) vector and previously decoded result to create new output step by step.

LSTM has 3 important parameters (for the time being!)

units: Positive integer, dimensionality of the output space
return_sequences: Boolean, whether to return the last output. in the output sequence, or the full sequence. Default: False.
return_state: Boolean, whether to return the last state in addition to the output. Default: False.
The first parameter (units) indicates the dimension of the output vector/matrix.

The last 2 parameters (return_sequences and return_state) determines what the LSTM layer outputs. LSTM can return 4 different sets of results/states according to the given parameters:

Default: Last Hidden State (Hidden State of the last time step)
IMPORTANT: USE OF FUNCTIONAL KERAS API:
In order to implement Encoder-Decoder approach, we will use Keras Functional API to create train & inference models. Thus, ensure that you are familiar with Keras Functional API.

Decide the context (latent) vector dimension
Actually it is the number of LSTM units parameter of the LSTM layer in Keras.
As the context vector is the condensed representation of the whole input sequence mostly we prefer a large dimension.
We can increment the context (latent) vector dimension by 2 ways:
increment the number of units in encoder LSTM
and/or increment the number of encoder LSTM layers
For the sake of simplicity, we use single LSTM layer in encoder and decoder layers for the time being

Since we will have last hidden states twice, we can ignore the first one (actually this one is considered as the output of the LSTM in general!) .

In other words, we ignore the output of the encoder LSTM but use the last Hidden and Cell states.

By using the context vector, we will set the initial states of the decoder LSTM.
That is decoder will start to function with the last state of the encoder
seq2seq,seq2seq tutorial,seq2seq lstm,seq2seq model explained,seq2seq model,seq2seq attention,seq2seq keras,seq2seq tensorflow,sequence modelling deep learning,sequence,seq2seq model keras,sequence to sequence,sequence to sequence learning,sequence to sequence learning with neural networks,neural networks,deep learning,attention,LSTM,recurrent,recurrent neural networks,ENCODER DECODER,encoder decoder lstm keras,encoder decoder lstm,encoder decoder model

Watch video SEQUENCE-TO-SEQUENCE LEARNING PART D: CODING ENCODER DECODER MODEL WITH TEACHER FORCING online without registration, duration hours minute second in high quality. This video was added by user Murat Karakaya Akademi 15 November 2020, don't forget to share it with your friends and acquaintances, it has been viewed on our site 4,270 once and liked it 87 people.

6,311

109