Character Level Text Generation with a LSTM Based Language Model

Опубликовано: 01 Январь 1970
на канале: Murat Karakaya Akademi

2,019

Access all tutorials at https://www.muratkarakaya.net
Code: https://colab.research.google.com/dri...
Text Generation Playlist:    • Text Generation in Deep Learning with...
TensorFlow Input Pipeline Playlist:    • TensorFlow Data Pipeline: How to Desi...
All About LSTM playlist:    • All About LSTM

Character Level Text Generation with an LSTM Model
This tutorial is the fifth part of the "Text Generation in Deep Learning with Tensorflow & Keras" series. In this series, we have been covering all the topics related to Text Generation with sample implementations in Python, Tensorflow & Keras. In this tutorial, we will focus on how to build a Language Model using Keras LSTM layer for Character Level Text Generation. First, we will download a sample corpus (text file). After opening the file, we will apply the TensorFlow input pipeline that we have developed in Part B to prepare the training dataset by preprocessing and splitting the text into input character sequence (X) and output character (y). Then, we will design an LSTM-based Language Model and train it using the train set. Later on, we will apply several sampling methods that we have implemented in Part D to generate text and observe the effect of these sampling methods on the generated text. Thus, in the end, we will have a trained LSTM-based Language Model for character-level text generation with three sampling methods.

If you would like to learn more about Deep Learning with practical coding examples, please subscribe to Murat Karakaya Akademi YouTube Channel or follow my blog on Medium

You can access this Colab Notebook using the link given in the video description below.

If you are ready, let's get started!
Text Generation in Deep Learning with Tensorflow & Keras Series:
Part A: Fundamentals

Part B: Tensorflow Data Pipeline for Character Level Text Generation

Part C: Tensorflow Data Pipeline for Word Level Text Generation

Part D: Sampling in Text Generation

Part E: Recurrent Neural Network (LSTM) Model for Character Level Text Generation

Part F: Encoder-Decoder Model for Character Level Text Generation

Part G: Recurrent Neural Network (LSTM) Model for Word Level Text Generation

Part H: Encoder-Decoder Model for Word Level Text Generation

You can watch all these parts on Murat Karakaya Akademi channel on YouTube in ENGLISH or TURKISH

I assume that you have already watched all previous parts.

Please ensure that you have reviewed the previous parts in order to utilize this part better.

References
What is a Data Pipeline?

tf.data: Build TensorFlow input pipelines

Text classification from scratch

Working with Keras preprocessing layers

Character-level text generation with LSTM

Toward Controlled Generation of Text

Attention Is All You Need

What is the difference between word-based and char-based text generation RNNs?

The survey: Text generation models in deep learning

Generative Adversarial Networks for Text Generation

FGGAN: Feature-Guiding Generative Adversarial Networks for Text Generation

How to sample from language models

How to generate text: using different decoding methods for language generation with Transformers

Hierarchical Neural Story Generation

How to sample from language models

Text generation with LSTM

A guide to language model sampling in AllenNLP

Generating text from the language model

How to Implement a Beam Search Decoder for Natural Language Processing

Controllable Neural Text Generation
What is a Character Level Text Generation?
A Language Model can be trained to generate text character-by-character. In this case, each of the input and output tokens is a character. Moreover, Language Model outputs a conditional probability distribution over the character set.
1. BUILD A TENSORFLOW INPUT PIPELINE
For more information please refer to Part B: Tensorflow Data Pipeline for Character Level Text Generation on Youtube ( ENGLISH / TURKISH) or Medium.

What is a Data Pipeline?
Data Pipeline is an automated process that involves in extracting, transforming, combining, validating, and loading data for further analysis and visualization.

It provides end-to-end velocity by eliminating errors and combatting bottlenecks or latency.

It can process multiple data streams at once.
In short, it is an absolute necessity for today’s data-driven solutions.
If you are not familiar with data pipelines, you can check my tutorials in English or Turkish.
What will we do in this Text Data pipeline?
We will create a data pipeline to prepare training data for character-level text generator.
convert the text into a sequence of characters
remove unwanted characters such as punctuations, HTML tags, white spaces, etc.
generate input (X) and output (y) pairs as character sequences

cache, prefetch, and batch the train data for performance

Смотрите видео Character Level Text Generation with a LSTM Based Language Model онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Murat Karakaya Akademi 01 Январь 1970, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 2,019 раз и оно понравилось 22 людям.

2,305