Keras Text Vectorization Layer: Configure, Adapt, Use, Save, and Upload: Part C Configure & Adapt

Опубликовано: 01 Январь 1970
на канале: Murat Karakaya Akademi

637

Google Colab: https://colab.research.google.com/dri...
Github pages: https://kmkarakaya.github.io/Deep-Lea...
Medium:   / deep-learning-with
tf.keras.layers: Understand & Use:    • tf.keras.layers: Understand & Use
Classification with Keras & Tensorflow:    • Classification with Keras / Tensorflow
Word Embedding in Keras:    • Word Embedding in Keras
TensorFlow Data Pipeline:    • TensorFlow Data Pipeline: How to Desi...
Keras Tutorials:    • Keras Tutorials
Seq2Seq Learning Tutorials:    • Seq2Seq Learning Tutorials
All the Deep Learning tutorials in English: https://www.youtube.com/c/MuratKaraka...

Author: Murat Karakaya
Date created: 05 Oct 2021
Last modified: 24 Oct 2021
Description: This is a new part of the "tf.keras.layers: Understand & Use" / "tf.keras.layers: Anla ve Kullan" series. In this part, we will build, adapt, use, save, and upload the Keras TextVectorization layer.

We will download a Kaggle Dataset in which there are 32 topics and more than 400K total reviews. In this tutorial, we will use this dataset for a multi-class text classification task.

Our main aim is to learn how to effectively use the Keras TextVectorization layer in practice.

The tutorial has 5 parts:

PART A: BACKGROUND
PART B: KNOW THE DATA
PART C: USE KERAS TEXT VECTORIZATION LAYER
PART D: BUILD AN END-TO-END MODEL
PART E: SUMMARY
At the end of this tutorial, we will cover:

What a Keras TextVectorization layer is
Why do we need to use a Keras TextVectorization layer in Natural Language Processing (NLP) tasks
How to employ a Keras TextVectorization layer in Text Preprocessing
How to integrate a Keras TextVectorization layer to a trained model
How to save and upload a Keras TextVectorization layer and a model with a Keras TextVectorization layer
How to integrate a Keras TextVectorization layer with TensorFlow Data Pipeline API (tf.data)
How to design, train, save and load an End-to-End model using Keras TextVectorization layer

What is Text Vectorization?
Text Vectorization is the process of converting text into numerical representation.

There are many different techniques proposed to convert text to a numerical form such as:

One-hot Encoding (OHE)
Count Vectorizer
Bag-of-Words (BOW)
N-grams
Term Frequency
Term Frequency-Inverse Document Frequency (TF-IDF)
Embeddings
What is Text Preprocessing?
Text preprocessing is traditionally an important step for natural language processing (NLP) tasks. It transforms text into a more suitable form so that Machine Learning or Deep Learning algorithms can perform better.

The main phases of Text preprocessing:

Noise Removal (cleaning) – Removing unnecessary characters and formatting
Tokenization – break multi-word strings into smaller components
Normalization – a catch-all term for processing data; this includes stemming and lemmatization
Some of the common Noise Removal (cleaning) steps are:

Removal of Punctuations
Removal of Frequent words
Removal of Rare words
Removal of emojis
Removal of emoticons
Conversion of emoticons to words
Conversion of emojis to words
Removal of URLs
Removal of HTML tags
Chat words conversion
Spelling correction
Tokenization is about splitting strings of text into smaller pieces, or “tokens”. Paragraphs can be tokenized into sentences and sentences can be tokenized into words.

Noise Removal and Tokenization and are staples of almost all text pre-processing pipelines. However, some data may require further processing through text normalization. Some of the common normalization steps are:

Upper or lowercasing
Stopword removal
Stemming – bluntly removing prefixes and suffixes from a word
Lemmatization – replacing a single-word token with its root
1.3. What is Keras Text Vectorization layer?
tf.keras.layers.TextVectorization layer is one of the Keras Preprocessing layers.

However, there are very important advantages using the Keras Preprocessing layers:

You can build Keras-native input processing pipelines. These input processing pipelines can be used as independent preprocessing code in non-Keras workflows, combined directly with Keras models, and exported as part of a Keras SavedModel.

You can build and export models that are truly end-to-end: models that accept raw data (images or raw structured data) as input; models that handle feature normalization or feature value indexing on their own.

Today, we will deal with the tf.keras.layers.TextVectorization layer which:
turns raw strings into an encoded representation
that representation can be read by an Embedding layer or Dense layer.
Text Preprocessing and
Text Vectorization

Смотрите видео Keras Text Vectorization Layer: Configure, Adapt, Use, Save, and Upload: Part C Configure & Adapt онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Murat Karakaya Akademi 01 Январь 1970, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 637 раз и оно понравилось 12 людям.

106