OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python

Published: 11 January 2023
on channel: Data Science Garage
9,979
223

Fine-tune OpenAI's Whisper to different language is simple using Python and Google Colab with GPU. In this tutorial, I selected the small version of Whisper AI model to fine-tune to Lithuanian language. Whisper can transcribe 96 other languages along with also being able to translate from those languages to English

Also, this video partly explains Whisper AI paper (tokenizer, encoder, decoder, padding, and other) and the model itself.

Before starting hands-on with Whisper, you should create your Hugging Face token at: https://huggingface.co/settings/tokens

You can check language dataset from Mozilla-foundation used in this tutorial at: https://huggingface.co/datasets/mozil...

Using Whisper for transcription in Python is very easy.

Whisper is an automatic speech recognition (ASR) system released by OpenAI and trained on 680000 hours of multilingual and multitask supervised data collected from the web. The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

For this video example, we will use a small version of WhisperAI model.
You can check for all available versions in the official Whisper AI Github model card: https://github.com/openai/whisper/blo...

The sections are:
0:00 – Hands-on steps
3:14 – Install PyTorch for WhisperAI with CUDA
3:34 – Set GPU Runtime in Google Colab
4:14 – Install ffmpeg package on the machine
4:40 – Install dependencies for fine-tuning
5:35 – Step 0. Log in to Hugging Face
6:09 – Step 1. Loading the dataset
7:16 – Step 2. Prepare Feature Extractor and Tokenizer
8:24 – Step 3. Combine elements with WhisperProcessor
9:06 – Step 4. Preapare data
11:03 – Step 5. Training and Evaluation
11:09 – Step 5.1. Initialize the data collator
12:26 – Step 5.2. Define evaluation metrics
12:56 – Step 5.3. Load a pre-trained Checkpoint
14:13 – Step 5.4. Define the training configuration
15:48 – Step 5.5. Train the Whisper AI model (fine-tune)

The Github repo with the full code available at: https://github.com/vb100/whisper_ai_f...

Technical definitions mentioned in the video:
WhisperFeatureExtractor: https://huggingface.co/docs/transform...
WhisperTokenizer: https://huggingface.co/docs/transform...
WhisperProcessor: https://huggingface.co/docs/transform...
WhisperForConditionalGeneration: https://huggingface.co/docs/transform...
LogMel Spectogram:   / understanding-the-mel-spectrogram  

‪@DataScienceGarage‬subscribe and get more high quality content soon!

#whisperai #openai #transcription


Watch video OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python online without registration, duration hours minute second in high quality. This video was added by user Data Science Garage 11 January 2023, don't forget to share it with your friends and acquaintances, it has been viewed on our site 9,979 once and liked it 223 people.