How to download files in Python with progress bars

Published: 07 December 2021
on channel: Jonathan Soma

5,035

A tutorial for using Python to bulk download a list of pdfs from a text file or CSV. Learn to create progress bars in for loops, compare urllib with requests and os.path to Python 3's pathlib, how to use .splitlines instead of .readlines to read from text files, and a little bonus pandas content. This video is infinitely long and detailed, but you can skip right to the solutions at 01:41 or check the timeline below.

I use Jupyter notebooks but it all works fine with VSCode and Terminal, PyCharm, etc!

Get the notebook and data here: https://gist.github.com/jsoma/08b0dff...

Install the packages we talk about: pip install requests tqdm pandas

SECRET ADMISSION: If we wanted to stick to pathlib we wouldn't have used open('files.txt').read().splitlines(), we would have used something like Path('files.txt').read_text().splitlines() instead. But every example you'll see on the internet uses open(...), so it seemed like the right thing to show you!

REMEMBER: If you're downloading text (CSV files, JSON, etc) you can use .write_text instead of .write_bytes

TRAGEDY: I think I need some synonyms for the word 'perfect'

====

CHAPTERS

00:00 Intro
01:41 The answers, right up front
03:01 Downloading files with urllib
06:48 Downloading files with requests
10:19 Using os.path to get filenames
12:38 Using pathlib to get filenames
14:32 Reading in text files
19:04 Using tqdm for progress bars
21:42 Saving files into folders
25:40 Downloading files from CSVs
33:28 Saving CSVs as plain text files
35:17 Using wget instead of Python
38:16 Outro

====

THE GOOD METHOD

from pathlib import Path
from tqdm.auto import tqdm
import requests

urls = open('files.txt').read().splitlines()
output_dir = Path('downloaded')
output_dir.mkdir(parents=True, exist_ok=True)

for url in tqdm(urls):
print(url)

filename = Path(url).name

response = requests.get(url)
output_dir.joinpath(filename).write_bytes(response.content)

===

THE SHORTER, LESS FLEXIBLE METHOD

import urllib.request
from os import path

urls = open('files-alt.txt').read().splitlines()

for url in urls:
filename = path.basename(url)
urllib.request.urlretrieve(url, filename)

===

THE PANDAS TECHNIQUE

download_dir = Path('downloads')
download_dir.mkdir(parents=True, exist_ok=True)

def download_file(row):
url = row['url']
print("Downloading", url)

filename = Path(url).name
filename = row['date'] + "-minutes.pdf"
filename = f"{row['date']}-minutes.pdf"
response = requests.get(url)
download_dir.joinpath(filename).write_bytes(response.content)

df.progress_apply(download_file, axis=1)

Watch video How to download files in Python with progress bars online without registration, duration hours minute second in high quality. This video was added by user Jonathan Soma 07 December 2021, don't forget to share it with your friends and acquaintances, it has been viewed on our site 5,035 once and liked it 80 people.

3,853