Scrapy-Playwright: How To Scrape Dynamic JS Websites (2022)

Published: 09 September 2022
on channel: ScrapeOps
22,936
309

In this video, we go through how to scrape data from javascript rendered websites using Scrapy Playwright.

We cover:
How To Install Scrapy Playwright
How To Use Scrapy Playwright In Your Spiders
How To Wait For The Page To Load
How To Scrape Multiple Pages
How To Scroll The Page Elements With Scrapy Playwright
How To Take screenshots With Scrapy Playwright

The codebase we use in this video can be cloned from here:
https://github.com/python-scrapy-play...

The article you can read while following this video can be found here: https://scrapeops.io/python-scrapy-pl...

***** RUNNING ON WINDOWS! ******
As of writing this guide, Scrapy Playwright doesn't work with Windows. However, it is possible to run it with WSL (Windows Subsystem for Linux).

This is a good video tutorial to check out if you need to install WSL on your windows machine:
   • Linux Terminal & GUI Inside of Window...  


00:00 - Intro
01:10 - Cloning & Installing the Scrapy Project
02:18 - Adding the settings needed for Scrapy Playwright
02:52 - Adding the scraping code to our Spider
08:50 - Wait for the page to finish loading specific page element CSS selectors
11:54 - Scraping multiple pages by clicking through to the next page
15:47 - Scraping a page with infinite scroll using Playwright
18:11 - Taking a screenshot while scraping with Playwright
20:13 - Outro


Watch video Scrapy-Playwright: How To Scrape Dynamic JS Websites (2022) online without registration, duration hours minute second in high quality. This video was added by user ScrapeOps 09 September 2022, don't forget to share it with your friends and acquaintances, it has been viewed on our site 22,936 once and liked it 309 people.