12

When I'm writing webscrapers I mostly just pivot between selenium (because the website is too "fancy" and definitely needs a browser) and pure requests calls (both in conjunction with bs4).

But when reading about scrapers, scrapy is often the first mentioned Python package. What am I missing out on if I'm not using it?

you are viewing a single comment's thread
view the rest of the comments
[-] Wats0ns@programming.dev 2 points 1 year ago

The huge feature of scrapy is it's pipelining system: you scrape a page, pass it to the filtering part, then to the deduplication part, then to the DB and so on

Hugely useful when you're scraping and extraction data, I reckon if you're only extracting raw pages then it's less useful I guess

[-] qwertyasdef@programming.dev 1 points 1 year ago

Oh shit that sounds useful. I just did a project where I implemented a custom stream class to chain together calls to requests and beautifulsoup.

[-] Wats0ns@programming.dev 2 points 1 year ago

Yep try scrapy. And also it handles for you the concurrency of your pipelines items, configuration for every part,...

this post was submitted on 18 Jul 2023
12 points (92.9% liked)

Python

6347 readers
2 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

🐍 Python project:
💓 Python Community:
✨ Python Ecosystem:
🌌 Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS