-15

Anyone have recommendation for how I can scrap a website, and extract unique names -- such as product names.

I was thinking of using some website scrapping tool, then a local LLM to find unique product names.

you are viewing a single comment's thread
view the rest of the comments
[-] exu@feditown.com 8 points 1 month ago

Usually you'd have to be the owner or operator of a website to scrap it. I guess hacking into the server and deleting all data would also work.

[-] dudesss@lemmy.ca 0 points 1 month ago* (last edited 1 month ago)

I was thinking of doing it once a day. Even if I have to manually initiate it to be legal. It would only be for personal non-public nor commercial reasons.

It would save me time from manually copying the HTML over to an LLM or something.

[-] hendrik@palaver.p3x.de 5 points 1 month ago* (last edited 1 month ago)

Just read the robots.txt and obey the rules. Also set your user agent string properly. We've had crawlers forever on the internet and that's the long accepted way to give consent or revoke consent, for website owners. Either you match a disallow directive and need to stop. Or you're completely fine to scrape it.

[-] dudesss@lemmy.ca 2 points 1 month ago

Neat, I've never heard of these.

[-] exu@feditown.com 4 points 1 month ago

I was joking about your use of scrap and scrapping, as in to remove or to cancel :)

Web scraping only has one p

this post was submitted on 07 May 2026
-15 points (14.3% liked)

Programming

27201 readers
503 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



founded 3 years ago
MODERATORS