504

You can do anything at Zombocom (sopuli.xyz)

submitted 3 months ago by Gork@sopuli.xyz to c/programmer_humor@programming.dev

75 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] luciole@beehaw.org 3 points 3 months ago* (last edited 3 months ago)

You have basically two options: treat HTML as a string or parse it then process it with higher level DOM features.

The problem with the second approach is that HTML may look like an XML dialect but it is actually immensely quirky and tolerant. Moreover the modern web page is crazy bloated, so mass processing pages might be surprisingly demanding. And in the end you still need to do custom code to grab the data you're after.

On the other hand string searching is as lightweight as it gets and you typically don't really need to care about document structure as a scraper anyways.

[-] yetAnotherUser@lemmy.ca 2 points 3 months ago

That makes a ton of sense. I hadn't thought about the page size yet. Thanks again.

this post was submitted on 13 Dec 2025

504 points (98.3% liked)

Programmer Humor

30398 readers

1877 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

Keep content in english
No advertisements
Posts must be related to programming or programmer topics

founded 2 years ago

MODERATORS

Feyter@programming.dev

anzo@programming.dev

BurningTurtle@programming.dev

pylapp@programming.dev