66
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 04 Nov 2023
66 points (92.3% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
54746 readers
670 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
founded 1 year ago
MODERATORS
Okay so, PDF documents are actually already “a collection of images” basically. This website is clearly trying to make it an extra step harder by loading the images individually as you browse the document. You could manually save/download all the images and use a tool to turn it back into a pdf. I haven’t heard of a tool that does this automatically, but it should be possible for a web scraper to make the GET requests sequentially then stitch the pdf back together.
I would go this route as well. As a developer this sounds easy enough. It you don't get vertical sequences of images, but instead a grid of images, then I would apply traditional image stitching techniques. There are tons of libraries for that on github.
Just as a tiny nitpick: PDFs can be just a collection of images. If there's text I would hope that they're embedded as text or vectors, which is what would usually happen if you export a document from a program (that isn't completely bs). So texts / vector graphics etc should not be images usually (unless you scan a document as a PDF - which I hate with a passion but can see that people prefer a single PDF over a collection of image files). Even then you could OCR the imges / PDF to make it searchable etc. Anyway in this case it seems to be a collection of images indeed ^__^