Is there a better pirates community? (sh.itjust.works)

submitted 2 years ago* (last edited 2 years ago) by Maddison@sh.itjust.works to c/piracy@lemmy.dbzer0.com

14 comments fedilink hide all child comments

[EDIT]: I like this guy, he solved my problem Basically, he showed me a script which can download a pdf file which is the best option there is. I mean, this script gave me high definition images.

MORE IMPORTANTLY, I still would like to know of any other community of pirates, because more is better in many circumstances, but as of now, my problem stands solved for the foreseeable future, thank you to you all! [\EDIT]

POST 1 POST 2 POST 3

Unfortunately, even though we can pirate some stuff off the internet, I doubt if there is an effective way to pirate drive view only documents.

Passing the pdfs of the screenshots through a OCR and Taking screenshots via a script are the best advices I have heard here as well as over reddit. I was wondering if any of you knew of any other pirating communities sailing on the high seas, if so, please help a man out by commenting their links, if you however know how to pirate drive view only documents (other than the methods over here, please feel free to comment them too.

edit: I am a student and it might take me time to reply back but I deeply appreciate your help.

you are viewing a single comment's thread
view the rest of the comments

[-] fkn@lemmy.world 19 points 2 years ago

I think the confusion/difficulty is the mistake that the PDF rendering is happening client side. I don't know this for certain since I haven't spent any time trying to break it, but based on the solutions I have found online leads me to believe that these view only PDFs are server side rendered and what is sent to your browser is only an image.

PDF is a weird file format... It is sometimes just a bunch of jpeg images of pages (scanners that don't do ocr generate PDFs this way) And the PDF isn't anything more than a collection of jpeg images... Or it can be a fully text based document using a proprietary rendering language that needs to be rendered to be viewed... Or it's a series of printer commands that would tell a printer how to print it...

In any case PDF viewers are super complex (basically they need to know how to render all of those different kinds of instructions into a standard document for viewing) and often times they are implemented as image generators (because basically that's what they are, it's also why some PDF viewers don't have text search or form filling, and it's part of why PDF editors are so complex). The result of this is that it's possible that the Google view of the PDF isn't a PDF document... And only the server side rendering of it which means that when the view only option is enabled... There is no PDF to download. You aren't looking at the PDF file. You are looking at the rendering result of the PDF viewer running on a Google server.

In this case you can't download the PDFs... Your best option is to take screen capture of the pages, and run ocr on them.

Basically Google servers are printing the PDF to your screen. You dumb scan it, which generates a PDF that is a collection of jpeg images, then you ocr it, which generates a text version of the PDF.

Those js script snippets literally are a dumb scanner for your screen... That make a PDF from a collection of jpeg images.

Kinda nuts.