37
submitted 2 days ago* (last edited 1 day ago) by happeningtofry99158@lemmy.world to c/opensource@lemmy.ml

So one of my pdfs has a page number and a link at the bottom of every page. It's around 500 pages so I dont want to edit it manually. Is there any way I can delete those things all at once from all pages of the pdf?

Maybe ghost script or python script can do this?

I also notice there isn't a PDF community in Lemmy, maybe somebody should create one.

Thanks a lot in advance.

you are viewing a single comment's thread
view the rest of the comments
[-] thevoidzero@lemmy.world 3 points 1 day ago

I don't know how comfortable you are writing your own, but pdf saves the components with coordinates, bounding box etc so you should be able to automate it with a small script that reads pdf components directly.

Also try qpdf to convert pdf into qdf format, then you can open it in a text editor, find the element you want to remove. Look at examples of few pages, find the pattern and do regex replace. Make sure to keep a copy and check the diff before accepting it.

this post was submitted on 20 Jun 2025
37 points (95.1% liked)

Open Source

38107 readers
239 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS