Best way to search files on remote server? (lemmy.zip)

submitted 2 weeks ago* (last edited 2 weeks ago) by First_Thunder@lemmy.zip to c/selfhost@lemmy.ml

10 comments fedilink hide all child comments

Context: my father is a lawyer and therefore has a bajillion pdf files that were digitised, stored in a server. I’ve gotten an idea on how to do OCR in all of them.

But after that, how can I make them easily searchable? (Keep in mind that unfortunately, the directory structure is important information to classify the files, aka you may have a path like clientABC/caseAV1/d.pdf

you are viewing a single comment's thread
view the rest of the comments

[-] greyfox@lemmy.world 2 points 1 week ago

If you want the search to be flexible like handling things like root stemming (i.e. for matching words that are pluralized etc) you might want to put the text into an Elasticsearch database.

You might run into problems with the field length if these are long documents. A possible solution to that would be an putting each page into its own field inside of the document.

If this is for a non tech user to search, the Kibana interface should be relatively easy for anyone to use.

this post was submitted on 19 Sep 2025

5 points (100.0% liked)

Self Hosted - Self-hosting your services.

16140 readers

4 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules

No harassment
crossposts from c/Open Source & c/docker & related may be allowed, depending on context
Video Promoting is allowed if is within the topic.
No spamming.
Stay friendly.
Follow the lemmy.ml instance rules.
Tag your post. (Read under)

Important

Lemmy doesn't have tags yet, so mark it with [Question], [Help], [Project], [Other], [Promoting] or other you may think is appropriate. This is strongly encouraged!

Cross-posting

!everything_git@lemmy.ml is allowed!
!docker@lemmy.ml is allowed!
!portainer@lemmy.ml is allowed!
!fediverse@lemmy.ml is allowed if topic has to do with selfhosting.
!selfhosted@lemmy.ml is allowed!

If you see a rule-breaker please DM the mods!

founded 4 years ago

MODERATORS

dogmuffins@lemmy.ml

Zoe8338@lemmy.ml

testman@lemmy.ml