15
How would you design parallel grep for huge JSONL files?
(lemmy.world)
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Follow the wormhole through a path of communities !webdev@programming.dev
https://jsonltools.com/what-is-jsonl
First time hearing about it, but JSONL is like CSV with json per line. So not really structured.
I don’t see why you couldn’t just use grep on it.
Sorry, I missed that L, and I've never heard about JSONL before (although worked with JSON logs that are effectively JSONL). So, well, you may use
grep, however it can be inefficient (depends on regex engine and how good you are in regexes). It is also easy to make a mistake if you are not very proficient in regexes. So I'd prefer using JSON parser (jqor another, maybe lower level if performance matters) overgrepanyway.parsing will make it orders of magnitude slower
It will not if your parser is not overcomplicated and does not populate some huge structures with data. You only need to find tokens and compare them with field names and values you are looking for. Regexes are slower and don't allow processing escaped characters correctly.