You can try to use VSCode + roo to intelligently chunk it autonomously. Get a API key from your llm provider of choice, put your data into a text file, Edit the roo agent personalites thats set to coding by default. Instead add and select a custom summarizer persona, for roo to use then tell it to summarize the text file.
As other commenter said your workflow requires more than what LLMs are currently capable of.
Summarization capability in LLMs is an equation of LLMs capacity for coherence over long conversational scaling operated on by the LLMs ability to navigate and distill internal structural mappings of conceptual & contextual archetype patterns as discrete objects across a continuous ambiguity sheaf.
This technical jargon that boils down to the idea that an llms summarization capability depends on its parameter size and enough vram for context lengths. Higher parameter and less quantized models maintaining more coherence over long conversations/datasets.
While enterprise llms are able to get up to 128k tokens while maintaining some level of coherence, the local models of medium quantization can handle 16-32k reliably. Theoretically 70b could maybe handle around 64k tokens but even thats stretching it.
Then comes the problem of transformer attention. You can't just put a whole books worth of text into an LLMs input and expect it to inspect any part in real detail. For best results you have to chunk it section by section, chapter by chapter.
So local llms may not be what you're looking for. If you are willing to go enterprise then Claude sonnet and deepseek R1 might be good especially if you set up a API interface.
The day adblocks/yt-dlp finally loose to google forever is the day I kiss youtube bye-bye. No youtube premium, no 2 minute long unskippable commerical breaks. I am strong enough to break the addiction and go back to the before-fore times when we bashed rocks together and stacked CDs in towers.
Peertube, odysee, bittorrenting, IPTV. Ill throw my favorite content creators a buck or two on patreon to watch their stuff there if needed. We've got options, its a matter of how hot you need to boil the water before the lowest common denominator consumer finally has enough.
Heres the template if anyone wants it
The pocket of air that was where you teleported now get displaced at a very decent fraction of the speed of light while the pocket of space you once ocupied becomes a almost pure vaccum. the air moves so fast it creates a sonic boom that ruptures the ear drums. Then, a few atoms of air collide together with such incredible force the atoms split and causes a small grade nuclear explosion.
My elderly parents in their 60s use linux mint daily and have never had an issue with it (admittedly I did have to set it up for them still). I just set up the desktop shortcuts for them to their websites and turn on automatic updates. The hardest part isn't using an alternative OS like mint or pop, its getting an average person to figure out how to install it. Getting into your BIOS to boot into the installation drive, re-partitioning your harddrive to free up space for dual booting or nuking windows off all together, those are the hardest parts for any first timers IMO. After youve done it a dozen times its no problemo but the first time is nerve racking at least it was to me.
This is a copy/pasted message I wrote up on another thread. As long as there are people in the comments shilling kagi, I will shill my prefered engines. At least my suggestions will bring awareness to free as in freedom projects. I hope to god people paying 10$/month just to not get datacucked by search engines will also learn something and save their money.
SearX/SearXNG is a free and open source, highly customizable, and self-hostable meta search engine. SearX instances act as a middle man, they query other search engines for you, stripping all their spyware ad crap and never having your connection touch their servers. Of course you have to trust the SearX instance host with your query information, but again if you are that paranoid just self host.
I personally trust some foss loving sysadmin that host social services for free out of alturism, who also accepts hosting donations, whos server is located on the other side of the planet, with my query info over Google/Alphabet any day.
Its nice to be able to email and have a human conversation with your search engine provider thats just a knowlegable every day joe who genuinely believes in the project and freely dedicates their resources to it. Consider sending some cash their way to help with upkeep if you like the services they provide, they will probably appreciate and make use of that 10$ better than kagi.
Heres a list of all public searx instances, I personally prefer to use paulgo.io All SearX instances are configured different to index different engines. If one doesn't seem to give good results try a few others.
Did I mention it has bangs like duckduckgo? If you really need google like for maps and buisness info just use !!g in the query
search.marginalia.nu is a completely novel search engine written and hosted by one dude that aims to prioritize indexing lighter websites little to no javascript as these tend to be personal websites and homepages that have poor SEO and the big search engines won't index well. If you remember the internet of the early 2000s and want a nostalgia trip this ones for you. Its also open source and self-hostable
Finally, YaCy is another completely novel search engine that uses peer-to-peer technology to power a big webcrawler which prioritizes indexes based off user queries and feedback. Everyone can download yacy and devote a bit of their computing power to both run their own local instance and help out a collective search engine. Companies can also download yacy and use it to index their private intranets.
They have a public instance available through a web portal. To be upfront, YaCy is not a great search engine for what most people usually want, which is quick and relevant information within the first few clicks. But, it is an interesting use of technology and what a true honest-to-god community-operated search engine looks like untainted by SEO scores or corporate money-making shenanigans.
I hope this has been informative to those who believe theres only a few options to pick from, I know these options are so unknown to most people.
Sometimes I think I made the right decision to just get a huge harddrive and download all my favorite entertainment in drm free format. Movies, music, games, books. I saw this coming a mile away a decade ago. The only thing that will really hurt me is if/when Steam inevitably goes full corporate cucks and starts going hard on the DRM locking down my library.
Now please unremove the shroom community as next priority. Empowering open minded people with the option and knowledge to heal themselves through the use of psychadelics (and other kinds of mushrooms that can potentially help fight diseases such as cancer) that they can grow themselves without big pharma and giving them a community to share their advice+experiences is the right thing to do.
Wow that was actually a cute story, not sure how legit but nice feels none the less