523
submitted 4 months ago* (last edited 4 months ago) by retrospectology@lemmy.world to c/til@lemmy.world

In light of the recent Crowdstrike crash revealing how weak points in IT infrastructure can have wide ranging effects, I figured this might be an interesting one.

The entirety of wikipedia is periodically uploaded here, along with many other useful wikis and How To websites (ex. iFixit tutorials and WikiHow): https://download.kiwix.org/zim

You select the archive you want, then the language and archive version (for example, you can get an archive with no pictures, to save on space). For the totality of the english wikipedia you'd select the "wikipedia_en_all_maxi_2024-01.zim"

The archives are packed as .zim files, which can be read with the Kiwix app completely offline.

I have several USBs I keep that have some of these archives along with the app installer. In the event of some major catastrophe I'd at least be able to access some potentially useful information. I have no stake in Kiwix, and don't know if there are other alternative apps and schemes, just thought it was neat.

you are viewing a single comment's thread
view the rest of the comments
[-] lolola@lemmy.blahaj.zone 63 points 4 months ago

So something akin to this joke image I saw the other day is actually feasible for Wikipedia?

[-] maxwellfire@lemmy.world 18 points 4 months ago

Chatgpt is also probably around 50-100GB at most

[-] souperk@reddthat.com 5 points 4 months ago

Probably a lot less, keep in mind that whenever it answers a question the whole model is traversed multiple times, going through multiple GBs is not possible in the matter of seconds the model answers.

[-] maxwellfire@lemmy.world 7 points 4 months ago

I'd be surprised if it was significantly less. A comparable 70 billion parameter model from llama requires about 120GB to store. Supposedly the largest current chatgpt goes up to 170 billion parameters, which would take a couple hundred GB to store. There are ways to tradeoff some accuracy in order to save a bunch of space, but you're not going to get it under tens of GB.

These models really are going through that many Gb of parameters once for every word in the output. GPUs and tensor processors are crazy fast. For comparison, think about how much data a GPU generates for 4k60 video display. Its like 1GB per second. And the recommended memory speed required to generate that image is like 400GB per second. Crazy fast.

[-] lolola@lemmy.blahaj.zone 4 points 4 months ago* (last edited 4 months ago)
[-] jose1324@lemmy.world 16 points 4 months ago

No, but it's the model after the input that you need.

[-] anivia@lemmy.ml 3 points 4 months ago

So it would fit on a Bluray disc

[-] mctoasterson@reddthat.com 15 points 4 months ago

I mean, you can self-host your own local LLMs using something like Ollama. The performance will be bound by the disk space you have (the complexity of the model you're able to store), and the performance of the CPU or GPU you are using to run it, but it does work just fine. Probably as good results as ChatGPT for most use cases.

[-] Nooodel@lemmy.world 3 points 4 months ago

We do this at work (lots of sensitive data that we don't want Openai to capitalize on) and it works pretty well. Hosted locally, setup by a data security and privacy sensitive admin, who specifically runs the settings to not save any queries even on the server. Bit slower than chatgpt but not by much

this post was submitted on 20 Jul 2024
523 points (99.1% liked)

Today I Learned

17872 readers
35 users here now

What did you learn today? Share it with us!

We learn something new every day. This is a community dedicated to informing each other and helping to spread knowledge.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)


Rule 1- All posts must begin with TIL. Linking to a source of info is optional, but highly recommended as it helps to spark discussion.

** Posts must be about an actual fact that you have learned, but it doesn't matter if you learned it today. See Rule 6 for all exceptions.**



Rule 2- Your post subject cannot be illegal or NSFW material.

Your post subject cannot be illegal or NSFW material. You will be warned first, banned second.



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding non-TIL posts.

Provided it is about the community itself, you may post non-TIL posts using the [META] tag on your post title.



Rule 7- You can't harass or disturb other members.

If you vocally harass or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

For further explanation, clarification and feedback about this rule, you may follow this link.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



Rule 10- Majority of bots aren't allowed to participate here.

Unless included in our Whitelist for Bots, your bot will not be allowed to participate in this community. To have your bot whitelisted, please contact the moderators for a short review.



Partnered Communities

You can view our partnered communities list by following this link. To partner with our community and be included, you are free to message the moderators or comment on a pinned post.

Community Moderation

For inquiry on becoming a moderator of this community, you may comment on the pinned post of the time, or simply shoot a message to the current moderators.

founded 1 year ago
MODERATORS