658
The Rule (lemmy.ml)
submitted 9 months ago by roon@lemmy.ml to c/196@lemmy.blahaj.zone
you are viewing a single comment's thread
view the rest of the comments
[-] AdrianTheFrog@lemmy.world 5 points 9 months ago

I don't have access to llama 3.1 405b but I can see that llama 3 70b takes up ~145 gb, so 405b would probably take 840 gigabytes, just to download the uncompressed fp16 (16 bits / weight) model. With 8 bit quantization it would probably take closer to 420 gb, and with 4 bit it would probably take closer to 210 gb. 4 bit quantization is really going to start harming the model outputs, and its still probably not going to fit in your RAM, let alone VRAM.

So yes, it is a crazy model. You'd probably need at least 3 or 4 a100s to have a good experience with it.

this post was submitted on 25 Jul 2024
658 points (100.0% liked)

196

17518 readers
1096 users here now

Be sure to follow the rule before you head out.


Rule: You must post before you leave.



Other rules

Behavior rules:

Posting rules:

NSFW: NSFW content is permitted but it must be tagged and have content warnings. Anything that doesn't adhere to this will be removed. Content warnings should be added like: [penis], [explicit description of sex]. Non-sexualized breasts of any gender are not considered inappropriate and therefore do not need to be blurred/tagged.

If you have any questions, feel free to contact us on our matrix channel or email.

Other 196's:

founded 2 years ago
MODERATORS