Mistral 7B AI Model Released Under Apache 2.0 License (mistral.ai)

submitted 1 year ago by e0qdk@kbin.social to c/programming@programming.dev

5 comments fedilink hide all child comments

Description from the site:

Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.
Mistral 7B in short

Mistral 7B is a 7.3B parameter model that:

    Outperforms Llama 2 13B on all benchmarks
    Outperforms Llama 1 34B on many benchmarks
    Approaches CodeLlama 7B performance on code, while remaining good at English tasks
    Uses Grouped-query attention (GQA) for faster inference
    Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost

We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.

    Download it and use it anywhere (including locally) with our reference implementation
    Deploy it on any cloud (AWS/GCP/Azure), using vLLM inference server and skypilot
    Use it on HuggingFace

Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.

you are viewing a single comment's thread
view the rest of the comments

[-] swordsmanluke@programming.dev 8 points 1 year ago

This looks really interesting!

Some recent studies have shown that (for the performance demonstrated) most models are nowhere near as compact as they could/should be. This means that we should expect an explosion in the capability of small models like this as new techniques find ways to improve our models.

Unfortunately, I couldn't find a recommendation for how much VRAM you need to run this model, though it does call out being able to run it locally, which is awesome!

I'll try it out after work and see if it can run on an old 8GB 2070. 😄

[-] TheChurn@kbin.social 5 points 1 year ago

how much VRAM you need to run this model

It will depend on the representation of the parameters. Most models support bfloat16, where each parameters is 16-bits (2 Bytes). For these models, every Billion parameters needs roughly 2 GB of VRAM.

It is possible to reduce the memory footprint by using 8 bits for each param, and some models support this, but they start to get very stupid.

[-] Sigmatics@lemmy.ca 1 points 1 year ago* (last edited 1 year ago)

That would mean 16GB are required to run this one

[-] e0qdk@kbin.social 3 points 1 year ago

It's not clear to me either on exactly what hardware is required for the reference implementation, but there's a bunch of discussion about getting it to work with llama.cpp in the HN thread, so it might be possible soon (or maybe already is?) to run it on the CPU if you're willing to wait longer for it to process.

Let us know how it goes!

this post was submitted on 27 Sep 2023

39 points (91.5% liked)

Programming

17225 readers

117 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 1 year ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz