cross-posted from: https://lemmy.world/post/800062
Eric Hartford (a.k.a. faldore) has announced OpenOrca, an open-source dataset and series of instruct-tuned language models he plans to release alongside Microsoft's new open-source challenger, Orca.
You can support Eric and all of the hard work he has done for the open-source community by following his newsletter on his site here.
Eric, if you're reading this and would like to share a donation link - I would be more than happy to include it on this post and any future regarding your work. Shoot me a message anytime.
Eric Hartford's Announcement
Today I'm announcing OpenOrca.
The dataset is completed. ~1mil of GPT4 augmented flanv2 instructions and ~3.5mil of GPT3.5 augmented flanv2 instructions.
We are currently training on LLaMA-13b. We expect completion in about 2 weeks.
When training is complete, we will release the dataset and the model at the same time.
We are seeking GPU compute sponsors for various targets, please consult the blog post and reach out if interested.
Thank you to our sponsors!
A few more highlights from the full article, which you should read here when you have a chance.
We expect to release OpenOrca-LLaMA-13b in mid-July 2023. At that time we will publish our evaluation findings and the dataset.
We are currently seeking GPU compute sponsors for training OpenOrca on the following platforms:
Falcon 7b, 40b LLaMA 7b, 13b, 33b, 65b MPT-7b, 30b Any other targets that get a sponsor. (RWKV, OpenLLaMA)
Dataset consists of:
~1 million of FLANv2 augmented with GPT-4 completions
~3.5 million of FLANv2 augmented with GPT-3.5 completions
If you found this post interesting, please consider subscribing to the /c/FOSAI community at !fosai@lemmy.world where I do my best to keep you in the know with the most important updates in free open-source artificial intelligence.