994
how things become science (lemmy.blahaj.zone)
you are viewing a single comment's thread
view the rest of the comments
[-] percent@infosec.pub 5 points 1 day ago

There are huge public datasets that are often used for pretraining. Common Crawl and C4 are probably the most prominent, but there are others.

There are also big public datasets available for fine-running and instruction tuning.

The open weight models are getting pretty powerful, thanks to some Chinese labs.

this post was submitted on 09 Apr 2026
994 points (99.1% liked)

Science Memes

19845 readers
4097 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 3 years ago
MODERATORS