243
submitted 4 days ago by yogthos@lemmy.ml to c/memes@lemmy.ml
you are viewing a single comment's thread
view the rest of the comments
[-] yogthos@lemmy.ml -3 points 3 days ago

Do show me a published data set of the kind you're demanding.

[-] TheOctonaut@mander.xyz 11 points 3 days ago* (last edited 3 days ago)

Since you're definitely asking this in good faith and not just downvoting and making nonsense sealion requests in an attempt to make me shut up, sure! Here's three.

https://commoncrawl.org/

https://github.com/togethercomputer/RedPajama-Data

https://huggingface.co/datasets/legacy-datasets/wikipedia/tree/main/

Oh, and it's not me demanding. It's the OSI defining what an open source AI model is. I'm sure once you've asked all your questions you'll circle back around to whether you disagree with their definition or not.

load more comments (10 replies)
load more comments (10 replies)
this post was submitted on 26 Jan 2025
243 points (95.8% liked)

Memes

46369 readers
1584 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS