341

About as open source as a binary blob without the training data (slrpnk.net)

submitted 1 year ago by Prunebutt@slrpnk.net to c/memes@lemmy.world

49 comments fedilink hide all child comments

Office space meme:

"If y'all could stop calling an LLM "open source" just because they published the weights... that would be great."

you are viewing a single comment's thread
view the rest of the comments

[-] pennomi@lemmy.world 9 points 1 year ago

It’s just AI haters trying to find any way to disparage AI. They’re trying to be “holier than thou”.

The model weights are data, not code. It’s perfectly fine to call it open source even though you don’t have the means to reproduce the data from scratch. You are allowed to modify and distribute said modifications so it’s functionally free (as in freedom) anyway.

[-] WraithGear@lemmy.world 7 points 1 year ago

Right. You could train it yourself too. Though its scope would be limited based on capability. But that’s not necessarily a bad thing. Taking a class? Feed it your text book. Or other available sources, and it can help you on that subject. Just because it’s hard didn’t mean it’s not open

[-] Prunebutt@slrpnk.net 0 points 1 year ago

You could train it yourself too.

How, without information on the dataset and the training code?

[-] WraithGear@lemmy.world 2 points 1 year ago* (last edited 1 year ago)

So i am leaning as much as i can here, so bear with me. But it accepts tokenized data and structures it via a transformer as a json file or sun such. The weights are a binary file that’s separate and is used to, well, modify the tokenized data to generate outcomes. As long as you used a compatible tokenization structure, and weights structure, you could create a new training set. But that can be done with any LLM. You can’t pull the data from this just as you can’t make wheat from dissecting bread. But they provide the tools to set your own data, and the way the LLM handles that data is novel, due to being hamstrung by US sanctions. A “necessity is the mother of invention” and all that. Running comparable ai’s on inferior hardware and much smaller budget is what makes this one stand out, not the training data.

[-] Prunebutt@slrpnk.net 1 points 1 year ago* (last edited 1 year ago)

It's still not open source. No matter how extendable the weights are.

[-] WraithGear@lemmy.world 1 points 1 year ago

I mean, this does not help me understand.

[-] Prunebutt@slrpnk.net 2 points 1 year ago* (last edited 1 year ago)

https://slrpnk.net/comment/13455788

Edit: this one is a more thorough explanation: https://lemmy.ml/comment/16365208

[-] pennomi@lemmy.world 0 points 1 year ago

Training code created by the community always pops up shortly after release. It has happened for every major model so far. Additionally you have never needed the original training dataset to continue training a model.

[-] Prunebutt@slrpnk.net 3 points 1 year ago

So, Ocarina of Time is considered open source now, since it's been decompiled by the community, or what?

Community effort and the ability to build on top of stuff doesn't make anything open source.

Also: initial training data is important.

[-] Prunebutt@slrpnk.net 6 points 1 year ago* (last edited 1 year ago)

Let's transfer your bullshirt take to the kernel, shall we?

The kernel is instructions, not code. It’s perfectly fine to call it open source even though you don’t have the code to reproduce the kernel from scratch. You are allowed to modify and distribute said modifications so it’s functionally free (as in freedom) anyway.

🤡

Edit: It's more that so-called "AI" stakeholders want to launder it's reputation with the "open source" label.

this post was submitted on 28 Jan 2025

341 points (93.8% liked)

memes

21552 readers

548 users here now

Community rules

1. Be civil

No trolling, bigotry or other insulting / annoying behaviour

2. No politics

This is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent reposts

Check for reposts when posting a meme, you can only repost after 1 month

4. No bots

No bots without the express approval of the mods or the admins

5. No Spam/Ads/AI Slop

No advertisements or spam. This is an instance rule and the only way to live. We also consider AI slop to be spam in this community and is subject to removal.

A collection of some classic Lemmy memes for your enjoyment

Sister communities

!tenforward@lemmy.world : Star Trek memes, chat and shitposts
!lemmyshitpost@lemmy.world : Lemmy Shitposts, anything and everything goes.
!linuxmemes@lemmy.world : Linux themed memes
!comicstrips@lemmy.world : for those who love comic stories.

founded 3 years ago

MODERATORS

Tenthrow@lemmy.world

The_Picard_Maneuver@lemmy.world

The_Picard_Maneuver@startrek.website