155
OpenAI transcribed over a million hours of YouTube videos to train GPT-4
(www.theverge.com)
This is a most excellent place for technology news and articles.
How clueless are you. Everything "taken" was available for free. Provided for free for any web crawler to consume and now you're acting like consuming it is a crime?
I get that you're really jealous because you didn't think of LLMs but you don't get to claim something is a crime in one specific instance just because you don't like what they're doing after their program consumes content.
Google has done the same thing for years and no one said a peep. What does everyone think search results even are??????
You completely miss my point, are you saying data such as copyrighted published works and medical records are free? Because I did not in any way consent to sharing medical records to OpenAI https://www.businessinsider.com/openai-chatgpt-generative-ai-stole-personal-data-lawsuit-children-medical-2023-6?op=1
Now I realize this is an alleged offense, but it's still fucked up. As for wanting to be the first to make a LLM, I have no desire to put myself into that amount of responsibility and liability. Sam Altman is chasing money and nothing more.
There's a distinct difference between quotation and plagiarism. A search engine does the former, LLMs do the latter.
No. If you write a truly unique combination of words then an LLM will be very unlikely to reproduce them.
An LLM is only likely to plagiarise you if your writing is similar to others.
[citation needed]
https://blog.gdeltproject.org/do-llms-truly-create-or-merely-arrange-just-how-much-of-an-llms-writing-is-original/
So plagiarism...
It only plagiarises you if you write something similar to lots of other people.
Write something original and, even if it is in their training dataset, LLMs are highly unlikely to reproduce it.
Fuck Google too