377

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head (www.theguardian.com)

submitted 2 years ago by 0x815@feddit.de to c/technology@beehaw.org

197 comments fedilink hide all child comments

In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.

you are viewing a single comment's thread
view the rest of the comments

[-] lostmypasswordanew@feddit.de 9 points 2 years ago

An AI model is a derivative work of its training data and thus a copyright violation if the training data is copyrighted.

[-] BlameThePeacock@lemmy.ca 17 points 2 years ago

A human is a derivative work of its training data, thus a copyright violation if the training data is copyrighted.

The difference between a human and ai is getting much smaller all the time. The training process is essentially the same at this point, show them a bunch of examples and then have them practice and provide feedback.

If that human is trained to draw on Disney art, then goes on to create similar style art for sale that isn't a copyright infringement. Nor should it be.

[-] Phanatik@kbin.social 14 points 2 years ago* (last edited 2 years ago)

This is stupid and I'll tell you why.
As humans, we have a perception filter. This filter is unique to every individual because it's fed by our experiences and emotions. Artists make great use of this by producing art which leverages their view of the world, it's why Van Gogh or Picasso is interesting because they had a unique view of the world that is shown through their work.
These bots do not have perception filters. They're designed to break down whatever they're trained on into numbers and decipher how the style is constructed so it can replicate it. It has no intention or purpose behind any of its decisions beyond straight replication.
You would be correct if a human's only goal was to replicate Van Gogh's style but that's not every artist. With these art bots, that's the only goal that they will ever have.

I have to repeat this every time there's a discussion on LLM or art bots:
The imitation of intelligence does not equate to actual intelligence.

[-] frog@beehaw.org 13 points 2 years ago

Absolutely agreed! I think if the proponents of AI artwork actually had any knowledge of art history, they'd understand that humans don't just iterate the same ideas over and over again. Van Gogh, Picasso, and many others, did work that was genuinely unique and not just a derivative of what had come before, because they brought more to the process than just looking at other artworks.

[-] nickwitha_k@lemmy.sdf.org 7 points 2 years ago

Yup. There seems to be a strong motive in many to not understand this concept as it makes their practices clearly ethically questionable.

[-] frog@beehaw.org 7 points 2 years ago

My feeling is that the vast majority of pro-AI techbros come from a computer science, finance, or business background; undoubtedly intelligent people, but completely and utterly lacking in any appreciation or understanding of what actually goes into creative work. I'm sure they genuinely believe that there's no difference between what a human does and what an AI does, because they think art (or writing, music, etc) are just the product of an algorithm.

[-] Phanatik@kbin.social 2 points 2 years ago

Ironically, my background is in mathematics but I also happen to be a writer so I see both sides of the argument. I just see the utter lack of compassion people have for those who produce creative work and the same people believe that if it can be automated, it should be automated.

[-] nickwitha_k@lemmy.sdf.org 1 points 2 years ago

Likely. Which is weird because algorithms are only a subset of software engineering, which requires abstract and creative thought to perform well.

[-] davehtaylor@beehaw.org 9 points 2 years ago

I really, really, really wish people would understand this.

AI can only create a synthesis of exactly what it's fed. It has no life experience, no emotional experience, no nurture-related experiences, no cultural experiences that color it's thinking, because it isn't thinking.

The "AI are only doing what humans do" is such a brain-dead line of thinking, to the point that it almost feels like it's 100% in bad faith whenever it's brought up.

[-] BlameThePeacock@lemmy.ca 3 points 2 years ago

You're completely wrong, and I'll tell you why.

None of what you said matters, perception filters, intent, intelligence... it's all irrelevant to the discussion.

Copyright infringement only gives certain rights, and at least here in Canada using them to generate a model isn't one of those. Rights are for things like distribution, reproduction, public performance, communication, and exhibition. US law says you can't "Prepare derivative works based upon the work." but the model isn't a derivative work because it's not really a work at all, you can't even visually look at the model. You can't copyright an algorithm in the US or Canada.

Only the created art should be scrutinized for copyright infringement, and these systems can generate both (just like a human can).

Any enforcement should then be handled when that protected work is then used to infringe on the actual rights of the copyright holder.

[-] Phanatik@kbin.social 1 points 2 years ago* (last edited 2 years ago)

I wasn't talking about copyright law in regards to the model itself.

I was talking about what is/isn't grounds for plagiarism. I strongly disagree with the idea that artists and art bots go through the same process. They don't and it's reductive to claim otherwise. It negatively impacts the perception of artists' work to assert that these models can automate a creative process which might not even involve looking at other artists' work because humans are able to create on their own.

A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.

This is why ChatGPT required the internet to do what it does (the privacy violation is another big concern there). The model needed vast quantities of information to be sufficiently trained because language is difficult to decipher. Languages evolved by getting in contact with other languages and organically making new words. ChatGPT will never invent a new word because it's not intelligent, it is merely imitating intelligence.

[-] BlameThePeacock@lemmy.ca 2 points 2 years ago

"A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate."

No, they really can't. Go look a 1 year old's first attempt at "art" because it's nothing more than random smashing of colour on paper. A computer could easily generate such "work" as well with no training data at all. They've seen art at that point, and still can't replicate it because they need much more training first.

Humans require books (or teachers who read books) to learn how to read and write. That is "vast quantities of information" being consumed to learn how to do it. If you had never seen or heard of a book, you wouldn't be able to write a novel. It's also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice)

[-] Phanatik@kbin.social 1 points 2 years ago

Once again, being reductive about artists' work. Jackson Pollock's entire career was smashing colours on a canvas. If you want to argue that Pollock had to look at thousands of paintings before making his, I honestly can't take you seriously at that point.

A computer could easily generate such “work” as well with no training data at all.

Yes and in the eyes of its creators, that was deemed a failure which is why Midjourney and Dall-E are the way they are. These bots don't want to create art, they want to imitate it.

Children have barely any experiences and can still create something. You might not deem it worthy of calling it art but they created something despite their limited knowledge and life experience.

Of course, you'd need books to read and write. The words have to be written and you need to see the words in written form if you also want to write them. But one thing you don't take into account is handwriting. Another thing that is unique to every individual. Some have worse handwriting than others and with practice (like any muscle) it can be improved but you haven't had to have seen handwritten text before writing it yourself. You only need to be taught how to hold a pen and you can write.

Novels are complex structures of language just like poetry. In order to write novels, you have to consume novels because it's well understood that to find your own narrative voice you must see how others express theirs. Stories are told in unique ways and it's crucial as a writer to understand and break these concepts down. Intention and purpose form a core part of storytelling and an LLM cannot and will not be able to express those things.

They're written in certain ways because the author intended them to be that way, such as Cormac McCarthy deciding to be very minimalist with his punctuation.
I would love to see you make a point that an LLM without being specifically prompted to do so would make that stylistic decision. An LLM can't make that decision because unless you specify a style it is aware of, it won't organically do it.

I am also a writer. I've written a short story. One of my stylistic choices is that I don't use dialogue tags like "said". An LLM won't make that choice because it isn't designed to do so, it won't decide to minimise its use of dialogue tags to improve the flow of the narrative unless you told it to.

It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice).

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

[-] BlameThePeacock@lemmy.ca 2 points 2 years ago

Jackson Pollock didn't create paintings, Jackson Pollock's art was story telling and showmanship.

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

Just like different models have their own patterns of writing...

You're thinking about LLMs like they're equivalent to multiple people(or groups of people) but each LLM is equivalent to a single person. The training and resulting function of each one is as distinct as an individual human.

I could raise one of my children to perform the exact same functions as an LLM or art creation tool. Give them exactly the same image/text sets that these models are trained on, and have them practice for a decade or two. Then I could tell them "Hey I need a picture of an orange rabbit riding a bike" and they could draw me one, or write a story about the same topic. There's clearly no copyright infringement in that process, so why would it be different for creating a machine to do the same thing?

[-] Phanatik@kbin.social 1 points 2 years ago

An LLM or art creation tool is barely equatable to one person. The difference between a child and an art creation tool is that you could show a child a single picture of a bunny, a bike and a carrot then ask them to draw an orange bunny riding a bike and they could draw something resembling that. An art bot would require hundreds to thousands of images of each object to understand what it is before it can even make a reasonable attempt. It's not even comparable the level of training required.

At least the child's drawing will have some personality in it, every output from an art bot ends up looking soulless. The reason for that is the simple concept that an art bot only imitates what it's been trained on and an artist draws on inspiration before applying the two things an art bot will never have; intent or purpose.

load more comments (2 replies)

[-] acastcandream@beehaw.org 2 points 2 years ago

this is stupid I’ll tell you why

Not sure why you think anyone would read anything if that’s how you start it.

[-] 50gp@kbin.social 9 points 2 years ago

a human does not copy previous work exactly like these algorithms, whats this shit take?

[-] BlameThePeacock@lemmy.ca 11 points 2 years ago

A human can absolutely copy previous works, and they do it all the time. Disney themselves license books teaching you how to do just that. https://www.barnesandnoble.com/w/learn-to-draw-disney-celebrated-characters-collection-disney-storybook-artists/1124097227

Not to mention the amount of porn online based on characters from copyrighted works. Porn that is often done as a paid commission, expressly violating copyright laws.

[-] Ret2libsanity@infosec.pub 7 points 2 years ago

Neither does AI?

[-] Niello@kbin.social 6 points 2 years ago

But considering that humans do get copyright strikes when they do something too similar that should also applies to AI, doesn't matter if it's not exact.

[-] Phanatik@kbin.social 5 points 2 years ago

That should tell you something about how companies act. They're fine with these LLMs plagiarising content but when someone gets marginally close to their own trademarks, they get slammed.

[-] lostmypasswordanew@feddit.de 8 points 2 years ago

Humans and AI are not the same and an equivalence should never be drawn.

[-] BlameThePeacock@lemmy.ca 2 points 2 years ago

Your feelings don't really matter, the fact of the matter is that the goal of ai is literally to replicate the function of a human brain. The way we're building them is often mimicking the same processes.

[-] nickwitha_k@lemmy.sdf.org 7 points 2 years ago

And LLMs and related technologies, by themselves, are artificial but not intelligent. So, the facts are not in favor of your argument to allow commercial parasitism on creative works.

[-] BlameThePeacock@lemmy.ca 2 points 2 years ago

I think you're missing a point here. If someone uses these to models to produce and distribute copyright infringing works, the original rights holder could go after the infringer.

The model itself isn't infringing though, and the process of creating the model isn't either.

It's a similar kind of argument to the laws that protect gun manufacturers from culpability from someone using their weapon to commit a crime. The user is the one doing the bad thing, they just produce a tool.

Otherwise, could Disney go after a pencil company because someone used one of their pencils to infringe on their copyright. Even if that pencil company had designed the pencil to be extremely good at producing Disney imagery by looking at a whole bunch of Disney images and movies to make sure it matches the size, colour, etc? No, because a pencil isn't a copyright infringement of art, regardless of the process used to design it.

[-] nickwitha_k@lemmy.sdf.org 2 points 2 years ago* (last edited 2 years ago)

Nah. You're missing the forest for the trees. Let's get abstract:

Person A makes a living by making product X and selling it.

Person B makes a living by making product Y and selling it.

Both A and B are in the same industry.

Person C uses a machine to extract the essence of product X and Y and blend them. Person C then claims authorship and sells it as product Z, which they sell in competition to X and Y.

Person C has not created anything. Their machine does not have value in the absence of products X and Y, yet received no permission, offers no credit nor compensation. In addition, they are competing for the same customers and harming the livelihoods of A and B. Person C is acting in a purely parasitic manner that cannot be seen as ethical in any widely accepted definition of the word.

[-] BlameThePeacock@lemmy.ca 1 points 2 years ago

You're missing something even more basic.

The machine Person C has created is not infringing on anything by itself. It's creation was not an infringement. "Extracting essence" isn't a protected right provided by the copyright frameworks. Only the actual art it is used to create could infringe (which most of the generated images do not).

If the final art created is an infringement, the existing copyright system handles that situation just like an infringing piece of art created by a human. The person at fault is the person who used the machine to create an infringing work, not the creator of the machine.

In your scenario, if a human C came along and looked at the art from Person A and B, blended them together into their own style, there wouldn't be any problem either. Even though they received no permission, and offered no credit nor compensation to the original creators. They would only get in trouble if they created an actual piece of art that was too similar to either of the specific artists works and therefore found to be infringing upon the copyright.

[-] nickwitha_k@lemmy.sdf.org 2 points 2 years ago* (last edited 2 years ago)

The scope here is not limited to "can someone legally get in trouble under current law" (which, seems likely but is still working its way through courts). The discussion is specifically discussing ethics. Person C has created nothing. They should have no product to sell, if not for persons A and B. Their competition with those that their product is derived from is a parasitic relationship, plain and simple. They are performing an act of exploitation with measurable harm both to persons A and B but also to further development of their craft by destroying any incentive to continue it.

Now, in some sort of alternate economic system, where one's livelihood is not tied to their vocation, sure, it's possibly not problematic because the economic harm is removed. However, in current capitalist systems that are in place where LLMs are heavily hyped, it's an ethically bankrupt action to take.

ETA: No amount of mental gymnastics can change the fact that use of others' works without their consent to train a model, then claiming authorship and competing IS plainly theft of the labor that went into creating the original works.

That's not too say that LLMs and they like don't have value or often require effort to produce something worthwhile. Just that they need to be used in an ethical manner that improves the human condition, not as another tool to rob others of the fruit of their labors.

[-] BlameThePeacock@lemmy.ca 1 points 2 years ago

I'll remind you the original article title literally contains the words "copyright law"

This discussion is entirely about legality, not ethics.

By your stupid logic, I have created nothing in my job designing automation systems, since I just look at what people currently do, program a computer to do those tasks instead, and I profit off those people no longer needing to do that job.

You want to keep everyone fully employed in needless tasks? Go join the Mennonites.

[-] nickwitha_k@lemmy.sdf.org 2 points 2 years ago

First, feeding something into a machine is not the same as looking at it. Person C literally creates nothing. They are a parasite. There's far more to creating than using statistical modeling algorithms. One cannot claim that that's what people studying a style and then creating someone are doing because it is empirically false.

Second, the scope of the discussion is not just "can someone legally get in trouble".

[-] BlameThePeacock@lemmy.ca 1 points 2 years ago

"Feeding something into a machine is not the same as looking at it" Most scientists would vehemently disagree. Human brains are just a complex and squishy computer. The fact that they're biological makes no difference to how we function. Input goes in, processing occurs, output comes out. Even the term "Computer" started as a job title for a human prior to the invention of mechanical and electric devices.

The scope of the discussion is absolutely what would get you in trouble. That's literally the entire post we're commenting on. We're not arguing if this SHOULD be allowed or not, we're arguing about whether current laws prohibit it.

You keep harping on about parasites, is every person who creates a machine to do a task that competes with humans parasitical in your fucked up world logic? If we want to make a machine to build widgets, an engineer will study how widgets get built, design a machine to do it instead, produce the machine, then a company will use it to outcompete the original manual widget makers. Same process for essentially every machine we've ever invented.

[-] nickwitha_k@lemmy.sdf.org 1 points 2 years ago

"Feeding something into a machine is not the same as looking at it" Most scientists would vehemently disagree. Human brains are just a complex and squishy computer.

In that aspect, we are absolutely in agreement. We are meat computers in meat cages containing necessary support systems. That statement was, perhaps, an oversimplification.

Things like LLMs are attempts to model how the human brain works but are not identical, nor are LLMs, by themselves, capable of intelligence. If one argues contrarily that feeding data into an LLM and using it to produce something is the same, then the one using the LLM is clearly not the author and claiming so is plagiarism of the work of either the creator of the LLM or the LLM itself.

The argument that, legally, IP owners cannot specify that their works may not be used as feedstock for competing commercial products is rather absurd itself and would invalidate all but the most permissive open-source licenses as well as proprietary licenses. As pointed out elsewhere, this line of thought would allow one to steal leaked source code and use it to effectively clone existing software. Use of the source in this manner would be infringing on the owner's IP rights.

Perhaps a good way to think about LLMs is as automated reverse engineering. They take data and statistically model it in order to characterize it. There is substantial case law there and the EFF has a great FAQ on the topic: https://www.eff.org/issues/coders/reverse-engineering-faq

[-] Zapp@beehaw.org 3 points 2 years ago

The goal of AI is fictional, and there's no solid evidence today that it will ever stop being fiction.

What at have today are stupid learning algorithms that are surprisingly good at mimicing intelligent people.

The most apt comparison today is a particularly clever parrot.

I'm all for having the discussion about how to handle AI when we have it, but it's bad faith to apply it to what we have today.

Critically, what we have today will never ever go on strike, or really make any kind of correct moral decision on it's own. We must treat it like dumb automation, because it is dumb automation.

[-] acastcandream@beehaw.org 1 points 2 years ago

the fact of the matter is that the goal of AI is literally to replicate the function of a human brain

…says who? That’s absolutely your feeling and not facts.

[-] conciselyverbose@kbin.social 8 points 2 years ago

Derivative works are only copyright violations when they replicate substantial portions of the original without changes.

The entirety of human civilization is derivative works. Derivative works aren't infringement.

[-] lostmypasswordanew@feddit.de 8 points 2 years ago

That's just not true

[-] conciselyverbose@kbin.social 2 points 2 years ago

It absolutely is. There's nothing out there in the past thousand years that isn't based on other prior art, copyright law only replies to direct copies, and there are explicit cutouts past that that allow you to directly copy some things if your work is transformative.

[-] FaceDeer@kbin.social 5 points 2 years ago

It is not a derivative work, the model does not contain any recognizable part of the original material that it was trained on.

[-] frog@beehaw.org 14 points 2 years ago

Except when it produces exact copies of existing works, or when it includes a recognisable signature or watermark?

[+] NumbersCanBeFun@kbin.social 3 points 2 years ago

[deleted]

[-] frog@beehaw.org 7 points 2 years ago

The point is that if the model doesn't contain any recognisable parts of the original material it was trained on, how can it reproduce recognisable parts of the original material it was trained on?

[-] ricecake@beehaw.org 2 points 2 years ago

That's sorta the point of it.
I can recreate the phrase "apple pie" in any number of styles and fonts using my hands and a writing tool. Would you say that I "contain" the phrase "apple pie"? Where is the letter 'p' in my brain?

Specifically, the AI contains the relationship between sets of words, and sets of relationships between lines, contrasts and colors.
From there, it knows how to take a set of words, and make an image that proportionally replicates those line pattern and color relationships.

You can probably replicate the Getty images watermark close enough for it to be recognizable, but you don't contain a copy of it in the sense that people typically mean.
Likewise, because you can recognize the artist who produced a piece, you contain an awareness of that same relationship between color, contrast and line that the AI does. I could show you a Picasso you were unfamiliar with, and you'd likely know it was him based on the style.
You've been "trained" on his works, so you have internalized many of the key markers of his style. That doesn't mean you "contain" his works.

[-] frog@beehaw.org 2 points 2 years ago

Just because you can't point to a specific part of your brain that contains the letter 'p' doesn't mean it isn't in there somewhere. If you didn't contain the letter 'p', or Getty watermark, or Picasso's work, you wouldn't be able to recognise them when you saw them or tried to replicate them. The act of recognising something that is familiar is basically the brain comparing what the eye sees with what is stored in the memory. The brain stores it differently to an exact copy on a hard drive, but it does, nevertheless, contain everything that it remembers.

[-] ricecake@beehaw.org 1 points 2 years ago

I disagree that recognition implies you contain it. It's much closer to a description than the actual thing, and a description isn't the same as the thing. This is evidenced by you being able to look at a letter P in a font you've never seen before and recognize it without issue. If it was just comparison, you couldn't do that.

[-] FaceDeer@kbin.social 2 points 2 years ago

Ah, this old paper again. When it first came out it got raked over the coals pretty thoroughly. The authors used an older, poorly-trained version of Stable Diffusion that had been trained on only 160 million images and identified 350,000 images from the training set that had many duplicates and therefore could potentially be overfitted. They then generated 175 million images using tags commonly associated with those duplicate images.

After all that, they found 109 images in the output that looked like fuzzy versions of the input images. This is hardly a triumph of plagiarism.

As for the watermark, look closely at it. The AI clearly just replicated the idea of a Getty-like watermark, it's barely legible. What else would you expect when you train an AI on millions of images that contain a common feature, though? It's like any other common object - it thinks photographs often just naturally have a grey rectangle with those white squiggles in it, and so it tries putting them in there when it generates photographs.

These are extreme stretches and they get dredged up every time by AI opponents. Training techniques have been refined over time to reduce overfitting (since what's the point in spending enormous amounts of GPU power to produce a badly-artefacted copy of an image you already have?) so it's little wonder there aren't any newer, better papers showing problems like these.

[-] frog@beehaw.org 6 points 2 years ago

Nevertheless, the Getty watermark is a recognisable element from the images the model was trained on, therefore you cannot state that the models don't spit out images with recognisable elements from the training data.

[-] FaceDeer@kbin.social 1 points 2 years ago* (last edited 2 years ago)

Take a close look at the "watermark" on the AI-generated image. It's so badly mangled that you wouldn't have a clue what it says if you didn't already know what it was "supposed" to say. If that's really something you'd consider "copyrightable" then the whole world's in violation.

The only reason this is coming up in a copyright lawsuit is because Getty is using it as evidence that Stability AI used Getty images in the training set, not that they're alleging the AI is producing copyrighted images.

[-] frog@beehaw.org 6 points 2 years ago

I said "recognisable", and it is clearly recognisable as Getty's watermark, by virtue of the fact that many people, not only I, recognise it as such. You said that the models don't use any "recognizable part of the original material that it was trained on", and that is clearly false because people do recognise parts of the original material. You can't argue away other people's ability to recognise the parts of the original works that they recognise.

[-] FaceDeer@kbin.social 1 points 2 years ago

I said that models don't contain any recognizable part of the original material. They might be able to produce recognizable versions of parts of the original material, as we're seeing here. That's an important distinction. The model itself does not "contain" the images from the training set. It only contains concepts about those images, and concepts are not something that can be copyrighted.

If you want to claim copyright violations over specific output images, sure, that's valid. If I were to hit on exactly the right set of prompts and pseudorandom seed values to get a model to spit out an image that was a dead ringer for a copyrighted work and I was to distribute copies of that resulting image, that's copyright violation. But the model itself is not a copyright violation. No more than an artist is inherently violating copyright because he could potentially pick up his paint brush and produce a copy of an existing work that he's previously seen.

In any event, as I said, Getty isn't suing over the copyright to their watermark.

this post was submitted on 09 Aug 2023

377 points (100.0% liked)

Technology

42944 readers

112 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org