252
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 11 Jan 2024
252 points (100.0% liked)
Technology
37720 readers
470 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than "well anyone can draw, children do it all the time". They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don't get it, and to an extent, that's fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.
Well, that’s not me. I’m a creative, and I see deep parallels between how LLMs work and how my own mind works.
Either you're vastly overestimating the degree of understanding and insight AIs possess, or you're vastly underestimating your own capabilities. :)
This whole AI craze has just shown me that people are losing faith in their own abilities and their ability to learn things. I've heard so many who use AI to generate "artwork" argue that they tried to do art "for years" without improving, and hence have come to conclusion that creativity is a talent that only some have, instead of a skill you can learn and hone. Just because they didn't see results as fast as they'd have liked.
Very well said! Creativity is definitely a skill that requires work, and for which there are no short cuts. It seems to me that the vast majority of people using AI for artwork are just looking for a short cut, so they can get the results without having to work hard and practice. The one valid exception is when it's used by disabled people who have physical limitations on what they can do, which is a point that's brought up occasionally - and if that was the one and only use-case for these models, I think a lot of artists would actually be fine with that.
I started drawing seriously when I was 14. Looking at my old artwork, I didn't start improving fast until I was around 19 or 20. Not to say I didn't improve at all during those five to six years but the pace did get faster once I had "learned to learn" so to say. That is to say it can take a lot of patience to get to a point where you actually start seeing improvement fast enough to stay motivated. But it is 100% worth it because at the end you have a lot of things you have created with your own two hands.
And regarding the point on physical limitations, I can't blame anyone in a situation like that for using AI if they have no other chance for realising their imaginations. For others, it is completely possible and not reserved for people who have some mythical innate talent. Just grab a pen or a brush and enjoy the process of honing a fine skill regardless of the end result. ❤️
Alternatively, you might be vastly overestimating human "understanding and insight", or how much of it is really needed to create stuff.
Average humans, sure, don't have a lot of understanding and insight, and little is needed to be able to draw a doodle on some paper. But trained artists have a lot of it, because part of the process is learning to interpret artworks and work out why the artist used a particular composition or colour or object. To create really great art, you do actually need a lot of understanding and insight, because everything in your work will have been put there deliberately, not just to fill up space.
An AI doesn't know why it's put an apple on the table rather than an orange, it just does it because human artists have done it - it doesn't know what apples mean on a semiotic level to the human artist or the humans that look at the painting. But humans do understand what apples represent - they may not pick up on it consciously, but somewhere in the backs of their minds, they'll see an apple in a painting and it'll make the painting mean something different than if the fruit had been an orange.
Interestingly, LLMs seem to show emerging semiotic organization. By analyzing the activation space of the neural network, related concepts seem to get trained into similar activation patterns, which is what allows LLMs to zero shot relationships when executed at a "temperature" (randomness level) in the right range.
Pairing an LLM with a stable diffusion model, allows the resulting AI to... well, judge by yourself: https://llm-grounded-diffusion.github.io/
I'm unconvinced that the fact they're getting better at following instructions, like putting objects where the prompter specifies, or changing the colour, or putting the right number of them, etc means the model actually understands what the objects mean beyond their appearance. It doesn't understand the cultural meanings attached to each object, and thus is unable to truly make a decision about why it should place an apple rather than an orange, or how the message within the picture changes when it's a red sports car rather than a beige people-carrier.
Well, that's part of the LLM step, so let's put it to the test:
Image prompt:
Create an ad for a car. The target audience are young adults, with high income, and thrill seeking. Come up with a brand name fitting such a car, and include it in the image. For the car, come up with a shape, and color, best fitting of the target audience. Come up with an image layout, art style, and camera angle, best fitting of the target audience. Include between zero and two additional items that will make the overall picture more attractive to the target audience.
ChatGPT:
[('an advertisement for a car', [45, 58, 422, 396]), ('a high-performance car', [123, 193, 266, 128]), ('an edgy and dynamic brand name logo', [188, 10, 136, 30]), ('a sleek, red sports car', [168, 254, 176, 102])] Background prompt: An adrenaline-pumping car advertisement targeting young adults with high income and a thrill-seeking spirit. The layout includes a bold brand name logo, a sleek red sports car, and a dynamic composition to captivate the audience. Negative prompt: additional items
How did it know to pick a "sleek red sports car"? Or the rest of the elements.
Because some of its training data included some of the many, many websites out there that describe marketing techniques. However, your example has actually proved my point - the red sports car is a car for insecure middle-aged men needing a mid-life crisis penis extension. The LLM has entirely missed that cultural association, and has basically suggested a red sports car for a young audience, when an alternate colour would actually be more appropriate - because it doesn't actually understand what a red sports car means.
It also hasn't actually picked any distinctive elements that couldn't be found on a website offering generic marketing advice. "A dynamic composition" is obvious, but it hasn't specified any details about what the composition should look like. It hasn't detailed any of the surrounding scenery. It says you should include a brand name logo, which was obvious because you prompted it to come up with a brand name, but it's failed to detail what those should actually look like. The entirety of the elements it's created here is "sleek red sports car", which has a cultural connotation inappropriate to the target audience, and the rest you could literally get from any search for "how do I create an advert for a car?"