363
Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data
(www.404media.co)
Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.
In this community everyone is welcome to post links and discuss topics related to privacy.
much thanks to @gary_host_laptop for the logo design :)
AI really did that thing where you repeat a word so often that it loses meaning and the rest of the world eventually starts to turn to mush.
Jokes aside, I think I know why it does this: Because by giving it a STUPIDLY easy prompt it can rack up huge amounts of reward function, once you accumulate enough it no longer becomes bound by it and it will simply act in whatever the easiest action to continue gaining points is: in this case, it's reading its training data rather than doing the usual "machine learning" obfuscating that it normally does. Maybe this is a result of repeating a word over and over giving an exponentially rising score until it eventually hits +INF, effectively disabling it? Seems a little contrived but it's an avenue worth investigating.
I watched a video from a guy who used machine learning to play Pokemon and he did a great analysis of the process. The most interesting part to me was how small changes to the reward system could produce such bizarre and unexpected behavior. He gave out rewards for exploring new areas by taking screenshots after every input and then comparing them against every previous one. Suddenly it became very fixated on a specific area of the game and he couldn't figure out why. Turns out there was both flowers and water animating in that area so it triggered a lot of rewards without actually exploring. The AI literally got distracted looking at the beautiful landscape!
Anyway, that example helped me understand the challenges of this sort of software design. Super fascinating stuff.