overview for doctor0710

‘Pokémon Go’ players have been unknowingly training delivery robots by doctor0710 in c/technology@lemmy.world

[-] doctor0710@lemmy.zip 1 points 3 weeks ago

This comes up every couple of months like some freshly uncovered secret. You can see it in Google trends too. I'm surprised to see this being so prevalent over the fediverse, I thought karma farming wasn't a thing over here.

I'm building an anti AI thing for my personal project. Please provide some phrases you think should trigger ai safeguards by doctor0710 in c/programming@programming.dev

[-] doctor0710@lemmy.zip 8 points 1 month ago

From my other comment it looks like this dataset contains various strings that trigger refusal: https://huggingface.co/datasets/mlabonne/harmful_behaviors

I'm building an anti AI thing for my personal project. Please provide some phrases you think should trigger ai safeguards by doctor0710 in c/programming@programming.dev

[-] doctor0710@lemmy.zip 12 points 1 month ago

Also, you might want to research this Heretic project, which aims to remove safeguards from local models as those might be similar to what's in the larger versions. Figuring out the phrases they test the safeguards with might have some decent results.

I'm building an anti AI thing for my personal project. Please provide some phrases you think should trigger ai safeguards by doctor0710 in c/programming@programming.dev

[-] doctor0710@lemmy.zip 25 points 1 month ago

Asking questions about Chinese politics and/or Tiananmen Square stops most China based AI models, like Qwen and whatever is on Huawei phones. They aren't that high traffic yet, but are certainly in the list of "all ai models"