78
submitted 5 months ago* (last edited 5 months ago) by elias_griffin@lemmy.world to c/technology@lemmy.world

I used to be the Security Team Lead for Web Applications at one of the largest government data centers in the world but now I do mostly "source available" security mainly focusing on BSD. I'm on GitHub but I run a self-hosted Gogs (which gitea came from) git repo at Quadhelion Engineering Dev.

Well, on that server I tried to deny AI with Suricata, robots.txt, "NO AI" Licenses, Human Intelligence (HI) License links in the software, "NO AI" comments in posts everywhere on the Internet where my software was posted. Here is what I found today after having correlated all my logs of git clones or scrapes and traced them all back to IP/Company/Server.

Formerly having been loathe to even give my thinking pattern to a potential enemy I asked Perplexity AI questions specifically about BSD security, a very niche topic. Although there is a huge data pool here in general over many decades, my type of software is pretty unique, is buried as it does not come up on a GitHub search for BSD Security for two pages which is all most users will click, is very recent comparitively to the "dead pool" of old knowledge, and is fairly well recieved, yet not generally popular so GitHub Traffic Analysis is very useful.

The traceback and AI result analysis shows the following:

  1. GitHub cloning vs visitor activity in the Traffic tab DOES NOT MATCH any useful pattern for me the Engineer. Likelyhood of AI training rough estimate of my own repositories: 60% of clones are AI/Automata
  2. GitHub README.md is not licensable material and is a public document able to be trained on no matter what the software license, copyright, statements, or any technical measures used to dissuade/defeat it. a. I'm trying to see if tracking down whether any README.md no matter what the context is trainable; is a solvable engineering project considering my life constraints.
  3. Plagarisation of technical writing: Probable
  4. Theft of programming "snippets" or perhaps "single lines of code" and overall logic design pattern for that solution: Probable
  5. Supremely interesting choice of datasets used vs available, in summary use, but also checking for validation against other software and weighted upon reputation factors with "Coq" like proofing, GitHub "Stars", Employer History?
  6. Even though I can see my own writing and formatting right out of my README.md the citation was to "Phoronix Forum" but that isn't true. That's like saying your post is "Tick Tock" said. I wrote that, a real flesh and blood human being took comparitvely massive amounts of time to do that. My birthname is there in the post 2 times [EDIT: post signature with my name no longer? Name not in "about" either hmm], in the repo, in the comments, all over the Internet.

[EDIT continued] Did it choose the Phoronix vector to that information because it was less attributable? It found my other repos in other ways. My Phoronix handle is the same name as GitHub username, where my handl is my name, easily inferable in any, as well as a biography link with my fullname in the about.[EDIT cont end]

You should test this out for yourself as I'm not going to take days or a week making a great presentation of a technical case. Check your own niche code, a specific code question of application, or make a mock repo with super niche stuff with lots of code in the README.md and then check it against AI every day until you see it.

P.S. I pulled up TabNine and tried to write Ruby so complicated and magically mashed, AI could offer me nothing, just as an AI obsucation/smartness test. You should try something similar to see what results you get.

you are viewing a single comment's thread
view the rest of the comments
[-] catloaf@lemm.ee 1 points 5 months ago

Authors shouldn't be paid for their labor?

[-] VictoriaAScharleau@lemmy.world -2 points 5 months ago

I didn't say that. you're making a leap of logic

[-] catloaf@lemm.ee 1 points 5 months ago

Yes, I am. Logically, if an author creates something and cannot control its distribution, it is available to everyone at no cost, therefore the author will never see a dime for their labor.

This discounts the donation model, because in practice, it rarely pays the bills. It also ignores patronage, because I doubt that you want the creation of art to be dependent on the generosity of the rich.

Thus, it makes sense for the author to maintain certain rights over the product of their labor. They provide the work under their terms, e.g. requiring payment for a copy, and that relatively low cost to the average Joe provides the money they need to buy food, pay rent, etc.

[-] VictoriaAScharleau@lemmy.world -2 points 5 months ago

you recognize two well known cases where copyright is not necessary to get paid. I don't think there is even an argument at this point. have a nice day.

[-] catloaf@lemm.ee 1 points 5 months ago

Yes, and I said they're not feasible, because they've been tried in the past and present and found to not work very well. If you disagree, I'm happy to hear your thoughts.

[-] VictoriaAScharleau@lemmy.world -2 points 5 months ago

you claim they are not feasible, but we know people do get paid through them, so you're just lying.

[-] catloaf@lemm.ee 1 points 5 months ago

Yes, they do get paid, but not a living wage.

For the donation model, most people doing that work that I've talked to have day jobs, and do the other work on the side. There's a reason the donation platform buttons say things like "buy me a coffee" and not "pay my rent for the month": it's because the donations don't cover rent.

For the patronage model, like I said, I don't think anyone wants work like this to be controlled by a handful of rich people.

I'm still interested in hearing your thoughts if you have more than "nuh uh" and "you're lying".

[-] VictoriaAScharleau@lemmy.world -2 points 5 months ago

only one person would need to be able to live on either model to disprove your claim. since that has definitely happened, you're definitely lying.

[-] catloaf@lemm.ee 1 points 5 months ago

Just because it works once doesn't mean it'll work all the time for everyone.

[-] VictoriaAScharleau@lemmy.world -1 points 5 months ago

any time it has worked proves you are wrong. the top 50 patreons clear over $100k a year

[-] catloaf@lemm.ee 1 points 5 months ago

Exactly, and since it certainly follows a long tail distribution, the rest of the 250,000 creators on patreon make a tiny fraction of that. For the vast majority of people, it doesn't provide a primary income.

I'm not sure you want to rely on Patreon in any case, since it also relies on the retention of rights for profit. In your scenario, when they upload to Patreon, anyone involved could tell them to get fucked and pay the author nothing.

[-] VictoriaAScharleau@lemmy.world 0 points 5 months ago

Exactly, and since it certainly follows a long tail distribution, the rest of the 250,000 creators on patreon make a tiny fraction of that. For the vast majority of people, it doesn’t provide a primary income.

this is true for the vast majority of storytellers and artists and musicians through all of history.

[-] VictoriaAScharleau@lemmy.world -1 points 5 months ago

I'm not sure you want to rely on Patreon in any case, since it also relies on the retention of rights for profit. In your scenario, when they upload to Patreon, anyone involved could tell them to get fucked and pay the author nothing.

anyone could do that now. people still get paid.

this post was submitted on 15 Jun 2024
78 points (77.5% liked)

Technology

59674 readers
1878 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS