AI is creeping into the Linux kernel - and official policy is needed ASAP : linux

[-] ExLisper@lemmy.curiana.net 29 points 10 hours ago

I guess the policy is that the code is reviewed. What does it matter if it was AI generated or not? If someone submits bullshit AI generated code he will be ignored in the future.

[-] soc@programming.dev 34 points 7 hours ago* (last edited 7 hours ago)

I would be deeply uncomfortable to work in an environment where one couldn't ask the author of a change for insights or rationale, because the author let some machine write it and therefore lacks any deeper understanding.

[-] ChairmanMeow@programming.dev 22 points 6 hours ago

For me it's grounds to deny a merge request. Can't explain your code? Then it's evidently not clear enough. Come back when it is.

[-] Korhaka@sopuli.xyz 13 points 5 hours ago

Should apply to all code too. Doesn't matter if you, stackoverflow or AI wrote it.

[-] sip@programming.dev 1 points 1 hour ago

you think linux kernel problems are solved on stack overflow? app coding vs lib coding is a huge gap in what the code looks like, I don't even want to think of kernel code.

[-] Senal@programming.dev 17 points 7 hours ago

Volume and Moderation.

Generating slop is significantly quicker.

You get an increase in volume of people pushing slop, which then has to be reviewed. In addition to the increase in submissions you also get the increase in fidelity/general complexity of the submissions.

Reviewing a PR generated by LLM's used by amateurs is more involved than an equivalent PR written directly by said amateur.

Straight up coding mistakes aren't most of the issue, it's the complex architectural and logical bugs that are going to be the problems.

Stuff that's functional but logically/architecturally unsound is much harder to spot and it's significantly easier to generate these kinds of issues with an LLM than to write them out by hand.

If someone submits bullshit AI generated code he will be ignored in the future.

Like this for example, a seemingly reasonable functional argument that is relatively logically unsound, in that is focuses on a narrow "happy path" and ignores where the actual issues are.

1 . To get to the stage where you can block this person you need to review the code first and identify if there is an issue.

Doing this for LLM generated code takes longer, on average.

It's also now possible for people less skilled to generate a higher volume of code that looks more reasonable, so that increases the total amount of reviews needed.

So the existing process of reviewing people and code is now a multiple more difficult and resource consuming.

Which is generally what people want addressed.

Can LLM's help?, possibly.

Are there issues that are going to become a large resource problem if we don't actually address them, yes.

[-] ExLisper@lemmy.curiana.net -1 points 7 hours ago

Ok, so you're suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it's components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.

I think the slop patches are small fixes suggested by some AI code analysis tools, that architectural and complex changes are part of well defined roadmap and don't come out of nowhere and that code that doesn't follow conventions is easily spotted and rejected. The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches. The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me.

[-] Senal@programming.dev 4 points 6 hours ago* (last edited 5 hours ago)

TL;DR;

You asked why it mattered if it's LLM generated or not, i provided examples where it does matter, nothing you've said in your reply seems to refute that so I'll just assume we've agreed on this point.

The rest of this reply is just me replying to your additional arguments.

Ok, so you’re suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it’s components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.

I mean, i didn't say any of that but feel free to doubt a position you just made up.

I think the slop patches are small fixes suggested by some AI code analysis tools.

There's no reason to believe that LLM usage is limited to small patches.

that architectural and complex changes are part of well defined roadmap and don’t come out of nowhere and that code that doesn’t follow conventions is easily spotted and rejected.

In a well maintained project, sure, ish, but let's just say you're right about the plan/roadmap phase.

The spotting and rejection you mentioned are now significantly more time and resource consuming for the reasons i stated in the previous reply.

Also when i used the word architecturally i was referring to the logical domain of the patch and the things it interacts with, i wasn't implying that LLM's would get a chance at re-architecting an entire project as large as the Linux kernel.

At least i'd hope not.

The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches.

I'm not sure of the usefulness of this kind of marking in practice, but i can tell you a way in which it might be useful.

The way you need to go about evaluating LLM generated code vs human code can be different.

And before you get on your high horse I'm not saying we shouldn't be doing a good job reviewing in general, of course we should.

Review and testing resources are limited in most practical settings, we should be focusing on best utilising that resource in the most efficient manner possible.

There are tools specifically geared towards evaluating LLM generated code for specific mistakes, this marking would enable a more efficient usage/allocation of review resources over and above the baseline code-quality tests.

The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me

Which is clear from your answers, if you don't understand how pull request review works in practice you're going to struggle to make a coherent argument that requires that understanding.

To answer the statement directly, there's sometimes no efficient way to tell which patches are from amateurs, even without LLM's.

The issue isn't even just relegated to amateurs, i would like to assume a competent dev of any skill level wouldn't be submitting patches they don't understand but that's just not always the case.

and again, think architecture with a 'little a' rather than a 'big A'.

Logical flow and domain understanding in a relatively limited scope, rather than system-wide structural change.

The difference between tactics and strategy.

[-] ExLisper@lemmy.curiana.net 0 points 5 hours ago

Are you Linux kernel contributor?

[-] Senal@programming.dev 2 points 5 hours ago* (last edited 5 hours ago)

No.

You ?

edit: If any of my answers made it seem like i was, let me know and i'll adjust them, that was not my intention.

[-] ExLisper@lemmy.curiana.net 0 points 5 hours ago

No. Let's wait for someone who knows what they are talking about.

[-] Senal@programming.dev 1 points 5 hours ago

You mean like a software developer who has to deal with PR's from sources that may or may not include LLM generated code ?

If that's the case, i might know someone.......

Wait... unless your original assertion was very specifically about only linux kernel development and not about the principles that apply to software PR review and LLM's as a whole ?

In that case, i don't have anyone to hand and you should probably mark it "Active Linux Kernel Contributors Only".

It's clearer that way.

[-] communism@lemmy.ml 5 points 9 hours ago

The issue is that it's easy for AI generated code to be subtly wrong in ways that are not immediately obvious to a human. The Linux kernel is written in C, a language that lets you do nearly anything, and is also inherently a privileged piece of software, making Linux bugs more serious to begin with.

The other problem is, of course, you can block someone submitting AI slop but there's a lot of people in the world. If there's a barrage of AI slop patches from lots of different people it's going to be a real problem for the maintainers.

[-] ExLisper@lemmy.curiana.net 2 points 8 hours ago

The issue is that it’s easy for AI generated code to be subtly wrong in ways that are not immediately obvious to a human.

Same with human generated code. AI bug are not magically more creative than human bugs. If the code is not readable/doesn't follow conventions you reject it regardless of what generated it.

The other problem is, of course, you can block someone submitting AI slop but there’s a lot of people in the world. If there’s a barrage of AI slop patches from lots of different people it’s going to be a real problem for the maintainers.

You don't need official policy to reject a barrage of AI slop patches. If you receive to many patches to process you change the submission process. It doesn't matter if the patches are AI slop or not.

Spamming maintainers is obviously bad but saying that anything AI generated in the kernel is a problem in itself is bullshit.

[-] communism@lemmy.ml 2 points 6 hours ago

saying that anything AI generated in the kernel is a problem in itself is bullshit.

I never said that.

Same with human generated code. AI bug are not magically more creative than human bugs. If the code is not readable/doesn’t follow conventions you reject it regardless of what generated it.

You may think that, but preliminary controlled studies do show that more security vulns appear in code written by a programmer who used an AI assistant: https://dl.acm.org/doi/10.1145/3576915.3623157

More research is needed of course, but I imagine that because humans are capable of more sophisticated reasoning than LLMs, the process of a human writing the code and deriving an implementation from a human mind is what leads to producing, on average, more robust code.

I'm not categorically opposed to use of LLMs in the kernel but it is obviously an area where caution needs to be exercised, given that it's for a kernel that millions of people use.

[-] CallMeAnAI@lemmy.world -3 points 8 hours ago* (last edited 8 hours ago)

Slippery slope bullshit. Completely ignoring that humans do all this dumb shit.

[-] jaykrown@lemmy.world 0 points 8 hours ago

It's about the people. If the AI generated code is subtly wrong, then it's on the community to test it and spot it. That's why it's important to have protocols and testing. The funny thing is you can also use AI to highlight bad code.

[-] 6nk06@sh.itjust.works 65 points 13 hours ago

They found 1 (one!) commit in git, and report that's it's all over the kernel. Nice journalism.

[-] cupcakezealot@piefed.blahaj.zone 4 points 7 hours ago* (last edited 7 hours ago)

"it's one horse and they report that it's all over troy. nice journalism" - people living in troy

[-] fmstrat@lemmy.nowsci.com 1 points 5 hours ago* (last edited 5 hours ago)

I mean, read into what they wrote about:

I'm pleased to announce the release of AUTOSEL, a complete rewrite of the stable kernel patch selection tool that Julia Lawall and I presented back in 2018[1]. Unlike the previous version that relied on word statistics and older neural network techniques, AUTOSEL leverages modern large language models and embedding technology to provide significantly more accurate recommendations.

...

Would be great to hear more. My very subjective feeling is that the last batch of AUTOSEL is much worse than the previous. Easily 50% of false positives.

Seems the newly rewritten kernel review tools wasn't what waa expected as an upgrade.

https://lists.linaro.org/archives/list/linux-stable-mirror@lists.linaro.org/thread/EJWMRUH2JTI34CPWVZZG62XJ7HMIH5WT/

[-] BigTrout75@lemmy.world 24 points 12 hours ago

That's ZDNET!

[-] WizardGed@lemmy.ca 3 points 7 hours ago

Checked who the author was, should have guessed... SJVN. He certainly has a flair for taking something relatively small, that a solution already exists for and suggesting something bureaucratic, unnecessary, and completely outside his technical competence. This is one of those things that the kernel devs can, and will solve when it's a real problem. Random journalists and armchair experts can wait till they're called upon.

[-] dataprolet@discuss.tchncs.de 0 points 10 hours ago

Have you read the article? It's also about the tools and general discussion about LLMs in kernel development.

[-] communism@lemmy.ml 5 points 9 hours ago* (last edited 9 hours ago)

I wonder if a "deposit" system for huge projects that get a lot of patch submissions might be worthwhile to deter vibe coders from submitting slop patches. You pay a trivial amount of money (adjusted for region/local currency strength) to submit a first patch and get it back if it's accepted. People who have already had patches accepted in the past are exempted.

[-] Zier@fedia.io 1 points 7 hours ago

AI is creeping on Kernel Sanders???