I guess the policy is that the code is reviewed. What does it matter if it was AI generated or not? If someone submits bullshit AI generated code he will be ignored in the future.
I would be deeply uncomfortable to work in an environment where one couldn't ask the author of a change for insights or rationale, because the author let some machine write it and therefore lacks any deeper understanding.
For me it's grounds to deny a merge request. Can't explain your code? Then it's evidently not clear enough. Come back when it is.
Should apply to all code too. Doesn't matter if you, stackoverflow or AI wrote it.
you think linux kernel problems are solved on stack overflow? app coding vs lib coding is a huge gap in what the code looks like, I don't even want to think of kernel code.
Volume and Moderation.
Generating slop is significantly quicker.
You get an increase in volume of people pushing slop, which then has to be reviewed. In addition to the increase in submissions you also get the increase in fidelity/general complexity of the submissions.
Reviewing a PR generated by LLM's used by amateurs is more involved than an equivalent PR written directly by said amateur.
Straight up coding mistakes aren't most of the issue, it's the complex architectural and logical bugs that are going to be the problems.
Stuff that's functional but logically/architecturally unsound is much harder to spot and it's significantly easier to generate these kinds of issues with an LLM than to write them out by hand.
If someone submits bullshit AI generated code he will be ignored in the future.
Like this for example, a seemingly reasonable functional argument that is relatively logically unsound, in that is focuses on a narrow "happy path" and ignores where the actual issues are.
1 . To get to the stage where you can block this person you need to review the code first and identify if there is an issue.
Doing this for LLM generated code takes longer, on average.
- It's also now possible for people less skilled to generate a higher volume of code that looks more reasonable, so that increases the total amount of reviews needed.
So the existing process of reviewing people and code is now a multiple more difficult and resource consuming.
Which is generally what people want addressed.
Can LLM's help?, possibly.
Are there issues that are going to become a large resource problem if we don't actually address them, yes.
Ok, so you're suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it's components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.
I think the slop patches are small fixes suggested by some AI code analysis tools, that architectural and complex changes are part of well defined roadmap and don't come out of nowhere and that code that doesn't follow conventions is easily spotted and rejected. The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches. The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me.
TL;DR;
You asked why it mattered if it's LLM generated or not, i provided examples where it does matter, nothing you've said in your reply seems to refute that so I'll just assume we've agreed on this point.
The rest of this reply is just me replying to your additional arguments.
Ok, so you’re suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it’s components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.
I mean, i didn't say any of that but feel free to doubt a position you just made up.
I think the slop patches are small fixes suggested by some AI code analysis tools.
There's no reason to believe that LLM usage is limited to small patches.
that architectural and complex changes are part of well defined roadmap and don’t come out of nowhere and that code that doesn’t follow conventions is easily spotted and rejected.
In a well maintained project, sure, ish, but let's just say you're right about the plan/roadmap phase.
The spotting and rejection you mentioned are now significantly more time and resource consuming for the reasons i stated in the previous reply.
Also when i used the word architecturally i was referring to the logical domain of the patch and the things it interacts with, i wasn't implying that LLM's would get a chance at re-architecting an entire project as large as the Linux kernel.
At least i'd hope not.
The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches.
I'm not sure of the usefulness of this kind of marking in practice, but i can tell you a way in which it might be useful.
The way you need to go about evaluating LLM generated code vs human code can be different.
And before you get on your high horse I'm not saying we shouldn't be doing a good job reviewing in general, of course we should.
Review and testing resources are limited in most practical settings, we should be focusing on best utilising that resource in the most efficient manner possible.
There are tools specifically geared towards evaluating LLM generated code for specific mistakes, this marking would enable a more efficient usage/allocation of review resources over and above the baseline code-quality tests.
The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me
Which is clear from your answers, if you don't understand how pull request review works in practice you're going to struggle to make a coherent argument that requires that understanding.
To answer the statement directly, there's sometimes no efficient way to tell which patches are from amateurs, even without LLM's.
The issue isn't even just relegated to amateurs, i would like to assume a competent dev of any skill level wouldn't be submitting patches they don't understand but that's just not always the case.
and again, think architecture with a 'little a' rather than a 'big A'.
Logical flow and domain understanding in a relatively limited scope, rather than system-wide structural change.
The difference between tactics and strategy.
Are you Linux kernel contributor?
No.
You ?
edit: If any of my answers made it seem like i was, let me know and i'll adjust them, that was not my intention.
No. Let's wait for someone who knows what they are talking about.
You mean like a software developer who has to deal with PR's from sources that may or may not include LLM generated code ?
If that's the case, i might know someone.......
Wait... unless your original assertion was very specifically about only linux kernel development and not about the principles that apply to software PR review and LLM's as a whole ?
In that case, i don't have anyone to hand and you should probably mark it "Active Linux Kernel Contributors Only".
It's clearer that way.
The issue is that it's easy for AI generated code to be subtly wrong in ways that are not immediately obvious to a human. The Linux kernel is written in C, a language that lets you do nearly anything, and is also inherently a privileged piece of software, making Linux bugs more serious to begin with.
The other problem is, of course, you can block someone submitting AI slop but there's a lot of people in the world. If there's a barrage of AI slop patches from lots of different people it's going to be a real problem for the maintainers.
The issue is that it’s easy for AI generated code to be subtly wrong in ways that are not immediately obvious to a human.
Same with human generated code. AI bug are not magically more creative than human bugs. If the code is not readable/doesn't follow conventions you reject it regardless of what generated it.
The other problem is, of course, you can block someone submitting AI slop but there’s a lot of people in the world. If there’s a barrage of AI slop patches from lots of different people it’s going to be a real problem for the maintainers.
You don't need official policy to reject a barrage of AI slop patches. If you receive to many patches to process you change the submission process. It doesn't matter if the patches are AI slop or not.
Spamming maintainers is obviously bad but saying that anything AI generated in the kernel is a problem in itself is bullshit.
saying that anything AI generated in the kernel is a problem in itself is bullshit.
I never said that.
Same with human generated code. AI bug are not magically more creative than human bugs. If the code is not readable/doesn’t follow conventions you reject it regardless of what generated it.
You may think that, but preliminary controlled studies do show that more security vulns appear in code written by a programmer who used an AI assistant: https://dl.acm.org/doi/10.1145/3576915.3623157
More research is needed of course, but I imagine that because humans are capable of more sophisticated reasoning than LLMs, the process of a human writing the code and deriving an implementation from a human mind is what leads to producing, on average, more robust code.
I'm not categorically opposed to use of LLMs in the kernel but it is obviously an area where caution needs to be exercised, given that it's for a kernel that millions of people use.
Slippery slope bullshit. Completely ignoring that humans do all this dumb shit.
It's about the people. If the AI generated code is subtly wrong, then it's on the community to test it and spot it. That's why it's important to have protocols and testing. The funny thing is you can also use AI to highlight bad code.
They found 1 (one!) commit in git, and report that's it's all over the kernel. Nice journalism.
"it's one horse and they report that it's all over troy. nice journalism" - people living in troy
I mean, read into what they wrote about:
I'm pleased to announce the release of AUTOSEL, a complete rewrite of the stable kernel patch selection tool that Julia Lawall and I presented back in 2018[1]. Unlike the previous version that relied on word statistics and older neural network techniques, AUTOSEL leverages modern large language models and embedding technology to provide significantly more accurate recommendations.
...
Would be great to hear more. My very subjective feeling is that the last batch of AUTOSEL is much worse than the previous. Easily 50% of false positives.
Seems the newly rewritten kernel review tools wasn't what waa expected as an upgrade.
That's ZDNET!
Checked who the author was, should have guessed... SJVN. He certainly has a flair for taking something relatively small, that a solution already exists for and suggesting something bureaucratic, unnecessary, and completely outside his technical competence. This is one of those things that the kernel devs can, and will solve when it's a real problem. Random journalists and armchair experts can wait till they're called upon.
Have you read the article? It's also about the tools and general discussion about LLMs in kernel development.
I wonder if a "deposit" system for huge projects that get a lot of patch submissions might be worthwhile to deter vibe coders from submitting slop patches. You pay a trivial amount of money (adjusted for region/local currency strength) to submit a first patch and get it back if it's accepted. People who have already had patches accepted in the past are exempted.
AI is creeping on Kernel Sanders???
Linux
A community for everything relating to the GNU/Linux operating system (except the memes!)
Also, check out:
Original icon base courtesy of lewing@isc.tamu.edu and The GIMP