One aspect you have to consider is the differences in human beings doing the evaluation. I had a coworker/report who would hand me obvious garbage tier code with glaring issues even in its output, and it would take multiple iterations to address very specific review comments (once, in frustration, I showed a snippet of their output to my nontechnical mom and even my mom wtf’ed and pointed out the problem unprompted); I’m sure all the AI-generated code I painstakingly spec, review and fix is totally amazing to them and need very little human input. Not saying it must be the case here, that was extreme, but it’s a very likely factor.
This is plausible. Assuming it’s true, we would see the adoption of vibe coding at a faster rate amongst inexperienced developers. I think that’s true.
A counterpoint is Google saying the vast majority of their code is written by AI. The developers at Google are not inexperienced. They build complex critical systems.
But it still feels odd to me, this contradiction. Yes there’s some skill to using AI but that doesn’t feel enough to explain the gap in perception. Your point would really explain it wonderfully well, but it’s contradicted by pronouncements by major companies.
One thing I would add is that code quality is absolutely tanking. PG mentioned YC companies adopted AI generated code at Google levels years ago. Yesterday I was using the software of one such company and it has “Claude code” levels of bugginess. I see it in a bunch of startups. One of the tells is they seem to experience regressions, which is bizarre. I guess that indicates bugs with their AI generated tests.
Fairly certain they do something like Anthropic does, they count the acceptance rate or something else that is fairly "optimistic" (my org has a code acceptance rate of 98,5% per the platform dashboard).
So, to clarify, me accepting the suggestion and then correcting it by hand still counts as N LoC accepted.
This is magical because you are both on the exact right path and not right. My theory is there’s a sort of skill to teasing code from AI (or maybe not and it’s alchemy all over again) and this is all new enough and we don’t have a common vocabulary for it that it’s hard for one person who is having a good experience and one person who is not to meaningfully sort out what they are doing differently.
Alternatively, it could be there’s a large swath of people out there so stupid they are proud of code your mom can somehow review and suggest improvements in despite being nontechnical.
> This is magical because you are both on the exact right path and not right. My theory is there’s a sort of skill to teasing code from AI (or maybe not and it’s alchemy all over again) and this is all new enough and we don’t have a common vocabulary for it that it’s hard for one person who is having a good experience and one person who is not to meaningfully sort out what they are doing differently.
I don't think this is a hypothesis.
Outside of asking for one-shot tasks that have been done a million times before, LLMs do not "default" to good work.
If you ask them over-and-over again to find holes in their solution, to fix them, to evaluate for tech debt, to test all cases, to re-asses after the cases if it's architecturally coherent, to compare to the closest available known good implementations, etc etc, they can eventually get what you want done unbelievably cheaply to an acceptable level of quality.
I mentioned initially - their work is unbelievably cheap, you should be EAGER to reject it. Most people wouldn't even bend down to pick a penny up off the sidewalk. They can literally pump out CLs for a penny. You shouldn't even waste time looking at "I'm done" until they've gone through 10+ rounds of reviews, refactors, bug fixes, thought of more test cases, compared to known implementations, etc.
Why are you going to spend ~$50-$100+ of your time reviewing $0.01 of LLM time?! It makes no sense!
If you just listen to them say "I'm done" and move on to their next task, it won't take too many days before you're swimming in a sea of incoherent garbage.