Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> is a job in itself right now.

Yes, exactly. I consider this the core of my job now: herding agents.

I reminds me of the time that I "herded" juniors, interns and new hires very much.

And my experience is that OpenCode et.al. don't do a "Good Enough" job. It's better, than e.g. Devstral2, but without guidance, still not sufficient. I think that mostly has to do with a combination of my experience and standards and of my languages and niches.

All of them are good enough for throwing out a react spagetti, one you'd expect from fiverr or from an intern: don't look under the hood, just drive it (launch it and leave it). Claude is far better in such a "benchmark" than e.g. Devstral2.

But when I need a hexagonal-architectured, TDD and BDD covered microservice in python with zero type warnings, all models fail spectacularly out of the box. I presume their training body isn't "used" to such patterns: it's statistically unlikely to ignore type warnings in Python (wink). Just like it's statistically unlikely to write a few files of typescript for a feature, instead of pulling in an node package. Turns out esp. with claude code, it's statistically likely to comment out failing tests if the rule is "ensure all test pass" and this one hard to fix¹.

So to get this level of what we require, I need tons of rules, guidelines, skills and whatnot. On every model. So I'll just as well - indeed - pipe my money into an EU company that's cheaper and has the option of self-hosting when s* starts hitting fans.

--- ¹ I think I finally found the "context" to fix this, though. What I used to tell my interns/juniors is to take a step back and re-think the shape of things: a difficult or complex test usually means the code it is testing needs re-architecturing. Something most agents will refuse: and good, because it's side-tracking them. My solution is to tell agents to stop, document the problem, and if obvious, document the solution as well in a dedicated "technical debt" markdown file. Then in future I'll direct another agent at this file and tell it to start fixing them one at a time.



I agree with all you’ve said.

Gemini loves deleting tests as well, and all of them will relentlessly stub things to make unit tests ‘easy’.

What experience brought me is knowing where to steer them, e.g. scraping all their shitty glue code and hand-holding Sonnet into implementing classes, DI, and unit tests that aren’t brittle at all. In that way, the agents have been nice to work with: they remind us of why cleaner code and good practices make for maintainable code. I hate their React spaghetti, but most places I’ve worked had tons of React spaghetti anyway…

All of this said: I actually miss steering juniors instead. Humans are frustrating to work with, but they are also adaptable, grow with time, and are… you know, human.

Mentoring Claude isn’t exactly fun or rewarding, in the way mentoring a colleague would be. And thankfully we have memory MCP servers, otherwise it would be like mentoring a brand new intern every time you fire up Claude.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: