Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> 1hr -> 5min on March 6th

This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.



https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)


Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).


Then my original question stands: why did this become an issue seemingly overnight if nothing changed?


What does it mean that sub-agents use a 5 min cache? Is this just for growing contexts submitted by subagent itself? What about fixed sub-agent context prefixes such as tool definitions? Are those 1hr TTL?

It seems that a more flexible cache control mechanism, useful for variable duration sub-agents, would be more like an arena allocator. Let the client tag their API key activity with different "cache group" (arena) identifiers, then provide an API method to let them free each cache group when they are finished with it. Each sub-agent would then use it's own cache group and clear it when the sub-agent exits, rather than just having a fixed 5min or 1hr TTL. The client could provide a default TTL for each cache group to use in case they forget to free.

Context prefixes like tool definitions that will be the same for multiple invocations of the same sub-agent type could then be created (maybe by main agent) with a different cache group, and a longer default TTL.


So if I run a test suite or compile my rust program in a sub agent I’m going to get cache misses? Boo.


Sub agents don't have much context and don't stay around for long, so misses in that case are trivial.


As of yesterday subagents were often getting the entire session copied to them. Happened to me when 2 turns with Claude spawned a subagent, caused 2 compactions, and burned 15% of my 5-hour limit (Max 5x).


how long they stay around after the cache miss is irrelevant if I am burning all the prior tokens again. also, how much context they have depends entirely on the task and your workflow. I you have a subagent implement a feature and use the compile + test loop to ensure it is implemented correctly before a supervisor agent reviews what was implemented vs asked then yes, subagents do have a lot of context.


I'd say it's next to impossible to have a subagent doing a compile+test loop where at least 1 call doesn't get made to the API over multiple 5-minute stretches to keep the cache warm. In such a case it may just be the same as doing the compile+test manually and then having the agent troubleshoot any issues before iterating.


... so how do API users enable 1hr caching? I haven't found a setting anywhere.


would like to know this too ;D

there is env.ENABLE_PROMPT_CACHING_1H_BEDROCK - but that is - as the name says "when using Bedrock"

for the raw API the docs are also clear -> "ttl": "1h" https://platform.claude.com/docs/en/build-with-claude/prompt...

but how to make claude-code send that when paying by API-key? or when using a custom ANTHROPIC_BASE_URL? (requests will contain cache_control, but no ttl!)


seems we have been heard ;)

  2.1.108
  Added ENABLE_PROMPT_CACHING_1H env var to opt into 1-hour prompt cache TTL on API key, Bedrock, Vertex, and Foundry (ENABLE_PROMPT_CACHING_1H_BEDROCK is deprecated but still honored),
  and FORCE_PROMPT_CACHING_5M to force 5-minute TTL
docs are not updated yet, directly from the changelog^

And this is "a bit better" - but seemingly still nowhere close to what subscribers get where main thread, agent, initial and follow-up messages may all get there own ?intelligent? 5min or 1h decision :/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: