This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.
What does it mean that sub-agents use a 5 min cache? Is this just for growing contexts submitted by subagent itself? What about fixed sub-agent context prefixes such as tool definitions? Are those 1hr TTL?
It seems that a more flexible cache control mechanism, useful for variable duration sub-agents, would be more like an arena allocator. Let the client tag their API key activity with different "cache group" (arena) identifiers, then provide an API method to let them free each cache group when they are finished with it. Each sub-agent would then use it's own cache group and clear it when the sub-agent exits, rather than just having a fixed 5min or 1hr TTL. The client could provide a default TTL for each cache group to use in case they forget to free.
Context prefixes like tool definitions that will be the same for multiple invocations of the same sub-agent type could then be created (maybe by main agent) with a different cache group, and a longer default TTL.
As of yesterday subagents were often getting the entire session copied to them. Happened to me when 2 turns with Claude spawned a subagent, caused 2 compactions, and burned 15% of my 5-hour limit (Max 5x).
how long they stay around after the cache miss is irrelevant if I am burning all the prior tokens again. also, how much context they have depends entirely on the task and your workflow. I you have a subagent implement a feature and use the compile + test loop to ensure it is implemented correctly before a supervisor agent reviews what was implemented vs asked then yes, subagents do have a lot of context.
I'd say it's next to impossible to have a subagent doing a compile+test loop where at least 1 call doesn't get made to the API over multiple 5-minute stretches to keep the cache warm. In such a case it may just be the same as doing the compile+test manually and then having the agent troubleshoot any issues before iterating.
but how to make claude-code send that when paying by API-key?
or when using a custom ANTHROPIC_BASE_URL? (requests will contain cache_control, but no ttl!)
2.1.108
Added ENABLE_PROMPT_CACHING_1H env var to opt into 1-hour prompt cache TTL on API key, Bedrock, Vertex, and Foundry (ENABLE_PROMPT_CACHING_1H_BEDROCK is deprecated but still honored),
and FORCE_PROMPT_CACHING_5M to force 5-minute TTL
docs are not updated yet, directly from the changelog^
And this is "a bit better" - but seemingly still nowhere close to what subscribers get where main thread, agent, initial and follow-up messages may all get there own ?intelligent? 5min or 1h decision :/
This is not accurate. The main agent typically uses a 1h cache (except for API customers, which can enable 1h but it is not on by default because it costs more). Sub-agents typically use a 5m cache.