I am using the Gemini 3.1 Flash Lite model via OpenRouter in an n8n AI Agent. The agent has a long System Prompt (approx. 5,000 tokens).
According to Gemini's technical specifications, any prefix over 1,024 tokens that remains identical between calls should trigger Implicit Caching, resulting in a 90% discount on "Cache Read" tokens. However, in n8n, every iteration of the agent (each tool call) is being billed as a full new input, ignoring the existing cache prefix.
In the attached logs, you can see multiple calls made within the same execution. Despite the System Prompt being static, the "Cache Read" is not being triggered, and I am being charged the full price for the ~10k tokens in every single step of the agent's reasoning loop.
For Gemini's Implicit Caching to work, the prompt prefix must be character-for-character identical to the previous one.
In a multi-turn tool-calling scenario, the expected behavior is:
Currently, n8n seems to be sending the payload in a way that the LLM provider does not recognize the prefix as identical, even when the System Prompt has not changed.
The AI Agent should maintain a deterministic prefix structure so that Gemini-based models can leverage Implicit Caching and reduce costs by 90% on subsequent tool calls and chat turns.
Gemini Cache Pricing:
Docker
2.10.3
24
PostgreSQL
main (default)
self hosted
AI Agent / OpenRouter Chat Model