Context Caching

Cache long context windows for repeated queries over the same document. Dramatically reduces token costs.

How It Works

DeepSeek automatically caches prompt prefixes. When subsequent requests share the same prefix, cached tokens are charged at 1/10 of the normal input price. No additional configuration needed — caching is automatic. The cache applies to the longest matching prefix of your prompt tokens.

Best Practices

• Place static content (system prompts, documents) at the beginning of the messages array. • Keep the variable part (user query) at the end. • Use consistent system prompts across requests to maximize cache hits. • Monitor cache hit rates via the prompt_cache_hit_tokens field in the response usage.