Get ahead
VMware offers training and certification to turbo-charge your progress.
Learn moreIn our previous blog post about Anthropic prompt caching, we explored how prompt caching dramatically reduces API costs and latency by reusing previously processed prompt content. We introduced Spring AI's five strategic caching patterns for Anthropic Claude models and showed how they automatically handle cache breakpoint placement while respecting the 4-breakpoint limit.
AWS Bedrock brings prompt caching to a broader ecosystem—supporting both Claude models (accessed via Bedrock) and Amazon's own Nova family. If you're considering Bedrock or already using it, you'll find the same Spring AI caching strategies apply with a few key differences.
In this blog post, we explain what's different about prompt caching in AWS Bedrock compared to Anthropic's direct API and how Spring AI maintains consistent patterns across both providers.
AWS Bedrock extends prompt caching beyond Claude to include Amazon Nova models:
Claude models (via Bedrock):
Amazon Nova models:
For complete model details, see AWS Bedrock supported models.
While the core caching concepts remain the same (as covered in our previous blog), AWS Bedrock has several differences worth understanding.
AWS Bedrock uses a fixed 5-minute TTL with no configuration options, while Anthropic's direct API offers optional 1-hour caching.
// Anthropic direct API: optional TTL configuration
AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
.messageTypeTtl(MessageType.SYSTEM, AnthropicCacheTtl.ONE_HOUR)
.build()
// AWS Bedrock: always 5 minutes
BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_ONLY)
.build()
For high-frequency workloads (requests every few seconds or minutes), the 5-minute TTL keeps the cache warm. For applications with requests spaced 5-30 minutes apart, cache entries may expire between requests.
Amazon Nova models do not support tool caching. Attempting to use TOOLS_ONLY or SYSTEM_AND_TOOLS strategies with Nova throws an exception.
// Works with Claude models
BedrockChatOptions.builder()
.model("anthropic.claude-sonnet-4-5-20250929-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_AND_TOOLS)
.build())
.toolCallbacks(tools)
.build()
// Throws exception with Nova models
BedrockChatOptions.builder()
.model("us.amazon.nova-pro-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.TOOLS_ONLY)
.build())
.toolCallbacks(tools)
.build()
// Use SYSTEM_ONLY for Nova
BedrockChatOptions.builder()
.model("us.amazon.nova-pro-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_ONLY)
.build())
.build()
| Model | Minimum Tokens per Checkpoint |
|---|---|
| Claude 3.7 Sonnet, 3.5 Sonnet v2, Opus 4, Sonnet 4, Sonnet 4.5 | 1,024 |
| Claude 3.5 Haiku | 2,048 |
| Claude Haiku 4.5 | 4,096 |
| Amazon Nova (all variants) | 1,000 |
See Bedrock token limits documentation for details.
| Metric | Anthropic Direct | AWS Bedrock |
|---|---|---|
| Creating a cache entry | cacheCreationInputTokens |
cacheWriteInputTokens |
| Reading from cache | cacheReadInputTokens |
cacheReadInputTokens |
Spring AI uses identical caching strategies across both providers:
BedrockCacheStrategy.SYSTEM_ONLY ←→ AnthropicCacheStrategy.SYSTEM_ONLY
BedrockCacheStrategy.TOOLS_ONLY ←→ AnthropicCacheStrategy.TOOLS_ONLY
BedrockCacheStrategy.SYSTEM_AND_TOOLS ←→ AnthropicCacheStrategy.SYSTEM_AND_TOOLS
BedrockCacheStrategy.CONVERSATION_HISTORY ←→ AnthropicCacheStrategy.CONVERSATION_HISTORY
Let's take a look at how similar the code is:
// Anthropic direct API
ChatResponse response = anthropicChatModel.call(
new Prompt(
List.of(new SystemMessage(systemPrompt), new UserMessage(userQuery)),
AnthropicChatOptions.builder()
.model(AnthropicApi.ChatModel.CLAUDE_4_5_SONNET)
.cacheOptions(AnthropicCacheOptions.builder()
.strategy(AnthropicCacheStrategy.SYSTEM_ONLY)
.build())
.maxTokens(500)
.build()
)
);
// AWS Bedrock (nearly identical)
ChatResponse response = bedrockChatModel.call(
new Prompt(
List.of(new SystemMessage(systemPrompt), new UserMessage(userQuery)),
BedrockChatOptions.builder()
.model("anthropic.claude-sonnet-4-5-20250929-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_ONLY)
.build())
.maxTokens(500)
.build()
)
);
The only differences: the chat model instance, options class, and model identifier format.
| Feature | AWS Bedrock | Anthropic Direct |
|---|---|---|
| Cache TTL | 5 minutes (fixed) | 5 minutes (default), 1 hour (optional) |
| Models | Claude + Nova | Claude only |
| Tool Caching | Claude only | All Claude models |
| Token Metrics | cacheWriteInputTokens, cacheReadInputTokens |
cacheCreationInputTokens, cacheReadInputTokens |
| Pricing | Varies by region/model | Published per-model |
| Cost Pattern | ~25% write premium, ~90% read savings | 25% write premium, 90% read savings |
Here's a practical example showing cache effectiveness:
@Service
public class ContractAnalyzer {
private final BedrockProxyChatModel chatModel;
private final DocumentExtractor documentExtractor;
public AnalysisReport analyzeContract(String contractId) {
String contractText = documentExtractor.extract(contractId);
String systemPrompt = """
You are an expert legal analyst specializing in commercial contracts.
Analyze the following contract and provide precise insights about
terms, obligations, risks, and opportunities:
CONTRACT:
%s
""".formatted(contractText);
String[] questions = {
"What are the key legal clauses and penalties?",
"Summarize the payment terms and financial obligations.",
"What intellectual property rights are defined?",
"Identify potential compliance risks.",
"What are the performance milestones?"
};
AnalysisReport report = new AnalysisReport();
BedrockChatOptions options = BedrockChatOptions.builder()
.model("anthropic.claude-sonnet-4-5-20250929-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_ONLY)
.build())
.maxTokens(1000)
.build();
for (int i = 0; i < questions.length; i++) {
ChatResponse response = chatModel.call(
new Prompt(
List.of(new SystemMessage(systemPrompt), new UserMessage(questions[i])),
options
)
);
report.addSection(questions[i], response.getResult().getOutput().getText());
logCacheMetrics(response, i);
}
return report;
}
private void logCacheMetrics(ChatResponse response, int questionNum) {
Integer cacheWrite = (Integer) response.getMetadata()
.getMetadata().get("cacheWriteInputTokens");
Integer cacheRead = (Integer) response.getMetadata()
.getMetadata().get("cacheReadInputTokens");
if (questionNum == 0 && cacheWrite != null) {
logger.info("Cache created: {} tokens", cacheWrite);
} else if (cacheRead != null && cacheRead > 0) {
logger.info("Cache hit: {} tokens", cacheRead);
}
}
}
AWS Bedrock provides cache metrics through the response metadata:
First request: cacheWriteInputTokens > 0, cacheReadInputTokens = 0
Subsequent requests (within TTL): cacheWriteInputTokens = 0, cacheReadInputTokens > 0
With a 3,500-token system prompt, this yields approximately 65% cost reduction on cached content (first question pays ~1.25x, subsequent questions pay ~0.10x).
@Service
public class MultiModelService {
// Nova: System prompt caching
public String analyzeWithNova(String document, String query) {
return chatClient.prompt()
.system("You are an expert analyst. Context: " + document)
.user(query)
.options(BedrockChatOptions.builder()
.model("us.amazon.nova-pro-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_ONLY)
.build())
.maxTokens(500)
.build())
.call()
.content();
}
// Claude: System + tool caching
public String analyzeWithTools(String document, String query,
List<ToolCallback> tools) {
return chatClient.prompt()
.system("You are an expert analyst. Context: " + document)
.user(query)
.options(BedrockChatOptions.builder()
.model("anthropic.claude-sonnet-4-5-20250929-v1:0")
.cacheOptions(BedrockCacheOptions.builder()
.strategy(BedrockCacheStrategy.SYSTEM_AND_TOOLS)
.build())
.toolCallbacks(tools)
.maxTokens(500)
.build())
.call()
.content();
}
}
Add the Spring AI Bedrock Converse starter:
Note: AWS Bedrock Prompt caching support is available in Spring AI 1.1.0 and later. Try it with the latest 1.1.0-SNAPSHOT version.
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-bedrock-converse</artifactId>
</dependency>
Configure AWS credentials:
spring.ai.bedrock.aws.region=us-east-1
spring.ai.bedrock.aws.access-key=${AWS_ACCESS_KEY_ID}
spring.ai.bedrock.aws.secret-key=${AWS_SECRET_ACCESS_KEY}
You can then start using prompt caching via AWS Bedrock as shown in the examples above.
| Strategy | Use When | Claude | Nova |
|---|---|---|---|
SYSTEM_ONLY |
Large stable system prompt | Yes | Yes |
TOOLS_ONLY |
Large stable tools, dynamic system | Yes | No |
SYSTEM_AND_TOOLS |
Both large and stable | Yes | No |
CONVERSATION_HISTORY |
Multi-turn conversations | Yes | Yes |
NONE |
Disable caching explicitly | Yes | Yes |
For detailed strategy explanations, cache hierarchy, and cascade invalidation patterns, see our Anthropic prompt caching blog post. These concepts still hold true in the case of AWS Bedrock.
AWS Bedrock extends prompt caching to Amazon Nova models while maintaining full support for Claude models. The key differences from Anthropic's direct API are a fixed 5-minute TTL, Nova's lack of tool caching support, and region-specific pricing.
Spring AI provides the same strategic caching patterns across both providers. Whether you choose Claude through Anthropic, Claude through Bedrock, or Amazon Nova models, the five caching strategies work consistently with minimal code changes.
The decision between providers depends on model availability (Nova is only on Bedrock), cache TTL requirements, and tool caching needs (Nova doesn't support it).
For more on prompt caching support in Spring AI for AWS Bercok see Spring AI Bedrock documentation and AWS Bedrock Prompt Caching documentation.