- AWS›
- Amazon Bedrock›
- Pricing
Amazon Bedrock pricing
Pricing overview
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Amazon Bedrock offers flexible pricing options to support customers at every stage of their generative AI journey. Customers can choose from on-demand pricing for pay-as-you-go usage with no upfront commitments, or batch mode for cost-efficient processing of large volumes of input. For high-volume and predictable workloads, provisioned throughput provides dedicated model capacity with discounted pricing. These options help optimize cost while balancing speed, scale, and model access needs.
Service tiers and model import
Standard Tier
The Standard tier provides consistent performance at regular rates for everyday AI tasks. With the Standard tier you only pay for what you use, with no time-based term commitments. For text-generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed. A token comprises a few characters and refers to the basic unit of text that a model learns to understand the user input and prompt. For image-generation models, you are charged for every image generated.
Batch
With Batch mode, you can provide a set of prompts as a single input file and receive responses as a single output file, allowing you to get simultaneous large-scale predictions. The responses are processed and stored in your Amazon S3 bucket so you can access them at a later time. Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing. Please refer to model list here.
Tools
Guardrails
Amazon Bedrock Guardrails is the only responsible AI capability from a major cloud provider that helps you build and customize safety, privacy, and truthfulness safeguards for your generative AI applications. It evaluates user inputs and model responses based on use case-specific policies, providing an additional layer of safeguards beyond what's natively available. Safeguards from Guardrails can be applied to models hosted on Amazon Bedrock or with any third-party models (such as OpenAI and Google Gemini) via the ApplyGuardrail API. You can also use Guardrails with an agent framework such as Strands Agents, including agents deployed using Amazon Bedrock AgentCore. Guardrails helps filter hallucinations and improve factual accuracy through contextual grounding checks against RAG content and Automated Reasoning checks to deliver provably truthful responses.
-
Amazon
-
Amazon
-
-
Amazon Nova
-
Amazon Nova
-
Amazon Nova
Pricing for Understanding Models
-
Tools and optimization pricing details
-
Guardrails
-
Guardrails
-
Amazon Bedrock Guardrails
Guardrails filter*
Price
Content filters for classic tier (text content)
€0.2587283 per 1,000 text units
Denied topics for classic tier
€0.2587283 per 1,000 text units
Sensitive information filters
€0.1724855 per 1,000 text unitsSensitive information filters (regular expression)
Free
Word filters
Free
Contextual grounding checks
€0.1724855 per 1,000 text units
On-Demand pricing
* Each guardrail policy is optional and can be enabled based on your application requirements. Charges will be incurred based on the policy type used in the guardrail. For example, if a guardrail is configured with content filters and denied topics, charges will be incurred for these two policies, while there will be no charges associated with sensitive information filters.
Note: A text unit can contain up to 1000 characters. If a text input is more than 1000 characters, it is processed as multiple text units, each containing 1000 characters or less. For example, if a text input contains 5600 characters, it will be charged for 6 text units.Contextual grounding check uses a reference source and a query to determine if the model response is grounded based on the source and relevant to the query. The total number of text units charged is calculated by combining all the characters in the source, query, and model response.