Configure guardrail

The following steps describe how to create and configure guardrails to enforce effective content control:

Navigate to Guardrails.
Click Add new.
The configuration page is displayed as shown below

Name: Enter a unique name for the guardrail
Description Provide a detailed description of the guardrail's purpose and intended functionality.
Version: Specify the version number of the guardrail configuration. By default, the version is set to Version 1. As you save and update the guardrail, the version number will automatically increment with each update to reflect changes made.
Guardrails Provider: Choose the guardrail provider from the dropdown menu. The available options include:
- Amazon Bedrock
- Karini guardrail
Configure Content Filters
Use this section to configure content filtering controls to:
- Detect and block harmful input prompts that violate usage policies.
- Detect and block unsafe model responses before they are returned.
- Apply granular filtering across risk categories such as hate speech, insults/harassment, sexual content, violence, and misconduct.
- Price: $0.15 per 1,000 text units
- Filter strengths for prompts: Use a higher filter strength to increase the likelihood of filtering harmful content in a given category.
  - Hate
  - Insults
  - Sexual
  - Violence
  - Misconduct
  - Prompt attack
  - Adjusting Filter Strength: Each category can be set to one of the following strengths: None, Low, Medium, or High. The higher the strength, the more likely harmful content in that category will be filtered.
- Filter strengths for responses: Use a higher filter strength to increase the likelihood of filtering harmful content in a given category. These filters evaluate and override model responses, but don't modify the model behavior.
  - Hate
  - Insults
  - Sexual
  - Violence
  - Misconduct
  Reset : Each filter group includes a Reset option to revert all filter levels to their default states.
Add Denied Topics
- Define specific topics to block in both user inputs and model responses.
- Provide fine-grained content control by preventing sensitive or unwanted topics from being processed.
- Configure up to 30 denied topics to filter prompts or outputs associated with those topics.
- Improve content safety and compliance by enforcing topic-level restrictions.
- Price: $0.15 per 1,000 text units.
- Configuration:
  - Name: Enter the topic name that you wish to block. Each topic name can be up to 100 characters in length.
  - Definition: Provide a definition for the selected topic. This helps clarify the nature of the topic to be blocked, ensuring accurate filtering.
  - Examples: You can add up to 5 examples related to the topic to help the model better identify and block relevant content. Each example can be up to 100 characters long.
  - Users can input a word or phrase, then press Enter or Tab to add a new item and click on the chip to remove it.
Add Word Filters
- Block specific words or phrases in user inputs and/or model responses.
- Add an extra layer of content control to exclude unwanted or inappropriate language.
- Optional feature that can be enabled based on organizational requirements.
- Configurable to align with internal policies and compliance standards.
- Price: Free.
- Configuration:
  - Filter Profanity: Enable this feature to block profane words in both user inputs and model responses. The list of profane words is based on a global definition of profanity and is subject to updates or changes over time.
  - Add Custom Words and Phrases:
    Manually: Users can manually add specific words or phrases they wish to block. Simply input the desired words or phrases into the provided input field and press Enter or Tab to add a new item and click on the chip to remove it.
    Upload from a Local File: Alternatively, users can upload a file containing a list of words or phrases to be filtered. After uploading, the interface will display the list of words to be added, allowing users to review or modify entries before applying them.
Add Sensitive Information Filters
- PII
  - Detect and block or mask sensitive information in user inputs and/or model responses.
  - Prevent exposure of Personally Identifiable Information (PII) and other confidential data.
  - Support privacy, security, and data-protection compliance by reducing the risk of inadvertent disclosure.
  - Ensure sensitive content is not processed or returned in generated outputs.
  - Price: $0.10 per 1,000 text units for PII filter.
  - Configuration:
    PII Type:
    Select from a broad set of predefined PII types to filter (for example, Address, Age, AWS Access Key, Credit/Debit Card, and others).
    Use predefined categories to ensure consistent detection and reduce configuration effort.
    Apply multiple PII types simultaneously to achieve comprehensive coverage based on policy requirements.
    Guardrail Behavior: For each selected PII type, users can define the desired guardrail behavior:
    Block: The system will block the occurrence of the specified PII in both user inputs and model responses.
    Mask: The system will mask the sensitive information by replacing it with asterisks (e.g., ********) while retaining the general format of the data.
    Regex patterns
    Define custom sensitive-data patterns using regular expressions (regex).
    Block or mask content that matches the configured patterns in inputs and/or outputs.
    Extend coverage for sensitive data types not included in predefined filters.
    Provide flexible, organization-specific filtering aligned with internal compliance requirements.
    Price: Free.
    Configuration:
    Name: Enter a name for the regex pattern you want to define. This name will help identify the specific pattern in your configuration.
    Regex Pattern: Define the regular expression that matches the sensitive information to be filtered.
    For example, the pattern ^IDd{3}[A-Z] can be used to match certain types of data such as IDs or alphanumeric strings.
    Guardrail Behavior:
    Block: The sensitive information matching the regex pattern will be blocked from being processed, ensuring it does not appear in user inputs or model responses.
    Mask: The sensitive information matching the regex pattern will be masked, replacing the sensitive content with asterisks (e.g., ********) while retaining the format.
    Description: Provide a description for the regex pattern to clarify what it filters and how it should be applied.
Contextual Grounding and Relevance check
- Validate model responses to ensure they are grounded in the configured reference source.
- Verify that responses remain relevant to the user’s query.
- Filter out hallucinated or unsupported content before it is returned.
- Improve the accuracy, reliability, and trustworthiness of generated outputs.
- Price: $0.10 per 1,000 text units.
- Configuration:
  - Grounding:
    Enable Grounding Check:
    When enabled, this feature ensures that the model’s responses are factually accurate and aligned with the information from the reference source.
    Responses that do not meet the defined Grounding score threshold will be blocked, thereby preventing the inclusion of hallucinated or inaccurate content.
    Grounding Score Threshold:
    The grounding score represents the model's confidence that the response is factually correct and aligned with the reference source.
    If the response falls below the threshold, it will be blocked, and the configured blocked message will be returned to the user.
    Adjusting the threshold to a higher value will enforce stricter filtering, blocking more responses.
    Relevance:
    Enable Relevance Check:
    This feature ensures that the model’s responses are relevant to the user's query and overall context.
    Responses that do not meet the Relevance score threshold will be blocked, thus minimizing irrelevant or off-topic content.
    Relevance Score Threshold:
    The relevance score indicates the model's confidence that the response appropriately addresses the user’s query.
    If the score is below the set threshold, the response will be blocked, and the configured blocked message will be returned to the user.
    Higher threshold levels will result in a more stringent filtering of responses.
Define Blocked Messaging
- Configure custom block messages displayed when a guardrail blocks a user prompt or a model response.
- Provide clear, policy-aligned feedback to users when content is rejected.
- Support different messaging to match organizational tone and compliance requirements.
- Improve user experience by explaining blocks without exposing sensitive policy details.
- Configuration:
  - Messaging for Blocked Prompt:
    Enter a custom message that will be displayed when a user’s prompt is blocked by the guardrail.
    This message provides users with an explanation for why their input was rejected or prevented from being processed.
  - Messaging for Blocked Responses:
    Specify a custom message to be shown when the model’s response is blocked by the guardrail.
    This allows the organization to offer context or suggest alternatives when the model’s output does not meet the defined criteria.

After configuring the guardrail, click Save. Upon saving, the version number will appear in the Guardrail table, indicating that the guardrail has been successfully configured.

Guardrails support versioning:

Per-update version creation: Any modification generates a new, discrete version rather than updating the existing one in place.
Automatic version incrementing: Version identifiers are assigned and incremented automatically on each update to ensure consistent sequencing without manual version management.

PreviousGuardrails NextGuardrail Cost Dashboards

Last updated 2 months ago