Prompt playground
Accessing and Applying Guardrails in the Prompt Playground:
Guardrails configured at the system level are available within the Prompt Playground, enabling users to perform structured and policy-aligned prompt evaluations. To apply a guardrail within this environment, follow the steps below:
Navigate to the Prompt Playground and open the prompt where a guardrail needs to be applied.
Switch to the Test & Compare tab.
For Model 1, the Select Guardrail dropdown is displayed by default. Clicking this dropdown reveals all available, pre-configured guardrails.
For Model 2 and Model 3, begin by selecting a model in each respective slot. Once a model is selected, the corresponding Guardrail dropdown becomes visible, allowing users to choose from the same standardized set of guardrails.
The full set of available guardrails is displayed in the interface, enabling clear identification and seamless selection based on the evaluation scenario.

Once a guardrail is selected from the dropdown, users can configure its enforcement scope by enabling one or both of the following options:
Request : applies guardrails to the user input before it is sent to the model.
Response : applies guardrails to the model-generated output before it is returned to the user.
The Prompt Playground adapts accordingly supporting input validation, output moderation, or full end-to-end compliance ensuring guardrails align with testing and safety objectives.
Below are the details of how guardrails behave in each configuration: Request, Response, and both.
Request enabled
Guardrails are enforced at the request level to evaluate user inputs before they are sent to the large language model. Based on the nature of the query and configured policy rules, the system either allows compliant inputs to proceed or blocks those that violate defined policies. Violations may include misconduct, unethical behavior, or restricted content, and are blocked if detected with high confidence and severity. In such cases, users receive a predefined response and can access trace logs and violation reports through the UI for transparency.
Output Response: Users see a predefined message generated by the guardrail, indicating the request was blocked due to a policy violation. This response is customizable and reflects the enforcement action configured by the administrator.
Trace Logging: Users can access trace logs that show which filters were triggered, the action taken (e.g., "BLOCKED"), and the processing latency. These logs help visualize how the guardrail evaluated the request in real time.
Violation Report: A detailed report is available showing violation type, confidence level, and filter strength that led to the blocking decision. This report enhances transparency by explaining exactly why and how the enforcement occurred.
Response enabled
Guardrails are applied at the response level to assess the model’s output after processing a user’s question. The response is evaluated against safety and quality standards, including content sensitivity and topic restrictions. If compliant, the response is delivered to the user. If a violation is detected such as content related to threats or violence the system blocks the response and applies the appropriate enforcement action. Users can view the outcome through the guardrail interface.
Output Response: Users see the model’s response only if it passes all policy checks; otherwise, a predefined message indicates it was blocked. This ensures users receive only safe, policy-compliant content.
Trace Logging: Trace logs display response evaluation details, including triggered filters, action taken (e.g., "BLOCKED"), response latency, and policy coverage. These logs help users and developers understand the decision-making behind each enforcement action.
Violation Report: Users can view a detailed report highlighting the nature of the violation e.g., "Threats and Violence" along with action type and severity. The report includes metadata such as violation type ("DENY") and final action ("BLOCKED") for full transparency.
Request and response enabled
When both request and response are enabled, the system performs dual-layer enforcement. First, the user input is evaluated against request-level policies. If valid, the input is sent to the large language model (LLM). The generated response then undergoes a second evaluation at the response level to ensure it meets safety and compliance standards. Throughout this process, detailed trace logs are generated to support observability and policy governance.
Last updated