Copilot

Guardrail Implementation in Copilot

Guardrails in Copilot are engineered to enforce organizational standards related to safety, content quality, and regulatory compliance. During the recipe authoring process, users can assign guardrails to prompts via the prompt or agent prompt. Once a prompt is configured with guardrails, the relevant configurations are displayed within the recipe editor, enabling users to review, validate, and refine them prior to deployment.

Upon saving and publishing the recipe, the defined guardrails are activated in the live copilot environment. As a result, each user interaction is evaluated against the configured guardrail logic, ensuring that responses adhere to compliance requirements and mitigating risks such as inappropriate content generation or off-topic replies.

Monitoring and Managing Guardrail Activity in Copilot

Monitoring of guardrail activity is critical to maintaining Copilot's alignment with defined safety, compliance, and content quality requirements. By analyzing guardrail performance during real-world user interactions, teams can identify high-frequency triggers, uncover behavioral trends, and detect potential misconfigurations or gaps. This insight enables proactive refinement of both recipe logic and guardrail policies to enhance the reliability, accuracy, and compliance posture of Copilot.

The Copilot Guardrail Metrics dashboard serves as a centralized observability layer, providing stakeholders with actionable data to evaluate guardrail effectiveness and drive continuous optimization across deployed workflows.

To access the guardrail monitoring dashboards:

  • From the Copilot menu, select the appropriate Copilot.

  • Click the Action button located on the right side.

  • Click View History.

This opens the Copilot Guardrail Metrics dashboard, which provides visual insights into how guardrails are functioning across all Copilot interactions. Key guardrail categories tracked include:

  • Content Filters

  • Word Filters

  • PII Detection

  • Prompt Attacks

  • Grounding

  • Relevance

  • Denied Topics

  • Regex Patterns

Each metric is presented in bar chart format, enabling users to quickly identify high-frequency violations and evaluate which filters are most commonly engaged. This data supports informed decision-making around tuning prompts, updating filters, or refining user guidance.

In addition, the Copilot History section contains detailed violation reports for individual user queries. These reports highlight which guardrails were triggered and what action was taken (e.g., response blocked or skipped), along with the option to View Traces a detailed breakdown of the guardrail decision flow for that interaction. This level of transparency is essential for auditing, debugging, and continuous improvement.

For focused analysis, the Filter by panel includes advanced Guardrail Filters that allow users to:

  • Select specific guardrail types (e.g., regex, PII, grounding).

  • Filter by actions taken (e.g., flagged, skipped).

  • Apply custom word filters.

  • Define a score range to isolate low or high-severity violations.

These filtering capabilities help teams efficiently identify patterns, investigate incidents, and drive improvements in both guardrail logic and overall Copilot behavior.

Last updated