# Observability

Every request in Karini AI includes a trace that outlines the steps orchestrated by the prompt, agent, recipe, or copilot. This trace allows you to follow the step-by-step process leading to the response at that point in the conversation.

<table><thead><tr><th width="170">Tracing Step</th><th width="212">Prompt</th><th>Attributes</th></tr></thead><tbody><tr><td>Detect Greeting Questions</td><td><ol><li><strong>Input</strong> : Greeting detection prompt with input question  </li><li><strong>Output</strong>: Classification output</li></ol></td><td><ol><li><strong>ServiceName-</strong>  Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li><li><strong>gen_ai.prompt.0.role -</strong></li><li><strong>gen_ai.completion.0.finish_reason</strong> -</li><li><strong>gen_ai.completion.0.role -</strong></li><li><strong>gen_ai.openai.api_base -</strong></li><li><strong>gen_ai.openai.system_fingerprint -</strong></li><li><strong>gen_ai.request.max_tokens -</strong>The maximum number of response tokens requested</li><li><strong>gen_ai.request.model -</strong> The model requested (e.g. <code>gpt-4</code>, <code>claude</code>, etc.)</li><li><strong>gen_ai.request.temperature</strong></li><li><strong>gen_ai.system -</strong> The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)</li><li><strong>gen_ai.usage.completion_tokens -</strong> The number of tokens used for the completion response</li><li><strong>gen_ai.usage.prompt_tokens -</strong> The number of tokens used for the prompt in the request</li><li><strong>llm.headers -</strong> </li><li><strong>llm.is_streaming</strong> -</li><li><strong>llm.request.type -</strong> The type of request (e.g. <code>completion</code>, <code>chat</code>, etc.)</li><li><strong>llm.usage.total_tokens -</strong> The total number of tokens used</li></ol></td></tr><tr><td>Check Content Safety</td><td><ol><li><strong>Input  :</strong> User query</li><li><strong>Output :</strong> Content Safety Check Output</li></ol></td><td><ol><li><strong>ServiceName -</strong>  Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li></ol></td></tr><tr><td>Query Embeddings</td><td><p></p><ol><li><strong>Input  :</strong> User query</li><li><strong>Output :</strong> Vector embeddings of the user query</li></ol></td><td><ol><li><strong>ServiceName -</strong> Information about the application in the resource</li><li><strong>SpanName -</strong>  Internal function name</li><li><strong>gen_ai.openai.api_base</strong> -</li><li><strong>gen_ai.request.model</strong> - The model requested (e.g. <code>gpt-4</code>, <code>claude</code>, etc.)</li><li><strong>gen_ai.response.model -</strong> The model actually used (e.g. <code>gpt-4-0613</code>, etc.)</li><li><strong>gen_ai.system -</strong> The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)</li><li><strong>gen_ai.usage.prompt_tokens -</strong>The number of tokens used for the prompt in the request</li><li><strong>llm.headers -</strong> The headers used for the request</li><li><strong>llm.is_streaming -</strong></li><li><strong>llm.request.type -</strong> The type of request (e.g. <code>completion</code>, <code>chat</code>, etc.)</li><li><strong>llm.usage.total_tokens -</strong> The total number of tokens used</li></ol></td></tr><tr><td>Get similar embeddings </td><td><ol><li><strong>Input :</strong> User query</li><li><strong>Output :</strong> Similar embeddings from the vector store</li></ol></td><td><p></p><ol><li><strong>ServiceName -</strong>  Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li></ol><p></p><p><br></p></td></tr><tr><td>Perform reranking  </td><td><p></p><ol><li><strong>Input :</strong> User query</li><li><strong>Output :</strong> Reranked similar embeddings using Cohere reranker</li></ol></td><td><ol><li><strong>ServiceName -</strong>  Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li></ol></td></tr><tr><td>Get Qna chain streaming  </td><td><p></p><ol><li><strong>Input :</strong> Prompt, user query and the reranked context</li><li><strong>Output :</strong> Response from the LLM</li></ol></td><td><p></p><ol><li><strong>ServiceName -</strong> Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li><li><strong>gen_ai.completion.0.finish_reason -</strong></li><li><strong>gen_ai.completion.0.role -</strong></li><li><strong>gen_ai.openai.api_base -</strong></li><li><strong>gen_ai.openai.api_version -</strong></li><li><strong>gen_ai.prompt.0.role</strong></li><li><strong>gen_ai.request.max_tokens -</strong>The maximum number of response tokens requested</li><li><strong>gen_ai.request.model -</strong> The model requested (e.g. <code>gpt-4</code>, <code>claude</code>, etc.)</li><li><strong>gen_ai.request.temperature -</strong></li><li><strong>gen_ai.response.model -</strong> The model actually used (e.g. <code>gpt-4-0613</code>, etc.)</li><li><strong>gen_ai.system -</strong> The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)</li><li><strong>gen_ai.usage.completion_tokens -</strong> The number of tokens used for the completion response</li><li><strong>gen_ai.usage.prompt_tokens -</strong> The number of tokens used for the prompt in the request</li><li><strong>llm.headers -</strong></li><li><strong>llm.is_streaming -</strong></li><li><strong>llm.request.type -</strong> The type of request (e.g. <code>completion</code>, <code>chat</code>, etc.)</li><li><strong>llm.usage.total_tokens -</strong> The total number of tokens used</li></ol></td></tr><tr><td>Get Followup Questions</td><td><ol><li><strong>Input :</strong> Follow up question generation prompt, user query and LLM generated answer to the user query</li><li><strong>Output :</strong> Followup questions</li></ol></td><td><ol><li><strong>ServiceName -</strong>  Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li><li><strong>gen_ai.completion.0.finish_reason -</strong></li><li><strong>gen_ai.completion.0.role -</strong></li><li><strong>gen_ai.openai.api_base -</strong></li><li>gen_ai.openai.system_fingerprint</li><li><strong>gen_ai.openai.api_version -</strong></li><li><strong>gen_ai.prompt.0.role -</strong></li><li><strong>gen_ai.request.max_tokens -</strong>The maximum number of response tokens requested</li><li><strong>gen_ai.request.model -</strong>The model actually used (e.g. <code>gpt-4-0613</code>, etc.)</li><li><strong>gen_ai.request.temperature -</strong></li><li><strong>gen_ai.response.model</strong> -</li><li><strong>gen_ai.system -</strong>The vendor of the LLM (e.g. OpenAI, Anthropic, etc.)</li><li><strong>gen_ai.usage.completion_tokens -</strong>The number of tokens used for the completion response</li><li><strong>gen_ai.usage.prompt_tokens -</strong>The number of tokens used for the prompt in the request</li><li><strong>llm.headers -</strong></li><li><strong>llm.is_streaming</strong> -</li><li><strong>llm.request.type</strong> - The type of request (e.g. <code>completion</code>, <code>chat</code>, etc.)</li><li><strong>llm.usage.total_tokens</strong><br></li></ol></td></tr><tr><td>Agent Executor</td><td><ol><li><strong>Input</strong> : Prompt, user question,  agent thoughts and actions</li><li><strong>Output:</strong> Response to the agent action</li></ol></td><td><p></p><ol><li><strong>ServiceName -</strong> Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li><li><strong>gen_ai.prompt.0.role</strong></li><li><strong>gen_ai.request.max_tokens -</strong> The maximum number of response tokens requested</li><li><strong>gen_ai.request.model -</strong>The model requested (e.g. <code>gpt-4</code>, <code>claude</code>, etc.)</li><li><strong>gen_ai.request.temperature</strong></li><li><strong>gen_ai.system-</strong>he vendor of the LLM (e.g. OpenAI, Anthropic, etc.)</li><li><strong>gen_ai.usage.completion_tokens -</strong>The number of tokens used for the completion response</li><li><strong>gen_ai.usage.prompt_tokens -</strong>The number of tokens used for the prompt in the request</li><li><strong>llm.request.type -</strong> The type of request (e.g. <code>completion</code>, <code>chat</code>, etc.)</li><li><strong>llm.usage.total_tokens -</strong> The total number of tokens used</li></ol></td></tr><tr><td><p>Get LLM Chain Streaming</p><p><br></p></td><td><p></p><ol><li><strong>Input</strong> : Prompt with user query</li><li><strong>Output:</strong> Response from LLM</li></ol></td><td><p></p><p></p><ol><li><strong>ServiceName -</strong> Information about the application in the resource</li><li><strong>SpanName -</strong> Internal function name</li><li><strong>gen_ai.prompt.0.role</strong></li><li><strong>gen_ai.request.max_tokens -</strong> The maximum number of response tokens requested</li><li><strong>gen_ai.request.model -</strong>The model requested (e.g. <code>gpt-4</code>, <code>claude</code>, etc.)</li><li><strong>gen_ai.request.temperature</strong></li><li><strong>gen_ai.system-</strong>he vendor of the LLM (e.g. OpenAI, Anthropic, etc.)</li><li><strong>gen_ai.usage.completion_tokens -</strong>The number of tokens used for the completion response</li><li><strong>gen_ai.usage.prompt_tokens -</strong>The number of tokens used for the prompt in the request</li><li><strong>llm.request.type -</strong> The type of request (e.g. <code>completion</code>, <code>chat</code>, etc.)</li><li><strong>llm.usage.total_tokens -</strong> The total number of tokens used</li></ol></td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://karini-ai.gitbook.io/karini-ai-documentation/observability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
