> For the complete documentation index, see [llms.txt](https://karini-ai.gitbook.io/karini-ai-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://karini-ai.gitbook.io/karini-ai-documentation/model-hub/large-language-models-llms.md).

# Large Language Models (LLMs)

Karini AI supports integrations with the following Large Language Model providers and custom models. Using these models, users can create model endpoints in the Karini AI model hub.&#x20;

1. Amazon Bedrock

   In Amazon Bedrock, the "Model Serving" configuration provides two options for how your models are deployed and managed: **On Demand** and **Provisioned Throughput**.

   1. #### &#x20;**On Demand**

      With the **On Demand** option, Amazon Bedrock automatically adjusts the computational resources to match the volume of requests your model receives. This means the system scales up or down in real-time based on demand, offering flexibility without the need for manual resource management. This option is suitable for workloads with varying traffic, ensuring you only pay for the resources used during active requests.

      The table below lists all available models.
   2. #### **Provisioned Throughput**

      The **Provisioned Throughput** option allows you to specify a fixed amount of computational capacity for your model, ensuring consistent performance and response times. This is ideal for use cases where you require a stable level of throughput, regardless of fluctuating demand. Resources are pre-allocated, which guarantees predictable performance but comes with a fixed cost, regardless of actual usage.&#x20;

      When **Provisioned Throughput** is selected, the following model providers are available:

      * [Amazon](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-nova.html)
      * [AI21 Labs](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-ai21.html)
      * [Anthropic](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html)
      * [Cohere](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere.html)

      For more details on model providers, please refer to the relevant documentation.

   **Model ARN:** Enter the ARN (Amazon Resource Name) of the selected model in this field. The ARN   serves as a unique identifier for the model within the AWS ecosystem, ensuring proper configuration and secure linkage to your account.
2. OpenAI
3. Azure OpenAI
4. Databricks
5. Anyscale
6. Amazon SageMaker

### Add New Model Endpoint

To add a new model endpoint to the model hub, do the following:&#x20;

1. On the **Model Endpoints** menu, select **Large language model endpoints(LLM)** tab and click **Add New**.&#x20;
2. Select a model provider and associated model id  in the list.&#x20;
3. User has option to override default configurations such as temperature, max tokens and pricing.&#x20;
4. By default, the organization level credentials are used to access the model. User can User can optionally **overwrite credentials** with a new set of model credentials.
5. User can test the model endpoint request and response by using the **Test endpoint** button.&#x20;

### Review Model Endpoints

User can review the created model endpoints under **Large language model endpoints(LLM)** ta&#x62;**.** It includes following information:

1. Model provider and model id
2. Max tokens, Min tokens and Temperature: The default values are displayed based on model specifications from the model provider.  User has the ability to override them.&#x20;
3. Model Price: The default price displays public pricing of the model inference per 1000 input and output tokens. This price is used in Karini AI [Dashboards](/karini-ai-documentation/dashboard-overview.md) to calculate cost. User has the ability to override this price if needed - such as in case of special pricing agreement with the model provider.&#x20;
4. Link to view the the recipes and prompts in which the model endpoint is used.&#x20;
5. Link to view the model information including the cost and usage dashboard for the model endpoint.&#x20;

### Available LLM Configurations

The following table describes LLMs that are available for integration with Karini AI model hub.   It also includes links to model provider reference documentation offering detailed information on model specifications, usage instructions, and API endpoints for effective integration and utilization.

<table><thead><tr><th width="136">Provider</th><th width="236">Models</th><th width="196">Config Parameters</th><th>Reference</th></tr></thead><tbody><tr><td>Amazon Bedrock</td><td><ol><li>Anthropic Claude 3.7 Sonnet vl</li><li>Anthropic Claude 3.5 Sonnet v2</li><li>Anthropic Claude 3.5 Haiku 20241022 vl</li><li>DeepSeek RI vl</li><li>Amazon Nova Pro vl</li><li>.Amazon Nova Lite vl</li><li>Amazon Nova Micro vl</li><li>Anthropic Claude 3.5 Sonnet 20240620 vl</li><li>Anthropic Claude 3 Opus 20240229 vl</li><li>Anthropic Claude 3 Sonnet 20240229 vl</li><li>Anthropic Claude 3 Haiku 20240307 vl</li><li>Anthropic Claude v2.1</li><li>Anthropic Claude v2</li><li>Anthropic Claude Instant vl</li><li>Llama 3.3 70B Instruct</li><li>Llama 3.2 1B Instruct</li><li>Llama 3.2 3B Instruct</li><li>Llama 3.2 11B Vision Instruct</li><li>Llama 3.2 90B Vision Instruct</li><li>Meta Llama 3.1 8B Instruct</li><li>Meta Llama 3.1 70B Instruct</li><li>Meta Llama 3 8B Instruct</li><li>Meta Llama 3 70B Instruct</li><li>Mistral 7B Instruct</li><li>Mistral Mixtral 8x7B Instruct</li><li>Mistral Large (24.02)</li><li>Mistral Small (24.02)</li><li>Cohere Command R Plus</li><li>Cohere Command R</li><li>Amazon Titan Text Premier</li><li>Amazon Titan Text Express</li><li>Amazon Titan Text Lite</li><li>A121 Jamba 1.5 Mini</li><li>A121 Jamba 1.5 Large</li><li>A121 Jamba Instruct</li></ol></td><td><p></p><ol><li>Temperature</li><li>Max Tokens</li></ol><p></p><p></p><p></p></td><td><a href="https://docs.aws.amazon.com/bedrock/latest/userguide/titan-text-models.html">https://docs.aws.amazon.com/bedrock/latest/userguide/titan-text-models.html</a></td></tr><tr><td>Azure OpenAI</td><td><ol><li>GPT 4O 2024-11-20</li><li>GPT 4O Mini</li><li>O3 Mini</li><li>O1</li><li>GPT 4O 2024-08-06</li><li>GPT 4O</li><li>GPT 3.5 Turbo (Legacy)</li><li>GPT-4 (Legacy)</li></ol></td><td><ol><li>Temperature</li><li>Max Tokens</li><li>Azure OpenAI API Base</li><li>Azure OpenAI Deployment Name</li></ol></td><td><a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal">https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal</a></td></tr><tr><td>OpenAI</td><td><ol><li>GPT 4O 2024-11-20</li><li>GPT 4O Mini</li><li>O3 Mini</li><li>O1</li><li>GPT 4O 2024-08-06</li><li>GPT 4O</li><li>Whisper-I</li><li>TTS-I</li><li>TTS-1-hd</li><li>GPT-4-Turbo</li><li>GPT-3.5-Turbo (Legacy)</li></ol></td><td><ol><li>Temperature</li><li>Max Tokens</li></ol></td><td><a href="https://platform.openai.com/docs/models">https://platform.openai.com/docs/models</a></td></tr><tr><td>Google Gemini</td><td><ol><li>Gemini 2.O Flash</li><li>Gemini 2.O Flash-Lite Preview</li><li>Gemini 1.5 Pro</li><li>Gemini 1.5 Flash</li></ol></td><td><p></p><ol><li>Temperature</li><li>Max Token</li></ol></td><td><a href="https://ai.google.dev/gemini-api/docs/models">https://ai.google.dev/gemini-api/docs/models</a></td></tr><tr><td>Vertex Gemini</td><td><ol><li>Gemini 2.O Flash</li><li>Gemini 2.O Flash-Lite Preview</li><li>Gemini 2.O Flash Thinking</li><li>Gemini 1.5 Pro</li><li>Gemini 1.5 Flash</li></ol></td><td><ol><li>Temperature</li><li>Max Token</li></ol></td><td><a href="https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models">https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models</a></td></tr><tr><td>Fireworks</td><td><ol><li>Llama 4 Maverick Instruct (Basic)</li><li>Llama 4 Scout Instruct (Basic)</li><li>DeepSeek RI</li><li>Deepseek V3-0324</li><li>Llama V3P1 405B Instruct</li><li>Llama V3P3 70B Instruct</li></ol></td><td><p></p><ol><li>Temperature</li><li>Max Token</li></ol></td><td><a href="https://fireworks.ai/models">https://fireworks.ai/models</a></td></tr><tr><td>Anyscale</td><td><ol><li>Google Gemma 7B</li><li>Meta Llama 3 8B</li><li>Meta Lama 3 70B</li><li>Mistral 7B Instruct</li><li>Mixtral 8x7B Instruct</li><li>Mixtral 8x22B Instruct</li></ol></td><td><ol><li>Temperature</li><li>Max Tokens</li></ol></td><td><a href="https://docs.anyscale.com/endpoints/model-serving/get-started">https://docs.anyscale.com/endpoints/model-serving/get-started</a></td></tr><tr><td>Databricks</td><td><ol><li><p>Foundation Models</p><ol><li>Databricks DBRX Instruct </li><li>Meta Lama 3 70B Instruct </li><li>Mistral 8x7B Instruct   </li><li>Llama 2 70B Chat (Legacy)  </li></ol></li><li>Databricks External Models</li><li>Databricks Custom Models</li></ol></td><td><ol><li>Temperature</li><li>Max Tokens</li><li><strong>Endpoint URL:</strong> Databricks model Endpoint URL     </li></ol></td><td><a href="https://docs.databricks.com/en/generative-ai/external-models/index.html">https://docs.databricks.com/en/generative-ai/external-models/index.html</a></td></tr><tr><td>Amazon SageMaker</td><td></td><td><ol><li>Temperature</li><li>Max Tokens</li><li><strong>Model Endpoint Name:</strong> SageMaker model endpoint name</li></ol></td><td><a href="https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html
">https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html<br></a></td></tr><tr><td>Cohere</td><td><ol><li>Rerank-english-v3.0</li><li>Rerank-multilingual-v3.0</li><li>Rerank-english-v2.0</li><li>Rerank-multilingual-v2.0</li></ol></td><td></td><td><a href="https://docs.cohere.com/v2/docs/reranking-with-cohere">https://docs.cohere.com/v2/docs/reranking-with-cohere</a></td></tr></tbody></table>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://karini-ai.gitbook.io/karini-ai-documentation/model-hub/large-language-models-llms.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
