> For the complete documentation index, see [llms.txt](https://karini-ai.gitbook.io/karini-ai-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://karini-ai.gitbook.io/karini-ai-documentation/installation/karini-ai-on-premise-deployment.md).

# Karini AI On Premise Deployment

This guide covers hardware requirements, infrastructure pre-requisites, and step-by-step deployment instructions for the Karini AI platform on Amazon Elastic Kubernetes Service (Amazon EKS).\ <br>

<figure><img src="/files/jsh39KyuzLITnq9AaPVS" alt=""><figcaption></figcaption></figure>

### Hardware Requirements

Karini AI is optimized for the following node configuration on Amazon EKS.

| Resource           | Minimum          | Recommended |
| ------------------ | ---------------- | ----------- |
| vCPUs per node     | 4                | **8**       |
| Memory per node    | 16 GB            | **32 GB**   |
| Node count         | 2                | 3 – 5       |
| OS Disk            | 50 GB            | 100 GB      |
| Persistent Storage | 100 GB EBS / EFS | 200 GB+     |

**Recommended Instance Type:** `m5.2xlarge` (8 vCPU / 32 GB RAM). Smaller instances may cause scheduling failures for memory-intensive services such as the Observability Datastore and Ray workers.

#### Instance Type Reference

| Instance Type     | vCPU | Memory | Use Case                          |
| ----------------- | ---- | ------ | --------------------------------- |
| `m5.xlarge`       | 4    | 16 GB  | Development / Evaluation          |
| `m5.2xlarge`      | 8    | 32 GB  | **Production (recommended)**      |
| `m5.4xlarge`      | 16   | 64 GB  | High-concurrency / Large datasets |
| `m7i-flex.xlarge` | 4    | 16 GB  | Cost-optimized (latest gen)       |

### Pre-requisites

Ensure the following tooling and infrastructure are in place before deploying.

#### Local Tools

| Tool      | Min. Version | Install Guide                                                                         |
| --------- | ------------ | ------------------------------------------------------------------------------------- |
| Terraform | `>= 1.7.5`   | [hashicorp.com](https://developer.hashicorp.com/terraform/downloads)                  |
| AWS CLI   | `>= 2.x`     | [aws.amazon.com](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) |
| kubectl   | `>= 1.28`    | [kubernetes.io](https://kubernetes.io/docs/tasks/tools/)                              |
| Helm      | `>= 3.12`    | [helm.sh](https://helm.sh/docs/intro/install/)                                        |

Configure the AWS CLI with a profile that has sufficient IAM permissions:

```bash
aws configure --profile karini
export AWS_PROFILE=karini
```

#### AWS Infrastructure

The following AWS resources must exist before running Terraform:

* **VPC** : with at least two private subnets and two public subnets, each in different Availability Zones
* **Subnet Requirements** : Each private subnet must be sized at **/24 or larger** (e.g., `10.x.x.0/24`), providing sufficient usable IPs per Availability Zone. This accounts for AWS VPC CNI pod IP pre-allocation per EKS node, EFS mount targets, internal Network Load Balancers, OpenSearch nodes, and Lambda Hyperplane ENIs
* **EKS Cluster :** Kubernetes version `1.31` recommended, with an OIDC provider configured
* **ACM Certificate :** a valid SSL/TLS certificate for your domain (or one will be created by the module)
* **ECR Registry :** containing the Karini AI container images (CoreML, Sentinel, Tesseract, Ray, etc.) provided by Karini AI

#### DNS & Domain

* A registered domain name (e.g., `karini.yourdomain.com`)
* DNS managed via Route 53 or an external DNS provider
* Ability to create/update A or CNAME records pointing to the provisioned AWS load balancer

#### Secrets & Credentials

Prepare the following before deployment. These will be stored in AWS Secrets Manager:

| Secret                              | Description                                            |
| ----------------------------------- | ------------------------------------------------------ |
| MongoDB password                    | Required if using self-managed MongoDB                 |
| OpenSearch credentials              | Master username and password                           |
| Observability Datastore credentials | Username and password                                  |
| Email / SMTP credentials            | For platform notifications                             |
| Third-party API keys                | As required by your use case (e.g., LLM provider keys) |

#### Karini AI Container Images

Contact [Karini AI](https://www.karini.ai/) to obtain:

* ECR image URIs for CoreML, Sentinel, Tesseract, Ray and  Semantic Chunking
* Application version tags for each service
* Helm chart repository credentials

### Deployment Steps

#### Step 1 : Unzip the Terraform Module

Contact Karini AI to recieve the Terraform module

```bash
unzip eks-terraform-module.zip
cd eks-terraform-module
```

#### Step 2 : Configure Your Variables

Copy the example variables file and populate your environment-specific values:

```bash
cp examples/existing-eks/terraform.tfvars.example terraform.tfvars
```

At minimum, set the following required variables in `terraform.tfvars`:

```hcl
# AWS & Networking
aws_region         = "us-east-1"
vpc_id             = "vpc-xxxxxxxxxx"
private_subnet_ids = ["subnet-aaa", "subnet-bbb"]
public_subnet_ids  = ["subnet-ccc", "subnet-ddd"]

# EKS Cluster
eks_cluster_name                 = "my-karini-cluster"
cluster_endpoint                 = "https://<your-eks-endpoint>"
cluster_identity_oidc_issuer_arn = "arn:aws:iam::<account-id>:oidc-provider/..."

# Domain & TLS Certificate
domain_name         = "karini.yourdomain.com"
acm_certificate_arn = "arn:aws:acm:us-east-1:<account-id>:certificate/..."

# Application images (provided by Karini AI)
image_repository = "<account-id>.dkr.ecr.us-east-1.amazonaws.com"
coreml_version   = "<version>"
sentinel_version = "<version>"

# Lambda image URIs (provided by Karini AI)
semantic_chunking_image_repo_uri = "<account-id>.dkr.ecr.us-east-1.amazonaws.com/semantic-chunking"
semantic_chunking_image_tag      = "<version>"
tesseract_image_repo_uri         = "<account-id>.dkr.ecr.us-east-1.amazonaws.com/tesseract"
tesseract_image_tag              = "<version>"
event_bridge_image_repo_uri      = "<account-id>.dkr.ecr.us-east-1.amazonaws.com/event-bridge"
event_bridge_image_tag           = "<version>"
ray_image_uri                    = "<account-id>.dkr.ecr.us-east-1.amazonaws.com/ray"
ray_image_tag                    = "<version>"
ray_version                      = "<version>"

# Secrets
secrets = {
  mongodb = {
    username = "admin"
    password = "your-secure-password"
  }
  opensearch = {
    username = "admin"
    password = "your-secure-password"
  }
  observability_datastore = {
    username = "default"
    password = "your-secure-password"
  }
}
```

#### Step 3 : Initialize Terraform

```bash
terraform init
```

This downloads all required providers and modules. Expected providers:

* `hashicorp/aws` `>= 5.41.0`
* `hashicorp/helm` `>= 2.12.1`
* `gavinbunney/kubectl` `>= 2.0.3`

#### Step 4 : Preview the Deployment Plan

```bash
terraform plan -var-file=terraform.tfvars
```

Review the output carefully to confirm the resources that will be created. A typical deployment creates **80–120 AWS resources**.

#### Step 5 : Apply the Deployment

```bash
terraform apply -var-file=terraform.tfvars
```

Type `yes` when prompted. The full deployment typically takes **15–25 minutes**.

#### Step 6 : Configure kubectl Access

```bash
aws eks update-kubeconfig \
  --region us-east-1 \
  --name my-karini-cluster
```

Verify connectivity:

```bash
kubectl cluster-info
kubectl get nodes
```

#### Step 7 : Verify All Services Are Running

```bash
kubectl get pods --all-namespaces
```

All pods should reach `Running` or `Completed` status within 5–10 minutes of `terraform apply` completing.

| Namespace       | Service                      | Expected Status           |
| --------------- | ---------------------------- | ------------------------- |
| `default`       | `sentinel-*`                 | Running                   |
| `default`       | `coreml-*`                   | Running                   |
| `default`       | `redis-*`                    | Running                   |
| `default`       | `observability-datastore-*`  | Running                   |
| `default`       | `mongodb-*`                  | Running (if self-managed) |
| `default`       | `kuberay-operator-*`         | Running                   |
| `default`       | `ray-cluster-head-*`         | Running                   |
| `ingress-nginx` | `ingress-nginx-controller-*` | Running                   |
| `kube-system`   | `cluster-autoscaler-*`       | Running                   |
| `kube-system`   | `efs-csi-*`                  | Running                   |

#### Step 8 : Run Initial Platform Setup

Once all pods are healthy, run the setup script to create your initial organization and admin user:

```bash
bash setup.sh
```

You will be prompted for:

* **Admin email address**
* **Admin password**

The script connects to the running Sentinel pod via `kubectl exec` and initializes your Karini AI workspace.

#### Step 9 : Access the Platform

Navigate to your configured domain in a browser:

```
https://karini.yourdomain.com
```

&#x20;**Deployment Complete.** Log in with the admin credentials created in Step 8. You should see the Karini AI dashboard.

***

### Component Overview

| Component                   | Purpose                                                    | Storage    | Default Resources       |
| --------------------------- | ---------------------------------------------------------- | ---------- | ----------------------- |
| **Sentinel**                | Core API, orchestration, and WebSocket service             | —          | 2 vCPU / 4 GB           |
| **CoreML**                  | LLM Gateway and Agent Runtime                              | —          | 2 vCPU / 4 GB           |
| **Observability Datastore** | Analytics and vector store (hot on EFS, cold on Amazon S3) | 30 GB+     | 4 vCPU / 16 GB          |
| **Redis**                   | Caching and session management                             | 10 GB EBS  | 0.5 vCPU / 3 GB         |
| **MongoDB**                 | Document store for platform metadata                       | 20 GB EFS  | 1 vCPU / 2 GB           |
| **OpenSearch**              | Vector search and Document Cache                           | 100 GB EBS | `r8g.medium.search` × 2 |
| **Ray (KubeRay)**           | Distributed Knowlegebase processing and async workloads    | —          | Configurable            |
| **AWS Batch**               | Serverless async job processing (Fargate)                  | —          | 1 vCPU / 2 GB per job   |
| AWS **Lambda**              | Semantic chunking, OCR (Tesseract), event processing       | —          | Serverless              |

***

### Network Architecture

#### Traffic Flow

1. End users connect to the platform via HTTPS on your domain
2. DNS resolves to the AWS Network Load Balancer (NLB) in the public subnets
3. NLB forwards traffic to the NGINX Ingress Controller running in the EKS cluster
4. NGINX routes requests to Sentinel (API) or CoreML (inference) based on path rules
5. All inter-service communication stays within the VPC private subnets
6. AWS managed services (S3, Secrets Manager) are accessed via VPC endpoints or NAT Gateway

### IAM Permissions

The AWS credentials used during deployment require the following IAM permissions. It is recommended to create a dedicated deployment IAM role.

#### Required IAM Actions

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EKS",
      "Effect": "Allow",
      "Action": [
        "eks:Describe*",
        "eks:List*",
        "eks:CreateAddon",
        "eks:DeleteAddon",
        "eks:UpdateAddon"
      ],
      "Resource": "*"
    },
    {
      "Sid": "IAM",
      "Effect": "Allow",
      "Action": [
        "iam:CreateRole",
        "iam:DeleteRole",
        "iam:AttachRolePolicy",
        "iam:DetachRolePolicy",
        "iam:CreatePolicy",
        "iam:DeletePolicy",
        "iam:GetRole",
        "iam:GetPolicy",
        "iam:ListRolePolicies",
        "iam:PassRole",
        "iam:CreateOpenIDConnectProvider",
        "iam:DeleteOpenIDConnectProvider"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Storage",
      "Effect": "Allow",
      "Action": [
        "elasticfilesystem:*",
        "s3:*",
        "kms:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Compute",
      "Effect": "Allow",
      "Action": [
        "ec2:Describe*",
        "ec2:CreateSecurityGroup",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CreateTags",
        "batch:*",
        "lambda:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Observability",
      "Effect": "Allow",
      "Action": [
        "logs:*",
        "cloudwatch:*",
        "es:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "Messaging",
      "Effect": "Allow",
      "Action": [
        "sqs:*",
        "events:*",
        "secretsmanager:*"
      ],
      "Resource": "*"
    }
  ]
}
```

#### IRSA Roles (Created by Terraform)

The module automatically creates the following IAM Roles for Service Accounts (IRSA):

| Role                      | Service Account         | Permissions                           |
| ------------------------- | ----------------------- | ------------------------------------- |
| `coreml-irsa-role`        | `coreml`                | S3, Bedrock, Secrets Manager          |
| `sentinel-irsa-role`      | `sentinel`              | S3, Secrets Manager, Lambda           |
| `efs-csi-role`            | `efs-csi-controller-sa` | EFS full access                       |
| `cluster-autoscaler-role` | `cluster-autoscaler`    | EC2 AutoScaling describe/set capacity |
| `kuberay-role`            | `kuberay-operator`      | Batch, S3                             |

### Scaling Recommendations

#### Horizontal Node Scaling

Cluster Autoscaler is pre-configured and will scale EKS node groups automatically:

```hcl
node_count_min     = 2   # Always-on nodes for high availability
node_count_desired = 3   # Starting node count
node_count_max     = 10  # Maximum nodes under load
```

The autoscaler uses a **30-minute scale-down window** to prevent thrashing during intermittent workloads

#### Service-Level Tuning

| Service                        | Variable                                | Default   | Recommendation                         |
| ------------------------------ | --------------------------------------- | --------- | -------------------------------------- |
| Observability Datastore memory | `observability_datastore_pod_memory_gb` | `16`      | Increase for datasets > 50 GB          |
| Obs. Datastore CPU             | `observability_datastore_pod_cpu_limit` | `"4"`     | Increase for high query concurrency    |
| Batch job memory               | `batch_job_memory`                      | `2048` MB | Increase for large document processing |
| Log retention                  | `log_retention_days`                    | `30`      | Reduce to lower CloudWatch costs       |
| OpenSearch nodes               | `opensearch_instance_count`             | `2`       | Use `3` for production HA              |
| OpenSearch storage             | `opensearch_ebs_volume_size`            | `100` GB  | Scale with log/index volume            |

#### Workload Profiles

| Profile             | Instance Type | Nodes  | Use Case                      |
| ------------------- | ------------- | ------ | ----------------------------- |
| Development         | `m5.xlarge`   | 1 – 2  | Testing and evaluation        |
| Production (Small)  | `m5.2xlarge`  | 2 – 3  | Up to 50 concurrent users     |
| Production (Medium) | `m5.2xlarge`  | 3 – 5  | Up to 200 concurrent users    |
| Production (Large)  | `m5.4xlarge`  | 5 – 10 | Enterprise / high-concurrency |

### Multi-Region High Availability

For enterprise deployments requiring cross-region redundancy.

#### Active-Passive Setup

1. **Primary region** runs the full Karini AI stack
2. **Secondary region** maintains replicated data stores only
3. Route 53 health checks with DNS failover route traffic to the secondary region on failure

#### Data Replication

Enable EFS cross-region replication in your `terraform.tfvars`:

```hcl
efs_replication_destination_region = <target aws region>
```

For OpenSearch, enable multi-AZ with dedicated master nodes:

```hcl
opensearch_instance_count = 3
opensearch_instance_type  = "r8g.large.search"
```

#### Recovery Objectives

| Tier       | RPO        | RTO         | Configuration              |
| ---------- | ---------- | ----------- | -------------------------- |
| Standard   | 1 hour     | 2 hours     | Single region, multi-AZ    |
| Enhanced   | 15 minutes | 30 minutes  | EFS replication enabled    |
| Enterprise | Near-zero  | < 5 minutes | Active-active multi-region |

***

### Troubleshooting

#### Pods Stuck in Pending State

```bash
kubectl describe pod <pod-name> -n default
```

Common causes:

* **Insufficient CPU/Memory** — Scale up the node group or use a larger instance type
* **PVC not bound** — Check EFS/EBS CSI driver pods are running in `kube-system`
* **Image pull error** — Verify ECR image URIs and IAM permissions for the node role

#### Terraform Apply Fails on Helm Release

```bash
helm list --all-namespaces
helm status <release-name> -n <namespace>
```

If a release is stuck in `pending-install`, delete it and re-apply:

```bash
helm delete <release-name> -n <namespace>
terraform apply -var-file=terraform.tfvars
```

#### Cannot Access the Platform URL

1. Confirm the NLB is provisioned: `kubectl get svc -n ingress-nginx`
2. Check the external IP/hostname is assigned to the NLB
3. Verify your DNS record points to the NLB hostname
4. Confirm the ACM certificate status is `ISSUED` in the AWS Console
5. Check ingress rules: `kubectl get ingress --all-namespaces`

#### Observability Datastore Out of Memory

Increase the memory allocation in `terraform.tfvars`:

```hcl
observability_datastore_pod_memory_gb         = 32
observability_datastore_pod_memory_request_gb = 16
```

Then re-apply:

```bash
terraform apply -var-file=terraform.tfvars
```

#### View Service Logs

```bash
# Sentinel logs
kubectl logs -l app=sentinel -n default --tail=100 -f

# CoreML logs
kubectl logs -l app=coreml -n default --tail=100 -f

# Observability Datastore logs
kubectl logs -l app=observability-datastore -n default --tail=100 -f
```

***

### Support

For deployment assistance or to obtain Karini AI container images and Helm chart access, contact:

* **Website:** [https://www.karini.ai](https://www.karini.ai/)
* **Email:** <support@karini.ai>

This guide applies to Karini AI deployments using the `eks-terraform-module`. For SaaS or managed deployment options, contact Karini AI directly.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://karini-ai.gitbook.io/karini-ai-documentation/installation/karini-ai-on-premise-deployment.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.