Create Recipe

Karini AI's Recipe allows you to create your no-code generative AI application pipelines.

Creating a Recipe

To create a new recipe, go to the Recipe Page, click Add New, select Karini as the runtime option, provide a user-friendly name and detailed description and choose recipe type as Knowledgebase.

Configure the following elements by dragging them on to the recipe canvas.

Source

Define your data source by configuring the associated data storage connector. Select appropriate data connector from a list of available connectors.

Configure the storage paths for the connector and apply necessary filters to restrict the data being included in the source. Enable recursive search if needed to include data from nested directories or structures.

You have option to test your data connector setup using the "Test" button.

Dataset

Dataset server as internal collection of dataset items which are pointers to the data source. For a recipe, you can choose to use an existing dataset which may have references to other data sources or create a new one, depending on your needs.

Karini AI provides various options for data preprocessing.

Dataset types

When you create a dataset in the recipe , you can choose one of two dataset types:

1. text

Used for text-based workflows, such as documents, extracted text, and OCR/PII processing.

Karini AI provides various options for data preprocessing for text datasets.

OCR:

For source data which contains files of types pdf or image, you can perform Optical Character Recognition (OCR) by selecting one of the following options:

Unstructured IO with Extract Images: This method is used for extracting images from unstructured data sources. It processes unstructured documents, identifying and extracting images that can be further analyzed or used in different applications.
PyMuPDF with Fallback to Amazon Textract: This approach utilizes PyMuPDF to extract text and images from PDF documents. If PyMuPDF fails or is insufficient, the process falls back to Amazon Textract, ensuring a comprehensive extraction by leveraging Amazon's advanced OCR capabilities.
Amazon Textract with Extract Table: Amazon Textract is used to extract structured data, such as tables, from documents. This method specifically focuses on identifying and extracting tabular data, making it easier to analyze and use structured information from scanned documents or PDFs.

PII:

If you need to mask Personally Identifiable Information (PII) within your dataset, you can enable the PII Masking option. You can select from the list of entities that you want masked for data pre-processing. To learn more about the entities refer to this documentation.

Link your Source element to Dataset element in the recipe canvas to start creating your data ingestion pipeline.

Preprocess Data and Extract Custom Metadata:

Custom preprocessor: This option allows the user to apply custom logic or transformations to the OCR-generated text. OCR text often contains noise, inaccuracies, or irrelevant information, and preprocessing helps to clean and structure the text to better serve specific requirements.
1. Document Types: The feature supports OCR extraction from various document formats, including PDFs, PNG images, and TXT files. It is versatile in handling different types of documents and outputs generated through OCR.
2. AWS Lambda Integration: The feature allows users to configure a custom AWS Lambda function for preprocessing tasks. By providing the Lambda function’s ARN (Amazon Resource Name), users can trigger the function, which executes the custom logic for processing the text. Lambda allows for highly scalable, serverless execution of functions with flexible integration, which is ideal for handling complex or computationally intensive tasks.
  - Lambda ARN : Enter the Amazon Resource Name (ARN) for the AWS Lambda function that will process and split the data.
  - Input Test Payload: Enter test data to validate the Lambda function’s behavior before deployment.
  - Test Button : Allows you to execute a test run of the configured Lambda function for validation.
  - Overwrite Credentials (Optional) → Allows you to override existing authentication settings with new credentials.
3. Page Range Specification: Users can specify which pages of the document should be processed. This is useful when only certain sections of a document need to be preprocessed or when processing large documents, allowing for efficient handling of specific pages or page ranges. For example, you can choose to process pages 1–5 or select all pages.
Metadata extractor: This feature uses a prompt-driven approach to extract entities and map them to appropriate data types, ensuring compliance with search engine requirements. The output is a clean, valid JSON format, optimized for indexing, querying, and downstream analysis.

In order to use this option, you must have the Custom metadata extraction model configured in the Organization setting.

You can define entities along with their corresponding data types to enable targeted extraction. If no entities are specified, the system will automatically identify and extract relevant entities based on the input text.

You can specify which pages of the document should be processed. This is useful when only certain sections of a document need to be preprocessed or when processing large documents, allowing for efficient handling of specific pages or page ranges. For example, you can choose to process pages 1–5 or select all pages.

2.Multimodal

The Multimodal feature enables processing and understanding of multiple data types including images, videos, audio, and text. This feature leverages Vision Language Models (VLM) and various processing techniques to extract meaningful information from diverse media formats.

Supports ingestion of common multimedia and document formats, including:

Images: JPEG (.jpg, .jpeg), PNG (.png), GIF (.gif), WebP (.webp)
Videos: MP4, AVI, MOV, WebM
Audio: MP3, WAV, M4A, FLAC
Documents: PDF, DOCX, TXT

Karini AI provides various options for data preprocessing for multimodal datasets.

A. Image Understanding

Image Understanding uses Vision Language Models (VLMs) to interpret images and generate structured, prompt-driven outputs.

VLM-based analysis: Produces semantic descriptions of image content (objects, scenes, relationships, and context) based on the configured prompt.
The recipe uses the following default prompt template.

You are an expert image analyst. Your task is to provide a comprehensive, detailed description of the image you receive.

Analyze the image thoroughly and describe:
The main subject(s) and their characteristics
The setting, environment, or background
Colors, lighting, and visual composition
Any text, symbols, or notable details visible
The mood, tone, or atmosphere conveyed
Spatial relationships between elements
Any actions, activities, or implied narrative

Be specific and observational. Focus on what you can definitively see rather than speculation. Organize your thoughts coherently but present them as a single flowing description.

Output only the description as plain text without any formatting, headers, or additional commentary.

Customizable prompts: Supports configurable VLM prompt templates for domain-specific extraction and standardized output formats.
Text extraction: Extracts text from images and can combine it with contextual descriptions to improve search and interpretation.
Layout-aware processing: Preserves document structure via intelligent chunking for multi-column pages, forms, and tables to improve extraction fidelity.

B. Video Understanding

Video Understanding processes video assets by combining time-based segmentation, VLM-driven visual analysis, and optional speech-to-text transcription to generate a comprehensive representation of the content.

Segmentation: Splits videos into configurable time windows (5–30 seconds) to control analysis granularity and throughput.
VLM analysis: Performs segment/frame-level interpretation using Vision Language Models to generate visual descriptions, detected events, and contextual summaries based on the configured prompt.
The recipe uses the following default prompt template.

You are an expert video analyst. Your task is to provide a comprehensive, detailed description of the video you receive.

Analyze the video thoroughly and describe:
The opening scene and how it progresses
Main subjects, characters, or objects and their actions throughout
The setting, environment, and any changes in location
Key events or moments in chronological order
Visual elements: colors, lighting, camera work, composition
Any audio elements if present (dialogue, music, sound effects)
Text, graphics, or overlays that appear
The pacing and flow of the content
The overall message, purpose, or narrative arc
The closing or final impression

Pay attention to temporal progression and how elements evolve across the 30 seconds. Be specific about what happens when, but maintain a coherent narrative flow.

Output only the description as plain text without any formatting, headers, or additional commentary.

Transcription: Converts the video’s audio track into text to capture spoken content for indexing and downstream NLP/LLM workflows.
Dual processing: Merges visual outputs with transcripts to improve completeness and retrieval accuracy .

C. Audio Processing

Audio Processing converts speech in audio files into text to support search, indexing, summarization, and downstream pipeline execution.

Speech-to-text: Produces transcripts using supported transcription methods based on deployment and accuracy requirements.
Amazon Transcribe integration: Enables AWS-managed transcription via Amazon Transcribe for scalable, production-grade speech recognition.
Custom model support: Use custom speech-to-text model endpoints.

In order to use this option, you must have the Global VLM model configured in the Organization setting.

Knowledge base

OpenSearch is the VectorDB provider for your knowledge base, responsible for managing and storing your vector data.

The View Custom Metadata button is located in the Knowledgebase section. When accessed before recipe processing, it will not display any metadata. Upon completion of recipe processing, the button will display either the default metadata or the custom metadata extracted through the metadata extractor prompt available on the dataset tile. For additional details, please refer to the provided links.

Vector DB Prompt

These options provide several techniques to improve the relevance and quality of your vector search.

Use embedding chunks:

Choosing this option conducts a semantic search to retrieve the top_k most similar vector-embedded document chunks and uses these chunks to create a contextual prompt for the Large Language Model (LLM).

Summarize chunks:

Choosing this option conducts a semantic search to retrieve the top_k most similar vector-embedded document chunks and then summarizes these chunks to create a contextual prompt for the Large Language Model (LLM).

Use the document text for matching embeddings:

Choosing this feature conducts a semantic search to retrieve the top_k most similar vector-embedded document chunks and uses the corresponding text from the original document to create a contextual prompt for the Large Language Model (LLM). You can further restrict the context by selecting one of the following options:

Use entire document: Use the text from the entire document to create context for the prompt.
Use matching page: Use the text from matching page of the document to create context for the prompt. You can optionally include previous and next page to ensure continuity and context preservation.

Top_k:

Maximum number of top matching vectors to retrieve.

Enable Reranker:

Re-ranking improves search relevance by reordering the result set based on the relevancy score. In order to enable reranker, you must have set the reranker model in the Organization setting. You can configure following options for the reranker.

Top-N: Maximum number of top-ranking vectors to retrieve. This number must be lesser than the top_k parameter.
Reranker Threshold: A threshold for relevancy score. The reranker model will select Top-N vectors that are over the set the threshold.

Advanced query reconstruction:

Query rewriting can involve modifying or augmenting user queries to enhance retrieval accuracy. In order to use this option, you must have the Natural language assistant model configured in the Organization setting.

Multi query rewrite: Use this option when you want to break down complex or ambiguous queries into multiple distinct, simpler queries. You are provided with a sample prompt for this task, however you can update the prompt as required.
Query expansion: Use this option to expand the user queries by augmenting the query with a LLM generated information which can help answer the question. This technique is helpful for improving retrieval accuracy when user queries are short, abrupt and not specific. You are provided with a sample prompt for this task, however you can update the prompt as required.

Enable ACL restriction:

Enable ACL restriction refers to filtering the knowledge base based on the user's Access Control List (ACL) permissions. Here's a explained how it works:

ACL-Based Filtering: When this feature is enabled, the content that is retrieved from the knowledge base will first be filtered based on the permissions assigned to the user. This means only content that the user is allowed to access will be considered.
Semantic Retrieval: After the ACL filter, the system performs semantic similarity-based retrieval to ensure that the content retrieved is relevant to the user's query.
Security Enhancement: This feature enhances security by ensuring that users can only access content that is permissible according to their ACL. It prevents unauthorized access to sensitive or restricted information by filtering out content the user shouldn't be able to access.

Enable dynamic metadata filtering:

This feature utilizes a Large Language Model (LLM) to generate custom metadata filters. Here's how it works:

Automatic Metadata Filtering: When enabled, the system analyzes metadata keys along with the user's input query. Based on this analysis, it generates dynamic metadata filters that narrow down the knowledge base.
Context-Aware: The metadata filters are designed to be dynamic and context-aware. This means that the filters adjust based on the query and the context of the user's request, ensuring a more accurate retrieval process.
Semantic Retrieval: Once the metadata filters are applied, the system performs retrieval based on semantic similarity. This allows for more focused and precise results.

Enable hybrid search

Hybrid Search enables the combination of vector-based semantic retrieval and keyword-based lexical matching within the Knowledge node. This approach improves retrieval accuracy by leveraging both deep semantic understanding and precise keyword alignment.

Configuration

Hybrid Search is activated by selecting the Enable Hybrid Search option. Once enabled, the system expects a Query Template in JSON format to control how results are retrieved and scored.

Query Template Schema

{
  "vector_weight": 0.6,
  "keyword_weight": 0.4,
  "fields": [
    "raw_chunk^2.0"
  ]
}

vector_weight: Specifies the contribution of vector similarity to the final relevance score. Value must be between 0 and 1.
keyword_weight: Specifies the contribution of keyword relevance. Must complement the vector_weight to ensure balanced scoring
fields: Field to which the defined scoring weight is applied during retrieval.

This configuration allows fine-grained control over hybrid ranking strategies, ensuring optimal relevance in document retrieval across varied datasets.

Search data

Search data executes a live retrieval query against the connected dataset index (OpenSearch) and returns the highest-relevance matched content for verification before it is used downstream.

Run retrieval
- Submits the entered query to OpenSearch when the send button is clicked.
- Retrieves the top-ranked matches based on the configured retrieval method.
Review returned content
- Returns a response with a unique request_id for traceability.
- Displays matched items in results, including page_content (the exact text selected as context).
Validate and troubleshoot with Tracing Details
- Provides step-level visibility into the retrieval pipeline:
  - Retrieve Context
  - Generate Embeddings
  - Search Vector Index

Link the Dataset element to the Knowledge base element in the recipe canvas to link your data with the vector store.

Data Processing Preferences

When you link the Dataset element to the Knowledge base element, you can set the Data Processing Preferences that would define your vector embedding creation process. These preferences include:

Embedding Model:

You can choose from a selection of available embeddings models in the Karini AI Model hub.

Chunking Type:

You can define a chunking strategy for your unstructured data

Recursive: This method divides data hierarchically or based on nested structures.
Semantic: This method segments data based on semantic boundaries or logical breaks in the content.
Layout aware chunking: This method divides data based on the layout and visual structure, preserving the spatial relationships within the content.

Layout-aware Chunking is available only when the OCR option "Amazon Textract - Extract with Tables" is selected.

Chunk Size:

Specify the size of each data segment processed by the embedding model.

Chunk Overlap:

Define the overlap between consecutive chunks of data segments. Overlapping helps ensure continuity and context preservation across chunks, especially in sequential or interconnected data.

Saving a Recipe

You can save the recipe at any point during the creation. Saving the recipe preserves all configurations and connections made in the workflow for future reference or deployment.

PreviousVector Stores NextRun Recipe

Last updated 18 days ago

hashtagCreating a Recipe

hashtagSource

hashtagDataset

hashtagDataset types

hashtag1. text

hashtag2.Multimodal

hashtagA. Image Understanding

hashtagB. Video Understanding

hashtagC. Audio Processing

hashtagKnowledge base

hashtagVector DB Prompt

hashtagUse embedding chunks:

hashtagSummarize chunks:

hashtagUse the document text for matching embeddings:

hashtagTop_k:

hashtagEnable Reranker:

hashtagAdvanced query reconstruction:

hashtagEnable ACL restriction:

hashtagEnable dynamic metadata filtering:

hashtagEnable hybrid search

hashtagSearch data

hashtagData Processing Preferences

hashtagEmbedding Model:

hashtagChunking Type:

hashtagChunk Size:

hashtagChunk Overlap:

hashtagSaving a Recipe