> For the complete documentation index, see [llms.txt](https://karini-ai.gitbook.io/karini-ai-documentation/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://karini-ai.gitbook.io/karini-ai-documentation/recipes/knowledgebase-recipe/data-storage-connectors/sitecore.md).

# Sitecore

To configure Sitecore integration, follow these steps based on your data source type:

#### 1. **Folder Configuration**:

* **Source Type**: Select **"Folders"**.
* **Folder ID**: Enter the unique **Folder ID** where your data is stored in Sitecore.
* **Recursive Search**: Enable **"Recursive search"** if you want to include subfolders and their contents.

#### 2. **Manifest File Configuration**:

* **Source Type**: Choose **"SiteCore Manifest"**.
* **S3 Bucket Path**: Provide the path to the **S3 bucket** where the Sitecore manifest file is stored.
* **Credentials**: Ensure the correct credentials are set up for both Sitecore and the S3 bucket to enable access.

**Sample Manifest Structure**

A manifest defines how content is extracted, filtered, and processed. It provides reusable, declarative configurations for large-scale ingestion workflows.

```
{
  "knowledge_articles": {
    "type": "Knowledge Articles",
    "extraction_rules": {
      "content_type": "html",
      "content_fields": ["Body"],
      "metadata_fields": {
        "Date"
      }
    },
    "filters": ["PLACEHOLDER_FILTER"],
    "sources": [
      { "path": "/placeholder/path/one", "recursive": true },
      { "path": "/placeholder/path/two", "recursive": true }
    ]
  },

  "questions_answers": {
    "type": "questions_answers",
    "extraction_rules": {
      "content_type": "html",
      "content_fields": [
        "Question",
        "Answer"
      ]
    },
    "filters": ["Release"],
    "sources": [
      { "path": "/placeholder/path/faq", "recursive": true }
    ]
  },

  "video_image_files": {
    "type": "Video and Image Files",
    "extraction_rules": {
      "content_type": "file",
      "content_fields": [],
      "folder_type": "Media Library",
      "metadata_fields": {
        "ItemID": "keyword",
        "Display Date": "date"
      }
    },
    "filters": [
       "*png",
        "*jpg",
        "*vidyard player",
        "*mp4"
    ],
    "sources": [
      { "path": "/placeholder/media/path/one", "recursive": true },
      { "path": "/placeholder/media/path/two", "recursive": true },
      { "path": "/placeholder/media/path/three", "recursive": true }
    ]
  },

  "pdf_files": {
    "type": "PDF Files",
    "extraction_rules": {
      "content_type": "file",
      "content_fields": [],
      "metadata_fields": {
       "ItemID" : "keyword",
        "Display Date" : "date",
        "Date" : "date"
      }
    },
    "filters": [
       "*pdf",
      "*docx"
    ],
    "sources": [
      { "path": "/placeholder/path/documents", "recursive": true }
    ]
  },

  "archived_files": {
    "type": "Archived Files",
    "extraction_rules": {
      "content_type": "file",
      "content_fields": [],
      "metadata_fields": {
       "ItemID" : "keyword",
        "Date" : "date",
        "Display Date" : "date"
      }
    },
    "filters": [
      "*zip",
      "*rar"
    ],
    "unzipped_files_filter": [
      ["*.pdf","*.ppt","*.html"]
    ],
    "sources": [
      { "path": "/placeholder/path/archive", "recursive": true }
    ]
  }
}
```

* **`knowledge_articles`**: Defines the configuration for extracting and processing knowledge article data.
* **`type`**: Specifies the type of data or document being processed (e.g., `PLACEHOLDER_TYPE`).
* **`extraction_rules`**: Defines the rules for extracting content and metadata from the documents.
  * **`content_type`**: Specifies the format of the document content (e.g., `html` or `file`).
  * **`content_fields`**: Lists the fields within the content to extract (e.g., text or specific data).
  * **`metadata_fields`**: Defines the metadata fields and their types (e.g., author, creation date).
* **`filters`**: Defines filters to apply to the content, such as specific keywords or conditions.
* **`sources`**: Specifies the paths to the data sources, with an option to recurse through subdirectories.
  * **`path`**: The location of the data source.
  * **`recursive`**: A boolean indicating whether to process subdirectories.
* **`questions_answers`**: Defines the configuration for extracting question and answer pairs.
* **`video_image_files`**: Configuration for processing video and image files.
  * **`folder_type`**: Specifies the type of folder containing the files.
  * **`filters`**: A list of filters to apply specifically for video and image files.
* **`pdf_files`**: Defines the configuration for processing PDF files.
  * **`filters`**: Specifies filters such as file extensions for processing PDFs.
* **`archived_files`**: Configuration for processing archived files (e.g., ZIP files).
  * **`unzipped_files_filter`**: Filters for processing files after they have been unzipped.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://karini-ai.gitbook.io/karini-ai-documentation/recipes/knowledgebase-recipe/data-storage-connectors/sitecore.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
