Conversation
Pegasystems Inc.
CA
Last activity: 9 Feb 2026 11:04 EST
Knowledge Buddy Collections & Data Sources Setup Guide
This guide explains how data collections and data sources work, which content types you can ingest, and exactly how to set them up in the Knowledge Buddy Portal. It also covers security roles and advanced options like chunking, attribution, and extraction methods.
What is a data collection?
- A data collection is a logical container that groups related data sources based on a criterion (for example, business division, product line, or industry).
- Purpose:
- Scope queries to a relevant domain.
- Improve performance with separate index tables.
- Centralize policies like chunking and access controls.
- Examples:
- HR_Collection for Human Resources content.
- Financial_Srvs for banking and investment content.
- Every data source must belong to a collection. If none is specified, the default collection “knowledge” is used.
- Performance note: Keeping collections separate stores data in distinct index tables in the vector store, enabling faster, more targeted searches and supporting isolated testing.
What is a data source?
- A data source connects to where your content lives (for example, files you upload, Pega Knowledge, or the ingestion APIs).
- Each data source defines its connection, access, and optional overrides (such as chunking and extraction), and it must be associated with a collection.
Supported content sources and formats
- Knowledge Buddy Ingestion API: Text only
- Knowledge Buddy File Ingestion API: .docx, .pptx, .ppt, .pdf, .txt, .html, .md
- Pega Knowledge: .docx, .pptx, .ppt, .pdf, .txt
- Pega Knowledge Loader: .docx, .pptx, .ppt, .pdf, .txt, .page, .aspx
- Notes:
- Text content should avoid tables and images because they may not display correctly.
- Files can include images, but Knowledge Buddy does not answer image-specific questions.
Security and access
- Collections:
- Configure who can view and manage each collection (including chunking strategy) in Access configurations.
- Default roles: Data Source Manager and Knowledge Buddy Manager (you can change these).
- Data sources and content:
- Define who can manage and view items via Access configurations.
- Assign roles to users to grant the correct permissions.
Step-by-step: Create a data collection
Prerequisite: You must have the Data Source Manager role.
- Open the Knowledge Buddy Portal.
- In the left navigation, click Create > Data Collection.
- Enter a Name for the collection.
- Important: Collection names cannot be changed later. To “rename,” create a new collection and archive the original.
- Enter a Description that explains the collection’s scope and purpose.
- In Access configurations, select roles that can manage and view the collection.
- Defaults: Data Source Manager and Knowledge Buddy Manager (adjust as needed).
- Optional: Click Advanced settings and configure:
- Chunking strategy: Choose the chunking method, chunk size, and chunk overlap. This applies to all data sources in the collection unless overridden at the data source.
- Attribution level: Choose content-level or chunk-level attribution.
- Auto filtering: Select a text analyzer to apply auto attribution during semantic search.
- Embedding attributes: Choose attributes to include in embeddings for improved relevance.
- Click Save.
Step-by-step: Create a data source
Prerequisite: You must have KnowledgeBuddy:Admin or KnowledgeBuddy:DataSourceManager.
- In the Knowledge Buddy Portal, click Data Source in the navigation.
- In the left navigation, click Create > Data Source.
- In the Create data source dialog:
- Collection: Select the collection this data source belongs to. If not selected, the default “knowledge” collection is used.
- Name: Enter a name without spaces or special characters.
- Description: Describe the data source’s purpose and scope.
- Access configurations:
- By default, users with the Data Source Manager role can manage the data source.
- Change default roles or assign custom roles as needed.
- Optional: Click Advanced settings and configure:
- Preferred extraction method:
- Standard: Default for plain text or unsupported formats. Extracts basic text (including text in tables), but not images or table structures.
- GenAI: For supported file types, retains tables and tabular formats during ingestion. If GenAI fails, the system falls back to Standard. Applies only to file-type content.
- Chunking settings: Override the collection’s chunking method, size, and overlap if needed.
- Preferred extraction method:
- Click Save.
- Note: You can also create data sources via the Data Source API if preferred.
Step-by-step: Add content
Before you begin: Ensure you have permission to manage the target data source and confirm the correct collection and advanced settings (chunking, attribution, extraction).
- Open the Knowledge Buddy Portal and go to Data Sources.
- Select the data source where you want to add content.
- Click Add content (or the equivalent action for your source type).
- Choose how you will add content:
- Upload files using the File Ingestion API or portal upload (supported formats listed above).
- Send text via the Knowledge Buddy Ingestion API.
- Connect to Pega Knowledge or use the Pega Knowledge Loader for supported formats.
- If applicable, select the extraction method:
- Use GenAI for supported file types to preserve tables and tabular formatting.
- Otherwise, use Standard extraction.
- Review or set metadata and access (if your process requires it).
- Start ingestion and monitor status in the data source’s Content or Activity view.
Best practices
- Create separate collections per domain or audience to improve performance and result relevance.
- Plan chunking at the collection level; only override at the data source when necessary (for example, for specialized documents).
- Use clear names and descriptions; remember collection names cannot be changed.
- Keep access roles up to date to maintain proper security.
- For testing, create a dedicated non-production collection to trial new chunking strategies without impacting users.
Demo:
Pega GenAI Cookbook - Recipes series
Enjoyed this article? See more similar articles in Pega Cookbook - Recipes 🔥🔥🔥 series.