Knowledge Setup

Upload documents, organize collections, and configure your knowledge base for AI agent use.

The knowledge base is where your agents find answers. Documents are uploaded, chunked into searchable segments, embedded with vector representations, and made available through hybrid search (semantic + keyword).

Uploading Documents

Navigate to Knowledge in the sidebar. You can:

Drag and drop files directly onto the page
Click Upload and select files from your computer
Sync from Google Drive or Notion (requires integration setup)

Supported Formats

Format	Extensions	Notes
PDF	`.pdf`	Text extracted automatically; scanned PDFs use OCR
Word	`.docx`	Full text and formatting preserved
Plain text	`.txt`, `.md`	Direct ingestion
Spreadsheets	`.csv`, `.xlsx`	Row-by-row or table-aware chunking
Images	`.png`, `.jpg`	OCR text extraction

Size Limits

Plan	Max File Size	Max Documents
Trial	50 MB	100
Starter	100 MB	500
Professional	250 MB	5,000
Enterprise	500 MB	Unlimited

Collections

Collections are folders for organizing your documents. They support nesting and can have different authority levels.

Authority Levels

Each collection has an authority level that affects how agents treat its content:

Authoritative — Agent treats this as ground truth. Used for policies, contracts, and official documentation.
Reference — Standard reference material. Used for general knowledge.
Supplementary — Lower-weight context. Used for drafts, notes, and informal content.

Creating Collections

Go to Knowledge
Click New Collection
Name it and set the authority level
Upload documents or move existing ones into it

Document Processing Pipeline

When you upload a document, Sempleo:

Extracts text — Parses the file format and extracts raw text
Chunks — Splits text into overlapping segments (typically 500–1,500 tokens) using structure-aware splitting
Embeds — Generates vector embeddings for each chunk using the configured embedding model
Indexes — Stores chunks in PostgreSQL with pgvector HNSW indexes for fast retrieval

Processing typically takes 10–30 seconds per document, depending on size.

Syncing External Sources

Google Drive

After connecting Google Drive in Settings → Integrations, configure folder sync:

Open the Google Drive integration detail
Enter the folder ID you want to sync
Documents are imported and kept in sync automatically

Notion

After connecting Notion:

Open the Notion integration detail
Select pages to sync
Page content is converted to markdown and ingested

Search & Retrieval

Agents use hybrid search combining:

Semantic search — Vector similarity using embeddings
Keyword search — Full-text search for exact matches
Reciprocal Rank Fusion — Merges results from both methods

Results are scored and filtered by authority level, recency, and relevance.

On this page