Knowledge Setup
Upload documents, organize collections, and configure your knowledge base for AI agent use.
The knowledge base is where your agents find answers. Documents are uploaded, chunked into searchable segments, embedded with vector representations, and made available through hybrid search (semantic + keyword).
Uploading Documents
Navigate to Knowledge in the sidebar. You can:
- Drag and drop files directly onto the page
- Click Upload and select files from your computer
- Sync from Google Drive or Notion (requires integration setup)
Supported Formats
| Format | Extensions | Notes |
|---|---|---|
.pdf | Text extracted automatically; scanned PDFs use OCR | |
| Word | .docx | Full text and formatting preserved |
| Plain text | .txt, .md | Direct ingestion |
| Spreadsheets | .csv, .xlsx | Row-by-row or table-aware chunking |
| Images | .png, .jpg | OCR text extraction |
Size Limits
| Plan | Max File Size | Max Documents |
|---|---|---|
| Trial | 50 MB | 100 |
| Starter | 100 MB | 500 |
| Professional | 250 MB | 5,000 |
| Enterprise | 500 MB | Unlimited |
Collections
Collections are folders for organizing your documents. They support nesting and can have different authority levels.
Authority Levels
Each collection has an authority level that affects how agents treat its content:
- Authoritative — Agent treats this as ground truth. Used for policies, contracts, and official documentation.
- Reference — Standard reference material. Used for general knowledge.
- Supplementary — Lower-weight context. Used for drafts, notes, and informal content.
Creating Collections
- Go to Knowledge
- Click New Collection
- Name it and set the authority level
- Upload documents or move existing ones into it
Document Processing Pipeline
When you upload a document, Sempleo:
- Extracts text — Parses the file format and extracts raw text
- Chunks — Splits text into overlapping segments (typically 500–1,500 tokens) using structure-aware splitting
- Embeds — Generates vector embeddings for each chunk using the configured embedding model
- Indexes — Stores chunks in PostgreSQL with pgvector HNSW indexes for fast retrieval
Processing typically takes 10–30 seconds per document, depending on size.
Syncing External Sources
Google Drive
After connecting Google Drive in Settings → Integrations, configure folder sync:
- Open the Google Drive integration detail
- Enter the folder ID you want to sync
- Documents are imported and kept in sync automatically
Notion
After connecting Notion:
- Open the Notion integration detail
- Select pages to sync
- Page content is converted to markdown and ingested
Search & Retrieval
Agents use hybrid search combining:
- Semantic search — Vector similarity using embeddings
- Keyword search — Full-text search for exact matches
- Reciprocal Rank Fusion — Merges results from both methods
Results are scored and filtered by authority level, recency, and relevance.