Sempleo Docs
Guides

Knowledge Setup

Upload documents, organize collections, and configure your knowledge base for AI agent use.

The knowledge base is where your agents find answers. Documents are uploaded, chunked into searchable segments, embedded with vector representations, and made available through hybrid search (semantic + keyword).

Uploading Documents

Navigate to Knowledge in the sidebar. You can:

  1. Drag and drop files directly onto the page
  2. Click Upload and select files from your computer
  3. Sync from Google Drive or Notion (requires integration setup)

Supported Formats

FormatExtensionsNotes
PDF.pdfText extracted automatically; scanned PDFs use OCR
Word.docxFull text and formatting preserved
Plain text.txt, .mdDirect ingestion
Spreadsheets.csv, .xlsxRow-by-row or table-aware chunking
Images.png, .jpgOCR text extraction

Size Limits

PlanMax File SizeMax Documents
Trial50 MB100
Starter100 MB500
Professional250 MB5,000
Enterprise500 MBUnlimited

Collections

Collections are folders for organizing your documents. They support nesting and can have different authority levels.

Authority Levels

Each collection has an authority level that affects how agents treat its content:

  • Authoritative — Agent treats this as ground truth. Used for policies, contracts, and official documentation.
  • Reference — Standard reference material. Used for general knowledge.
  • Supplementary — Lower-weight context. Used for drafts, notes, and informal content.

Creating Collections

  1. Go to Knowledge
  2. Click New Collection
  3. Name it and set the authority level
  4. Upload documents or move existing ones into it

Document Processing Pipeline

When you upload a document, Sempleo:

  1. Extracts text — Parses the file format and extracts raw text
  2. Chunks — Splits text into overlapping segments (typically 500–1,500 tokens) using structure-aware splitting
  3. Embeds — Generates vector embeddings for each chunk using the configured embedding model
  4. Indexes — Stores chunks in PostgreSQL with pgvector HNSW indexes for fast retrieval

Processing typically takes 10–30 seconds per document, depending on size.

Syncing External Sources

Google Drive

After connecting Google Drive in Settings → Integrations, configure folder sync:

  1. Open the Google Drive integration detail
  2. Enter the folder ID you want to sync
  3. Documents are imported and kept in sync automatically

Notion

After connecting Notion:

  1. Open the Notion integration detail
  2. Select pages to sync
  3. Page content is converted to markdown and ingested

Search & Retrieval

Agents use hybrid search combining:

  • Semantic search — Vector similarity using embeddings
  • Keyword search — Full-text search for exact matches
  • Reciprocal Rank Fusion — Merges results from both methods

Results are scored and filtered by authority level, recency, and relevance.

On this page