KMS Compilation Pipeline — Design Doc
Overview
One-click Compile processes raw .txt uploads into structured wiki pages using markitdown for conversion and LiteLLM for content analysis.
Data Flow
Upload (.txt)
│
▼
raw/notes/{file}.txt ◄── inbox (user uploads here)
│
▼ (Compile button clicked)
│
┌── FileScanner ──────────────────────────────┐
│ Scans raw/notes/*.txt (not .md, not in │
│ subdirs — just the direct inbox) │
└──────────────┬──────────────────────────────┘
│
▼
┌── TextConverter (markitdown) ───────────────┐
│ .txt → .md (strip excessive whitespace, │
│ normalise line endings) │
└──────────────┬──────────────────────────────┘
│
▼
┌── LLMProcessor (LiteLLM) ───────────────────┐
│ Batch: send all files in ONE call with │
│ numbered sections. Returns JSON array. │
│ Fallback: individual calls if content │
│ exceeds context window. │
│ │
│ Per-file extraction: │
│ - title (from content, not filename) │
│ - tags (auto-generated list) │
│ - confidence (high/medium/low) │
│ - summary (concise but nuance-preserving) │
│ - source (URL, book ref, article, etc.) │
│ - content (cleaned .md, no extra fluff) │
└──────────────┬──────────────────────────────┘
│
▼
┌── WikiWriter ───────────────────────────────┐
│ Write to wiki/topics/{slug}.md with YAML │
│ frontmatter from extracted data + │
│ source_path: notes/{filename}.txt │
└──────────────┬──────────────────────────────┘
│
▼
┌── FileMover ────────────────────────────────┐
│ Move processed .txt to raw/processed/ │
│ (keeps source artifact, clean inbox) │
└──────────────┬──────────────────────────────┘
│
▼
┌── RebuildIndex ─────────────────────────────┐
│ Rebuild FTS5 index via existing rebuild │
└──────────────┬──────────────────────────────┘
│
▼
Results shown in compile.html
Data Shapes
@dataclass
class RawFile:
path: Path
name: str # filename with extension
stem: str # filename without extension
size_bytes: int
@dataclass
class ProcessedFile:
raw_file: RawFile
success: bool
slug: str # wiki page slug
error: str | None = None
@dataclass
class CompiledNote:
"""Output from LLM per file."""
title: str
tags: list[str]
confidence: str # "high" | "medium" | "low"
summary: str
source: str # user-provided context (URL, book, etc.)
content: str # cleaned markdown body
Batching Strategy
Default: send all new .txt files in a single LLM call. Number files [1], [2], [3]... in the prompt, return JSON array.
Fallback (content too large or >1M tokens): - Process in batches of 5 files - If a single file is enormous (>100K chars), process it alone
This means 1 LLM call (or a handful) per Compile run rather than N calls.
LLM Prompt
You are a knowledge management compiler. Given raw text notes,
extract structured information for a personal wiki.
For each note, return:
- title: descriptive title (from content, not filename)
- tags: 3-8 relevant tags (lowercase, no spaces)
- confidence: "high" if well-structured, "medium" if reasonable,
"low" if fragmentary/unclear
- summary: 2-3 sentences preserving key nuance
- source: the source context if apparent, else "Personal note"
- content: the note cleaned up as concise markdown (remove
excessive blank lines, normalise formatting, keep all meaning)
Return a JSON array. Each element corresponds to one note.
Configuration (env vars, KMS_ prefix)
Add to kms/web/config.py:
# LLM
llm_api_key: str = "" # KMS_LLM_API_KEY
llm_model: str = "gpt-4o-mini" # KMS_LLM_MODEL
llm_max_files_per_batch: int = 10 # KMS_LLM_MAX_FILES_PER_BATCH
# Paths
processed_dir: Path | None = None # defaults to raw/processed/
Error Handling
| Failure | Action |
|---|---|
| markitdown fails on a file | Skip file, log error, continue |
| LLM call fails (rate limit, timeout) | Retry once, then skip batch |
| LLM returns malformed JSON | Skip batch, log raw response |
| Slug collision (existing wiki page) | Append _v2, _v3 etc. |
| File already processed (in inbox) | Handled by moving to processed/ |
Files to Create / Modify
New files:
- kms/scripts/compile_pipeline.py — PipelineOrchestrator + all components
- kms/web/compile_pipeline.py — or inline in main? Better as separate module
Modified files:
- kms/web/main.py — update /compile POST to run pipeline
- kms/web/config.py — add LLM settings
- kms/web/templates/compile.html — show per-file results, error counts
- requirements.txt — add litellm, markitdown
Installation (needs pip install)
cd kms
.venv/bin/pip install litellm markitdown