KMS Compilation Pipeline — Requirements

Problem

Uploaded raw .txt files sit in raw/notes/ but never become wiki pages. The Compile button re-indexes wiki/topics/ but there's no pipeline to create those wiki pages from raw uploads.

Goals

  1. User uploads .txt files via /upload
  2. One-click Compile button transforms raw .txt → wiki pages
  3. Wiki pages are properly formatted (frontmatter, tags, title)
  4. LLM is used to extract structure from unstructured notes
  5. Already-compiled files are skipped (no reprocessing)
  6. Results shown clearly in the UI

Constraints

  • LLM API keys come from env vars (Rule 35: pydantic-settings)
  • Original .txt files remain untouched (source artifacts)
  • Pipeline runs server-side (no manual SSH steps)
  • Handles errors: bad files, API failures, duplicate filenames
  • Pages link back to source file for reference
  • .txt → .md conversion uses pandoc (already available?) Fallback: Python text wrapping if pandoc not installed

Non-Goals

  • Not a full text-to-knowledge AI pipeline (summary extraction, cross-referencing)
  • Not editing existing wiki pages — only creating new ones from unprocessed files

Files Affected

New: - kms/scripts/compile_pipeline.py — core pipeline logic - kms/web/pipeline_config.py — LLM settings (API keys, model, prompt)

Modified: - kms/web/main.py — /compile route updated to call pipeline - kms/web/templates/compile.html — richer result display - kms/web/config.py — maybe add LLM settings

Components (for DP/HLI)

  1. File tracker — knows which raw files have been compiled (SQLite or hash DB)
  2. Converter — .txt → .md (pandoc or Python)
  3. LLM processor — sends .md to LLM, gets frontmatter + structured content
  4. Wiki writer — writes wiki page to wiki/topics/ with proper format
  5. Pipeline orchestrator — ties it together, handles errors
  6. Rebuild — FTS reindex after new pages