Knowledge Management System — Design Document

Date: 2026-04-23 Status: Approved (Discovery Phase complete) Author: James Vgent (via Discovery Phase w/ Pankaj)


1. Purpose

A master knowledge base for Pankaj's personal documents — usable by both human (Pankaj) and LLM (James). Compiled from raw notes, articles, and course materials into a searchable, structured wiki with a web UI on the VPS.

2. Architecture Overview

┌─────────────────────┐     ┌──────────────────────┐     ┌──────────────────────┐
│                     │     │                      │     │                      │
│  raw/notes/         │────▶│  Compilation Pipeline │────▶│  wiki/                │
│  (.md, .txt → .md)  │     │  (LLM: James reads    │     │  (Structured .md      │
│                     │     │   raw → writes wiki)  │     │   with frontmatter)   │
└─────────────────────┘     └──────────────────────┘     └──────────────────────┘
                                                                   │
                                                                   ▼
                                                          ┌──────────────────────┐
                                                          │  SQLite FTS5 Index    │
                                                          │  (search index built   │
                                                          │   from wiki/ pages)   │
                                                          └──────────────────────┘
                                                                   │
                                                                   ▼
                                                          ┌──────────────────────┐
                                                          │  FastAPI Web App      │
                                                          │  (HTMX + Alpine.js)   │
                                                          │  VPS: localhost:port  │
                                                          └──────────────────────┘

3. Data Flow

3.1 Ingestion (Pankaj's side)

  1. Pankaj drops documents into raw/notes/
  2. Supported formats: .md (primary), .txt (→ pandoc → .md)
  3. Organized by topic: raw/notes/course-notes/, raw/notes/ideas/, raw/notes/articles/
  4. When new content is ready, Pankaj notifies James to compile

3.2 Compilation (James's side)

  1. James reads all files in raw/notes/ (or a specified subset)
  2. For each document, extracts:
  3. Core knowledge, concepts, and relationships
  4. Key patterns, decisions, and insights
  5. Connections to existing wiki pages
  6. Writes structured Markdown pages to wiki/:
  7. YAML frontmatter (title, type, date, tags, source file path, confidence)
  8. TLDR summary (one sentence)
  9. Body content with <a href="/page/backlinks" class="backlink">backlinks</a> to related pages
  10. Updates wiki/index.md (catalog), wiki/log.md (changelog), wiki/gaps.md (unresolved questions)
  11. Rebuilds SQLite FTS5 search index from wiki/ pages

3.3 Search & Browse (Web UI)

  1. User visits web UI on VPS
  2. Search via FastAPI endpoint → SQLite FTS5 query → ranked results
  3. Click result → rendered Markdown page with:
  4. Formatted body (headings, lists, code blocks, backlinks)
  5. Tags
  6. Link back to source artifact in raw/notes/
  7. Browse by tag, recent, or full index

4. Technology Stack

Layer Choice Rationale
Backend FastAPI (Python 3.12+) Async, Python-native, auto-docs
Database SQLite3 with FTS5 Zero config, built-in full-text search, single file
Server-side rendering Jinja2 (FastAPI default) Server-rendered HTML
Frontend interactivity HTMX + Alpine.js Minimal JS, declarative, no build step
Markdown rendering Python markdown + pymdown-extensions + pygments Renders wiki pages to HTML with syntax highlighting
File conversion pandoc (external CLI) Convert .txt to .md if needed
CSS Custom minimal CSS (or Pico CSS) Lightweight, no framework bloat

5. Directory Structure

/home/pc/.openclaw/workspace/kms/   ← Knowledge Management System root
├── raw/
│   ├── notes/
│   │   ├── ai-courses/           ← Pankaj's AI course notes
│   │   ├── ideas/                ← Random thoughts and ideas
│   │   └── articles/             ← Web articles / saved content
│   └── slides/                   ← Future: slide decks
│   └── audio/                    ← Future: recordings
│   └── video/                    ← Future: video content
├── wiki/                         ← Compiled structured knowledge
│   ├── index.md                  ← Catalog of all pages
│   ├── log.md                    ← Changelog of all operations
│   ├── gaps.md                   ← Unresolved questions / gaps
│   ├── active-areas.md           ← Topics in focus
│   └── topics/                   ← Knowledge pages by domain
│       ├── ai-agents/
│       ├── data-science/
│       ├── software-architecture/
│       └── ...
├── web/
│   ├── main.py                   ← FastAPI app entry point
│   ├── config.py                 ← pydantic-settings config
│   ├── database.py               ← SQLite setup + FTS5 schema
│   ├── search.py                 ← Search logic
│   ├── wiki_renderer.py          ← Markdown → HTML rendering
│   ├── templates/
│   │   ├── base.html             ← Base layout (nav, search bar)
│   │   ├── search.html           ← Search results page
│   │   └── page.html             ← Individual wiki page view
│   ├── static/
│   │   ├── style.css             ← Custom CSS
│   │   └── app.js                ← Alpine.js behavior
│   └── kb.db                     ← SQLite database (auto-created)
├── scripts/
│   ├── rebuild_index.py          ← Rebuild SQLite FTS5 from wiki/
│   └── compile_wiki.py           ← Invoke James to compile raw → wiki
├── .gitignore
├── requirements.txt
└── README.md

6. Search Architecture (MVP — SQLite FTS5)

Schema

-- Pages table
CREATE TABLE pages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,
    title TEXT NOT NULL,
    type TEXT,           -- topic, pattern, decision, concept
    tags TEXT,           -- comma-separated
    source_path TEXT,    -- path to original raw/ file
    confidence TEXT,     -- high/medium/low
    content TEXT NOT NULL, -- full markdown body
    frontmatter TEXT,    -- raw YAML frontmatter
    created_at TEXT,
    updated_at TEXT
);

-- FTS5 virtual table
CREATE VIRTUAL TABLE pages_fts USING fts5(
    title, tags, content, frontmatter,
    content=pages, content_rowid=id
);

-- Rebuild index
INSERT INTO pages_fts(pages_fts) VALUES('rebuild');
-- Simple search
SELECT p.*, rank
FROM pages_fts
JOIN pages ON pages_fts.rowid = pages.id
WHERE pages_fts MATCH ?
ORDER BY rank;

-- Tag filtering
SELECT p.* FROM pages p
WHERE p.tags LIKE '%' || ? || '%'
ORDER BY p.updated_at DESC;

7. File Management

Upload

  • POST endpoint: upload one or more files to raw/notes/
  • Supported formats: .md, .txt (+ future: .pdf, .pptx, .mp3, .mp4)
  • Duplicate handling: If a file with the same name exists, the new upload gets suffixed with a timestamp: filename_20260423_162500.ext. No sequential numbering needed.
  • Optional topic subdirectory selection

Download

  • GET endpoint: retrieve any file from raw/notes/
  • Directory listing: browse raw/notes/ structure
  • Single-file download

8. Web UI Pages (MVP)

Route Description HTMX/Alpine
/ Search bar + recent pages Alpine: search input state, debounce
/search?q=... Ranked search results HTMX: live search results
/page/{slug} Rendered wiki page + backlinks Alpine: tag filter toggles, source link
/upload Upload files + list uploaded files Alpine: file list, upload progress
/download/{path} Download a raw file
/tags Browse by tag cloud
/browse Full wiki index (paginated) HTMX: load-more pagination

Route summary

  • / — Home: search bar + recent wiki pages
  • /search?q=... — FTS5-ranked search across wiki pages
  • /page/{slug} — Rendered markdown page + backlinks + link to source artifact
  • /upload — Upload form + list of files in raw/notes/, grouped by subdirectory
  • /download/{path} — Direct file download from raw/notes/
  • /tags — Tag cloud for browsing
  • /browse — Paginated wiki index

9. Future Stages (Not In Scope)

Feature Stage Dependencies
Slide text extraction 2 pdfplumber, python-pptx
Audio transcription 3 Whisper / Deepgram API
Video transcription 4 Whisper + ffmpeg
Semantic search (QMD) Optional QMD CLI + embeddings model
Relevance scoring (ML) Optional Embedding model + similarity
Cross-artifact browsing 3+ Unified artifact store

10. Implementation Approach

Per Rule 31 (Software Development SOP): 1. HLI phase — pseudocode components: compilation pipeline, search index, web UI 2. CD phase — build per component with Pre-Implementation Workflow (Rule 24) 3. Testing — E2E: drop a note, compile, search, view


Approved by: Pankaj Date: 2026-04-23