Knowledge Management System — Design Document

Date: 2026-04-23 Status: Approved (Discovery Phase complete) Author: James Vgent (via Discovery Phase w/ Pankaj)

1. Purpose

A master knowledge base for Pankaj's personal documents — usable by both human (Pankaj) and LLM (James). Compiled from raw notes, articles, and course materials into a searchable, structured wiki with a web UI on the VPS.

2. Architecture Overview

┌─────────────────────┐     ┌──────────────────────┐     ┌──────────────────────┐
│                     │     │                      │     │                      │
│  raw/notes/         │────▶│  Compilation Pipeline │────▶│  wiki/                │
│  (.md, .txt → .md)  │     │  (LLM: James reads    │     │  (Structured .md      │
│                     │     │   raw → writes wiki)  │     │   with frontmatter)   │
└─────────────────────┘     └──────────────────────┘     └──────────────────────┘
                                                                   │
                                                                   ▼
                                                          ┌──────────────────────┐
                                                          │  SQLite FTS5 Index    │
                                                          │  (search index built   │
                                                          │   from wiki/ pages)   │
                                                          └──────────────────────┘
                                                                   │
                                                                   ▼
                                                          ┌──────────────────────┐
                                                          │  FastAPI Web App      │
                                                          │  (HTMX + Alpine.js)   │
                                                          │  VPS: localhost:port  │
                                                          └──────────────────────┘

3. Data Flow

3.1 Ingestion (Pankaj's side)

Pankaj drops documents into raw/notes/
Supported formats: .md (primary), .txt (→ pandoc → .md)
Organized by topic: raw/notes/course-notes/, raw/notes/ideas/, raw/notes/articles/
When new content is ready, Pankaj notifies James to compile

3.2 Compilation (James's side)

James reads all files in raw/notes/ (or a specified subset)
For each document, extracts:
Core knowledge, concepts, and relationships
Key patterns, decisions, and insights
Connections to existing wiki pages
Writes structured Markdown pages to wiki/:
YAML frontmatter (title, type, date, tags, source file path, confidence)
TLDR summary (one sentence)
Body content with <a href="/page/backlinks" class="backlink">backlinks</a> to related pages
Updates wiki/index.md (catalog), wiki/log.md (changelog), wiki/gaps.md (unresolved questions)
Rebuilds SQLite FTS5 search index from wiki/ pages

3.3 Search & Browse (Web UI)

User visits web UI on VPS
Search via FastAPI endpoint → SQLite FTS5 query → ranked results
Click result → rendered Markdown page with:
Formatted body (headings, lists, code blocks, backlinks)
Tags
Link back to source artifact in raw/notes/
Browse by tag, recent, or full index

4. Technology Stack

Layer	Choice	Rationale
Backend	FastAPI (Python 3.12+)	Async, Python-native, auto-docs
Database	SQLite3 with FTS5	Zero config, built-in full-text search, single file
Server-side rendering	Jinja2 (FastAPI default)	Server-rendered HTML
Frontend interactivity	HTMX + Alpine.js	Minimal JS, declarative, no build step
Markdown rendering	Python `markdown` + `pymdown-extensions` + `pygments`	Renders wiki pages to HTML with syntax highlighting
File conversion	pandoc (external CLI)	Convert .txt to .md if needed
CSS	Custom minimal CSS (or Pico CSS)	Lightweight, no framework bloat

5. Directory Structure

/home/pc/.openclaw/workspace/kms/   ← Knowledge Management System root
├── raw/
│   ├── notes/
│   │   ├── ai-courses/           ← Pankaj's AI course notes
│   │   ├── ideas/                ← Random thoughts and ideas
│   │   └── articles/             ← Web articles / saved content
│   └── slides/                   ← Future: slide decks
│   └── audio/                    ← Future: recordings
│   └── video/                    ← Future: video content
├── wiki/                         ← Compiled structured knowledge
│   ├── index.md                  ← Catalog of all pages
│   ├── log.md                    ← Changelog of all operations
│   ├── gaps.md                   ← Unresolved questions / gaps
│   ├── active-areas.md           ← Topics in focus
│   └── topics/                   ← Knowledge pages by domain
│       ├── ai-agents/
│       ├── data-science/
│       ├── software-architecture/
│       └── ...
├── web/
│   ├── main.py                   ← FastAPI app entry point
│   ├── config.py                 ← pydantic-settings config
│   ├── database.py               ← SQLite setup + FTS5 schema
│   ├── search.py                 ← Search logic
│   ├── wiki_renderer.py          ← Markdown → HTML rendering
│   ├── templates/
│   │   ├── base.html             ← Base layout (nav, search bar)
│   │   ├── search.html           ← Search results page
│   │   └── page.html             ← Individual wiki page view
│   ├── static/
│   │   ├── style.css             ← Custom CSS
│   │   └── app.js                ← Alpine.js behavior
│   └── kb.db                     ← SQLite database (auto-created)
├── scripts/
│   ├── rebuild_index.py          ← Rebuild SQLite FTS5 from wiki/
│   └── compile_wiki.py           ← Invoke James to compile raw → wiki
├── .gitignore
├── requirements.txt
└── README.md

6. Search Architecture (MVP — SQLite FTS5)

Schema

-- Pages table
CREATE TABLE pages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,
    title TEXT NOT NULL,
    type TEXT,           -- topic, pattern, decision, concept
    tags TEXT,           -- comma-separated
    source_path TEXT,    -- path to original raw/ file
    confidence TEXT,     -- high/medium/low
    content TEXT NOT NULL, -- full markdown body
    frontmatter TEXT,    -- raw YAML frontmatter
    created_at TEXT,
    updated_at TEXT
);

-- FTS5 virtual table
CREATE VIRTUAL TABLE pages_fts USING fts5(
    title, tags, content, frontmatter,
    content=pages, content_rowid=id
);

-- Rebuild index
INSERT INTO pages_fts(pages_fts) VALUES('rebuild');

Search

-- Simple search
SELECT p.*, rank
FROM pages_fts
JOIN pages ON pages_fts.rowid = pages.id
WHERE pages_fts MATCH ?
ORDER BY rank;

-- Tag filtering
SELECT p.* FROM pages p
WHERE p.tags LIKE '%' || ? || '%'
ORDER BY p.updated_at DESC;

7. File Management

Upload

POST endpoint: upload one or more files to raw/notes/
Supported formats: .md, .txt (+ future: .pdf, .pptx, .mp3, .mp4)
Duplicate handling: If a file with the same name exists, the new upload gets suffixed with a timestamp: filename_20260423_162500.ext. No sequential numbering needed.
Optional topic subdirectory selection

Download

GET endpoint: retrieve any file from raw/notes/
Directory listing: browse raw/notes/ structure
Single-file download

8. Web UI Pages (MVP)

Route	Description	HTMX/Alpine
`/`	Search bar + recent pages	Alpine: search input state, debounce
`/search?q=...`	Ranked search results	HTMX: live search results
`/page/{slug}`	Rendered wiki page + backlinks	Alpine: tag filter toggles, source link
`/upload`	Upload files + list uploaded files	Alpine: file list, upload progress
`/download/{path}`	Download a raw file	—
`/tags`	Browse by tag cloud	—
`/browse`	Full wiki index (paginated)	HTMX: load-more pagination

Route summary

/ — Home: search bar + recent wiki pages
/search?q=... — FTS5-ranked search across wiki pages
/page/{slug} — Rendered markdown page + backlinks + link to source artifact
/upload — Upload form + list of files in raw/notes/, grouped by subdirectory
/download/{path} — Direct file download from raw/notes/
/tags — Tag cloud for browsing
/browse — Paginated wiki index

9. Future Stages (Not In Scope)

Feature	Stage	Dependencies
Slide text extraction	2	pdfplumber, python-pptx
Audio transcription	3	Whisper / Deepgram API
Video transcription	4	Whisper + ffmpeg
Semantic search (QMD)	Optional	QMD CLI + embeddings model
Relevance scoring (ML)	Optional	Embedding model + similarity
Cross-artifact browsing	3+	Unified artifact store

10. Implementation Approach

Per Rule 31 (Software Development SOP): 1. HLI phase — pseudocode components: compilation pipeline, search index, web UI 2. CD phase — build per component with Pre-Implementation Workflow (Rule 24) 3. Testing — E2E: drop a note, compile, search, view

Approved by: Pankaj Date: 2026-04-23