Knowledge Management System — Design Document
Date: 2026-04-23 Status: Approved (Discovery Phase complete) Author: James Vgent (via Discovery Phase w/ Pankaj)
1. Purpose
A master knowledge base for Pankaj's personal documents — usable by both human (Pankaj) and LLM (James). Compiled from raw notes, articles, and course materials into a searchable, structured wiki with a web UI on the VPS.
2. Architecture Overview
┌─────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ │ │ │ │ │
│ raw/notes/ │────▶│ Compilation Pipeline │────▶│ wiki/ │
│ (.md, .txt → .md) │ │ (LLM: James reads │ │ (Structured .md │
│ │ │ raw → writes wiki) │ │ with frontmatter) │
└─────────────────────┘ └──────────────────────┘ └──────────────────────┘
│
▼
┌──────────────────────┐
│ SQLite FTS5 Index │
│ (search index built │
│ from wiki/ pages) │
└──────────────────────┘
│
▼
┌──────────────────────┐
│ FastAPI Web App │
│ (HTMX + Alpine.js) │
│ VPS: localhost:port │
└──────────────────────┘
3. Data Flow
3.1 Ingestion (Pankaj's side)
- Pankaj drops documents into
raw/notes/ - Supported formats:
.md(primary),.txt(→ pandoc → .md) - Organized by topic:
raw/notes/course-notes/,raw/notes/ideas/,raw/notes/articles/ - When new content is ready, Pankaj notifies James to compile
3.2 Compilation (James's side)
- James reads all files in
raw/notes/(or a specified subset) - For each document, extracts:
- Core knowledge, concepts, and relationships
- Key patterns, decisions, and insights
- Connections to existing wiki pages
- Writes structured Markdown pages to
wiki/: - YAML frontmatter (title, type, date, tags, source file path, confidence)
- TLDR summary (one sentence)
- Body content with
<a href="/page/backlinks" class="backlink">backlinks</a>to related pages - Updates
wiki/index.md(catalog),wiki/log.md(changelog),wiki/gaps.md(unresolved questions) - Rebuilds SQLite FTS5 search index from wiki/ pages
3.3 Search & Browse (Web UI)
- User visits web UI on VPS
- Search via FastAPI endpoint → SQLite FTS5 query → ranked results
- Click result → rendered Markdown page with:
- Formatted body (headings, lists, code blocks, backlinks)
- Tags
- Link back to source artifact in
raw/notes/ - Browse by tag, recent, or full index
4. Technology Stack
| Layer | Choice | Rationale |
|---|---|---|
| Backend | FastAPI (Python 3.12+) | Async, Python-native, auto-docs |
| Database | SQLite3 with FTS5 | Zero config, built-in full-text search, single file |
| Server-side rendering | Jinja2 (FastAPI default) | Server-rendered HTML |
| Frontend interactivity | HTMX + Alpine.js | Minimal JS, declarative, no build step |
| Markdown rendering | Python markdown + pymdown-extensions + pygments |
Renders wiki pages to HTML with syntax highlighting |
| File conversion | pandoc (external CLI) | Convert .txt to .md if needed |
| CSS | Custom minimal CSS (or Pico CSS) | Lightweight, no framework bloat |
5. Directory Structure
/home/pc/.openclaw/workspace/kms/ ← Knowledge Management System root
├── raw/
│ ├── notes/
│ │ ├── ai-courses/ ← Pankaj's AI course notes
│ │ ├── ideas/ ← Random thoughts and ideas
│ │ └── articles/ ← Web articles / saved content
│ └── slides/ ← Future: slide decks
│ └── audio/ ← Future: recordings
│ └── video/ ← Future: video content
├── wiki/ ← Compiled structured knowledge
│ ├── index.md ← Catalog of all pages
│ ├── log.md ← Changelog of all operations
│ ├── gaps.md ← Unresolved questions / gaps
│ ├── active-areas.md ← Topics in focus
│ └── topics/ ← Knowledge pages by domain
│ ├── ai-agents/
│ ├── data-science/
│ ├── software-architecture/
│ └── ...
├── web/
│ ├── main.py ← FastAPI app entry point
│ ├── config.py ← pydantic-settings config
│ ├── database.py ← SQLite setup + FTS5 schema
│ ├── search.py ← Search logic
│ ├── wiki_renderer.py ← Markdown → HTML rendering
│ ├── templates/
│ │ ├── base.html ← Base layout (nav, search bar)
│ │ ├── search.html ← Search results page
│ │ └── page.html ← Individual wiki page view
│ ├── static/
│ │ ├── style.css ← Custom CSS
│ │ └── app.js ← Alpine.js behavior
│ └── kb.db ← SQLite database (auto-created)
├── scripts/
│ ├── rebuild_index.py ← Rebuild SQLite FTS5 from wiki/
│ └── compile_wiki.py ← Invoke James to compile raw → wiki
├── .gitignore
├── requirements.txt
└── README.md
6. Search Architecture (MVP — SQLite FTS5)
Schema
-- Pages table
CREATE TABLE pages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT UNIQUE NOT NULL,
title TEXT NOT NULL,
type TEXT, -- topic, pattern, decision, concept
tags TEXT, -- comma-separated
source_path TEXT, -- path to original raw/ file
confidence TEXT, -- high/medium/low
content TEXT NOT NULL, -- full markdown body
frontmatter TEXT, -- raw YAML frontmatter
created_at TEXT,
updated_at TEXT
);
-- FTS5 virtual table
CREATE VIRTUAL TABLE pages_fts USING fts5(
title, tags, content, frontmatter,
content=pages, content_rowid=id
);
-- Rebuild index
INSERT INTO pages_fts(pages_fts) VALUES('rebuild');
Search
-- Simple search
SELECT p.*, rank
FROM pages_fts
JOIN pages ON pages_fts.rowid = pages.id
WHERE pages_fts MATCH ?
ORDER BY rank;
-- Tag filtering
SELECT p.* FROM pages p
WHERE p.tags LIKE '%' || ? || '%'
ORDER BY p.updated_at DESC;
7. File Management
Upload
- POST endpoint: upload one or more files to
raw/notes/ - Supported formats:
.md,.txt(+ future:.pdf,.pptx,.mp3,.mp4) - Duplicate handling: If a file with the same name exists, the new upload gets suffixed with a timestamp:
filename_20260423_162500.ext. No sequential numbering needed. - Optional topic subdirectory selection
Download
- GET endpoint: retrieve any file from
raw/notes/ - Directory listing: browse
raw/notes/structure - Single-file download
8. Web UI Pages (MVP)
| Route | Description | HTMX/Alpine |
|---|---|---|
/ |
Search bar + recent pages | Alpine: search input state, debounce |
/search?q=... |
Ranked search results | HTMX: live search results |
/page/{slug} |
Rendered wiki page + backlinks | Alpine: tag filter toggles, source link |
/upload |
Upload files + list uploaded files | Alpine: file list, upload progress |
/download/{path} |
Download a raw file | — |
/tags |
Browse by tag cloud | — |
/browse |
Full wiki index (paginated) | HTMX: load-more pagination |
Route summary
/— Home: search bar + recent wiki pages/search?q=...— FTS5-ranked search across wiki pages/page/{slug}— Rendered markdown page + backlinks + link to source artifact/upload— Upload form + list of files inraw/notes/, grouped by subdirectory/download/{path}— Direct file download fromraw/notes//tags— Tag cloud for browsing/browse— Paginated wiki index
9. Future Stages (Not In Scope)
| Feature | Stage | Dependencies |
|---|---|---|
| Slide text extraction | 2 | pdfplumber, python-pptx |
| Audio transcription | 3 | Whisper / Deepgram API |
| Video transcription | 4 | Whisper + ffmpeg |
| Semantic search (QMD) | Optional | QMD CLI + embeddings model |
| Relevance scoring (ML) | Optional | Embedding model + similarity |
| Cross-artifact browsing | 3+ | Unified artifact store |
10. Implementation Approach
Per Rule 31 (Software Development SOP): 1. HLI phase — pseudocode components: compilation pipeline, search index, web UI 2. CD phase — build per component with Pre-Implementation Workflow (Rule 24) 3. Testing — E2E: drop a note, compile, search, view
Approved by: Pankaj Date: 2026-04-23