Domain-specialised, local-first LLM scaffolding with strict scope.
A narrow model that knows where it ends.
Professional domains only
Infrastructure, DevOps, cloud, finance, tax, software, governance, AI systems. Off-topic requests — sports, entertainment, celebrity, trivia — are refused and redirected at the prompt and retrieval layer.
Personal knowledge, explicit
Google Drive, NFS shares, local files, email archives, Linkwarden, spreadsheets, PDFs, Markdown, code. Raw source and curated opinion are kept separate so personal defaults become explicit training inputs.
Runs on a laptop
FastAPI + MLX + Chroma is the easy starting path on an M-series Mac. Nothing leaves the machine unless an optional remote adapter is enabled deliberately.
Scales when it has to
Same code swaps in Ollama, Qdrant, or llama.cpp behind the FastAPI layer. LoRA training moves from MLX locally to Axolotl on RunPod when the adapter gets big.
Six layers, every one replaceable.
01 · ingestExtract from Drive, NFS, email, Linkwarden, code, docs. Normalised JSONL + Parquet with provenance preserved.
02 · classifyDeduplicate, tag by domain, enforce the topic allow-list. Rejected material never reaches the training set.
03 · indexChroma locally for the first run, Qdrant when the corpus stops fitting in memory. Embeddings recomputed on source change.
04 · trainLoRA on Apple Silicon via MLX for the first adapter. Axolotl on RunPod when the adapter needs a bigger base model.
05 · serveFastAPI orchestrator with an OpenAI-compatible endpoint, retrieval, citations, topic filtering, and adapter switching.
06 · evaluateGuardrail matrix, boundary-case suites, improvement/degradation reports. Every change is measured before it ships.
The written record, page by page.
stack primerThe shape of the repository — what lives where and why.
installationFrom a clean Mac to a running local model on MLX + Chroma.
data ingestionHow each source is extracted, normalised, and stored with provenance.
trainingLoRA fine-tuning with MLX locally; the Axolotl path for remote training.
rag setupChroma first, Qdrant on upgrade. Embeddings, chunking, re-rankers.
model servingFastAPI orchestrator, OpenAI-compatible endpoint, adapter switching.
evaluationGuardrail matrix, boundary cases, improvement and degradation reports.
guardrailsScope rules, refusal logic, redirect prompts, topic router.
architectureThe full narrative and rendered diagrams for each layer.