llm.riera.co.uk

Domain-specialised, local-first LLM scaffolding with strict scope.

Lab

A small, reproducible path from personal knowledge to a private, topic-restricted model. Ingest, tag, index, fine-tune with LoRA, serve through an OpenAI-compatible API. Optimised for Apple Silicon; optional upgrade to Ollama, Qdrant, or RunPod when the local path is no longer enough.

RuntimeFastAPI · MLX · Chroma

UpgradeOllama · Qdrant · llama.cpp

TrainingLoRA local · Axolotl remote

Sourcegithub.com/joanmarcriera/personal-llm →

§ 01 · Shape

A narrow model that knows where it ends.

Scope

Professional domains only

Infrastructure, DevOps, cloud, finance, tax, software, governance, AI systems. Off-topic requests — sports, entertainment, celebrity, trivia — are refused and redirected at the prompt and retrieval layer.

Inputs

Personal knowledge, explicit

Google Drive, NFS shares, local files, email archives, Linkwarden, spreadsheets, PDFs, Markdown, code. Raw source and curated opinion are kept separate so personal defaults become explicit training inputs.

Local-first

Runs on a laptop

FastAPI + MLX + Chroma is the easy starting path on an M-series Mac. Nothing leaves the machine unless an optional remote adapter is enabled deliberately.

Upgrade path

Scales when it has to

Same code swaps in Ollama, Qdrant, or llama.cpp behind the FastAPI layer. LoRA training moves from MLX locally to Axolotl on RunPod when the adapter gets big.

§ 02 · Architecture

Six layers, every one replaceable.

01 · ingest

Extract from Drive, NFS, email, Linkwarden, code, docs. Normalised JSONL + Parquet with provenance preserved.

02 · classify

Deduplicate, tag by domain, enforce the topic allow-list. Rejected material never reaches the training set.

03 · index

Chroma locally for the first run, Qdrant when the corpus stops fitting in memory. Embeddings recomputed on source change.

04 · train

LoRA on Apple Silicon via MLX for the first adapter. Axolotl on RunPod when the adapter needs a bigger base model.

05 · serve

FastAPI orchestrator with an OpenAI-compatible endpoint, retrieval, citations, topic filtering, and adapter switching.

06 · evaluate

Guardrail matrix, boundary-case suites, improvement/degradation reports. Every change is measured before it ships.

§ 03 · Docs

The written record, page by page.

stack primer

The shape of the repository — what lives where and why.

installation

From a clean Mac to a running local model on MLX + Chroma.

data ingestion

How each source is extracted, normalised, and stored with provenance.

training

LoRA fine-tuning with MLX locally; the Axolotl path for remote training.

rag setup

Chroma first, Qdrant on upgrade. Embeddings, chunking, re-rankers.

model serving

FastAPI orchestrator, OpenAI-compatible endpoint, adapter switching.

evaluation

Guardrail matrix, boundary cases, improvement and degradation reports.

guardrails

Scope rules, refusal logic, redirect prompts, topic router.

architecture

The full narrative and rendered diagrams for each layer.

§ 04 · Contact

Model quirks, guardrail feedback, or a domain you think should be added.

GitHub joanmarcriera/personal-llm → Email joanmarcriera@gmail.com → Portfolio riera.co.uk →