NEXUS
A distributed AI agent platform for personal intelligence
Nexus is a personal AI family office — eight autonomous agents that monitor, analyze, and act on your digital life. Running entirely on local hardware with local LLM inference, they manage data pipelines, curate knowledge, track relationships, review code, and keep you informed. Around the clock, with zero cloud dependency.
8
Agents
255
Tables
298K
Knowledge Facts
255K
Music Plays
328
MCP Tools
0
Cloud Dependency
Why local

Curiosity, mostly. And a refusal to keep paying per token.

This started because the cloud AI bill kept climbing. OpenAI for chat, Anthropic for reasoning, Google for embeddings, Replicate for image gen, Runway for video, ElevenLabs for voice — every interesting idea meant another metered API and another monthly invoice that scaled with use. Building something ambitious meant signing up to pay forever.

So Nexus runs on two mini-PCs in a closet. The models are local. The database is local. Every token, embedding, photo description, transcription, face match, and reranked result happens on hardware I own outright. The hardware paid for itself in a few months of saved API spend, and now the marginal cost of an idea is zero — which turns out to be the only constraint that matters for actually building things.

Free per token
Zero marginal cost on every chat, embedding, photo description, transcription. Run a million inferences a day, the bill is the same.
Fast iteration
Try wild ideas without watching a meter. The interesting experiments — bulk reprocessing, daily life chapter generation, full-corpus reembedding — only happen when there's no per-call cost.
Real engineering
Tensor splitting a 70B model across two GPUs over Thunderbolt 5, building an autoscaler for ephemeral workers, wiring eight agents into a coordination protocol — the project is the point.
Yours
No vendor changes a price, deprecates a model, or rate-limits a key. The whole stack runs even if every cloud AI company disappeared tomorrow.

The hardware

Two GMKTec EVO-X2 machines connected by Thunderbolt 5, running AMD Strix Halo. Furnace at 96GB unified VRAM, Crucible at 64GB. 160GB of GPU memory across one platform, split across two brains.

Furnace and Crucible

Furnace

Primary Compute
CPUAMD Strix Halo
Memory128 GB
VRAM96 GB Unified
RoleGateway + Primary LLM
ServicesAPI, Worker, Forge, DB
Thunderbolt 5
40 Gbps
0.12 ms latency

Crucible

Satellite Compute
CPUAMD Strix Halo
Memory64 GB
VRAM64 GB Unified
RoleTensor Split + Creative
ServicesRPC Worker, ComfyUI
160 GB
Total unified VRAM across two machines
RPC Tensor Split: The Llama 3.3 70B model is too large for a single machine. Using llama.cpp's RPC protocol, model layers are distributed across both machines — layers 0–40 on Furnace's GPU, layers 41–80 on Crucible via Thunderbolt 5. Tensors transfer at 40 Gbps with sub-millisecond latency, making the split nearly invisible to inference speed.

Three machines, four fabrics, one brain

Furnace, Crucible, and Anvil are stitched together with four distinct network paths — a dedicated Thunderbolt 5 cable for the hot loop, a Thunderbolt bridge for the Mac, Tailscale overlaying everything for secure remote access, and 1 GbE for background LAN traffic. Each link carries different flows and has wildly different latency characteristics.

INTERNET / WAN Anthropic · Google · GitHub · Resend · Pushover · AWS · Cloudflare LAN Gateway home · 1 GbE WAN Furnace AMD Strix Halo · 128 GB · 96 GB VRAM eno1 · LAN tailnet LLM INFERENCE (llama.cpp · ROCm) :8080 llama-server Llama-3.3-70B (+ RPC split) :8088 llama-priority Qwen3.5-35B-A3B :8081 llama-vlm Qwen3-VL-32B :8082 llama-embed nomic-embed-768 :8083 whisper · :8084 faces · :8085 rerank :8087 chatterbox-tts · :8090 ocr · :8091 musicgen FORGE LLM GATEWAY :8642 forge-api FastAPI · OpenAI-compat Routes localhost models + Anvil VLM + Crucible NEXUS PLATFORM :7700 nexus-api Express + agent REST :7701 nexus-mcp Streamable HTTP :7702 nexus-worker tool exec server nexus-worker agent cycles + jobs DATA LAYER :5432 PostgreSQL 16 listen: localhost, 10.10.0.2, 10.10.12.1, tailnet :6432 pgbouncer txn pool, 30/500 WEB / EDGE :443/:80 caddy reverse proxy :3100 chancery web (Next.js) :3000 aria web · :8901 nexus-site MCP SERVERS (stdio) gmail · google-drive · cloudflare · github notion · brave-search · next-devtools INFRA / OBS prometheus :9090 · node-exporter :9100 postgres-exporter :9187 · blackbox :9115 Crucible AMD Strix Halo · 128 GB · 96 GB VRAM eno1 · LAN tailnet LLM INFERENCE :50052 rpc-server llama.cpp tensor RPC exposes GPU to Furnace's llama-server [ComfyUI, MuseTalk currently disabled — VRAM allocated to 70B RPC split] NEXUS BULK crucible-bulk.service WORKER_MODE=bulk · long-running LLM jobs nexus-scaler.service Autoscaler for ephemeral bulk workers DATABASE_URL → 10.10.12.1:5432 FORGE_URL → 10.10.12.1:8642 FORGE_PRIORITY → 10.10.12.1:8088 FORGE_BASE_URL → 10.10.12.1:8642/v1 TOOL_SERVER_URL → 10.10.12.1:7702 INFRA / OBS node-exporter :9100 · sshd :22 NETWORK INTERFACES eno1 LAN /24 1 GbE tailscale0 tailnet overlay thunderbolt0 10.10.12.2/30 → Furnace thunderbolt1 10.10.0.3/24 → Anvil bridge ACTIVE TB CONNECTIONS rpc-server :50052 ← 10.10.12.1 (Furnace llama RPC) node → 10.10.12.1:5432 (bulk → pg) node → 10.10.12.1:5432 (scaler → pg) All Crucible→Furnace flows are on the direct TB cable. Anvil Mac mini M4 · macOS en1 · WiFi tailnet MCP servers (SSH stdio): apple-photos · imessage · apple-ecosystem bridge0 10.10.0.1/24 (en2 + en3 bridged) · forge-vlm-anvil :8081 VLM (Qwen3-VL-8B) · Apple Photos / Vision frameworks eno1 (LAN+Tailnet+WAN) eno1 (LAN+Tailnet+WAN) en1 (WiFi) Thunderbolt Direct 10.10.12.1 ↔ 10.10.12.2 40 Gbps · ~0.12 ms · dedicated cable llama-server :8080 ──╮ pg :5432 ─╮│ forge :8642 ╮││ tool-srv :7702 ╮│││ ╰── :50052 rpc-server │╰── crucible-bulk ││╰── nexus-scaler │││ ╰╰╰── bulk workers tb1 → bridge0 10.10.0.2 → .1 tb1 → bridge0 10.10.0.3 → .1 TB Anvil Bridge 10.10.0.0/24 · ~0.18 ms Tailnet overlay (encrypted, rides eno1/en1) SSH MCP stdio
Thunderbolt direct · 40 Gbps · 0.12 ms TB Anvil bridge · 0.18 ms Tailnet overlay (encrypted) 1 GbE LAN WiFi Internet / WAN

Furnace ↔ Crucible

FlowEndpointPath
llama-server → rpc tensor split10.10.12.2:50052TB direct
crucible-bulk → PostgreSQL10.10.12.1:5432TB direct
nexus-scaler → job queue10.10.12.1:5432TB direct
bulk worker → Forge API10.10.12.1:8642TB direct
bulk worker → tool exec server10.10.12.1:7702TB direct

Anvil ↔ Cluster

FlowEndpointPath
Forge → Anvil VLM10.10.0.1:8081TB bridge
Worker → Apple Photos MCPssh stdioTailnet
Worker → iMessage MCPssh stdioTailnet
Worker → Apple Ecosystem MCPssh stdioTailnet
Remote admin / laptopsshTailnet

The software stack

TypeScript PostgreSQL Node.js Express Next.js llama.cpp Tailscale MCP Protocol systemd

Nexus is an npm workspaces monorepo with four packages. The API server handles external requests, the Worker runs agent cycles and processes jobs, the MCP server exposes tools via the Model Context Protocol, and Core provides shared services used by all three.

@nexus/core
Shared services, types, DB, auth, migrations
@nexus/api
Express REST API on port 7700
@nexus/mcp
MCP Streamable HTTP, port 7701
@nexus/worker
Agent runtime, job executor, tool server

Worker modes: The worker runs in two configurations. Primary handles agent cycles, event listeners, and urgent jobs. Bulk handles long-running LLM work — embeddings, sentiment analysis, photo descriptions. The autoscaler spawns ephemeral bulk workers when the queue grows, and kills them when it drains.

Forge
The Nervous System

Unified LLM API gateway running on Furnace. Every agent, every tool, every background job — all inference routes through Forge. OpenAI-compatible endpoints with request logging, model routing, Prometheus metrics, and circuit breakers.

/v1/chat/completions /v1/embeddings /v1/describe /v1/transcribe /v1/tts /v1/faces /v1/rerank /embed
Zero-cost local inference
Forge

LLM models

Four models running simultaneously on local hardware. The 70B flagship is tensor-split across both machines via RPC. Each model is purpose-matched to its workload.

Llama 3.3 70B
Q4_K_M · port 8080
2 slots · 32K context per slot RPC tensor split across Furnace + Crucible Agent reasoning, complex generation
Primary
Qwen 3.5 35B-A3B
Q4_K_M · port 8088
Mixture of Experts · 2 slots · 32K context Fast classification, triage, routing Agent priority slot for cycle work
Fast MoE
Qwen3-VL 32B
Q4_K_M · port 8081
Vision-language model Photo description, OCR, visual QA 163K photos · 255K music plays · 340 biography facts
Vision
nomic-embed-text v1.5
f16 · port 8082
768-dimensional embeddings Semantic search across all data Knowledge, photos, contacts, messages
Embeddings

The service layer

Ten specialized services running alongside the core LLM models. Each handles a specific modality — vision, audio, identity, search — all accessible through Forge's unified API.

VLM
Vision Language Model
Photo description, visual QA, OCR
:8081
Embed
Embeddings
Semantic search, 768 dimensions
:8082
Whisper
Whisper
Speech-to-text transcription
:8083
InsightFace
InsightFace
Face detection and recognition
:8084
Reranker
Reranker
Search result reranking
:8085
OCR
Florence-2 OCR
Document text extraction
:8086
TTS
Chatterbox TTS
Text-to-speech synthesis
:8087
Diarization
Voice Diarization
Speaker identification
:8089
Sentiment
Sentiment Analysis
LLM-powered sentiment scoring
LLM
Image Gen
Image Generation
ComfyUI / FLUX.1 on Crucible
:8088 · planned

Fourteen sources, one graph

Everything a digital life touches flows through the same ingestion pipeline: normalize → deduplicate → store → enrich (sentiment, embeddings, knowledge graph) → proactive rules. Sources run on different cadences — some real-time, some scheduled, some manually imported archives — but they all land in the same PostgreSQL database and are queryable as one dataset.

iMessage
MCP · 60s
98K messages
Gmail
MCP · 90s
59K emails
Apple Photos
MCP · 15 min
163K photos
Google Calendar
MCP · 5 min
live sync
Contacts
iCloud DB · 6h
142 contacts
Looki wearable
API · 60s
1,657 moments
Apple Health
Relay sync
4M records
Strava
API
363 activities
Spotify history
JSON dump
187K plays
Last.fm
JSON dump
68K scrobbles
Apple Music
Biome SEGB · 6h
8.6K tracks
Netflix history
CSV
3.8K views
Travel archives
Flights · Trips
347 records
Historical archives
Blog · BlackBerry · Guestbook
3.5K records

Each source lands in its own table, gets normalized into ingestion_log, then fans out through the enrichment pipeline: sentiment analysis, 768-dim embeddings, knowledge graph extraction, and GPS reverse-geocoding where applicable. The proactive engine watches all sources and surfaces significant events to agents via the inbox. Nothing is truly siloed — a photo cluster can cross-reference a trip, which cross-references messages with a person, which cross-references a song played during that week.

One message, end to end

Watch a single iMessage flow through every stage of the system — from arrival to action — in about thirty seconds. This is the same path every piece of incoming data takes.

arrive iMessage MCP · 60s poll normalize Pipeline ingestion_log embed nomic-768 :8082 sentiment Forge LLM :8088 priority extract knowledge graph facts + entities classify PIE engine significant? deliver ARIA inbox LISTEN/NOTIFY act cycle wakes tool calls 01 02 03 04 05 06 07 08 Hey, are you free tomorrow?
01 message arrives via Apple Photos / iMessage MCP poll 02 Pipeline normalizes & writes to ingestion_log 03 embedding generated, stored in pgvector 04 sentiment classified via Forge priority LLM 05 entities + facts extracted into the knowledge graph 06 proactive engine scores significance 07 if significant, NOTIFY ARIA's inbox channel 08 ARIA wakes mid-cycle, decides what to do, calls tools

Meet the agents

Seven autonomous agents, each with a distinct role and personality. They coordinate through an inbox system, run on configurable schedules, and operate under graduated autonomy levels.

Pipeline
Pipeline
Monitors 14 data sources and keeps enrichment pipelines flowing
Infra
Infra
Guards infrastructure, job queues, databases, and security
Inference
Inference
Manages five local LLM instances across two machines
Coder
Coder
Reviews code, audits dependencies, and submits pull requests
Insight
Insight
Analyzes patterns across 131K knowledge facts and personal data
Circle
Circle
Tracks relationships, detects drift, and prepares meeting briefings
Chronicler
Chronicler
Biographical intelligence — cross-references 22 years of data to build the life story

What the agents are actually doing

Every agent cycle is a complete loop: load state → reason via LLM → pick tool calls → execute → log the outcome. Below is one real decision pulled live from the database, sanitized by Claude Opus, and walked through step by step. Underneath that, a feed of the last 24 hours of agent activity — same source, same sanitizer.

Loading decision trace…

Lately

Last 24 hours · sampled · sanitized

Loading…

The MCP connection web

Ten MCP servers provide 328 tools to the agent team. Each server connects external services — Gmail, GitHub, Notion, iMessage — into a unified tool catalog that agents discover on demand.

NEXUS
328 tools · 10 servers · one protocol

Agent architecture

Each agent runs as a short-lived cycle: load soul package, gather state, call Forge, parse actions, execute tools, log decision. Up to 5 rounds per cycle, with graduated autonomy and approval gates.

Soul Spec — Agent Persona Packages

Agent personas follow ClawSouls Soul Spec v0.5. Each agent has a soul package directory containing standardized files that define identity, behavior, and operational context.

The framework uses progressive disclosure — Level 1 loads only the manifest for quick discovery, Level 2 loads core behavior files every cycle, and Level 3 adds coordination context only when the agent has unread inbox messages. Most cycles are quiet, so Level 2 keeps token costs minimal.

Below is the actual content of every agent's soul package, loaded straight from disk. Pick an agent, pick a file, see exactly what their LLM cycles get:

Loading…

The soul loader caches manifests per process and re-reads files on worker restart. Prompt changes are hot-reloaded — edit the file, restart the service, no migration needed.

Checklists — Mandatory Tasks

Agents have mandatory tasks enforced at two levels. CHECKLIST.md is hardcoded in the soul package and loaded into every LLM prompt — numbered steps that must be completed each cycle, in order. The agent_checklists table adds configurable daily, weekly, and monthly items tracked in the database.

souls/pipeline/CHECKLIST.md (excerpt)
## Every Cycle (MANDATORY)

1. **Sync handler health.** Verify all sync handlers
   ran recently.
2. **Ingestion flow.** Verify data is flowing across
   all sources.
3. **Enrichment backlogs -- FILL THE QUEUE.** Check
   actual data gaps, not just pending job counts.
   If gap > 500, enqueue batch jobs.
4. **Daily checklist.** Call get_checklist, complete
   each item.
5. **Report status.** Summarize what you found AND
   what you did about it.

Agents use get_checklist and complete_checklist_item tools to query and mark items done. 50+ checklist items across 8 agents — morning briefings, security scans, behavioral summaries, VRAM reviews, code audits, biographical interviews, and more.

Working Memory — Ephemeral State

Agents emit <working_memory> blocks in their LLM responses to persist short-term state across cycles. Unlike permanent memories, working memory is ephemeral — it tracks in-progress investigations, multi-cycle task context, and running observations.

<working_memory>
Photo backlog was 3200 last cycle.
Triggered 6 describe batches.
Check reduction next cycle.
Gmail auth token expires in 48h -- monitoring.
</working_memory>

The framework extracts the content, stores it in agent_registry.working_memory, and re-injects it into the state message on the next cycle. Working memory auto-clears on clean cycles (status=ok, no actions) — ensuring agents don't carry stale context.

Long-term Memory — Knowledge Persistence

The agent_memory table stores permanent knowledge with confidence scoring, supersedence chains, and reinforcement tracking. Agents save observations, baselines, and findings using the remember tool.

When an agent's state message exceeds 4000 tokens, auto-summarization compresses the memories section via a lightweight LLM call before the main cycle runs. The consolidate_memories tool uses LLM to identify clusters of redundant memories and merge them into consolidated entries, with originals deactivated.

The system preserves key facts while dropping redundant observations — preventing context window overflow as agents accumulate knowledge over weeks and months.

Delegation — Cross-Agent Consultation

The delegate_task tool enables synchronous cross-agent consultation within a single cycle. Instead of sending an inbox message and waiting 15 minutes for the next cycle, an agent can query another agent's expertise in real time.

// ARIA checking Forge health via Inference
delegate_task({
  agent_id: "inference",
  prompt: "Is Forge healthy right now?
           Any latency concerns?"
})
// Response in ~5-10 seconds

The target agent's soul package and memories are loaded, a single LLM call is made, and the response returns immediately. Delegation is for reasoning queries — no tool execution. Async fallback via inbox when LLM slots are busy.

Communication — Inbox & Event System

All agent-to-agent communication flows through agent_inbox with priority levels. Messages include context, trace IDs for debugging, and support for urgent routing.

LISTEN/NOTIFY: PostgreSQL pub/sub triggers immediate agent wake-ups when inbox messages arrive. A 30-second debounce prevents storms, but critical priority messages bypass the debounce entirely. This means urgent escalations reach the right agent in under a minute, not at the next scheduled cycle.

Inbox cap at 10 messages with 7-day expiry. Agents process messages in priority order and can mark them as read or acted upon. The system includes per-tool approval gating — high-risk tools like restart_service always require human approval regardless of agent autonomy level.

The autoscaler

When enrichment backlogs grow, ephemeral workers spawn on both machines. When the queue drains, they disappear. No orchestrator, no containers — just systemd transient units.

0 pending
idle

Pipeline detects a backlog of 3,200 unembedded photos. It floods the job queue. The autoscaler on Furnace and Crucible detects >50 pending jobs and spawns ephemeral bulk workers — up to 4 per machine. Workers pull jobs with FOR UPDATE SKIP LOCKED, process them in parallel, and exit when the queue is empty. Total throughput: ~185 jobs in 90 seconds.

Every commit, every repo

Nexus didn't appear out of thin air — it was built on top of months of related work across a constellation of personal repos. Each bar shows commits per repo per month across the whole portfolio. The whole thing is a recent push: roughly a thousand commits between the initial idea and what's running today.

How it happened

It started as a personal AI assistant. March 12, 2026 — I scaffolded the first version of ARIA, a single-agent system that could triage my email and send me a morning briefing. It worked. And then it needed more.

ARIA couldn't do everything alone. She needed someone watching the infrastructure while she handled messages. Someone monitoring the LLM services. Someone tracking data pipelines. Agents started appearing — first Keeper, then Monitor, then a parade of specialists. I built Chancery as the management dashboard to keep track of them all. Within a week I had six agents, a growing job queue, and a tight-coupling problem that was about to break.

The agents got stuck. Monitoring loops — an agent would detect a minor anomaly, escalate it, get a response, detect it again, escalate it again. Escalation storms — three agents all noticing the same issue and flooding ARIA's inbox simultaneously. I learned the hard way that prompt constraints weren't enough. You can tell an agent "don't escalate minor issues" in its system prompt, and it will do it anyway the moment something looks slightly off. The fix was framework-level enforcement: inbox caps, debounce timers, priority gates, approval workflows baked into the tool executor, not the prompt.

That's when I built Nexus. March 30 — a clean monorepo, a shared database (migrated off AWS RDS onto local PostgreSQL), a unified tool catalog, and a proper agent runtime with working memory, checklists, and graduated autonomy. The agents got renamed, restructured, given soul packages instead of monolithic prompts. The reactive monitoring model gave way to something more deliberate: a family office, where each agent has a defined role, a mandatory checklist, and the discipline to report what they found rather than chase every anomaly.

March 12, 2026
ARIA Phase 1 scaffold
Single-agent personal assistant. Email triage, morning briefings, basic task management.
March 25
Forge initial service
Local LLM inference gateway. Needed to stop paying cloud API costs and control the stack end-to-end.
March 28
Chancery bootstrap
ARIA needed help. Agents started appearing — a management dashboard to see what they were all doing.
March 30
Nexus monorepo initialized
Loose coupling, distributed architecture. Shared services, unified tool catalog, proper agent runtime.
March 31
Database migration
Moved from AWS Aurora RDS to local PostgreSQL on Furnace. 254 tables, zero cloud dependency.
April 1
Data pipeline validation
PIE operational. 14 data sources active. Proactive Intelligence Engine detecting patterns and delivering insights.
April 2
Llama 70B, Soul Spec v0.5, autoscaler
Deployed Llama 3.3 70B across both machines via RPC tensor split. Agent rename v3 — from roles to personalities. Soul Spec v0.5 replaced monolithic prompts. The family office model crystallized.
April 3
Reporting, avatars, showcase
Chancery reporting pages, animated agent avatars via Runway, intro videos, and this showcase site.
April 4
Mass biographical ingest
Pulled in Last.fm, Netflix, Apple Music, Apple Notes, Apple Contacts, BlackBerry SMS archives from 2009, the niclydon.com family guestbook from 2004, and Chronicler interview sessions. The platform suddenly had 24 years of life data to work with.
April 5
Semantic search and life chapters
Unified semantic search across 28 embedded tables with Forge reranking. Claude Opus generated 16 life chapters spanning childhood through 2026, plus a unified life arc and movie poster. Launched mystory.nicholaslydon.com.
April 6
Health watchdog and showcase rebuild
Built the platform-health-watchdog timer to monitor 12 services every 60 seconds with Pushover alerts. Tracked down a process-group signal leak in the autoscaler that had been silently killing siblings. Rebuilt this showcase site from scratch.
April 7
Sentiment backfill at scale
Sentiment-backfill went from 148 rows to 174,000 across 16 communication channels in a single afternoon. Switched generation tier to Claude Haiku, classification tier to Gemini Flash Lite. Knowledge entities reached 100% LLM-summarized.
April 8
Topic modeling and platform hardening
K-means topic modeling across 590,000 embeddings produced 510 labeled clusters via Python sklearn. Found and fixed an Anthropic SDK timeout bug causing 6-minute hangs. Hardened the deploy script after a missing dist file left the worker crash-looping. Newsletter contamination filter, Pushover content-fingerprint dedup, gmail OAuth re-auth.

Chat with ARIA from your pocket. Native iOS app with SSE streaming, HealthKit sync, and location tracking.

Swift · SwiftUI · Background processing · Push notifications via Pushover

Architecture

Two machines, one database, unified tool execution

Furnace

Primary · AMD Strix Halo · 96GB VRAM

nexus-api:7700
nexus-worker (primary)
nexus-mcp:7701
Forge API:8642
llama-server (70B):8080
llama-priority (35B MoE):8088
llama-vlm (VL-32B):8081
llama-embed:8082
Whisper, TTS, InsightFace, OCR, Reranker
PostgreSQL:5432
Caddy, Prometheus, Grafana
TB5 40Gbps

Crucible

Satellite · AMD Strix Halo · 64GB VRAM

llama-rpc-worker
crucible-bulk (autoscaled)
ComfyUI / FLUX.1planned
RPC Tensor Split
Llama 70B layers 41–80 served from Crucible VRAM via llama.cpp RPC over Thunderbolt 5

Anvil (M4 Mac Mini)

ARIA Relay
iMessage, Photos, Apple Ecosystem MCP
SSH stdio bridge from Furnace

PostgreSQL · nexus

255 tables · 298K knowledge facts · 255K music plays · 163K photos · 98K messages · 59K emails · 8.8K daily profiles · localhost:5432