Local Smartz
Local-first multi-agent research system on Ollama. Single DeepAgent + 8 custom tools. Web UI, CLI, and a SwiftUI macOS app — no cloud LLM dependency.
The Problem
Cloud LLM APIs are great until they’re not — privacy-sensitive research, air-gapped environments, or just a desire to control the bill. Most agent frameworks assume an OpenAI-shaped API and fail interestingly when you point them at a local model. Smaller local models also break in different ways than frontier ones: silent tool-call drops, stringified JSON arguments, and runaway loops on ambiguous prompts.
What I Built

A local-first port of the multi-agent research patterns from Stratagem. Single DeepAgent (LangChain / LangGraph) handles orchestration with built-in write_todos planning, subagent spawning via the task tool, and filesystem-based context offloading. Eight custom tools cover web search, page scraping, PDF/spreadsheet/text parsing, sandboxed Python execution, and report/spreadsheet generation.
Hardware Profiles
Two profiles, auto-detected by RAM:
| Profile | Planning model | Execution model | When |
|---|---|---|---|
| full | llama3.1:70b-instruct-q5 | qwen2.5-coder:32b-instruct-q5 | 128 GB Mac |
| lite | qwen3:8b-q4 | qwen3:8b-q4 | 20 GB M4, single model |
The lite profile gets its own system prompt (one-tool-per-turn, numbered steps, few-shot examples, no subagent refs), a reduced 5-tool whitelist, runtime turn-cap, and a loop detector. Local-model failure modes get explicit mitigation rather than wishful prompting.
Surfaces
- CLI —
localsmartz "question", interactive REPL,--thread <name>for resumable research,--list-threads - HTTP server —
localsmartz --serveruns an SSE-streaming server on127.0.0.1:11435(stdlibhttp.server, no extra deps) - macOS app — SwiftUI wrapper around the Python backend. NavigationSplitView for thread history, streaming output, MenuBarExtra. Subprocess-launched backend, SSE streamed via
URLSession.bytes. Builds via XcodeGen + Xcode 14+, ships as a DMG.

Key Design Decisions
Calculation policy is non-negotiable — every numeric answer flows through python_exec, never LLM-generated arithmetic. Local models hallucinate numbers reliably.
Sync tools, not async — local models behave better with synchronous tool execution. The async-first patterns common in cloud frameworks introduce timing variability the local stack handles poorly.
ChatOllama, not OpenAI compatibility shim — the OpenAI-compat layer silently drops tool calls in streaming mode. langchain-ollama keeps the tool-call channel intact.
Thread context + artifact manifest — ported wholesale from Stratagem. messages.jsonl + context.md per thread, plus an artifact manifest tracking every generated output. Research is resumable across sessions.
Resilient parameter parsing — create_report and create_spreadsheet accept both list-of-dict and stringified JSON arguments. Local models stringify roughly 20% of structured tool calls; rather than fight that, parse both shapes.