Back to projects
Active Started Mar 2026

Local Smartz

Local-first multi-agent research system on Ollama. Single DeepAgent + 8 custom tools. Web UI, CLI, and a SwiftUI macOS app — no cloud LLM dependency.

Python Ollama DeepAgents LangChain LangGraph SwiftUI SSE

The Problem

Cloud LLM APIs are great until they’re not — privacy-sensitive research, air-gapped environments, or just a desire to control the bill. Most agent frameworks assume an OpenAI-shaped API and fail interestingly when you point them at a local model. Smaller local models also break in different ways than frontier ones: silent tool-call drops, stringified JSON arguments, and runaway loops on ambiguous prompts.

What I Built

Local Smartz macOS app — Research tab with the agent roster (Planner, Researcher, Analyzer, Writer, Fact-checker) in the sidebar and a local gpt-oss:120b model loading into memory ::border

A local-first port of the multi-agent research patterns from Stratagem. Single DeepAgent (LangChain / LangGraph) handles orchestration with built-in write_todos planning, subagent spawning via the task tool, and filesystem-based context offloading. Eight custom tools cover web search, page scraping, PDF/spreadsheet/text parsing, sandboxed Python execution, and report/spreadsheet generation.

Hardware Profiles

Two profiles, auto-detected by RAM:

ProfilePlanning modelExecution modelWhen
fullllama3.1:70b-instruct-q5qwen2.5-coder:32b-instruct-q5128 GB Mac
liteqwen3:8b-q4qwen3:8b-q420 GB M4, single model

The lite profile gets its own system prompt (one-tool-per-turn, numbered steps, few-shot examples, no subagent refs), a reduced 5-tool whitelist, runtime turn-cap, and a loop detector. Local-model failure modes get explicit mitigation rather than wishful prompting.

Surfaces

  • CLIlocalsmartz "question", interactive REPL, --thread <name> for resumable research, --list-threads
  • HTTP serverlocalsmartz --serve runs an SSE-streaming server on 127.0.0.1:11435 (stdlib http.server, no extra deps)
  • macOS app — SwiftUI wrapper around the Python backend. NavigationSplitView for thread history, streaming output, MenuBarExtra. Subprocess-launched backend, SSE streamed via URLSession.bytes. Builds via XcodeGen + Xcode 14+, ships as a DMG.

Local Smartz macOS app showing a research run in progress with the live trace queue ::border

Key Design Decisions

Calculation policy is non-negotiable — every numeric answer flows through python_exec, never LLM-generated arithmetic. Local models hallucinate numbers reliably.

Sync tools, not async — local models behave better with synchronous tool execution. The async-first patterns common in cloud frameworks introduce timing variability the local stack handles poorly.

ChatOllama, not OpenAI compatibility shim — the OpenAI-compat layer silently drops tool calls in streaming mode. langchain-ollama keeps the tool-call channel intact.

Thread context + artifact manifest — ported wholesale from Stratagem. messages.jsonl + context.md per thread, plus an artifact manifest tracking every generated output. Research is resumable across sessions.

Resilient parameter parsingcreate_report and create_spreadsheet accept both list-of-dict and stringified JSON arguments. Local models stringify roughly 20% of structured tool calls; rather than fight that, parse both shapes.