Active Started Dec 2025
Omniparse
Universal document parser. Excel, PowerPoint, Python, PDF, and directories into LLM-ready Markdown and structured data.
TypeScript Next.js 16 React 19 Prisma SQLite Tailwind v4
The Problem
Most LLM workflows need to ingest documents — Excel sheets, slide decks, Python source, scanned PDFs, entire directory trees. The default move is to glue together five different parsers, each with its own dependency tree, error model, and output shape. The output usually still needs cleanup before a model can read it.
What I Built
@tyroneross/omniparse — a single SDK + CLI that takes any of those inputs and emits clean Markdown plus structured JSON. Published to npm as v1.0.0. The CLI is omniparse <path>; the SDK is a typed function call.
Monorepo
| Package | Purpose | Stack |
|---|---|---|
packages/sdk | Core parsing + CLI binary omniparse | TypeScript 5, tsup, xlsx, sax, p-limit |
packages/web | Web app for upload + browse | Next.js 16, React 19, Prisma 7, better-sqlite3, Radix UI, Tailwind 4 |
packages/mac | Planned native Mac wrapper | SwiftUI |
The SDK is the contract; the web app and the future Mac app are surfaces over the same parser core.
What it parses
- Excel — xlsx, multi-sheet, formulas resolved to values
- PowerPoint — slide-by-slide markdown with notes preserved
- PDF — text + structure (heading detection, table extraction)
- Python — module → markdown with docstrings hoisted
- Directories — recursive walk, file-type-aware, single combined output