Back to projects
Active Started Dec 2025

Omniparse

Universal document parser. Excel, PowerPoint, Python, PDF, and directories into LLM-ready Markdown and structured data.

TypeScript Next.js 16 React 19 Prisma SQLite Tailwind v4

The Problem

Most LLM workflows need to ingest documents — Excel sheets, slide decks, Python source, scanned PDFs, entire directory trees. The default move is to glue together five different parsers, each with its own dependency tree, error model, and output shape. The output usually still needs cleanup before a model can read it.

What I Built

@tyroneross/omniparse — a single SDK + CLI that takes any of those inputs and emits clean Markdown plus structured JSON. Published to npm as v1.0.0. The CLI is omniparse <path>; the SDK is a typed function call.

Monorepo

PackagePurposeStack
packages/sdkCore parsing + CLI binary omniparseTypeScript 5, tsup, xlsx, sax, p-limit
packages/webWeb app for upload + browseNext.js 16, React 19, Prisma 7, better-sqlite3, Radix UI, Tailwind 4
packages/macPlanned native Mac wrapperSwiftUI

The SDK is the contract; the web app and the future Mac app are surfaces over the same parser core.

What it parses

  • Excel — xlsx, multi-sheet, formulas resolved to values
  • PowerPoint — slide-by-slide markdown with notes preserved
  • PDF — text + structure (heading detection, table extraction)
  • Python — module → markdown with docstrings hoisted
  • Directories — recursive walk, file-type-aware, single combined output