Reads documents other tools can't.

Converts documents into clean, AI-ready Markdown.

On your machine. No upload.

Works with PDF · DOCX · PPTX · XLSX · EPUB · HTML · CSV · JSON · XML · images · audio

Open Source MIT Local-first Windows
MDFlux: drop a document, get clean Markdown in seconds
See it in action · three steps

Select. Convert. Done.

01Select
Select a folder of mixed files

Grab a folder of mixed files. PDFs, Office docs, scans, web pages. No conversion settings to pick.

02Convert
Convert documents to clean Markdown

One drop turns the whole batch into AI-ready Markdown, locally. OCR recovers text from scans others read as empty.

03Stays healthy
Dependency health and diagnostics

Dependency health and diagnostics run on-device. See exactly what's installed and working.

The proof · fewer tokens, lower cost

About 2 to 6× fewer tokens than a vision model.

Every time a document gets read by an LLM, you pay for it in tokens. Sending pages as images is an expensive way to spend them. MDFlux hands the model clean Markdown instead, and the saving lands on every single call that reads the document.

Why clean Markdown costs fewer tokens

You pay for pixels, not words. A page sent to a vision model as an image costs a fixed chunk of tokens, often well over a thousand, no matter how little text is on it. The same page as Markdown is usually a few hundred.

Plain text tokenizes efficiently. Markdown is just text, with no image data, no markup bloat, no base64 blobs.

Clean beats raw. MDFlux strips the broken layout, repeated whitespace, and junk that pad out messy extractions, so you don't spend tokens on noise.

Structure stays compact. Headings, tables, and lists carry the document's meaning in very few tokens.

It compounds. The saving lands on every single call that reads the document, across a whole pipeline or batch.

5.7×lighter than vision on scanned pages

Point a plain extractor at a scanned, image-only PDF and you get zero usable text. MDFlux's OCR recovers it, and even then stays lighter than the vision route.

Scanned, image-only PDFUsable tokens
Plain text extractor0
Vision model (page as image)10,731
MDFlux (OCR → Markdown)1,893

Full text recovered in about 5.7× fewer tokens than the vision model, which still has to OCR the image on its end anyway.

Two to six times fewer tokens than a vision model
What it does

Everything around the engine that makes it usable.

Built on Microsoft's MarkItDown. What MDFlux adds is the OCR, the desktop app, the batching, the reliability, and the privacy-by-default packaging.

MDFlux capability overview
💸

Fewer tokens, lower cost

Clean Markdown costs about 2 to 6× fewer tokens than sending pages to a vision model. Every LLM call that reads the document is cheaper.

🔒

Local and private

Your documents never leave your machine. No cloud, no API key, no account.

🔍

Reads scanned PDFs

Built-in OCR recovers text that plain extractors return as zero characters.

🧱

Real structure

Proper Markdown with headings, tables, and lists intact. Readable, greppable, diff-able.

🖥️

No terminal needed

Portable app. Unzip, run, click through a one-time setup. Done.

📦

Many formats

PDF, DOCX, PPTX, XLSX, EPUB, HTML, CSV, JSON, XML, images, and audio.

🔁

Batch a whole folder

Convert everything at once, with progress, cancellation, and per-file diagnostics.

🧹

Optional cleanup

Off, rule-based, or a local AI pass to tidy up messy extractions.

How it works · three steps

Select. Convert. Done, all on your machine.

The first launch sets up a private, self-contained Python environment (one time, needs internet). Every conversion after that runs fully offline.

01Set up

A one-time, self-contained setup.

MDFlux provisions its own private Python 3.12 environment on first launch. After that it runs fully offline, every time. No terminal, no admin rights, no manual installs.

MDFlux first-run setup screen: provisions a private, self-contained Python environment
02Convert

Drop → clean Markdown.

One drop turns a file or a whole batch into AI-ready Markdown, locally. OCR recovers text from scans that other tools read as empty. Cancel any time, with progress per file.

Converting a document to clean Markdown
03Stays healthy

Everything local & checked.

Dependency health and diagnostics run on-device. You see exactly what's installed and working, and a clear next step instead of a silent empty file when something's off.

MDFlux diagnostics panel: dependency health and per-file status
Cleanup modes · a per-run switch

Off, rule-based, or a local AI pass.

Tidy up messy extraction, or don't. The default is Off: no LLM, fully offline. You decide per run, per batch. Nothing is forced on you.

MDFlux cleanup mode selector: Off, Rule-based, Local AI, and API options

Off default

No LLM. Raw MarkItDown output, exactly as the engine produced it. Fully offline, zero compute overhead.

Rule-based offline

Deterministic cleanup passes, fix broken tables, collapse whitespace, normalise headings. No LLM, no surprise rewrites, always the same output.

Local AI Ollama · offline

An optional pass through a model running on your machine via Ollama. Still offline, still private. Bring your own model, the app doesn't manage weights in v1.

API bring-your-own-key

For when you want a frontier model's cleanup. OpenAI-compatible client, your key, your cost. Clearly marked, never the default, never required.

How it compares

MDFlux vs Microsoft MarkItDown.

MDFlux is built on MarkItDown, which is a genuinely great conversion library. The point isn't to beat the engine, it's to make that engine genuinely usable day to day.

MarkItDownMDFlux
Core conversion engineyesyes (uses MarkItDown)
Scanned / image-only PDFs≈ 0 charactersbuilt-in OCR recovers text
Install and runpip + terminalportable app, no terminal
Dependency setupmanualsets itself up on first launch
Batch a whole folderwrite your own scriptbuilt in, concurrent, with progress
Timeouts and cancelcan hang, no feedbackstreams progress, cancellable
Cleanup modesraw outputOff, rule-based, or local AI
Preview & diagnosticsnonerendered preview + health panel
Audio transcriptionplugin or Azurelocal, built in
Privacylocal if you wire it uplocal by default
MDFlux next to MarkItDown
Who it's for

Built for anyone who feeds documents to a model.

🤖

AI & RAG builders

Feed clean, structured source documents to any model instead of raw text or pricey vision tokens.

🔬

Researchers

Batch-convert papers, reports, and scanned archives into searchable Markdown.

🧑‍💻

Developers

Get diff-able, version-controllable text out of binary document formats.

📝

Writers & analysts

Pull clean copy out of PDFs and Office files without the formatting mess.

🔒

Privacy-conscious

Convert sensitive contracts, records, and decks with nothing ever uploaded.

Supported formats

One app, most of your files.

Documents
  • PDF incl. scanned · OCR
  • EPUB
  • TXT
  • Markdown
Office
  • DOCX
  • PPTX
  • XLSX
Web & data
  • HTML
  • CSV
  • JSON
  • XML
Other
  • Audio → transcript local
  • OCR on images local

Download for Windows. Free. MIT-licensed.

Portable zip. No installer, no admin rights, no account, no cloud.

⬇ Get MDFlux