Reads documents other tools can't.

Converts documents into clean, AI-ready Markdown.

On your machine. No upload.

Works with PDF · DOCX · PPTX · XLSX · EPUB · HTML · CSV · JSON · XML · images · audio

Open Source MIT Local-first Windows

⬇ Download for Windows See the proof ↓

MDFlux: drop a document, get clean Markdown in seconds

See it in action · three steps

Select. Convert. Done.

01Select

Grab a folder of mixed files. PDFs, Office docs, scans, web pages. No conversion settings to pick.

02Convert

One drop turns the whole batch into AI-ready Markdown, locally. OCR recovers text from scans others read as empty.

03Stays healthy

Dependency health and diagnostics run on-device. See exactly what's installed and working.

The proof · fewer tokens, lower cost

About 2 to 6× fewer tokens than a vision model.

Every time a document gets read by an LLM, you pay for it in tokens. Sending pages as images is an expensive way to spend them. MDFlux hands the model clean Markdown instead, and the saving lands on every single call that reads the document.

Why clean Markdown costs fewer tokens

You pay for pixels, not words. A page sent to a vision model as an image costs a fixed chunk of tokens, often well over a thousand, no matter how little text is on it. The same page as Markdown is usually a few hundred.

Plain text tokenizes efficiently. Markdown is just text, with no image data, no markup bloat, no base64 blobs.

Clean beats raw. MDFlux strips the broken layout, repeated whitespace, and junk that pad out messy extractions, so you don't spend tokens on noise.

Structure stays compact. Headings, tables, and lists carry the document's meaning in very few tokens.

It compounds. The saving lands on every single call that reads the document, across a whole pipeline or batch.

5.7×lighter than vision on scanned pages

Point a plain extractor at a scanned, image-only PDF and you get zero usable text. MDFlux's OCR recovers it, and even then stays lighter than the vision route.

Scanned, image-only PDF	Usable tokens
Plain text extractor	0
Vision model (page as image)	10,731
MDFlux (OCR → Markdown)	1,893

Full text recovered in about 5.7× fewer tokens than the vision model, which still has to OCR the image on its end anyway.

Two to six times fewer tokens than a vision model

What it does

Everything around the engine that makes it usable.

Built on Microsoft's MarkItDown. What MDFlux adds is the OCR, the desktop app, the batching, the reliability, and the privacy-by-default packaging.

💸

Fewer tokens, lower cost

Clean Markdown costs about 2 to 6× fewer tokens than sending pages to a vision model. Every LLM call that reads the document is cheaper.

🔒

Local and private

Your documents never leave your machine. No cloud, no API key, no account.

🔍

Reads scanned PDFs

Built-in OCR recovers text that plain extractors return as zero characters.

🧱

Real structure

Proper Markdown with headings, tables, and lists intact. Readable, greppable, diff-able.

🖥️

No terminal needed

Portable app. Unzip, run, click through a one-time setup. Done.

📦

Many formats

PDF, DOCX, PPTX, XLSX, EPUB, HTML, CSV, JSON, XML, images, and audio.

🔁

Batch a whole folder

Convert everything at once, with progress, cancellation, and per-file diagnostics.

🧹

Optional cleanup

Off, rule-based, or a local AI pass to tidy up messy extractions.

How it works · three steps

Select. Convert. Done, all on your machine.

The first launch sets up a private, self-contained Python environment (one time, needs internet). Every conversion after that runs fully offline.

01Set up

A one-time, self-contained setup.

MDFlux provisions its own private Python 3.12 environment on first launch. After that it runs fully offline, every time. No terminal, no admin rights, no manual installs.

MDFlux first-run setup screen: provisions a private, self-contained Python environment

02Convert

Drop → clean Markdown.

One drop turns a file or a whole batch into AI-ready Markdown, locally. OCR recovers text from scans that other tools read as empty. Cancel any time, with progress per file.

03Stays healthy

Everything local & checked.

Dependency health and diagnostics run on-device. You see exactly what's installed and working, and a clear next step instead of a silent empty file when something's off.

MDFlux diagnostics panel: dependency health and per-file status

Cleanup modes · a per-run switch

Off, rule-based, or a local AI pass.

Tidy up messy extraction, or don't. The default is Off: no LLM, fully offline. You decide per run, per batch. Nothing is forced on you.

MDFlux cleanup mode selector: Off, Rule-based, Local AI, and API options

Off default

No LLM. Raw MarkItDown output, exactly as the engine produced it. Fully offline, zero compute overhead.

Rule-based offline

Deterministic cleanup passes, fix broken tables, collapse whitespace, normalise headings. No LLM, no surprise rewrites, always the same output.

Local AI Ollama · offline

An optional pass through a model running on your machine via Ollama. Still offline, still private. Bring your own model, the app doesn't manage weights in v1.

API bring-your-own-key

For when you want a frontier model's cleanup. OpenAI-compatible client, your key, your cost. Clearly marked, never the default, never required.

How it compares

MDFlux vs Microsoft MarkItDown.

MDFlux is built on MarkItDown, which is a genuinely great conversion library. The point isn't to beat the engine, it's to make that engine genuinely usable day to day.

	MarkItDown	MDFlux
Core conversion engine	yes	yes (uses MarkItDown)
Scanned / image-only PDFs	≈ 0 characters	built-in OCR recovers text
Install and run	pip + terminal	portable app, no terminal
Dependency setup	manual	sets itself up on first launch
Batch a whole folder	write your own script	built in, concurrent, with progress
Timeouts and cancel	can hang, no feedback	streams progress, cancellable
Cleanup modes	raw output	Off, rule-based, or local AI
Preview & diagnostics	none	rendered preview + health panel
Audio transcription	plugin or Azure	local, built in
Privacy	local if you wire it up	local by default

Who it's for

Built for anyone who feeds documents to a model.

🤖

AI & RAG builders

Feed clean, structured source documents to any model instead of raw text or pricey vision tokens.

🔬

Researchers

Batch-convert papers, reports, and scanned archives into searchable Markdown.

🧑‍💻

Developers

Get diff-able, version-controllable text out of binary document formats.

📝

Writers & analysts

Pull clean copy out of PDFs and Office files without the formatting mess.

🔒

Privacy-conscious

Convert sensitive contracts, records, and decks with nothing ever uploaded.

Supported formats

One app, most of your files.

Documents

PDF incl. scanned · OCR
EPUB
TXT
Markdown

Office

DOCX
PPTX
XLSX

Web & data

HTML
CSV
JSON
XML

Other

Audio → transcript local
OCR on images local

Download for Windows. Free. MIT-licensed.

Portable zip. No installer, no admin rights, no account, no cloud.

⬇ Get MDFlux