vox

Vox: local speech-to-text for macOS

Vox turns speech into text using AI models that run on your machine. No audio leaves your device. No internet needed after initial model download. No subscription, no account.

Hold Alt+Space (or any hotkey you specify) in any app, speak, release. Text appears at your cursor.

$5, one-time purchase. All future updates included.

macOS
$ hold alt+space, say something, release
listening...
> "remind sarah about the deploy tomorrow morning"
pasted to cursor

Features

System-wide hotkey

Alt+Space (or any hotkey you specify) works in any app. Your editor, browser, Slack, terminal, whatever. Hold it down, talk, let go. Text shows up at your cursor.

Voice commands

Say "new line", "undo", "select all", or "delete that" while dictating. Hands-free editing without reaching for the keyboard.

Multiple models

Ships with OpenAI's Whisper (99 languages, various sizes) and NVIDIA's Parakeet (faster English). Download them from the app, swap anytime. All run locally on your CPU. No GPU needed.

File transcription

Drag a .wav, .mp3, .m4a, or .ogg onto the window. Get a full transcript. Good for meeting recordings, voice memos, interviews.

Output templates

Auto-format what you say as an email, bullet list, code comment, meeting notes, or make your own template. Every transcription also saves as a .md file automatically.

Works offline

After you download a model once, Vox never needs the internet again. Take it on a plane.

Privacy

I know every app says "we care about your privacy." So here's the actual architecture instead of a trust-me statement:

  • Audio goes from your mic to a local process on your CPU to text. Never written to disk, never sent over the network.
  • There is no server endpoint to receive audio. Not "we don't collect it." There's literally nowhere to send it.
  • The only network request the app makes is checking your license key on activation. No analytics, no telemetry, no crash reports.
  • Models download straight from Hugging Face. No proxy, no middleman.

If that sounds paranoid, it's because I built this for myself first.

System requirements

macOS 12+ Apple Silicon (native) or Intel. Universal .dmg.

Needs a modern multi-core CPU (no GPU), 2 GB free RAM, and enough disk for the app (~100 MB) plus whichever model you pick (75 MB to 3 GB). Any mic works. Built-in, USB, Bluetooth.

Setup

No account to create. No API keys to find. No config files to edit.

  1. Buy and paste your key. Pay $5, get a license key in your browser and email. Paste it into the app.
  2. Grab a model. Pick one from the model manager. Smaller is faster, larger is more accurate. You can always switch later.
  3. Hold Alt+Space (or your custom hotkey) and talk. Text shows up wherever your cursor is.

FAQ

$5, works offline, updates forever.