Vox: local speech-to-text for macOS
Vox turns speech into text using AI models that run on your machine. No audio leaves your device. No internet needed after initial model download. No subscription, no account.
Hold Alt+Space (or any hotkey you specify) in any app, speak, release. Text appears at your cursor.
$5, one-time purchase. All future updates included.
$ hold alt+space, say something, release listening... > "remind sarah about the deploy tomorrow morning" pasted to cursor
Features
System-wide hotkey
Alt+Space (or any hotkey you specify) works in any app. Your editor, browser, Slack, terminal, whatever. Hold it down, talk, let go. Text shows up at your cursor.
Voice commands
Say "new line", "undo", "select all", or "delete that" while dictating. Hands-free editing without reaching for the keyboard.
Multiple models
Ships with OpenAI's Whisper (99 languages, various sizes) and NVIDIA's Parakeet (faster English). Download them from the app, swap anytime. All run locally on your CPU. No GPU needed.
File transcription
Drag a .wav, .mp3, .m4a, or .ogg onto the window. Get a full transcript. Good for meeting recordings, voice memos, interviews.
Output templates
Auto-format what you say as an email, bullet list, code comment, meeting notes, or make your own template. Every transcription also saves as a .md file automatically.
Works offline
After you download a model once, Vox never needs the internet again. Take it on a plane.
Privacy
I know every app says "we care about your privacy." So here's the actual architecture instead of a trust-me statement:
- Audio goes from your mic to a local process on your CPU to text. Never written to disk, never sent over the network.
- There is no server endpoint to receive audio. Not "we don't collect it." There's literally nowhere to send it.
- The only network request the app makes is checking your license key on activation. No analytics, no telemetry, no crash reports.
- Models download straight from Hugging Face. No proxy, no middleman.
If that sounds paranoid, it's because I built this for myself first.
System requirements
macOS 12+ Apple Silicon (native) or Intel. Universal .dmg.
Needs a modern multi-core CPU (no GPU), 2 GB free RAM, and enough disk for the app (~100 MB) plus whichever model you pick (75 MB to 3 GB). Any mic works. Built-in, USB, Bluetooth.
Setup
No account to create. No API keys to find. No config files to edit.
- Buy and paste your key. Pay $5, get a license key in your browser and email. Paste it into the app.
- Grab a model. Pick one from the model manager. Smaller is faster, larger is more accurate. You can always switch later.
- Hold Alt+Space (or your custom hotkey) and talk. Text shows up wherever your cursor is.
FAQ
$5, works offline, updates forever.