// zevyn voice

Hold a hotkey, speak the brief.

Zevyn Voice is system-wide push-to-talk dictation built into the desktop app. Hold Right Alt + Space, speak naturally, release — the transcript pastes into whichever window has focus, in roughly 170 milliseconds. No API key, no rate limit, no audio leaves your machine.

Join the waitlist Read the docs

// acoustic chamberhotkey · capture · decode · paste

zevyn voice

Right Alt+Spacehold to dictate

transcriptdecoding…

›

EN · parakeet tdt v3

~170 ms · local · in-process

live·voice·speak the brief · pastes in ~170 ms

§ 01

Hold the hotkey, talk like a human

Press and hold Right Alt + Space anywhere on your desktop. A floating indicator confirms you are captured; speak naturally; release. The transcript is pasted into whatever window has focus — VS Code, a terminal, the Supervisor chat, a browser address bar.

zevyn voice

hotkey Right Alt + Space (hold or toggle)

mode push-to-talk

▶ "validate the body schema then return a 200"

pasted into app/api/orders/route.ts · 170 ms

The hotkey is configurable, and a test button in settings flashes the indicator so you know your binding is live, not shadowed by another app.

§ 02

Local model, ~170 ms latency

Inference runs in-process on your CPU with sherpa-onnx and an INT8 build of NVIDIA Parakeet TDT 0.6b v3. A three-second utterance decodes in roughly 170 milliseconds on a modern dev machine — six times faster than Whisper Large-v3-Turbo on the same hardware, with lower word error rate on technical English.

6.34% WER on technical English — beats Whisper Large-v3-Turbo (7.75%).
25 languages out of the box. Default English; switch in settings.
Model lazily loaded on first press; arena freed after 5 minutes idle.
Stream stalls (USB headset unplug) trigger an automatic rebuild.

Note

Single binary. No Python, no sidecar process, no notarization breaks — sherpa-onnx statically links the ONNX runtime at build time.

§ 03

Zero bytes leave your machine

Audio is captured, resampled, decoded, and discarded — all on your CPU, in your process. There is no API call, no rate limit, no key to manage, no "service down" failure. Voice keeps working on a plane, behind a firewall, on a customer site with no internet.

The model itself is downloaded once on first enable, SHA-256 verified end-to-end, and cached. From then on the feature is local-first by construction.

§ 04

Pastes cleanly into anything

Zevyn Voice delivers the transcript through the clipboard plus a single Cmd/Ctrl + V — not per-character key injection. xterm, tmux, Vim in a remote SSH session, and any TUI relying on bracketed-paste all receive the text as one coherent block, with no interleaving artifacts.

Auto-paste mode — clipboard set, paste injected, prior clipboard restored.
Manual paste mode — clipboard set only (Citrix and VDI friendly).
Hold or toggle hotkey modes; pick your microphone explicitly.
Optional 'keep model loaded' for zero cold-start on next press.

Note

Briefs the Supervisor like you would speak to a tech lead — the agent receives the dictated text as a single, clean prompt.

// related

The Supervisor

One LLM plans the work, spawns agents, reviews their diffs, and merges. You talk to it like a tech lead.

Explore

Memory graph

Local markdown notes with wikilinks and full-text search. Decisions persist long after the conversation ends.

Explore

Built-in browser

Agents navigate and screenshot live pages. Click any element to brief an agent about exactly what to change.

Explore

Stop supervising one agent.

Zevyn Studio is launching soon. Join the waitlist and start directing a team.

Join the waitlist All features