// zevyn voice

Hold a hotkey, speak the brief.

Zevyn Voice is system-wide push-to-talk dictation built into the desktop app. Hold Right Alt + Space, speak naturally, release — the transcript pastes into whichever window has focus, in roughly 170 milliseconds. No API key, no rate limit, no audio leaves your machine.

§ 01

Hold the hotkey, talk like a human

Press and hold Right Alt + Space anywhere on your desktop. A floating indicator confirms you are captured; speak naturally; release. The transcript is pasted into whatever window has focus — VS Code, a terminal, the Supervisor chat, a browser address bar.

zevyn voice
hotkey Right Alt + Space (hold or toggle)
mode push-to-talk
▶ "validate the body schema then return a 200"
pasted into app/api/orders/route.ts · 170 ms

The hotkey is configurable, and a test button in settings flashes the indicator so you know your binding is live, not shadowed by another app.

§ 02

Local model, ~170 ms latency

Inference runs in-process on your CPU with sherpa-onnx and an INT8 build of NVIDIA Parakeet TDT 0.6b v3. A three-second utterance decodes in roughly 170 milliseconds on a modern dev machine — six times faster than Whisper Large-v3-Turbo on the same hardware, with lower word error rate on technical English.

  • 6.34% WER on technical English — beats Whisper Large-v3-Turbo (7.75%).
  • 25 languages out of the box. Default English; switch in settings.
  • Model lazily loaded on first press; arena freed after 5 minutes idle.
  • Stream stalls (USB headset unplug) trigger an automatic rebuild.
Note

Single binary. No Python, no sidecar process, no notarization breaks — sherpa-onnx statically links the ONNX runtime at build time.

§ 03

Zero bytes leave your machine

Audio is captured, resampled, decoded, and discarded — all on your CPU, in your process. There is no API call, no rate limit, no key to manage, no "service down" failure. Voice keeps working on a plane, behind a firewall, on a customer site with no internet.

The model itself is downloaded once on first enable, SHA-256 verified end-to-end, and cached. From then on the feature is local-first by construction.

§ 04

Pastes cleanly into anything

Zevyn Voice delivers the transcript through the clipboard plus a single Cmd/Ctrl + V — not per-character key injection. xterm, tmux, Vim in a remote SSH session, and any TUI relying on bracketed-paste all receive the text as one coherent block, with no interleaving artifacts.

  • Auto-paste mode — clipboard set, paste injected, prior clipboard restored.
  • Manual paste mode — clipboard set only (Citrix and VDI friendly).
  • Hold or toggle hotkey modes; pick your microphone explicitly.
  • Optional 'keep model loaded' for zero cold-start on next press.
Note

Briefs the Supervisor like you would speak to a tech lead — the agent receives the dictated text as a single, clean prompt.

// related

Stop supervising one agent.

Zevyn Studio is launching soon. Join the waitlist and start directing a team.