I Built an Offline Voice Typing App for Linux – Speak to AI
https://github.com/AshBuk/speak-to-ai
Despite the existence of various voice-to-text applications, I couldn’t find a suitable solution for my daily use on my Linux OS. Therefore, I decided to create and share my open-source project with the community. Speak to AI is:
- 100% offline — uses Whisper locally, no cloud
- Works everywhere — editors, browsers, terminals, AI chats
- Global hotkeys — press, speak, release
- Native Linux — supports X11 and Wayland
- AppImage — download and run, no installation
Tech Stack
- Go for the core app (fast, small binary)
- whisper.cpp for speech recognition
- xdotool/ydotool for keyboard simulation
- PulseAudio/PipeWire for audio capture
The Hard Parts
X11 vs Wayland: Different typing mechanisms. Solution: detect environment and use appropriate method.
Audio permissions: Global hotkeys need input
group membership. Clear docs help users set this up.
Model size: Whisper models are big. Using quantized small quantize
model balances speed and accuracy.
Results
- Storage: 277.2MB (whisper small q5 model, dependencies, go-binary)
- Memory: ~300MB RAM during operation
- <1s latency for short phrases
- 90%+ accuracy for clear speech
- Works on Fedora, Ubuntu.
I would be grateful if you test it on your Linux environment! Check the DE documentation:
https://github.com/AshBuk/speak-to-ai/blob/master/docs/Desktop_Environment_Support.md
Try It