Einstein voice lab: local STT and live models

Whisper on lab GPUs for low-latency transcription, paired with Gemini Live sessions for conversational speech on hardware we operate.

Voice is one of the hardest surfaces to get right — latency, barge-in, and model quality all have to land together. Our Einstein voice lab runs Whisper STT on dedicated GPUs so transcription stays fast even when cloud APIs spike.

For full-duplex conversation we pair local STT with Gemini Live sessions, benchmarking end-to-end latency and word-error rate before routes ship to Simon Voice. The same stack powers internal demos and customer previews on private nodes.

Research from the voice lab feeds both the public Voice product and Fleet deployments where teams want speech workloads entirely on hardware they control.

Explore Voice

More news from GeniusPro →