Voixet · Vol I · How it works

A phone call across any language, explained.

Voixet is a browser dial pad, the OpenAI Realtime Translations model, and the Twilio PSTN network. Together they place a real phone call to any number on earth and run a live interpreter in both directions of the line.

What happens during a call

  1. I

    You open voixet.com and pick a language pair

    The dial pad runs entirely in your browser. No app, no plugin, no webcam — just a microphone and a network connection. The first call each new account gets is free for 3 minutes.

  2. II

    Voixet dials a real phone number on your behalf

    Through Twilio Programmable Voice, Voixet places a real PSTN call to the number you entered. The recipient gets a normal phone call. They do not need an account, an app, or a setup step. They pick up.

  3. III

    Your speech streams to the OpenAI Realtime model

    As you start speaking, fragments of audio stream to OpenAI's Realtime Translations model. The model recognises your language and begins synthesising the translation while you are still speaking — the way a professional simultaneous interpreter works at a conference, not the way a translator works on a document.

  4. IV

    The translated audio plays in the recipient's ear

    The translated audio stream is piped back through Twilio into the phone call leg connected to the recipient. They hear it as if you were a real interpreter on the line — your voice cadence, their language. Latency is typically under one second.

  5. V

    Same thing in reverse for their reply

    A separate Realtime session handles the recipient's side of the call. Their speech in their language comes back to you as audio in your language. The two sessions are independent so neither ever echoes the other.

  6. VI

    A bilingual transcript renders live

    Two columns on screen — original above, translation below — render in real time, so you can verify exactly what was said while the conversation is still happening. The transcript saves to your account so you can revisit it later.

What Voixet is not

  • Not a text translator. Voixet handles spoken phone calls. For pasting and translating text, Google Translate or DeepL are better tools.
  • Not a video-conferencing translator. Voixet places traditional PSTN phone calls to phone numbers. Zoom, Google Meet, and other web-conferencing platforms have their own translation features.
  • Not certified interpretation. For regulated medical or legal communication where accuracy is auditable, a certified human interpreter is still the appropriate choice. Voixet is for everyday cross-language communication where speed and cost matter more than formal certification.
  • Not free above 3 minutes. The free quota is a one-time signup gift designed to let you verify the experience. Beyond that, Voixet is pay-per-minute at about $0.60.

What language pairs are tested most

The underlying OpenAI Realtime model supports 100+ languages. The pairs we test most actively and where translation quality is highest:

  • EN ↔ ZH (Traditional)
  • EN ↔ ZH (Simplified)
  • EN ↔ JA
  • EN ↔ KO
  • EN ↔ ES
  • EN ↔ FR
  • EN ↔ DE
  • EN ↔ PT
  • ZH ↔ JA
  • ZH ↔ KO
  • JA ↔ KO

Languages outside this list will work but may have lower fluency or higher latency. The transcript lets you spot-check translation quality mid-call.

Try it

The fastest way to understand Voixet is to make a real call. New accounts get 3 minutes free, no credit card required.

Last updated · 2026-05-14