Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API

Google just announced Gemini 3.5 live translation. It is the latest audio model for direct speech-to-speech translation. Speech-to-speech means that spoken audio comes in, and translated spoken audio comes out. The model automatically detects more than 70 languages ​​and generates translated speech. It maintains the speaker’s tone, rhythm, and pitch in the output. Step-by-step systems wait for the speaker to finish responding. Gemini 3.5 Live Translate generates speech continuously instead. It balances waiting for context and interpretation. More context improves quality. Faster output keeps the translation in sync with the speaker. The result remains a few seconds behind the speaker throughout the session.

Gemini 3.5 live translation

Gemini 3.5 Live Translate is a single voice model (gemini-3.5-live-translate-preview), not the chat assistant. It processes speech as the sound flows within a complete sentence, not after it. It handles multilingual input without manually configuring settings. Its noise power allows applications to run in loud and unpredictable environments.

The model is projected across three surfaces. Developers are getting it in public preview through Gemini Live API and Google AI Studio. Organizations are getting a special preview of Google Meet starting this month. Everyone else can get it through the Google Translate app on Android and iOS.

How does continuous streaming work?

Design teams are important for building real-time features. The live chat agent uses turn-based interactions. It is based on pausing, detecting intent, and handling interruptions. Live translation uses continuous stream processing instead. It is translated when the speaker speaks, without waiting for the turns to end.

To maintain strict real-time latency limits, the subtitle path only accepts audio input. Text input is not supported in translation mode. The model also drops tool usage and system help in this mode. This keeps it as a focused compiler pipeline rather than a general proxy.

Build using Live API

Developers configure translation within the Live API session setup. You set a translationConfig Block inside generationConfig. the targetLanguageCode The field takes a BCP-47 code, e.g "pl" or "es". BCP-47 is the standard format for language tags such as en or pt-BR. It defaults to "en". the echoTargetLanguage A boolean controls input that already exists in the target language. when truethe model echoes this rhetoric. when falseremains silent. You can also enable inputAudioTranscription and outputAudioTranscription For text texts.

Audio formats have been fixed. The input is 16-bit raw PCM at 16kHz, mono, low-end. The output is 16-bit raw PCM at 24kHz, mono, low-end. PCM is raw, uncompressed audio. You can send audio in segments of 100 milliseconds. For client-side applications, there are ephemeral codes at v1alpha The endpoint avoids revealing your API key.

Distance Live agent Direct translation
Typical role The assistant who listens, reasons, and acts Interpreter / Real-time translator pipeline
interaction Rotation-based,with discontinuity handling Constant current processing, no rotation
tools Call functions, Google search, Help Translation only, no tools or instructions
Input Text, audio, video and image Audio only, for strict response time
Settings Generation, speech, tools, instructions targetLanguageCode and echoTargetLanguage

Use case

The model targets live interpretation across several settings. Google lists multilingual calls, meetings, classes, and broadcasts. Developer platforms reduce integration work for real-time media. Agora, Fishjam, LiveKit, Pipcat, and Vision Agents already use the Live API. These platforms handle complex real-time media streaming infrastructure. This allows developers to focus on the user experience instead.

The Google app example demonstrates multilingual dubbing and interpretation. Grab is testing the driver-passenger communication model in minivans. Grab users make more than 10 million voice calls per month. CJ ENM, LiveKit, and others have reported positive feedback about quality, accuracy, and low latency.

How to change Google Meet and translation

According to the official release from Google, Google Meet will soon use Live Translate version 3.5 for speech translation. The table shows what was mentioned before and after Meet.

ability Previous meeting With 3.5 live translations
Languages 5 70+
Groups for each meeting Only to and from English 2000+ groups
access Existing interface Updated interface for instant access

The Meet update is available in private preview to Workspace for business customers this month. It will be rolled out more widely later this year. In the Translate app, the live translation feature works with any connected headphones. It reflects the speaker’s tone across more than 70 languages. Android also gains a listening mode. You hold the phone to your ear like a normal call. The translated audio is then streamed through the earpiece, without being heard by others.

Key takeaways

  • Gemini 3.5 Live Translate is Google’s latest voice model for live speech-to-speech translation across more than 70 languages.
  • It flows continuously rather than step by step, staying a few seconds behind the speaker.
  • Developers can configure it via Live API using targetLanguageCode and echoTargetLanguage; Audio only, 16 kHz in, 24 kHz out.
  • It is rolled out in Gemini Live API, Google Meet (5 → 70+ languages) and Translate app.
  • All generated audio carries an imperceptible SynthID watermark for easy detection.

verify Model card and Technical details. Also, feel free to follow us on twitter Don’t forget to join us 150k+ mil SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.

Do you need to partner with us to promote your GitHub Repo page, face hug page, product release, webinar, etc.? Contact us


Leave a Reply