AI & ML

Gemini Embedding 2: Our first natively multimodal embedding model

May 2026 8 min read

Today we released Gemini Embedding 2, our first fully multimodal embedding model based on the Gemini architecture, in public preview via the Gemini API and Vertex AI.

Expanding on our previous text-only foundation, Gemini Embedding 2 maps text, images, videos, audio, and documents into a single, unified embedding space and captures semantic intent in over 100 languages. This simplifies complex processes and improves a wide variety of downstream multimodal tasks, from retrieval augmented generation (RAG) and semantic search to sentiment analysis and data clustering.

New flexible production modalities and dimensions

The model is based on Gemini and leverages its enhanced multimodal understanding capabilities to create high-quality embeddings in:

- Text: Supports expansive context of up to 8192 input tokens

- Images: capable of processing up to 6 images per request, compatible with PNG and JPEG formats

- Videos: supports up to 120 seconds of video input in MP4 and MOV formats

- Audio: natively ingest and embed audio data without the need for intermediate text transcriptions

- Documents: Directly embed PDF files up to 6 pages

Beyond processing one modality at a time, this model natively understands interleaved input so that it can pass multiple input modalities (e.g. image + text) in a single request. This allows the model to capture the complex and nuanced relationships between different types of media, allowing for a more accurate understanding of complex real-world data.

Like our previous embedding models, Gemini Embedding 2 incorporates Matryoshka Representation Learning (MRL), a technique that "nests" information by dynamically reducing dimensions. This allows flexible output dimensions that are reduced from the default 3072 so developers can balance performance and storage costs. We recommend using dimensions 3072, 1536, 768 for maximum quality.

To see these additions in action, try our lightweight multimodal semantic search demo.

Next-generation performance

Gemini Embedding 2 doesn't just improve legacy models. It sets a new performance standard for multimodal depth, features robust voice capabilities, and outperforms leading models in text, image, and video tasks. This measurable improvement and unique multi-modal coverage give developers exactly what they need for their diverse integration needs.

Discovering deeper meaning for data

Embeds are the technology that powers experiences in many Google products. From RAG, where embeddings can play a crucial role in context engineering, to large-scale data management and classic search/analysis, some of our early access partners are already using Gemini Embedding 2 to unlock high-value multimodal applications:

Start building today

Get started with the Gemini Embedding 2 model via Gemini API or Vertex AI.

from google import genai

from google.genai import types

# For Vertex AI:

# PROJECT_ID='<add_here>'

# client = genai.Client(vertexai=True, project=PROJECT_ID, location='us-central1')

client = genai.Client()

with open("example.png", "rb") as f:

image_bytes = f.read()

with open("sample.mp3", "rb") as f:

audio_bytes = f.read()

# Insert text, image and audio

result = client.models.embed_content(

model="gemini-embed-2-preview",

content=[

"What is the meaning of life?"

types.Part.of_bytes(

data = image_bytes,

mime_type="image/png",

types.Part.of_bytes(

data = audio_bytes,

mime_type="audio/mpeg",

)

print(result.embedding)

Learn how to use the model in our Gemini API and Vertex AI Colab interactive notebooks. You can also use it through LangChain, LlamaIndex, Haystack, Weaviate, QDrant, ChromaDB and Vector Search.

By bringing semantic meaning to the diverse data around us, Gemini Embedding 2 provides the essential multimodal foundation for the next era of a

Gemini Embedding 2: Our first natively multimodal embedding model

Related Coverage

DumbQuestion.ai - Self-Awareness, Prompt Injection, Search Intent... and darkness

Gemini 2.5 Flash vs Claude 3.7 Sonnet: 4 Production Constraints That Made the Decision for Me

I Made Claude Code Think Before It Codes. Here's the Prompt.