Models

Google Releases Gemma 3: Multimodal Open Models Up to 27B Parameters

Google has launched Gemma 3, a new family of open-weight models featuring multimodal capabilities, 128k token context windows, and support for 140+ languages.

AZAli Zayed · Founder & EditorJune 25, 20262 min read✓ Independently fact-checked

The quick version

Gemma 3 is available in four parameter sizes: 1B, 4B, 12B, and 27B, with both base and instruction-tuned versions.
The 4B, 12B, and 27B models support multimodal inputs, allowing them to process both text and images.
Context window lengths have been expanded to 128k tokens for the 4B, 12B, and 27B models, while the 1B variant supports 32k tokens.
Google reports that the 27B instruction-tuned model outperforms Gemini 1.5-Pro across undisclosed benchmarks.

Google has officially released Gemma 3, its latest iteration of open-weight large language models. The update introduces four distinct parameter sizes—1 billion, 4 billion, 12 billion, and 27 billion—designed to bridge the gap between compact, on-device utility and high-performance reasoning. According to technical documentation from the release, the models now support over 140 languages and offer significantly expanded context windows compared to the previous generation.

The most notable shift in this release is the move toward native multimodality. While the 1B model remains text-only, the 4B, 12B, and 27B variants can ingest and process both image and text inputs. This allows users to perform tasks such as image content analysis, document summarization, and complex question answering. For those building workflows with these models, you can compare their utility against proprietary alternatives in our ChatGPT vs Gemini (2026): Which Is Better? (Tested) analysis.

Why it matters

Gemma 3 represents a significant scaling effort by Google to provide powerful, open-weight tools that do not require building from scratch. To achieve the 128k token context window efficiently, the development team utilized existing Gemma 2 knowledge rather than retraining from the ground up. By pre-training with 32k sequences and scaling to 128k at the final stage, the researchers managed to save significant compute resources. Furthermore, the team adjusted positional embeddings—upgrading the base frequency from 10k to 1M—and optimized KV cache management using a sliding window interleaved attention mechanism.

What it means for you

For developers, the integration with the Hugging Face ecosystem is a primary feature, enabling immediate deployment via Transformers, MLX, Llama.cpp, and Hugging Face Endpoints. The models are designed to be flexible; even for the multimodal versions, users can opt to run them as text-only models by choosing not to load the vision encoder into memory, thereby saving system resources. With the 27B instruction-tuned model reportedly outperforming Gemini 1.5-Pro on internal benchmarks, these models provide a high-performance, cost-effective option for local or private server deployments where data privacy and control are paramount.

128kMaximum context window tokens

Frequently asked questions

What are the parameter sizes for Gemma 3?

Gemma 3 is available in 1B, 4B, 12B, and 27B parameter versions.

Does Gemma 3 support images?

Yes, the 4B, 12B, and 27B models support both image and text inputs.

What is the maximum context window for Gemma 3?

The 4B, 12B, and 27B models support a context window of 128k tokens, while the 1B model supports 32k.

Our tested pick

If you are deciding between Google’s models and OpenAI’s, check our head-to-head comparison to see which performs better for your specific tasks.

ChatGPT vs Gemini (2026): Which Is Better? (Tested) →

Source: Hugging Face. Published June 25, 2026.

Ali Zayed

Founder & Editor · AI Tools Worth

Ali has hands-on tested 50+ AI tools and tracks model releases daily. Every verdict here comes from real, paid usage — never vendor demos or sponsored placements.

AI Tools Worth is independent and unsponsored. Some linked guides contain affiliate links — they never change our verdicts.