Google I/O 2025: The AI Event of the Year (so far)

Posted by:

|

On:

|

, ,

The energy was palpable at Google I/O, with Google DeepMind and Google for Developers leading the charge. The core message? Getting Google’s “best models into your hands and our products, ASAP”. The focus was heavily on the rapid progress of their models and how they are enabling a new era of AI-powered experiences across the board. It was clear that Google is pushing the frontiers of AI, integrating it deeply into their products and making it accessible to developers.

The Brains Behind the Magic: Gemini and Gemma Updates

At the heart of many announcements were the updates to Google’s foundational AI models: Gemini and Gemma.

Gemini 2.5 Pro: Already a powerhouse, the updated 2.5 Pro is Google’s “most performance and powerful model”. It’s designed for “highly complex task that require a lot of deep reasoning”. and has proven to be a “standout use case” for coding. It’s leading across major benchmarks, including sweeping the LMArena leaderboard and being number 1 in the WebDev arena. For those pushing the boundaries, a new advanced mode called Deep Think is being rolled out to trusted testers, allowing the model to “think through various possible answers to a problem before giving you the best answer,” pushing performance to its limits with cutting-edge reasoning techniques.

Gemini 2.5 Flash: Described as Google’s “most efficient workhorse model,” the updated 2.5 Flash offers “one of the best price performance ratio in the market today”. It’s faster and cheaper than Pro, suitable for “high volume tasks” like summarization. The new Flash is improved “across key benchmarks for reasoning, code, and long context”, coming in second only to 2.5 Pro on the LMArena leaderboard. Developers will soon have the ability to set thinking budgets for Flash, controlling cost and latency.

Other Gemini Models: The lineup also includes 2.0 Flash-Lite (small, fast, cheap) and models for on-device processing (Gemini Nano) and embedding. Older versions like Gemini 2.0 are still available, but the push is towards experimenting with the newer, better-performing 2.5 models.

Gemma 3: Google’s open family of models, Gemma, saw a significant update with Gemma 365. Based on Gemini technologies, Gemma 3 is multimodal, capable of receiving and having conversations about images and video. It’s available in multiple sizes (4B, 12B, 27B), offering a range of performance and hardware requirements. A key aspect is its accessibility – Gemma is designed to be run locally on your own laptop or phone, integrated with Open Source tools, and fine-tuned for specific use cases or domains. New variants like MedGemma (for the health industry, including medical images and text), SignGemma (expanding to sign language understanding), and ShieldGemma (for image safety classification) were also highlighted.

More Than Just Text: Rich Multimodal and Generative Capabilities

The updates didn’t stop at core models. Google announced significant advancements in multimodal understanding and content generation.

Audio Magic: Gemini TTS (text-to-speech) was launched, capable of generating “high-quality audio” with customizable “motions,” “multiple voices,” different “languages,” and even “mood speaker interactions” for podcast-like performance. For real-time interactions, the Gemini Live API now offers native audio output models, providing more natural and compelling voices with seamless language switching.

Image Generation & Editing: The latest image generation model, Imagen 4, is coming to the Gemini app, offering “richer” images with “more nuanced colors and fine-grained details”. Crucially, it’s significantly better at text and typography, even making creative choices in font and layout. You can also edit images generated with Imagen 4 directly in the app. A super-fast variant is also available, generating images 10 times faster than the previous model for rapid iteration. The Gemini Image Out variant also allows for image generation and editing.

Video Generation & Sound: A major leap was the announcement of Veo 3, Google’s new state-of-the-art video model. Veo 3 not only offers improved visual quality and stronger understanding of physics but also comes with native audio generation, meaning it can generate “sound effects, background sounds, and dialogue” directly in the video. To help creators leverage these capabilities, a new AI filmmaking tool called Flow was launched today, combining Veo, Imagen, and Gemini to assist with creating cinematic content.

Multimodal Understanding: Gemini API now allows uploading various file types (spreadsheet, docs, video, audio) and analyzing media information. New video capabilities include analyzing YouTube links, processing up to 6 hours of video at lower resolution settings, dynamic frame rates, video clipping, and image segmentation. The models can understand complex information and perform reasoning or calculations based on the input.

The Age of Agents: AI Taking Action

A significant theme was the rise of AI agents – systems that combine AI models with tools to take actions on your behalf.

Computer Use: Project Mariner, now evolving into Computer Use capabilities, allows AI agents to interact with browsers and other software. It’s gaining multitasking ability (up to 10 simultaneous tasks) and teach and repeat, where you show it a task once, and it learns a plan for similar tasks. This is being rolled out to developers via the Gemini API.

Agent Mode in Gemini App: An “experimental version” of Agent Mode is coming soon to subscribers. This mode in the Gemini app uses agents (like Project Mariner) to perform complex tasks like finding apartment listings based on specific criteria, adjusting filters, accessing listing details, and even scheduling tours.

Jules, the Coding Agent: A specific agent tailored for developers, Jules, is an “async agentic coding agent” that can “code your AI agents” and handle complex tasks like fixing bugs or updating codebases. It integrates with GitHub and is now in public beta.

Agent Ecosystem: Google is supporting agent frameworks like Google SDK/kit and collaborating with Open Source tools like LangChain. They also support an Agent to Agent protocol and the Model Context Protocol (MCP), making it easier for agents to talk to each other and access other services.

Putting AI to Work: New Tools and Experiences

Beyond the core models and agentic capabilities, Google introduced or updated several tools accessible via the Gemini API:

Google AI Studio: This popular web-based tool lets developers “test out the capabilities of API before committing to building applications at scale”. It now includes code generation. It’s described as a “UI and no-code experience” to quickly access features.

Search Grounding & URL Context: Tools to bring “fresh information” using Google Search grounding. A new tool called URL context allows extracting “more in-depth content” from a set of URLs (up to 20 links in a prompt) for deep research and analysis, powering features like the Google research Agent.

Code Execution & Function Calling: These tools are available via the API for tasks like creating charts or running analysis. Function calling, bread and butter for agentic apps, now supports asynchronous functions, allowing them to run in the background.

Structured Outputs: The API now offers more robust and comprehensive “structured outputs functionality,” letting you get responses in JSON schema.

Safety & Copyright Filters: Developers have access to a set of configurable filters to make applications safer, with control over threshold settings.

SDK Integration: The Gemini API has SDKs for Python, JavaScript, and GO. Notably, the SDK now supports MCP, allowing you to combine MCP clients/servers and Gemini API interactions in the same codebase.

What This Means For You

Casual Users: Get ready for AI everywhere! Your Google Search experience is transforming with AI Overviews already scaled to over 1.5 billion users. A new AI Mode in search is a “total reimagining” allowing “longer and more complex queries,” follow-up questions, dynamic UI adaptation, and eventually personalized suggestions based on your activity and connected Google apps (like Gmail). Search Live (using your camera with Search) lets you ask questions about what you see in real-time. Gemini Live in the Gemini app offers highly interactive, natural voice conversations, now with camera and screen sharing, rolling out free. Imagine getting personalized email replies that sound like you. Think real-time speech translation in Google Meet to break language barriers. And yes, AI is even coming to online shopping with AI Try-on. Futuristic tech like Google Beam (3D video calls) and Android XR glasses were also teased.

Professional Users & Developers: This is your playground! You have access to some of the most powerful models available via the Gemini API, from the cutting-edge Pro for complex reasoning and coding to the efficient Flash for scaling applications. A generous free tier lets you experiment without worrying about costs. You can use AI Studio for rapid prototyping. The rich set of tools available through the API – for multimodal processing, search grounding, code execution, function calling, and deep research – provides building blocks for diverse applications. The push towards agentic capabilities means you can start building applications that take actions autonomously. The SDKs and support for open frameworks simplify integration. For those who need more control or want to build specialized models, the open Gemma family, including new multimodal and domain-specific variants, offers flexibility for running models locally or fine-tuning them for specific needs..

Free vs. Paid: The Big Question

This is where things get a little nuanced based on the source information.

The Gemini API offers a “very generous free of charge” tier to start experimenting….

Google AI Studio is free to use for testing API capabilities….

Gemini Live in the Gemini app is rolling out free on Android and iOS…

AI Overviews have scaled to billions of users….

The new AI Mode in search is coming to everyone in the US.

Gemma models are open and can be downloaded and run locally without needing an API subscription38…. Domain-specific variants like MedGemma are available on platforms like Vertex AI.

However, one source mentions a new Google AI Ultra subscription at a significant monthly price to get access to the “full girth of Gemini’s Power”. This contrasts with mentions of features rolling out free….

Real-time speech translation in Google Meet is currently available for subscribers.

Agent Mode in the Gemini app is coming soon to subscribers.

Personal context in AI Mode, which connects to your Google apps like Gmail, is an opt-in feature.

Based on this, it seems Google is making many core AI experiences broadly accessible, including a free tier for developers using the API and free app features like Gemini Live and AI Mode in Search for users. However, some advanced or premium features, particularly agentic capabilities in the app and potentially the very top-tier models or subscription benefits, may be behind a paywall or subscription.

What’s Coming Next?

The pace of innovation isn’t slowing down. Expect to see, More agentic capabilities and “higher abstractions” rolling out in the coming months:

The Computer Use tool becoming publicly available soon.

More tool combinations available in the Chat API.

Wider rollout of asynchronous function calling….

Ephemeral tokens coming soon via the API….

The Deep Think mode for Gemini 2.5 Pro will be rolling out soon after trusted testing….

Gemini Diffusion (the experimental text diffusion model) continues development.

Gemini 2.5 Flash will be generally available in early June, with Pro soon after….

The first Google Beam devices are expected later this year.

More languages for real-time speech translation in Google Meet are rolling out.

Features prototyped in Project Astra will continue to graduate into Gemini Live.

Connecting Gemini Live to more Google apps (Calendar, Maps, Keep, Tasks) is coming soon.

Expansion of Gemma variants like Sign Gemma to support more sign languages.

Complex analysis and data visualization in AI Mode for search coming this summer….

Deep research capabilities integrated into AI Mode.

Personal context coming to AI Mode this summer….

Cutting-edge features from AI Mode gradually integrating into the core search experience.

Continued work on Android XR glasses and their integration with AI features like Search Live….

My Takeaway

As a content creator, this I/O felt like stepping into the future. The advancements in multimodal generation (hello, Veo 3 with audio and Imagen 4’s text capabilities!) are game-changers for creative workflows. The push towards AI agents and tools like Computer Use and Jules signals a shift towards AI assisting not just with ideation but with execution, potentially freeing up time for more creative endeavors.

For casual users, AI is becoming more integrated, natural, and helpful across everyday tasks, from searching the web to communicating. For developers, Google is offering powerful, flexible models and tools with increasing support for the Open Source ecosystem and agentic development. While questions remain about the paid tiers and how they’ll segment access, the sheer volume of innovation showcased yesterday is undeniable.

The best advice? Dive into AI Studio, check out the Gemini API docs, explore the Gemini cookbook on GitHub, and start experimenting. This is just the beginning, and the tools we saw yesterday are shaping the digital landscape we’ll be navigating (and creating within!) tomorrow.

Posted by

in

, ,