Gemini Google I/O AI agents multimodal AI large language models

Google Ships Gemini 3.5 Flash, an Agentic Assistant Called Spark, and a Do-Everything Model Nobody Fully Understands Yet

19 May 2026

Google has announced Gemini 3.5 Flash a faster and more efficient AI model designed to make complex agentic tasks viable at scale, boasting nearly 300 tokens per second while matching the benchmark performance of larger, slower frontier models. Alongside it, Gemini Spark is Google's first dedicated AI agent, running 24/7 in the cloud to autonomously handle tasks across Google's ecosystem — such as monitoring emails, generating summaries, and building slide decks — and will initially be available to AI Ultra subscribers. Google also unveiled Gemini Omni, a new multimodal model intended to eventually handle any type of input and output (text, image, video, audio) from a single unified model, though for now it is launching with video generation only, replacing Veo in Google's products.

A year ago at Google I/O, the company was still flogging the Gemini 2.5 family. Since then we've burned through 3.0 and 3.1, and now here we are at 3.5. The pace is either impressive or exhausting depending on your tolerance for version numbers.

Gemini 3.5 Flash is rolling out today across Google's product stack, and yes, Google is once again claiming a Flash model beats its previous-generation Pro. This has become something of a tradition at this point, but the numbers do back it up. The new model can output close to 300 tokens per second while posting benchmark scores comparable to frontier models running at roughly a quarter of that speed. For agentic workloads, which tend to be long-running and computationally expensive, that efficiency gap matters enormously.

Tulsee Doshi, who leads product management for Gemini, credits improved post-training techniques and feedback gathered from real developer usage, particularly through Antigravity, Google's in-house IDE. The results show up most clearly in coding tasks. On Terminal Bench and SWE-Bench Pro, 3.5 Flash comfortably beats older Flash models and nudges ahead of Gemini 3.1 Pro. It also sits in roughly the same territory as OpenAI's GPT 5.5, a model that costs considerably more to run.

UI control is another area where the new model makes a difference. Getting AI to navigate interfaces built for humans, clicking the right things in the right sequence across multiple steps, is genuinely hard and computationally heavy. Doshi argues 3.5 Flash handles this better because the quality-to-cost ratio has finally reached a point where it's worth attempting at scale. On OSWorld-Verified, which tests models against real computing environments, 3.5 Flash again beats older versions and roughly matches GPT 5.5.

Internally at Google, the jump from 3.1 Pro to 3.5 Flash has apparently been dramatic enough to measure in their own codebases. 'Massive, massive' was Doshi's choice of words, which is either genuine enthusiasm or the kind of thing you say at a press event. Probably both.

Antigravity 2.0 ships alongside the model, adding support for parallel sub-agent workflows. The idea is that 3.5 Flash can spawn multiple agents working simultaneously on different parts of a problem. This is only practical because the model is cheap enough to run several instances without the cost becoming absurd.

Beyond Antigravity, 3.5 Flash is heading to the Gemini app, the API, AI Studio, Android Studio, and Google's enterprise products. The Pro variant is already in internal testing and should arrive next month.

Gemini Spark: an agent that lives in Google's cloud

If 3.5 Flash is the engine, Gemini Spark is the first vehicle built around it aimed at regular users. Spark runs continuously in Google's infrastructure, untethered from any specific device or browser session. It has access to your Google Drive, Gmail, Calendar, and the rest of the ecosystem, and it acts on your instructions over time rather than just responding to one-off prompts.

In practice this looks like: set Spark to watch for certain types of emails and fold them into a daily digest, have it monitor your meetings and produce summaries with action items, or point it at a sprawling collection of documents and ask it to build a presentation. Google is careful to say it will ask for confirmation before doing anything consequential, though the definition of 'high-stakes' is doing a lot of work there.

Doshi has apparently been using Spark daily during internal testing. Her examples included building a slide deck on model performance stats ahead of I/O ('it turned out beautifully') and tracking developmental milestones for her infant ('I'm treating my child like an AI model,' she acknowledged). Both are genuinely interesting use cases, though the second one will give some people pause.

The privacy calculus here is real. Handing an AI model persistent access to your email, documents, and calendar is a significant ask. Google is betting that if the product is useful enough, people will get over it, which is probably right. People share an enormous amount with Google already that would have seemed dystopian in 2010.

Spark launches for AI Ultra subscribers next week. Google has restructured its Ultra pricing: there's now a tier at £100 per month for access to the latest features, and the previous top tier has dropped from £200 to £150 for those who want higher token limits. The plan is to eventually roll Spark out to free users as well, though no timeline was given.

Gemini Omni: the everything model, minus most of the everything

The more interesting and more uncertain announcement is Gemini Omni Flash. This is a genuinely multimodal model, meaning it is designed to accept any kind of input and produce any kind of output: text, images, video, audio. Right now it does video, replacing Veo 3 in products like the Gemini app, YouTube, and the Flow tool.

Omni sits outside the 3.5 branch and appears to represent a separate architectural direction. The current state of Google's AI output is a patchwork: images go through Nano Banana, music through Lyria, video through Veo, and so on. Developers have to connect to the right API for each modality, and not everything is available everywhere. Omni is Google's attempt to eventually collapse this into a single model that handles everything.

'The vision for Gemini has always been multimodal in, multimodal out,' Doshi said. Whether Omni actually delivers on that is unclear even to the Gemini team. They're starting with video output and plan to add other modalities over the coming months, watching to see whether a unified model genuinely outperforms the specialised ones or whether certain use cases still need their own dedicated systems.

The first release is a Flash-tier model. A Pro version is planned but has no release date. If the unified approach proves out, Omni could eventually become the foundation for future Gemini releases. If it doesn't, Google will presumably keep routing prompts to whichever specialist model seems most appropriate and hope nobody notices the seams.

READ NEXT

Google I/O 2026: Quadrillions of Tokens, Billions in Capex, and an Agent That Plans Your Block Party Claude for Chrome Still Has an Unpatched Extension Hijack Bug, Eight Versions On Agentforce Is Flopping With Customers, Says KeyBanc. Salesforce Disagrees.