Gemini 3.5 Flash Is Faster and Smarter Than Its Predecessor — And Considerably More Expensive
Google DeepMind has released Gemini 3.5 Flash, and the headline numbers look good: faster inference, better benchmarks, improved multimodal support. The catch is that running it costs 5.5 times more than the model it replaces. That's not a rounding error.
Analysis from Artificial Analysis, which received early access, found that token prices have tripled. Input now costs $1.50 per million tokens, output $9.00 — up from $0.50 and $3.00 for Gemini 3 Flash. Per token, it still undercuts Gemini 3.1 Pro, which sits at $2.00 and $12.00. But per-token pricing has become a dangerously misleading way to compare costs.
The reason is token consumption. On agentic benchmark tasks, Gemini 3.5 Flash burns through so many more input tokens that its total cost ends up 75 percent higher than the Pro model it supposedly sits below. The context window remains at one million tokens, so the ceiling hasn't changed — just the rate at which you hit it.
This pattern isn't unique to Google. Anthropic's Opus 4.7 quietly got 30 to 40 percent more expensive through higher token usage. OpenAI's GPT-5.5 jumped 50 to 90 percent above GPT-5.4. Google went further and raised both token prices and consumption simultaneously.
Better, but not without caveats
On the Artificial Analysis Intelligence Index, Gemini 3.5 Flash scores 55 — nine points above its predecessor, putting it ahead of Grok 4.3 and Claude Sonnet 4.6. Hallucination rates dropped sharply, down 31 percentage points to 61 percent. That sounds dramatic until you see that MiMo-V2.5-Pro and Grok 4.3 are both sitting at 25 percent. Meaningful progress, but nowhere near the front of the pack.
The biggest gains are in agentic tasks, which is also where things get expensive. On GDPval-AA, a benchmark testing real agent tasks with web and shell access, Gemini 3.5 Flash scored an Elo of 1,656 — a massive jump from Gemini 3 Flash's 1,204, and close to GPT-5.4's 1,674. Strong result.
The price of that performance is interaction steps. Gemini 3.5 Flash averages 49 turns per task. Claude Opus 4.7 takes 45, GPT-5.4 takes 40, and Gemini 3.1 Pro manages the same work in 23. Every extra turn inflates the input token count, which is why the total cost ends up higher than a nominally pricier Pro model.
Coding is a real problem
For a model with these ambitions, the coding benchmark is awkward. On the Artificial Analysis Coding Index, Gemini 3.5 Flash scores 45. Gemini 3.1 Pro Preview scores 55. GPT-5.5 hits 59. Even Claude Sonnet 4.5 at 51 beats it. This matters because coding is one of the highest-demand use cases for exactly this type of model, and it's also deeply intertwined with agentic workflows. Being strong at agent tasks but weak at coding undercuts a significant chunk of the practical appeal.
Where it genuinely excels
Speed is the real story. Gemini 3.5 Flash outputs over 280 tokens per second, roughly 70 percent faster than its predecessor and faster than any comparable model at this intelligence level. For latency-sensitive applications, that's genuinely useful.
Multimodal support is also a differentiator. Unlike Claude Opus 4.7, Grok 4.3, and GPT-5.5 — all limited to image input — Gemini 3.5 Flash handles video and audio as well. On MMMU-Pro, the multimodal benchmark, it scored 84 percent, the highest ever recorded. Google holds the top two spots, with Gemini 3.1 Pro second at 82 percent.
The broader cost problem
The price trajectory across the industry reflects what these models are actually being built for. Complex, multi-step agentic tasks require more compute than answering a single question. Unless hardware inference costs fall faster than per-task compute demands rise, the direction of travel is clear.
For simpler workloads, older or lighter models remain available — Gemini 3.1 Flash-Lite being the obvious option in Google's lineup. But for companies already committed to heavier AI workflows, the return on investment calculation is getting genuinely difficult. Discrete tasks like translation or code generation are at least measurable. The harder question is what you're paying for when AI assists with knowledge work — and whether the productivity gains, which tend to be diffuse and slow to show up, ever justify increasingly expensive model bills.