ArticleAugust 23, 2025

Beyond the boast: Did Google’s Gemini really outshine ChatGPT?

CN

@Zakariae BEN ALLALCreated on Sat Aug 23 2025

Beyond the boast: Did Google’s Gemini really outshine ChatGPT?

August 23, 2025

TL;DR: In 2023, DeepMind’s Demis Hassabis said Google’s next AI, Gemini, would outshine ChatGPT. Since then, Google shipped Gemini 1.0 and 1.5, with native multimodality and very long context windows. On Google’s own benchmarks, Gemini Ultra reached state-of-the-art (including the first 90% on MMLU), and 1.5 brought 1M-token context. But independent, crowd-ranked evaluations show no runaway winner: Gemini, OpenAI’s GPT‑4o, and Anthropic’s Claude 3.5 trade places at the top depending on task. Verdict: Gemini delivered major strengths (long context, tight product integration), but “outshine” depends on what you measure.

The claim and the race it sparked

Shortly after Google merged its Brain team with DeepMind to form Google DeepMind in 2023, CEO Demis Hassabis teased a new model family—Gemini—aimed at leapfrogging ChatGPT. Coverage at the time framed it as a direct challenge to OpenAI’s lead and a test of Google’s broader AI strategy built into Search, Android, and Workspace.¹⁴

What Google actually shipped

Gemini 1.0 (Ultra, Pro, Nano)

Google formally introduced Gemini in December 2023 with three sizes—Ultra (flagship), Pro (versatile), and Nano (on-device)—and emphasized native multimodality (text, images, audio, and more) rather than bolting modalities on after the fact. Google reported state-of-the-art results on a wide set of benchmarks, highlighting a milestone in general knowledge and reasoning:

“Gemini Ultra is the first model to outperform human experts on MMLU with a score of 90.0%.”

— Google, Introducing Gemini²

Gemini 1.5: long-context and efficiency

In early 2024, Gemini 1.5 expanded context windows to 1 million tokens for the Pro variant—enough to cram full codebases, long videos, or book-length documents into a single prompt. Google also stressed more efficient inference and better multimodal grounding, enabling richer retrieval over large inputs.³

How does Gemini stack up against ChatGPT and Claude?

Benchmarks vs. lived performance

Academic benchmarks: Google’s launch materials showed Gemini Ultra leading on many popular tests (e.g., MMLU for knowledge and reasoning).² These are valuable but increasingly saturated and sensitive to test contamination and prompt tuning.
Crowd rankings: Community “blind taste tests” like the LMSYS Chatbot Arena have repeatedly placed multiple frontier models in the top tier—Gemini 1.5 Pro, GPT‑4o, and Claude 3.5 Sonnet—with small Elo swings that vary by user task and prompt style. No single model dominates across the board.⁷

Multimodality and real-time interaction

Google Gemini: Designed for multimodality from the start, with emphasis on parsing long videos, large PDFs, and mixed media in one session. The 1M-token window is a practical differentiator for enterprise document and code understanding.³
OpenAI GPT‑4o: Introduced in 2024 with real-time text, vision, and audio in a single model, pushing latency down for “talk to your computer” experiences (voice assistants, live screen understanding).⁵
Anthropic Claude 3.5: Marketed strength in reasoning and coding, with strong performance in agentic tasks and tool use reported by Anthropic.⁶

Where Gemini clearly shines

Long-context retrieval: 1M-token context windows enable end-to-end analysis of lengthy codebases and documents without elaborate chunking heuristics.³
Ecosystem integration: Deep hooks into Google’s products (Search, Android, Workspace) make Gemini accessible where work already happens—useful for adoption, governance, and data residency.
On-device options: The Nano tier targets private, low-latency tasks on phones—important for privacy-sensitive or offline use cases.²

Where the verdict is mixed

General helpfulness and creativity: Head-to-head user preferences vary across tasks and prompts; GPT‑4o and Claude 3.5 often match or beat Gemini in crowd ratings.⁷
Audio-first experiences: GPT‑4o’s low-latency voice and vision demos set a high bar for real-time assistants, although Google showcases similar directions with Gemini; performance depends on deployment and device.

So…did Gemini outshine ChatGPT?

It depends on the yardstick. If you value long-context multimodality and tight integration with Google’s stack, Gemini delivered—often spectacularly—on the original promise. If you judge by day-to-day “helpfulness” across diverse prompts, the independent consensus through 2024 showed several leaders in a narrow pack. In other words: Gemini didn’t run away with the race, but it decisively moved the field forward and reset expectations for context length and media breadth.

What to watch next

Reasoning reliability: Better tool-use, planning, and error recovery—especially on multi-step, real-world tasks.
Latency and cost: Real-time multimodal assistants require tight budgets for both; efficiency wins will shift adoption.
Enterprise controls: Auditability, data governance, and content provenance (e.g., watermarking and metadata) at scale.
Agentic workflows: Safe, supervised autonomy for workflows that span apps, data sources, and devices.

Sources

Thank You for Reading this Blog and See You Soon! 🙏 👋

Let's connect 🚀

Share this article

Latest Insights

Deep dives into AI, Engineering, and the Future of Tech.

Featured

Collage of five AI browsers - Chrome Gemini, Edge Copilot, ChatGPT Atlas, Perplexity Comet, and Dia - displayed on a laptop screen in a workspace

I Tried 5 AI Browsers So You Don’t Have To: Here’s What Actually Works in 2025

I explored 5 AI browsers—Chrome Gemini, Edge Copilot, ChatGPT Atlas, Comet, and Dia—to find out what works. Here are insights, advantages, and safety recommendations.

Read Article

Must Read

AWS Nova 2 and Nova Forge announced onstage at re:Invent 2025, highlighting enterprise AI customization

AWS’s Nova 2 and Nova Forge Empower Tailored Enterprise AI Solutions

Discover AWS's Nova 2 and Nova Forge, which empower builders to create custom "Novellas" by integrating your data in earlier training phases for enhanced control, reliability, and scale.

View of a modern UK supercomputing facility representing AI compute and data infrastructure

AI Week in Review: UK’s Science-Driven Strategy and Global Trends, Nov 15-22, 2025

The UK launches its AI for Science Strategy, expands AI Growth Zones, and unveils a national data facility while global AI adoption accelerates and OpenAI partners with Foxconn.

Andrej Karpathy discussing AI and education at a tech event

Karpathy’s Verdict on AI Homework: Stop Policing, Start Redesigning School

Andrej Karpathy argues the war on AI homework is lost. Learn how schools can adapt: shift grading in-class, teach AI literacy, and design fair assessments.

Three Years of ChatGPT: How a Quiet Demo Transformed Tech, Work, and Markets

Three years after ChatGPT’s launch, discover how it reshaped tech, work, and markets—from GPT-4 to GPT-4o and 800M weekly users, plus what’s next.

Beyond the boast: Did Google’s Gemini really outshine ChatGPT?

Beyond the boast: Did Google’s Gemini really outshine ChatGPT?

The claim and the race it sparked

What Google actually shipped

Gemini 1.0 (Ultra, Pro, Nano)

Gemini 1.5: long-context and efficiency

How does Gemini stack up against ChatGPT and Claude?

Benchmarks vs. lived performance

Multimodality and real-time interaction

Where Gemini clearly shines

Where the verdict is mixed

So…did Gemini outshine ChatGPT?

What to watch next

Sources

Share this article

Latest Insights

I Tried 5 AI Browsers So You Don’t Have To: Here’s What Actually Works in 2025

AWS’s Nova 2 and Nova Forge Empower Tailored Enterprise AI Solutions

AI Week in Review: UK’s Science-Driven Strategy and Global Trends, Nov 15-22, 2025

Karpathy’s Verdict on AI Homework: Stop Policing, Start Redesigning School

Three Years of ChatGPT: How a Quiet Demo Transformed Tech, Work, and Markets

Stay Ahead of the Curve