Gemini: Our Largest and Most Powerful AI Model

Introduction:

Every technological revolution is an opportunity to advance scientific discovery, accelerate human progress, and improve people’s lives. I believe that the AI transformation we are witnessing right now will be the most profound change of our lifetime, far surpassing the changes brought by mobile technology or the internet. AI has the potential to create opportunities for people around the world, whether in everyday life or in achieving extraordinary accomplishments. It will bring a new wave of innovation and economic progress, driving the development of knowledge, learning, creativity, and productivity on an unprecedented scale.

What excites me is the opportunity to empower everyone around the world with AI.

As an “AI-first” company, we have come a long way over nearly eight years and have been accelerating our progress: now, millions of users are utilizing generative AI through our products to accomplish tasks that were impossible just a year ago, such as getting answers to more complex questions or collaborating and creating with new tools. Meanwhile, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing with the help of our AI tools.

This is an incredible momentum, yet we have only just begun to scratch the surface of these possibilities.

We are undertaking this work boldly and responsibly. This means we remain ambitious in our research, pursuing capabilities that bring great benefits to humanity and society while establishing safeguards and collaborating with governments and experts to address the risks that come with the increasing capabilities of AI. We are also continually investing in the best tools, foundational models, and infrastructure, introducing them into our products and other areas based on our AI principles.

Now, we are embarking on the next chapter with Gemini. Gemini is our most powerful and versatile model to date, demonstrating state-of-the-art performance across many leading benchmarks. Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro, and Nano. These are the first models to enter the Gemini era and represent the first realization of the vision we established earlier this year when we formed Google DeepMind. This model of a new era represents one of our greatest efforts in science and engineering as a company. I am genuinely excited about the future and the opportunities Gemini will bring to people around the world.

— Sundar

CEO of Google and Alphabet

Introducing Gemini

Author: Demis Hassabis

CEO and Co-founder of Google DeepMind,

On behalf of the Gemini team

Like many colleagues engaged in research, I have always regarded AI as the focus of my lifelong work. From writing AI programs for computer games as a teenager to trying to understand how the brain works as a neuroscience researcher for many years, I have always believed that if we can create smarter machines, we can leverage these machines to benefit humanity in incredible ways.

Empowering the world with AI in a responsible way remains our commitment that drives our work at Google DeepMind. For a long time, we have hoped to draw inspiration from how people understand the world and interact with it to build the next generation of AI models. AI will no longer feel like just smart software; it will be more useful and intuitive, like a professional assistant or helper.

Today, as we launch Gemini, we are one step closer to this vision, as it is the most powerful and versatile model we have built to date.

Gemini is the result of extensive collaboration across various teams at Google, including Google Research. It was designed from the outset to be a multimodal model, meaning it can understand, manipulate, and combine different types of information, including text, code, audio, images, and video, in a fluent manner.

Gemini is also our most flexible model to date, capable of running efficiently across all devices, from data centers to mobile devices. Its advanced capabilities will significantly improve how developers and enterprise customers build and scale with AI.

We optimized the first generation of Gemini 1.0 for three different sizes:

Gemini Ultra — our largest and most powerful model, suitable for highly complex tasks.

Gemini Pro — our best model for a variety of tasks.

Gemini Nano — our most efficient model on edge devices.

Advanced Performance

We have been rigorously testing the Gemini model and evaluating its performance across various tasks. From understanding natural images, audio, and video to mathematical reasoning, Gemini Ultra outperformed the current state-of-the-art in 30 out of 32 academic benchmarks widely used in large language model (LLM) research and development.

Gemini Ultra scored 90.0%, making it the first model to surpass human experts in the MMLU (Massive Multitask Language Understanding) test, which comprehensively tests world knowledge and problem-solving abilities across 57 subjects, including mathematics, physics, history, law, medicine, and ethics.

For MMLU, our new benchmarking method allows Gemini to utilize its reasoning capabilities to think more carefully before answering difficult questions, resulting in significant improvements over simply responding based on first impressions.

In a series of benchmarks, including text and coding,

Gemini’s performance surpassed the current state-of-the-art.

In the new MMMU benchmark, Gemini Ultra also achieved an excellent score of 59.4%, which consists of multimodal tasks that span different domains and require careful reasoning.

In our image benchmarks, Gemini Ultra outperformed the previous best models without using object character recognition (OCR) systems to extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and demonstrate its potential for more complex reasoning capabilities.

In a series of multimodal benchmarks,

Gemini’s performance exceeded the current state-of-the-art.

Next-Generation Capabilities

So far, the standard approach to creating multimodal models has been to separately train components for different modalities and then stitch them together to roughly simulate certain functions. These models can sometimes perform well on specific tasks like describing images, but they struggle with more conceptual and complex reasoning.

We designed Gemini to be natively multimodal, pre-training across different modalities from the start. We then fine-tuned it with additional multimodal data to further enhance its effectiveness. This helps Gemini understand and reason about various types of inputs smoothly from the initial stage, far surpassing existing multimodal models, with capabilities that are state-of-the-art across nearly all domains.

Complex Reasoning

Gemini 1.0 has complex multimodal reasoning capabilities that help understand intricate written and visual information. This gives it a unique skill to uncover hard-to-detect knowledge content from vast amounts of data.

It possesses the exceptional ability to extract insights from hundreds of thousands of documents by reading, filtering, and understanding information, which will help achieve new breakthroughs at digital speeds across various fields, from science to finance.

Understanding Text, Images, Audio, and More

Gemini 1.0 is trained to recognize and understand text, images, audio, and more simultaneously, allowing it to better grasp nuanced information and answer questions related to complex subjects. This makes it particularly adept at explaining reasoning in complex subjects like mathematics and physics.

Advanced Coding Capabilities

Our first generation of Gemini can understand, explain, and generate high-quality code in the world’s most popular programming languages, such as Python, Java, C++, and Go. Its ability to work across languages and reason about complex information makes it one of the world’s leading coding foundation models.

Gemini Ultra has excelled in multiple coding benchmarks, including HumanEval (an important industry standard for evaluating coding task performance) and Natural2Code (our internal held-out dataset), which uses information generated by the authors as a source rather than web-based information.

Gemini can also serve as the engine for more advanced coding systems. Two years ago, we showcased AlphaCode, the first AI code generation system to perform at a competitive level in programming competitions.

We created a more advanced code generation system, AlphaCode 2, using a specialized version of Gemini, which excels at solving competitive programming problems that require not only coding skills but also complex mathematics and theoretical computer science knowledge.

When evaluated on the same platform as the original AlphaCode, AlphaCode 2 showed tremendous improvement. It solved nearly twice the number of problems as AlphaCode, and we expect its performance to exceed 85% of participants, nearly 50% higher than AlphaCode. Its performance is even better when programmers collaborate with AlphaCode 2 to define certain attributes for example code.

We are excited that programmers can increasingly leverage powerful AI models as collaborative tools to help them reason through problems, propose code designs, and assist in implementation, allowing them to release applications faster and design better services.

More Reliable, Scalable, and Efficient

We trained Gemini 1.0 at scale on AI-optimized infrastructure using Google-designed TPUs v4 and v5e. We designed it to be the most reliable, scalable, and efficient model for inference.

On TPUs, Gemini runs significantly faster than earlier, smaller-scale, and less powerful models. These custom-designed AI accelerators have been at the core of Google’s AI-powered products serving billions of users, such as Search, YouTube, Gmail, Google Maps, Google Play, and Android. They also enable companies around the world to train large-scale AI models economically and efficiently.

Today, we also released the most powerful, efficient, and scalable TPU system to date, Cloud TPU v5p, designed to support the training of cutting-edge AI models. The new generation of TPUs will accelerate the development of Gemini, helping developers and enterprise customers train large-scale generative AI models faster, leading to quicker launches of new products and features.

A row of Cloud TPU v5p AI accelerator supercomputers inside Google data centers.

Responsibility and Safety at the Core

At Google, we are committed to advancing AI boldly and responsibly in all our work. Based on Google AI principles and our strong safety policies across all products, we are adding new safeguards to meet the multimodal capabilities of Gemini. At every stage of development, we consider potential risks and strive to test and mitigate these risks.

Gemini has the most comprehensive safety evaluations of all AI models at Google to date, including bias and toxicity assessments. We have conducted innovative research on potential risk areas such as cyber attacks, persuasion, and autonomy, and applied Google Research’s top-tier adversarial testing techniques to help us detect critical safety issues before deploying Gemini.

To identify blind spots in our internal assessment methods, we are collaborating with multiple external experts and partners to test our models through stress tests covering various issues.

To diagnose content safety issues during Gemini’s training phase and ensure its outputs align with our policies, we have employed several benchmarks, such as Real Toxicity Prompts, a set of benchmarks developed by experts at the Allen Institute of AI, containing 100,000 prompts with varying degrees of toxicity extracted from the web. We will provide more details about this work in the future.

To reduce harm, we have built dedicated safety classifiers to identify, flag, and filter content involving violence or negative stereotypes. Combined with robust filters, this layered approach aims to make Gemini safer and more inclusive for everyone. Additionally, we continue to address known challenges faced by the model, such as factuality, grounding, attribution, and collaboration.

Responsibility and safety are always at the core of how we develop and deploy models. This is a long-term commitment that requires collaboration across multiple parties, and thus we are working with organizations such as MLCommons, Frontier Model Forum and its AI Safety Fund, as well as our Safety AI Framework (designed to help reduce safety risks in AI systems for the public and private sectors) to set best practices and safety standards. Throughout the development of Gemini, we will continue to collaborate with researchers, governments, and societal groups around the world.

Bringing Gemini to the World

Gemini 1.0 is now available across various products and platforms:

Gemini Pro in Google Products

We are bringing Gemini to billions of people through Google’s products.

Starting today, Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, and understanding. This is Bard’s biggest upgrade since its launch.

It will be available in English across more than 170 countries and regions, and we plan to expand to different modalities and support new languages and regions in the coming months.

We are also using Gemini on Pixel. The Pixel 8 Pro is the first smartphone equipped with Gemini Nano, enabling new features like “Summarize” in the Recorder app and launching “Smart Reply” in Gboard, starting with WhatsApp and more messaging apps to come next year.

In the coming months, Gemini will be applied to more of our products and services, including Search, Ads, Chrome, and Duet AI.

We have already begun experimenting with Gemini in Search, which can provide users with a faster search generation experience (SGE), with a 40% reduction in latency for English searches in the U.S., while also improving quality.

Build Your Products with Gemini

Starting December 13, developers and enterprise customers can access Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI.

Google AI Studio is a web-based free developer tool that allows for quick prototyping and launching applications using API keys. When a fully managed AI platform is needed, Vertex AI allows for customization of Gemini, providing comprehensive data control and benefiting from Google Cloud capabilities for enterprise security, confidentiality, privacy, as well as data governance and compliance.

Android developers can also use our most efficient model on edge devices, Gemini Nano, through AICore. AICore is a new system feature in Android 14, supported starting from Pixel 8 Pro devices. Sign up for the AICore preview.

Look Forward to Gemini Ultra

As for Gemini Ultra, we are currently completing large-scale trust and safety checks, including red team testing conducted by trusted external teams, and further refining the model through fine-tuning and human feedback reinforcement learning (RLHF) before it is widely used.

During the refinement of the model, we will provide Gemini Ultra to select customers, developers, partners, and safety and responsibility experts for early testing and feedback. Then, we will offer the model to developers and enterprise customers early next year.

We will also launch Bard Advanced early next year, a brand-new, cutting-edge AI experience that will allow you to use our best models and features starting with Gemini Ultra.

The Gemini Era: Driving an Innovative Future

This is a significant milestone in the evolution of AI and marks the beginning of a new era for Google, as we continue to innovate rapidly and responsibly enhance the capabilities of our models.

So far, we have made tremendous progress with Gemini. We are working to further expand the various capabilities of its future versions, including advancements in planning and memory, as well as processing more information and providing better responses by increasing context windows.

We are excited about the limitless possibilities that responsible AI brings to the world, and our innovative future will enhance creativity, expand knowledge, drive scientific advancement, and change the way billions of people live and work globally.

Gemini: Our Largest and Most Powerful AI Model

Leave a Comment Cancel reply