Google has recently unveiled a significant technological stride achieved by it in recent times regarding its AI model Gemini. Google kicks off its Gemini 2,0 Era with Gemini 2.0 Flash. This model comes with significant efficiency improvements and input-related innovations. To be precise, the model’s MMLU-Pro has raised by 0.6 percent while the model now allows easy and efficient processing of even multimodal information in a single prompt.
The model also shows great improvement in code generation, specifically in Natural2Code. It lacks in performing long-context tasks which is evident as its performance ranking shows a decline in Long context tasks.
We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed (see chart below). I’m especially excited to see the fast progress on coding, with more to come.
— Sundar Pichai (@sundarpichai) December 11, 2024
Developers can try an experimental version in AI… pic.twitter.com/iEAV8dzkaW
Model’s response to efficiency
Upon asking the model to rank itself on the parameters of efficiency, it ranks itself as “Highly efficient”. While the model doesn’t give any substantial data to claim so but boasts of its various newly added features that bring in a structural parameter to ensure high efficiency as compared to its previous models.
Multimodal efficiency
The model comes with a multimodal advantage that can be looked at in two ways:
- Native multimodality: This means it has been built to be multimodal. Hence, it can process, understand, and generate responses for different types of information (text, images, code) more efficiently and seamlessly.
- Cross-Modal Reasoning: This means the model can process all different types of information in a single prompt. For instance, It can process an image and its accompanying text simultaneously giving it an edge over the previous model of Gemini. Accompanying text includes captions, surrounding text, and metadata and not embedded text specifically.
The cross-modal reasoning feature is majorly about establishing a relationship between the different sorts of information that are there for any given scenario and provides the best possible analysis after combining all the information available and interpreted.
Comparison with previous models
“We’re kicking off the start of our Gemini 2.0 era with Gemini 2.0 Flash, which outperforms 1.5 Pro on key benchmarks at 2X speed,” a tweet by Sundar Pichai stated. As per the information provided by Google’s CEO, the recently unveiled model is approximately 2X faster than its previous, Gemini 1.5 model.
Curious to know recent developments in AI, Read Google Likely to Unveil Its AI-Driven Browsing Agent in December
Various other innovations
Google has also unveiled various other innovations in the field of AI agents. The releases are:
- Project Mariner: Project Mariner developed by Google Deepmind can follow complex instructions and reason across websites — and shows its work. Its USP includes interpreting complex instructions further breaking them down into actionable steps. Its screenshot results show a staggering 84.09 percent results while WebVoyagers shows 83.5 percent results.
- Project Astra: A research prototype that explores the future capabilities of a universal AI assistant. It refines its answers by remembering key details of past conversations - as well as up to 10 minutes of its current session. Users can ask a question and it will use Google Search, Maps, and Lens to inform its answers.
- Project Jules: Project Jules is a GitHub workflow tool that automates coding tasks, including bug fixes. This allows developers to delegate repetitive work and concentrate on more creative problem-solving.
Read more about other innovations like OpenAI's Sora, read Can Sora create realistic human faces? Let's understand OpenAI’s latest video generation model