Google I/O Event: Unveiling the Latest Updates Around Gemini
The Google I/O event, which was held yesterday, May 14th, was memorable as we got to witness impressive product updates and improved features that would make our lives much easier when performing certain tasks.
I mean, Google itself has confirmed that it is fully in its Gemini era, as it showed us what its generative AI model is capable of doing and how it would integrate with a lot of its products to bring about greater efficiency and utility.
To keep things simple, I have compiled a list of Google products that feature Gemini's breakthrough capabilities. You can check them out below:
Google Search with AI Overviews
Google has decided to take one step further by updating its Search Generative Experience (SGE), which was launched in beta in May 2023.
This new feature called "AI Overviews" is capable of handling complex questions, planning, and research and can provide quick answers with impressive accuracy thanks to Gemini's multi-step reasoning abilities.
It has also modified Google's search results page with a new look that will feature unique headlines and a wide range of content types and perspectives.
For now, AI Overviews will be rolling out to those in the U.S. and will be made available to other countries soon, as confirmed by Google.
Gemini 1.5 Pro and Gemini 1.5 Flash
Gemini 1.5 Pro, which was released in February, has received a new improvement, as announced by Google at its I/O event yesterday. Based on what we know, we believe this is Google's way of answering its competitors, like OpenAI, which released an improved version of its language model called GPT-4o on Monday.
According to Google, the improved Gemini 1.5 Pro will now be able to handle multiple large documents numbering 1500 pages in total or summarize 100 emails. This means that you will be able to quickly get insights and relevant information about large documents in the blink of an eye.
It also has a 2 million-token window, which has now improved the quantity of information the model can process at once.
Gemini 1.5 Flash, which is the smaller Google Gemini model, has been optimized for efficiency or higher-frequency tasks. It is now excellent at summarization, chat applications, image and video captioning, and data extraction from long documents and tables.
The Gemini 1.5 Flash has also been made perfect for developers who require a model that is lighter and less expensive than the Pro version but carries just as much power.
Project Astra
Google announced Project Astra, a research prototype that is referred to as an AI agent capable of reasoning, planning, and retaining memory.
According to Google Deepmind CEO Demis Hassabis, Astra will be an intelligent system that can think multiple steps ahead and work across software and systems, all to get something done on your behalf, and most importantly, under your supervision.
Astra will also use the camera and microphone on your device to assist you in everyday life. You could liken it to J.A.R.V.I.S. or F.R.I.D.A.Y., the fictional AI assistants that Iron Man uses in the Marvel Cinematic Universe.
Right now, Project Astra remains an early-stage feature with no specific launch plans. However, Google has hinted that it might be integrated into products like the Gemini app later this year, which would be a major leap in the development of helpful AI assistants.
Ask Photos
Google Photos was released almost nine years ago, and from that time until now, billions of users have used this tool to capture their most important memories in the form of photos and videos.
Now, Google has decided to integrate its Gemini generative AI model, which would now give people better experiences and help them search their memories more deeply.
For example, you can ask photos to tell you your swimming progress, and based on your saved memories, Gemini would go beyond just a simple search to curate a summary of different contexts, such as you doing laps, down to the date in your swimming certificate, so that you can relive your memories all over again.
Surprisingly, Google did not introduce the one-tap video enhancement feature and the "Show less" memories feature that were speculated to be announced in its I/O event. However, it might be that they are still in their testing phases and that Google might make them available in Google Photos soon or probably in its next event.
Google Veo, Imagen 3, and Audio Overviews
Google announced the Google Veo, which is its latest model for generating high-definition video, and Imagen 3, its highest-quality text-to-image model, which promises lifelike images and fewer distracting visual artifacts than its prior models.
It also introduced Audio Overviews, which give users the ability to generate audio discussions based on text input. For instance, if you ask for an example of a psychological problem in real life, it will carry out your request through interactive audio.
AI hardware - Trillium 6.0
Google unveiled Trillium, which it says is its sixth-generation tensor processing unit (TPU). This TPU is a piece of hardware that is integral to running complex AI operations and will be made available to cloud customers in late 2024.
According to Google, it is the most performant and most efficient TPU to date, delivering an improvement of up to four times in compute performance per chip over the previous generation, TPU v5e.
Google believes that Trillium will power the next generation of AI models and will help transform businesses, enable discoveries, and train and serve the future generations of Gemini models faster, more efficiently, and with lower latency than ever before.
The tech giant also confirmed that these TPUs are not meant to compete with other chips, like Nvidia’s graphics processing units.
Rather, it would continue to partner with Nvidia, as it plans to use its Blackwell platform for “various internal deployments and offer large-scale tools for enterprise developers building large language models.
PaliGemma and Gemma 2
We got to see the introduction of PaliGemma which is a powerful vision and language model (VLM) inspired by PaLI-3.
It was built on open components including the SigLIP vision model and the Gemma language model and is designed to produce top-notch performances on a wide range of vision-language tasks. This includes image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation.
Gemma 2 on the other hand is the the next generation of Google’s open-weights Gemma models, which will launch with a 27 billion parameter model in June.
So far, the standard Gemma models, which launched earlier this year, were only available in 2-billion-parameter and 7-billion-parameter versions, making this new 27-billion model quite a step up.
Conclusion
Google released an impressive list of updates that hopefully we expect will live up to their expectations. While some updates are available to limited regions across the globe, others have been scheduled to be released later in the future.
Nonetheless, these updates have shown us what the future holds in terms of AI and how we can harness its power to achieve greater results in a much more efficient way.