Google Plans Major Gemini AI Expansion, Introducing New Modalities Beyond Text in Coming Months

Google’s Gemini series for AI LLMs may have had a rough start but it’s definitely getting its fair share of success for all the hard work it put out.

There have been some very embarrassing instances with image generators going awfully wrong but the company has come a long way indeed. So much improvement was made with time and the tech giant is keen on seeing more success with its Gemini 2.0 launch which could be the biggest and best offering for businesses and users to date.

Today, the firm shared more on this front including the general rollout for Gemini 2.0 Flash while introducing its Gemini 2.0 Flash-Lite. It similarly launched an experimental variant of the Gemini 2.0 Pro.

The models are created to provide more support for companies and developers and can be accessed through Google’s AI Studio and Vertext AI. Meanwhile, the Flash-Lite is on public preview with Pro up for grabs as an early testing phase.

Google says all the models will have multimodal input and text outputs after the release. There will similarly be more modalities on offer that are for general availability in the next couple of months. This seems to be a great advantage that Google brings to the table as other arch rivals such as DeepSeek and OpenAI continue to launch competitive products in AI.

Currently, both DeepSeek and OpenAI are yet to accept the multimodal inputs which means images, attachments, or any file upload. R1 can do that through the mobile app chat or through its website. It carries out OCR that’s more than 60 years old to take out text from such material. So it’s not quite understanding any more features present inside.

They both can be dubbed new class reasoning models that take longer than others to think through an answer and reflect on it through a chain of thought processing. That’s quite different from how regular LLMs work such as the Gemini 2.0 Pro. So comparing Google’s AI chatbots to other rivals is a whole new ordeal altogether.

Today, Google’s CEO shared on X that all Gemini for phone apps on iOS and Android will get the Gemini 2.0 Flash Thinking. This can be linked to apps like Google Maps, Search, and YouTube.

The launch of Gemini 2.0 Flash is another major news shared by Google which is believed to be production-ready starting today. The model is curated to serve high-end AI apps and give rise to low-latency replies with support for large-scale reasoning done on a multimodal basis.

One of the biggest advantages here has to do with the token numbers and large context window that users add as prompts and get back-and-forth replies with the LLM chatbot or API. So many leading models show support for more than 200k tokens or less than that. Hence, the fact that this supports more than one million says so much. It’s very useful for tasks done on a large scale and those which are very frequent.

On the other hand, Google’s Gemini 2.0 Flash-Lite is the latest LLM designed to give rise to cost-effective solutions in AI without lowering the quality standards. It can outperform the predecessor and show support for multimodal inputs. There are features for a new context window feature of more than one million tokens like a complete Flash Model.

Other than this, the company’s DeepMind division is rolling out the latest safety and privacy measures for all Gemini 2.0 models. It wants to make the most of better accuracy via reinforcement learning techniques. It can similarly be used for refining outputs and for the likes of critiques. Furthermore, it hopes to add automated security updates to highlight any vulnerability such as prompt injection threats done indirectly.

In the future, DeepMind from Google wants an expansion of the Gemini modal family with further modalities going beyond just text. It hopes for a launch on this front as early as the next few months.

Image: Google

Read next: Scams On The Rise with Fraudsters Posing as Real TikTok Employees

Google Plans Major Gemini AI Expansion, Introducing New Modalities Beyond Text in Coming Months

Dr. Hura Anwar

You might like