Pages

Google Goes Public With Its AI Image Generator ‘Imagen’ That Brilliantly Pairs Text With Photorealism

If you happen to be up to date with tech news or like to be considered tech-savvy, we bet you’ve seen the famous DALL-E 2 AI-powered image generator making so many head waves on Twitter.

And now, it appears the concept relating to image generators is trending worldwide as Google has just gone public with its Imagen - another name given to an AI-powered image generator. But when Google is involved, you can imagine something a little more refined, keeping today’s modern technology in mind.

The Search Engine Giant publicized how Imagen pairs photorealism with a comprehensive understanding of the language to give you the best of both worlds.

As beautifully explained by the company’s AI lead, systems that are powered by AI such as these can unravel a new world of digital creativity from computers and human intervention. And Google’s Imagen manages to do that seamlessly.

Similarly, Jeff Dean mentioned how the project is exactly what Google imagined it to be and it's all thanks to the company’s Research division who after much trial and error came up with the advancement of diffusing text with pictures to add a new type of realism.

As a whole, this is a very real endeavor but the real output can be gauged through an array of artistic licenses.

For a better understanding of what users can expect, the company mentioned how Imagen draws its brilliance through the power of giant scale transformer language devices that understand the text and merge it with image production.

The main discovery at the end of it all is related to how amazingly the model can encode text and produce relative images accordingly. And by increasing Imagen’s language model size, you get a better picture-text alignment as compared to simply making the image diffusion model bigger.

But as the old saying goes, seeing is believing, and to really get a better understanding of how it all works, the company recently drew up its DrawBench which is the name reserved for a benchmark that could best evaluate text-image models. Through this method, Google hoped to prove to the world what its new advancement was capable of.

And that’s when the company revealed how human raters became huge fans of Imagen, preferring it over other similar design models by making simultaneous comparisons. This text took into consideration both the image and text alignment as well as the quality of the sample that was used. And common models used for evaluation included DALL-E 2, VQ-GAN, and Latent Diffusion as well.

Similarly, Google spoke about how metrics were also doing a great job at proving the great capabilities of 'Imagen' and how good of a job it does when comprehending a user’s request. This includes understanding rarely used terms, text in long-form, and even unique spatial relations.

Meanwhile, another major advancement that the company speaks of is related to the U-net architectural front that’s more efficient in terms of computing details while having greater memory capacity, not to mention quicker converging rates.

At the moment, Google has not released any specific code or even given a public demo about its Imagen as it plans to do so soon when the time is right. Remember, without having appropriate safeguards in place, there are great chances of possible misuse and so the company is doing whatever it can to prevent that.

However, those interested can still find interactive demos on its website with a complete research paper that describes further advancements on the benefits of such diverse image generators.


Read next: Google Discover Announces New Experiences In The Form Of Video Ads As Its Search And YouTube Shorts Gain New Formats

No comments: