Meta Unveils New Open-Source AI Model That Combines Multiple Streams Of Data

Facebook’s parent firm is unveiling a new AI model ImageBind that’s open-sourced and links several streams of data together.

This includes the likes of visuals, movements, text, temperature, and audio, the company recently confirmed.

Moreover, such a model can only be a part of the research project at this point in time without any form of consumer applications being outlined right now. However, it does mean that there are plenty of generative AI systems designed to produce immersive experiences that are multisensory and display Meta sharing AI research at some point in time.

This is when some rivals including OpenAI and Google become more and more secretive.

The main concept behind such trials and research has to do with combining several kinds of data into a single index. Such an idea might seem abstract but it’s the same that is related to the massive boom in generative AI technology that we’re seeing take over the world right now.

For instance, AI image generators including DALL-E, Midjourney, and Stable Diffusion are all putting reliance upon systems that combine both pictures and text during such a stage of training. They’re in search of patterns in the whole visual data while attempting to make connections of data toward a description of the image in question.

This enables such systems to produce images that follow the text input set in by the user and the same would be true regarding plenty of AI tools that produce both videos and audio in the same manner.

Meta claims such models are the first variant of its kind to put together six kinds of data into one single embedding region. Such types of data entailed in this model include the likes of visuals, thermal, text, audio, and another very interesting concept featuring movements that arise in the form of IMU.

Such IMUs are located on devices like phones and smartwatches. They’re made use of for a number of different types of tasks including making a switch from the device’s landscape mode to a portrait one, which clearly outlines a major change.


Meta says the whole idea behind such a change is to enable AI systems of the future from cross-referencing this type of information in the same manner as where AI systems stand currently in terms of text inputs.

Moreover, the tech giant also says that might soon entail some other types of sensory data as a part of its future model designs. This could be in the form of touch, smell, speech, and even the brain’s fMRI signals.

Similarly, it feels such research might bring machines one step nearer to humans and that ability would put together various types of information together. This all happens to be super speculative as it’s just the initial stage at this moment in time. The research done so far is quite limited and there’s a long way to go before we see it all come to reality.

Such AI models tend to produce short videos that are blurred through the likes of text descriptions. But so far, the work being done proves how future versions of such systems might combine data streams and produce audio that matches such audio outputs.

It’s quite an interesting aspect and one that industry watchers are not ignoring. Remember, this is the first time that the tech giant is open-sourcing such types of underlying models which is a scrutinized practice in today’s AI world.

But with the good always comes some bad and people who are not a fan of the entire concept of open sourcing such as OpenAI claim this sort of behavior comes with great harm including copying others' work with ease.

Read next: 59% of Customers Are Uncomfortable with AI Personalization
Previous Post Next Post