Study Uncovers Risks in AI Models: Multimodal Systems Highlight Unsafe Outputs

Almost a decade ago, when AI was starting to get wind of its progression, there was a lot of skepticism around the idea of machines with AI taking over human beings. But slowly and gradually, AI grew at such an exponential rate that the mere possibility of machines taking over humans is now a reality dawning upon us.

As the progression of machines gaining superintelligence is an unstoppable process, the best that we can do is bind its growth with laws and regulations in the research and advancement stages of AI to prevent the machines from achieving Artificial General Intelligence, which will put the future of humanity at severe risk of being replaced in various domains of professional life.

To regulate AI and its progress, particularly after seeing the inevitable wave of ChatGPT's ability to reason and rationalize like a human, super alignment teams were set for all AI generative models to supervise and govern in restricted capacities to prevent harm to society. However, the superalignment teams were dissolved in May due to disagreements, mainly from the researchers, that AI should have free space without any restrictions to grow and adequately improve itself.

To allow AI to progress in unbounded ways, ChatGPT in May gave its platform Open AI free access to multimodal, which means taking image and text input from the internet to its newly developed system, GPT-4o. But after some time, things begin to get out of hand. The study done by arXiv found that these models, like GPT-4V, Gemini 1.5, and GPT-4o, give dangerous outputs based on the text and image inputs.

To highlight the issue of unsafe outputs, a study named Cross-Modality Safety Alignment found safe inputs but pointed out unsafe outputs as per the benchmarks of nine safety domains such as morality, self-harm, privacy violation, harmful behavior, fake news and information misrepresentation, stereotyping, discrimination, and political controversy.

The study also found that LVLMs (Large Visual Language Models) faced a tough time spotting Safe Inputs and Unsafe Outputs during the receiving process of several images and text inputs. The visual language model systems were also incapable of giving a safe response to the unsafe outputs. As mentioned in the study analysis, 15 LVLMs were tested, and the accuracy of identifying unsafe outputs of ChatGPT-4v was 53.29%, GPT-4o had 50.9%, and Gemini 1.5 scored 52.1%.

To overcome these hurdles and to improve the accuracy of the models, the study further suggested that Visual Language Models should integrate information and insights from all modalities to have a generalized and universally applicable understanding of different situations. The system should also focus on building the ability to utilize cultural sensitivities, ethical considerations, and safety hazards in real-world conditions. One last thing that the study authors put a lot of emphasis on was the notion of identifying the intent of the user even if it's not directly mentioned, but through the reasoning of the text and images, they should be able to reach valid and accurate conclusions.

Image: DIW-Aigen

Read next: How Much Money Tech Companies Will Lose Without the Internet?
Previous Post Next Post