The rise of multimodal AI agents

Technology companies are investing large amounts of money in creating new multimodal artificial intelligence models and algorithms that can learn, reason and make decisions autonomously after collecting and analysing data.

 

Data processing and machine learning are accelerating the development of artificial intelligence at an unrelenting pace. While early AI assistants, such as Siri or Alexa, were limited to simple interactions, with the entry of ChatGPT on the scene everyone started talking about the next generation of AI assistants that could perform more complex tasks.

The goal was to create a system capable of performing a wide range of tasks, like a human assistant. However, these assistants did not go beyond the processing of textual data, limiting their practical use. It is an approach far from how humans understand the world, using multiple sensory channels simultaneously.

Thus, the evolution of AI is focused on new algorithms that can process and integrate information from various modalities, including images, audio and video, to improve interaction. Many experts, including Sam Altman, CEO of Open AI, say that multimodal AI agents are the next big revolution that will make AI tools even more integrated into our daily lives than smartphones.

The future of multimodal agents

In practical terms, a multimodal AI agent can, for example, analyse a text while processing an image, spoken language, or an audio clip to give a more complete and accurate response, both through voice and text. This opens up new possibilities in various fields: from education and healthcare to e-commerce and customer service.

According to David Barber, director of the Centre for Artificial Intelligence at University College London, these agents could also streamline the processes of businesses and public bodies, so that an AI agent could function as a more complex customer service canister.

Unlike the current generation of linguistic model-based assistants that can only generate the next likely word in a sentence, an AI agent would have the ability to act autonomously on natural language commands and process customer service tasks without supervision, such as analysing customer complaint emails and, by accessing the management database, process them according to company policies.

Multimodal AI agents can also analyse consumers’ shopping behaviour, including their interaction with various media, to provide more personalised product recommendations. This is a practical application that would also be useful in educational environments, transforming the learning experience by providing personalised and interactive content.

Perhaps one of the most obvious uses of this latest evolution of AI is autonomous vehicles that can drive with limited human intervention. While it is true that we are still a long way from these vehicles being able to achieve fully autonomous operation, AI agents are already an integral part of their operation, sensing the car’s environment and making informed decisions.

In the medical field, it can not only improve patient care by integrating various types of data but can also help healthcare professionals diagnose diseases, identify patterns and suggest possible treatments by analysing medical images, vital data and the patient’s medical history. However, as with other potential applications that handle large amounts of personal data, privacy, security and ethical issues will need to be addressed to ensure public acceptance.

 

11Onze is the community fintech of Catalonia. Open an account by downloading the app El Canut for Android or iOS and join the revolution!

If you liked this article, we recommend:

Technology

The evolution of artificial intelligence

3 min read

Artificial intelligence (AI) is an increasingly ubiquitous...

Technology

Machine learning and ethics

3 min read

Machine learning is a branch of artificial intelligence...

Technology

Can I be fired by an algorithm?

3 min read

Digital systems using artificial intelligence are...



Equip Editorial Equip Editorial
  1. Joan Santacruz CarlúsJoan Santacruz Carlús says:

Leave a Reply

App Store Google Play