Google Launches Global Multimodal AI Agents Competition with $25,000 Grand Prize
Google has announced the launch of a new global competition focused on developing multimodal artificial intelligence agents, with participation open to developers from Egypt and around the world. The competition offers a grand cash prize of $25,000, aiming to stimulate innovation among developers, students, and AI enthusiasts in the rapidly evolving artificial intelligence era.
The challenge centers on building an AI agent capable of handling multiple input types, including text, audio, images, and video. This reflects the industry’s accelerating shift toward interactive multimodal applications rather than traditional text-only solutions.
Participants can choose from three main tracks. The first, Live Agents, focuses on creating real-time interactive agents that can respond to interruptions, such as live translators or AI tutors that review homework through a camera. The second track, Storyteller, emphasizes building agents that combine text, images, audio, and video into unified narrative experiences, including interactive books or AI-powered marketing content tools. The third track, UI Navigator, targets agents capable of visually interpreting screens, navigating applications, and performing tasks on behalf of users, such as browser automation or app testing.
The competition requires the use of Google’s Gemini model, along with official development tools such as SDKs or ADKs, and hosting the solution on Google Cloud.
Submissions must include a written project description, a source code repository hosted on GitHub with setup instructions, proof of deployment on Google Cloud, a system architecture diagram, and a demo video of no more than four minutes explaining how the solution works.
Additional evaluation points may be awarded to participants who share project-related content on social media using the challenge hashtag, implement automated cloud deployment, or engage with Google developer communities.









