Google unveils Genie 2 which is a foundational world model that can generate various models of action-controllable, playable 3D environments for training and evaluating embodied agents. It is called a foundational world model as it is capable of generating an entire 3D world rich with diversity.
Text input
A simple text input is sufficient enough to generate a playable 3D environment. It means a user can generate a particular output based on their text input and choice of rendering. The text input enables Imagen 3, a state-of-the-art image-to-text model, to generate an image based on the prompt which is further furthered into a video.
The model can generate a video running up to a minute with the surroundings being playable and actionable based on the actions.
Controls offered
The model responds intelligently and swiftly to the actions commanded by the user by pressing the keys to move the character accordingly and bring the cinematic aspects of the same in the video on its own.
Alternate outputs
The model can generate counterfactual or alternative outputs for the very same frame based on the movement of the object and the actions taken. It means the AI model is trained not only in terms of movement but also in terms of the response that is generated upon any action and the way the environment responds to it.
Innumerable environments
The model is capable of generating different perspectives, such as first-person views, isometric views, or third-person driving videos. It gives this model a more real-time and professionally equipped approach to facilitate as many environments and perspectives of the user, its actions, or the environment.
Real-time & practical
The world model has been trained to behave as much in real-time and practically as it can be in terms of physics, gravity, reflections, and lighting. This makes it more advanced in terms of generating 3D outputs as the details of such kind make the scenes look more realistic and appealing.
A step forward in AGI
As for Artificial General Intelligence (AGI), the model would need extensive training and input to become one eventually. This particular development has paved the way for solving the problem of training an AGI. It will facilitate the training of embodied agents safely while achieving the breadth and generality required to progress toward AGI.