Beta

Curating trusted science content, from across the internet.

Explore Deep Dive

Google DeepMind robotics lab tour with Hannah Fry

Google DeepMind

10/12/2025·

Watch: 17 minRead: 2 min

Robotics & Automation

# Robotics # Gemini # vision-language models # end-to-end-robotics # DeepMind

Below is a short summary and detailed review of this video written by FutureFactual:

Gemini powered robotics demonstrates long horizon planning and dexterity at Google DeepMind

Overview

In this lab visit with Google DeepMind, Hannah Fry and Kanishka Rao showcase how Gemini-based vision-language-action models are powering robots that can understand general human instructions, plan longer-horizon tasks, and perform precise manipulation. The demonstrations move beyond preprogrammed routines to open-ended, adaptive behavior.

Key takeaways

The robots leverage two integrated systems: a reasoning-capable ER model and a vision-language-action model for physical actions, enabling end-to-end task execution from high level instructions. Demos include packing a lunch with millimetre precision and sorting objects in open-ended scenarios, highlighting generalization and data-driven learning in robotics.

Introduction and context

The episode explores Google DeepMind's robotics work, focusing on how Gemini's multimodal reasoning is embedded in robotic systems to achieve general-purpose manipulation. The lab tour features director of robotics Kanishka Rao and host Hannah Fry, with demonstrations that emphasize open-ended task execution rather than fixed, pre-scripted moves.

Foundational architecture

Two core components form the backbone: an ER model for reasoning and a VLA (vision-language-action) model that handles perception, language understanding, and physical actions. The ER component orchestrates the VLA to produce long-horizon plans, while the robot executes sequences of actions in a coordinated fashion, enabling tasks such as weather-aware packing and luggage organization.

From short-horizon to long-horizon tasks

Earlier demonstrations focused on short actions like grabbing or placing objects. The current setup enables chaining multiple small moves into longer, meaningful workflows, such as looking up weather, selecting what to pack, and executing a packing plan end to end. The lab shows dexterity demonstrations on the Aloha robot platform as well as generalization tests with unseen objects, illustrating the system's ability to adapt to new scenes and items.

Open-ended generalization and data challenges

The researchers highlight the necessity of large-scale, diverse manipulation data and call for continued breakthroughs to improve data efficiency and safety. They discuss teleoperation as a data-collection method and emphasize the open-ended, unstructured nature of real-world manipulation as a key remaining hurdle.

Humanoid demonstrations and future outlook

To find out more about the video and Google DeepMind go to: Google DeepMind robotics lab tour with Hannah Fry.

Related posts

featured

Short Wave

·11/02/2026

AI is great at predicting text. Can it guide robots?