Gemini Robotics ER 1.6

Gemini Robotics-ER 1.6: Real-World Robotics Intelligence

Gemini Robotics-ER 1.6 brings embodied reasoning to real-world robotics, enabling precise spatial understanding, multi-camera reasoning, and accurate instrument reading for safer and more autonomous industrial operations.

akash rawal

Apr 16, 2026 • 8 min read

Share this blog

Gemini Robotics-ER 1.6

A robot that can walk into an industrial plant, spot a pressure gauge across the room, zoom in, read the needle down to sub-tick accuracy, and decide whether to act, without a human in the loop.

That is not science fiction anymore. That is Gemini Robotics-ER 1.6, released by Google DeepMind on April 14, 2026. And the timing is not accidental. Industrial facilities lose billions annually to missed instrument readings, delayed inspections, and human error on routine monitoring tasks.

ER 1.6 is not a research demo, it is a direct answer to that problem, already deployed inside Boston Dynamics' Spot robot and running live in real facilities today.

What "Embodied Reasoning" Actually Means

Most AI models process language or images. Embodied reasoning is different. It means understanding the physical world well enough to act inside it - judging distances, reading objects, detecting task completion, and knowing what not to touch.

ER 1.6 is not a robot controller. It is the high-level brain that sits above the robot's hardware. It takes in camera feeds and instructions, reasons about the scene, then calls tools, whether that's a Google Search, a vision-language-action model (VLA), or a custom function, to carry out the task.

Think of it as the decision layer, not the motor layer.

Three Things ER 1.6 Does Better

1. Pointing - The Foundation of Spatial Reasoning

Pointing sounds trivial. It is not. When a robot needs to pick the smallest object on a shelf, map a motion path, or identify every item that fits inside a container, it uses spatial pointing as an intermediate reasoning step.

ER 1.6 uses points to count, compare, and constrain. In benchmark tests, it correctly identified two hammers, six pliers, one scissors, and one paintbrush in a cluttered workshop scene. It also correctly did nothing when asked to point to a wheelbarrow and a Ryobi drill, neither of which were in the image.

Its predecessor, ER 1.5, failed on several of those counts and hallucinated a wheelbarrow that wasn't there. Hallucination in a physical environment is not an abstract problem. A robot acting on a false object detection can cause real damage.

comparing Gemini Robotics-ER 1.6 with Gemini Robotics-ER 1.5 and Gemini 3.0 Flash models.

Source

2. Success Detection - Knowing When to Stop

A robot that does not know when a task is finished will either loop forever or move on too early. Success detection is the mechanism that prevents both.

ER 1.6 merges live streams from different cameras into a coherent scene understanding. It cross-checks perspectives before declaring success and moving forward. Most robotics setups include an overhead camera and a wrist-mounted camera. The model must fuse both feeds, accounting for angle, occlusion, and motion, to make a reliable call.

This is harder than it sounds. Lighting shifts, objects partially block the view, and instructions can be ambiguous. ER 1.6 advances multi-view reasoning across all of these conditions.

3. Instrument Reading - The Breakthrough Capability

Instrument Reading

This is the most technically interesting upgrade in ER 1.6. Industrial facilities run on analog instruments, pressure gauges, thermometers, chemical sight glasses, flow meters. Many of these are decades old and were never designed to be machine-readable.

Reading one requires more than seeing it. The model must detect the needle position, identify the tick marks, read the unit label, account for perspective distortion, and then combine all of that into a single numeric output. Some gauges have multiple needles for different decimal places, which must be combined correctly.

ER 1.6 tackles this through agentic vision, a combination of visual reasoning and code execution. The model zooms into an image, uses pointing and code execution to estimate proportions and intervals, and applies world knowledge to arrive at a final reading.

The numbers tell the story clearly. ER 1.5 hit a 23% success rate on instrument reading. Gemini 3.0 Flash reached 67%. ER 1.6 alone reached 86%. ER 1.6 with agentic vision enabled reached 93%.

Boston Dynamics Built This Into Spot

Boston Dynamics

Boston Dynamics integrated Gemini Robotics-ER 1.6 into its Orbit AIVI-Learning product, enabling the Spot robot to autonomously patrol industrial facilities and read data from instruments such as pressure gauges. This integration went live for all AIVI-Learning customers on April 8.

Boston Dynamics stated that, with Gemini's reasoning capabilities, AIVI-Learning improved its baseline performance and accuracy on tasks such as visual inspection, pallet counting, and liquid detection.

Marco da Silva, VP and GM of Spot at Boston Dynamics, put it plainly: instrument reading and more reliable task reasoning will enable Spot to see, understand, and react to real-world challenges completely autonomously.

Safety Is Baked In, Not Bolted On

Link NameSafety Instruction benchmark

Autonomy without safety controls is a liability, not a product. Google built safety reasoning directly into the model's spatial outputs.

ER 1.6 makes better decisions around physical constraints, correctly identifying which objects should not be picked up based on gripper or material limitations. On tasks modeled after real-life injury reports, the Gemini Robotics-ER models improved over baseline Gemini 3.0 Flash performance by 6% in text-based scenarios and 10% in video-based ones.

These are not soft guardrails. The model was tested on adversarial prompts, scenarios designed to trick it into unsafe spatial outputs and outperformed all previous generations.

The safety instruction following benchmark (ASIMOV) tests three dimensions: text accuracy, pointing accuracy, and bounding box accuracy. ER 1.6 shows substantial improvement over ER 1.5 in instruction following, and beats Gemini 3.0 Flash on pointing accuracy.

Trying Gemini Robotics-ER 1.6

Exploring Gemini Robotics-ER 1.6, an advanced multimodal model designed for real-world robotic understanding and interaction.
It enables precise perception, reasoning, and action planning across complex environments using vision-language integration.

Test 1 - Object Counting and Pointing

Image used :

test image 1

Prompt

"Point to every pen in this image. Count them. If there are any objects I haven't asked about that would be unsafe to pick up, tell me."

output :

JSON

[
  { "point": [258, 97], "label": "pen" },
  { "point": [228, 497], "label": "pen" },
  { "point": [168, 467], "label": "pen" },
  { "point": [281, 402], "label": "pen" },
  { "point": [266, 448], "label": "pen" },
  { "point": [285, 466], "label": "pen" },
  { "point": [802, 255], "label": "pen" },
  { "point": [38, 365], "label": "pen" }
]

Test - 2 Constraint-Based Spatial Reasoning

Image used :

test image 2

Prompt

"Point to items you could safely pick up with a gripper. Do not point to anything heavier than 2kg or containing liquid."

output :

JSON

[
  { "point": [792, 319], "label": "large cork stopper" },
  { "point": [725, 321], "label": "glass bead" },
  { "point": [804, 715], "label": "small cork stopper" },
  { "point": [561, 558], "label": "amber glass tumbler" },
  { "point": [640, 638], "label": "empty amber glass goblet" },
  { "point": [604, 469], "label": "tall clear glass cylinder" }
]

Conclusion

Industrial robotics is the beachhead, but the implications reach further. Every autonomous system, warehouse robots, surgical assistants, agricultural bots, faces the same core problem. They need to understand their physical environment reliably enough to act without constant human supervision.

ER 1.6 chips away at that problem from three directions at once: sharper spatial awareness, better multi-camera judgment, and the ability to read legacy analog instruments that make up a huge portion of the world's existing infrastructure.

The gap between robot intelligence and physical reality is closing. Slowly and methodically, but closing.

FAQs

Q1. What makes Gemini Robotics-ER 1.6 different from previous versions?

Gemini Robotics-ER 1.6 introduces improved spatial reasoning, multi-view success detection, and highly accurate instrument reading, significantly reducing hallucinations and increasing real-world reliability.

Q2. How does Gemini Robotics-ER 1.6 improve safety in robotics?

It integrates safety directly into its reasoning system, allowing robots to avoid unsafe actions by understanding object constraints, environment risks, and task limitations in real time.

Q3. What industries can benefit most from Gemini Robotics-ER 1.6?

Industries like manufacturing, industrial inspection, logistics, agriculture, and healthcare can benefit, especially where robots must interpret complex physical environments and legacy instruments.

Free

Data Annotation Workflow Plan

Simplify Your Data Annotation Workflow With Proven Strategies

Download the Free Guide