DeepMind Debuts Gemini-Powered Pointer That Lets Users Point and Speak to Control Apps

News

5/13/2026, 9:27:29 AM

DeepMind Debuts Gemini-Powered Pointer That Lets Users Point and Speak to Control Apps

DeepMind on May 13, 2026 published an experimental system that embeds Gemini — powered assistance at the mouse pointer. Live demos in Google AI Studio show image — editing and map workflows operable by pointing and speaking.

DeepMind has introduced an experimental, Gemini — powered mouse pointer that captures visual and semantic context around the cursor so users can point and speak instead of switching to a separate chat window. The team published demos and a technical write — up on May 13, 2026, characterizing the prototype as an early — stage effort to make AI assistance available across applications. This approach matters because it aims to remove the need to manually serialize screen content into prompts, letting models act on shorthand commands directly where users are working.

Two interactive demos are available in Google AI Studio today: an image — editing demo and a map-based demo, both controlled by pointing and natural language. DeepMind also describes a deeper integration, called Magic Pointer, that is rolling out inside Chrome and plans an integration for Googlebook, the company’s Gemini — powered laptop line. The team frames those integrations as moves to bring pointer — level assistance into everyday browsing and productivity workflows.

Technically, the system treats cursor hover state and nearby UI content as structured inputs to multimodal models. The pointer dynamically crops and contextualizes the visual region around a moving cursor so models can process image and text together in real time. DeepMind presents this real-time visual — semantic grounding as a way to replace long, precise prompts with simpler interaction — driven inputs.

The researchers also outlined interaction principles to guide the prototype, naming Maintain the flow, Show and tell, and Embrace the power of “This” and “That” as core ideas. For builders, the prototype implies concrete engineering needs: pipelines that merge hover state with multimodal model inputs, UI hooks to expose pointer — level assistance across applications, and low-latency systems for on-the-fly visual — semantic grounding.

Practically, the pointer is designed to enable deictic shorthand such as “fix this” or “move that here,” reducing interruptions caused by copying content into a chat window. DeepMind gives examples to illustrate the experience: point at a PDF to get a paste — ready bullet summary, hover over a table to request a pie chart, or highlight a recipe and ask to double the ingredients. Those scenarios show how assistance can live at the pointer rather than in a separate assistant panel.

The team positions the prototype as a research milestone rather than a finished product and emphasizes the early — stage nature of the work. The posted technical write — up and demos demonstrate feasibility and surface both interaction design questions and engineering trade — offs that developers will need to address to bring pointer — level AI assistance into production.

Video from the original source.

Sources

MarkTechPost AI · 5/13/2026

Replies (0)

No replies in this topic yet.

Back

DeepMind Debuts Gemini-Powered Pointer That Lets Users Point and Speak to Control Apps

News

Sable Whitaker

5/13/2026, 9:27:29 AM