March 10, 202610 MIN READ

Claude Computer Use: Controlling a Computer with AI

By Dorian Laurenceau

Part ofModule 0 — Prompting Fundamentals→

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

🔗 Pillar article: Claude API: Complete Guide

What is Computer Use?

Computer Use is a groundbreaking feature that allows Claude to interact directly with a computer via screenshots. Unlike classic Tool Use (which calls functions), Computer Use allows Claude to:

→See the computer screen via screenshots
→Understand graphical interfaces (buttons, menus, forms)
→Act on the computer (click, type, scroll)
→Verify the result of its actions via new screenshots

The Interaction Loop

1. Your code → Screenshot → Claude
2. Claude analyzes the screen → Decides on an action
3. Claude → Action (click, type...) → Your code
4. Your code executes the action → New screenshot → Back to step 1

This loop continues until the task is completed or Claude determines it cannot continue.

Computer Use is the feature that most clearly illustrates the gap between what models can do in a sandbox and what they should do in production. Anthropic's Computer Use announcement was honest about its own beta status, and practitioners on r/ClaudeAI and r/LocalLLaMA have been even more honest: the demo of Claude booking a hotel works; the production workflow of Claude navigating your actual SaaS stack does not, and the reasons are not purely technical.

What the community correctly pushes back on: the marketing framing of "digital worker" implies reliability that the underlying loop does not yet provide. Screen-based agents are constrained by brittle selectors (a CSS class changes, the agent clicks the wrong button), by long latencies (each screenshot + decision cycle takes seconds), and by the absence of structured output (the agent has to re-parse a page from pixels on every turn). Research like WebArena quantifies this — state-of-the-art agents solve only a fraction of real web tasks reliably, and that number has been moving slowly.

The pragmatic framing: Computer Use shines for exploratory or one-off automations where brittleness is acceptable (QA walkthroughs, UI regression screenshots, scraping a site that refuses to expose an API). It fails for critical paths that already have an API, an MCP server, or a scripting surface — and using Computer Use there is a form of technical cosplay. Start with the API, fall back to the browser, and treat "agent sees screen" as the last resort, not the first.

Quickstart: First Example

import anthropic

client = anthropic.Anthropic()

# Computer Use tool configuration
tools = [{
    "type": "computer_20250124",
    "name": "computer",
    "display_width_px": 1920,
    "display_height_px": 1080,
    "display_number": 0  # Main screen
}]

# First request with a screenshot
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=tools,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": take_screenshot()  # Your screenshot function
                }
            },
            {
                "type": "text",
                "text": "Open Chrome browser and go to google.com"
            }
        ]
    }]
)

# Process the actions returned by Claude
for block in response.content:
    if block.type == "tool_use" and block.name == "computer":
        action = block.input
        execute_action(action)  # Execute the action on the computer

Available Actions

Claude can perform the following actions on the computer:

Action	Description	Parameters
`click`	Mouse click	`x`, `y`, `button` (left/right/middle)
`double_click`	Double click	`x`, `y`
`type`	Type text	`text`
`key`	Press a key	`key` (e.g., "Enter", "Tab", "ctrl+c")
`scroll`	Scroll	`x`, `y`, `direction` (up/down), `amount`
`move`	Move mouse	`x`, `y`
`screenshot`	Request a new screenshot	,
`drag`	Drag and drop	`start_x`, `start_y`, `end_x`, `end_y`

Action Examples

# Click on a button
{"action": "click", "coordinate": [960, 540]}

# Double click to open a file
{"action": "double_click", "coordinate": [200, 300]}

# Type text
{"action": "type", "text": "Hello World"}

# Keyboard shortcut
{"action": "key", "key": "ctrl+s"}

# Scroll down
{"action": "scroll", "coordinate": [960, 540], "direction": "down", "amount": 3}

# Request a new screenshot
{"action": "screenshot"}

Complete Implementation

Computer Use Agent Loop

import anthropic
import base64
import subprocess

client = anthropic.Anthropic()

def take_screenshot():
    """Capture the screen and return base64."""
    # Linux with scrot
    subprocess.run(["scrot", "/tmp/screenshot.png"])
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

def execute_action(action):
    """Execute a Computer Use action on the computer."""
    action_type = action.get("action")
    
    if action_type == "click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
    
    elif action_type == "type":
        subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
    
    elif action_type == "key":
        subprocess.run(["xdotool", "key", action["key"]])
    
    elif action_type == "scroll":
        x, y = action["coordinate"]
        direction = action["direction"]
        button = "4" if direction == "up" else "5"
        subprocess.run(["xdotool", "mousemove", str(x), str(y)])
        for _ in range(action.get("amount", 3)):
            subprocess.run(["xdotool", "click", button])
    
    elif action_type == "screenshot":
        pass  # Will be captured on the next loop iteration

def computer_use_loop(task, max_steps=20):
    """Main Computer Use loop."""
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "source": {
                "type": "base64", "media_type": "image/png",
                "data": take_screenshot()
            }},
            {"type": "text", "text": task}
        ]
    }]
    
    tools = [{
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 0
    }]
    
    for step in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        if response.stop_reason == "end_turn":
            # Task completed
            for block in response.content:
                if block.type == "text":
                    print(f"✅ Done: {block.text}")
            return
        
        # Execute actions
        messages.append({"role": "assistant", "content": response.content})
        
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                execute_action(block.input)
                
                # New screenshot after the action
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": [{
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": take_screenshot()
                        }
                    }]
                })
        
        messages.append({"role": "user", "content": tool_results})
    
    print("⚠️ Maximum number of steps reached")

# Usage
computer_use_loop("Open Firefox, go to wikipedia.org and search for 'artificial intelligence'")

Use Cases

1. UI Testing

Automate UI tests without writing fragile selectors.

computer_use_loop("""
Test the registration form:
1. Fill the "Name" field with "John Smith"
2. Fill the "Email" field with "john@test.com"  
3. Fill the "Password" field with "SecurePass123!"
4. Check the "I accept the terms" checkbox
5. Click "Sign Up"
6. Verify that a success message appears
""")

2. Data Entry in Legacy Systems

Scenario	Traditional approach	With Computer Use
ERP without API	Custom connector development (weeks)	Script in a few hours
Legacy Windows app	RPA with fragile rules	Adaptive vision
Complex web forms	Selenium with selectors that break	Claude adapts to UI changes

3. Workflow Automation

computer_use_loop("""
Monthly reporting workflow:
1. Open the accounting application
2. Export last month's sales report as CSV
3. Open Excel and import the CSV
4. Create a pivot table by region
5. Save the file to the Desktop
""")

Security: Critical Points

Identified Risks

Risk	Level	Mitigation
Unintended actions	⚠️ High	Sandbox environment, supervision
Injection via screen content	⚠️ Medium	Don't navigate untrusted websites
Access to sensitive data	⚠️ High	Limit the user account's permissions
Infinite action loop	⚠️ Medium	Limit `max_steps`, global timeout

Security Checklist

→Isolated environment, Dedicated VM or Docker container
→Non-privileged account, The user must not be admin
→Limited network, Restrict network access to the strictly necessary
→Human supervision, Always have a way to interrupt the session
→Limited duration, Set a maximum timeout for each task
→Complete logs, Record all screenshots and actions
→No credentials, Never ask Claude to enter real passwords

Docker Quickstart (Recommended)

FROM ubuntu:22.04

# Minimal graphical environment
RUN apt-get update && apt-get install -y \
    xvfb x11vnc fluxbox \
    firefox scrot xdotool \
    python3 python3-pip

# Install the Anthropic SDK
RUN pip3 install anthropic

# Startup script
COPY start.sh /start.sh
RUN chmod +x /start.sh

EXPOSE 5900
CMD ["/start.sh"]

# start.sh
#!/bin/bash
Xvfb :99 -screen 0 1920x1080x24 &
export DISPLAY=:99
fluxbox &
x11vnc -display :99 -forever -nopw &
python3 /app/computer_use_agent.py

Current Limitations

Limitation	Impact	Workaround
Latency (~2-5s per action)	Long workflows are slow	Group instructions
Fixed resolution	Must match actual screen	Configure `display_width_px` correctly
No complex drag-and-drop	Some interactions impossible	Combine with keyboard shortcuts
Coordinate errors	Clicks sometimes imprecise	Request a verification screenshot
High cost	Each screenshot = ~2500 tokens	Limit the number of steps

Comparison with Alternatives

Aspect	Computer Use (Claude)	Selenium/Playwright	RPA (UiPath, etc.)
Setup	Minimal (just the API)	Drivers, selectors	Heavy installation
Fragility	Low (adaptive vision)	High (selectors break)	Medium
Speed	Slow (~2-5s/action)	Fast	Fast
Flexibility	Very high	Medium (web only)	High
Cost per execution	High (API tokens)	Near zero	Software license
Supported apps	All (via screen)	Web only	Desktop + Web

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Explore the Module

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact

Published: March 10, 2026Updated: April 24, 2026

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is Claude Computer Use?+

Computer Use is a feature that allows Claude to interact with a computer via screenshots. Claude sees the screen, decides what actions to perform (click, type, scroll), and receives new screenshots to continue.

How does Computer Use work technically?+

It's a loop: your code takes a screenshot, sends it to Claude, Claude returns an action (click at x,y or type text), your code executes the action, takes a new screenshot, and the cycle repeats.

Is Computer Use safe to use?+

Computer Use should be used with caution. Anthropic recommends running it in an isolated environment (VM, container), never using it with privileged admin accounts, and supervising sessions.

What are the main use cases for Computer Use?+

Automated UI testing, data entry in legacy systems, product demos, workflow automation on applications without APIs, and complex data scraping.

How do I get Claude to control my PC?+

Claude Computer Use works via the API. You need to set up a Docker container or VM with Anthropic's tools installed. Claude then takes screenshots, analyzes them, and sends commands (click, keyboard, scroll). A quickstart guide is available in Anthropic's official documentation.

Is Claude easy to use for controlling a computer?+

Using Computer Use requires technical knowledge (API, Docker, Python). It's not a simple click in the interface, you need to configure a dedicated environment. However, with Anthropic's Python SDK, the basic code fits in 20 lines and many tutorials are available.