Back to all articles
10 MIN READ

Claude Computer Use: Controlling a Computer with AI

By Dorian Laurenceau

๐Ÿ“… Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

๐Ÿ”— Pillar article: Claude API: Complete Guide


What is Computer Use?

Computer Use is a groundbreaking feature that allows Claude to interact directly with a computer via screenshots. Unlike classic Tool Use (which calls functions), Computer Use allows Claude to:

  • โ†’See the computer screen via screenshots
  • โ†’Understand graphical interfaces (buttons, menus, forms)
  • โ†’Act on the computer (click, type, scroll)
  • โ†’Verify the result of its actions via new screenshots

The Interaction Loop

1. Your code โ†’ Screenshot โ†’ Claude
2. Claude analyzes the screen โ†’ Decides on an action
3. Claude โ†’ Action (click, type...) โ†’ Your code
4. Your code executes the action โ†’ New screenshot โ†’ Back to step 1

This loop continues until the task is completed or Claude determines it cannot continue.

Computer Use is the feature that most clearly illustrates the gap between what models can do in a sandbox and what they should do in production. Anthropic's Computer Use announcement was honest about its own beta status, and practitioners on r/ClaudeAI and r/LocalLLaMA have been even more honest: the demo of Claude booking a hotel works; the production workflow of Claude navigating your actual SaaS stack does not, and the reasons are not purely technical.

What the community correctly pushes back on: the marketing framing of "digital worker" implies reliability that the underlying loop does not yet provide. Screen-based agents are constrained by brittle selectors (a CSS class changes, the agent clicks the wrong button), by long latencies (each screenshot + decision cycle takes seconds), and by the absence of structured output (the agent has to re-parse a page from pixels on every turn). Research like WebArena quantifies this โ€” state-of-the-art agents solve only a fraction of real web tasks reliably, and that number has been moving slowly.

The pragmatic framing: Computer Use shines for exploratory or one-off automations where brittleness is acceptable (QA walkthroughs, UI regression screenshots, scraping a site that refuses to expose an API). It fails for critical paths that already have an API, an MCP server, or a scripting surface โ€” and using Computer Use there is a form of technical cosplay. Start with the API, fall back to the browser, and treat "agent sees screen" as the last resort, not the first.

Quickstart: First Example

import anthropic

client = anthropic.Anthropic()

# Computer Use tool configuration
tools = [{
    "type": "computer_20250124",
    "name": "computer",
    "display_width_px": 1920,
    "display_height_px": 1080,
    "display_number": 0  # Main screen
}]

# First request with a screenshot
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=tools,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": take_screenshot()  # Your screenshot function
                }
            },
            {
                "type": "text",
                "text": "Open Chrome browser and go to google.com"
            }
        ]
    }]
)

# Process the actions returned by Claude
for block in response.content:
    if block.type == "tool_use" and block.name == "computer":
        action = block.input
        execute_action(action)  # Execute the action on the computer

Available Actions

Claude can perform the following actions on the computer:

ActionDescriptionParameters
clickMouse clickx, y, button (left/right/middle)
double_clickDouble clickx, y
typeType texttext
keyPress a keykey (e.g., "Enter", "Tab", "ctrl+c")
scrollScrollx, y, direction (up/down), amount
moveMove mousex, y
screenshotRequest a new screenshot,
dragDrag and dropstart_x, start_y, end_x, end_y

Action Examples

# Click on a button
{"action": "click", "coordinate": [960, 540]}

# Double click to open a file
{"action": "double_click", "coordinate": [200, 300]}

# Type text
{"action": "type", "text": "Hello World"}

# Keyboard shortcut
{"action": "key", "key": "ctrl+s"}

# Scroll down
{"action": "scroll", "coordinate": [960, 540], "direction": "down", "amount": 3}

# Request a new screenshot
{"action": "screenshot"}

Complete Implementation

Computer Use Agent Loop

import anthropic
import base64
import subprocess

client = anthropic.Anthropic()

def take_screenshot():
    """Capture the screen and return base64."""
    # Linux with scrot
    subprocess.run(["scrot", "/tmp/screenshot.png"])
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

def execute_action(action):
    """Execute a Computer Use action on the computer."""
    action_type = action.get("action")
    
    if action_type == "click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
    
    elif action_type == "type":
        subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
    
    elif action_type == "key":
        subprocess.run(["xdotool", "key", action["key"]])
    
    elif action_type == "scroll":
        x, y = action["coordinate"]
        direction = action["direction"]
        button = "4" if direction == "up" else "5"
        subprocess.run(["xdotool", "mousemove", str(x), str(y)])
        for _ in range(action.get("amount", 3)):
            subprocess.run(["xdotool", "click", button])
    
    elif action_type == "screenshot":
        pass  # Will be captured on the next loop iteration

def computer_use_loop(task, max_steps=20):
    """Main Computer Use loop."""
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "source": {
                "type": "base64", "media_type": "image/png",
                "data": take_screenshot()
            }},
            {"type": "text", "text": task}
        ]
    }]
    
    tools = [{
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 0
    }]
    
    for step in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        if response.stop_reason == "end_turn":
            # Task completed
            for block in response.content:
                if block.type == "text":
                    print(f"โœ… Done: {block.text}")
            return
        
        # Execute actions
        messages.append({"role": "assistant", "content": response.content})
        
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                execute_action(block.input)
                
                # New screenshot after the action
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": [{
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": take_screenshot()
                        }
                    }]
                })
        
        messages.append({"role": "user", "content": tool_results})
    
    print("โš ๏ธ Maximum number of steps reached")

# Usage
computer_use_loop("Open Firefox, go to wikipedia.org and search for 'artificial intelligence'")

Use Cases

1. UI Testing

Automate UI tests without writing fragile selectors.

computer_use_loop("""
Test the registration form:
1. Fill the "Name" field with "John Smith"
2. Fill the "Email" field with "john@test.com"  
3. Fill the "Password" field with "SecurePass123!"
4. Check the "I accept the terms" checkbox
5. Click "Sign Up"
6. Verify that a success message appears
""")

2. Data Entry in Legacy Systems

ScenarioTraditional approachWith Computer Use
ERP without APICustom connector development (weeks)Script in a few hours
Legacy Windows appRPA with fragile rulesAdaptive vision
Complex web formsSelenium with selectors that breakClaude adapts to UI changes

3. Workflow Automation

computer_use_loop("""
Monthly reporting workflow:
1. Open the accounting application
2. Export last month's sales report as CSV
3. Open Excel and import the CSV
4. Create a pivot table by region
5. Save the file to the Desktop
""")

Security: Critical Points

Identified Risks

RiskLevelMitigation
Unintended actionsโš ๏ธ HighSandbox environment, supervision
Injection via screen contentโš ๏ธ MediumDon't navigate untrusted websites
Access to sensitive dataโš ๏ธ HighLimit the user account's permissions
Infinite action loopโš ๏ธ MediumLimit max_steps, global timeout

Security Checklist

  1. โ†’Isolated environment, Dedicated VM or Docker container
  2. โ†’Non-privileged account, The user must not be admin
  3. โ†’Limited network, Restrict network access to the strictly necessary
  4. โ†’Human supervision, Always have a way to interrupt the session
  5. โ†’Limited duration, Set a maximum timeout for each task
  6. โ†’Complete logs, Record all screenshots and actions
  7. โ†’No credentials, Never ask Claude to enter real passwords
FROM ubuntu:22.04

# Minimal graphical environment
RUN apt-get update && apt-get install -y \
    xvfb x11vnc fluxbox \
    firefox scrot xdotool \
    python3 python3-pip

# Install the Anthropic SDK
RUN pip3 install anthropic

# Startup script
COPY start.sh /start.sh
RUN chmod +x /start.sh

EXPOSE 5900
CMD ["/start.sh"]
# start.sh
#!/bin/bash
Xvfb :99 -screen 0 1920x1080x24 &
export DISPLAY=:99
fluxbox &
x11vnc -display :99 -forever -nopw &
python3 /app/computer_use_agent.py

Current Limitations

LimitationImpactWorkaround
Latency (~2-5s per action)Long workflows are slowGroup instructions
Fixed resolutionMust match actual screenConfigure display_width_px correctly
No complex drag-and-dropSome interactions impossibleCombine with keyboard shortcuts
Coordinate errorsClicks sometimes impreciseRequest a verification screenshot
High costEach screenshot = ~2500 tokensLimit the number of steps

Comparison with Alternatives

AspectComputer Use (Claude)Selenium/PlaywrightRPA (UiPath, etc.)
SetupMinimal (just the API)Drivers, selectorsHeavy installation
FragilityLow (adaptive vision)High (selectors break)Medium
SpeedSlow (~2-5s/action)FastFast
FlexibilityVery highMedium (web only)High
Cost per executionHigh (API tokens)Near zeroSoftware license
Supported appsAll (via screen)Web onlyDesktop + Web

GO DEEPER โ€” FREE GUIDE

Module 0 โ€” Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

D

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact
Published: March 10, 2026Updated: April 24, 2026
Newsletter

Weekly AI Insights

Tools, techniques & news โ€” curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is Claude Computer Use?+

Computer Use is a feature that allows Claude to interact with a computer via screenshots. Claude sees the screen, decides what actions to perform (click, type, scroll), and receives new screenshots to continue.

How does Computer Use work technically?+

It's a loop: your code takes a screenshot, sends it to Claude, Claude returns an action (click at x,y or type text), your code executes the action, takes a new screenshot, and the cycle repeats.

Is Computer Use safe to use?+

Computer Use should be used with caution. Anthropic recommends running it in an isolated environment (VM, container), never using it with privileged admin accounts, and supervising sessions.

What are the main use cases for Computer Use?+

Automated UI testing, data entry in legacy systems, product demos, workflow automation on applications without APIs, and complex data scraping.

How do I get Claude to control my PC?+

Claude Computer Use works via the API. You need to set up a Docker container or VM with Anthropic's tools installed. Claude then takes screenshots, analyzes them, and sends commands (click, keyboard, scroll). A quickstart guide is available in Anthropic's official documentation.

Is Claude easy to use for controlling a computer?+

Using Computer Use requires technical knowledge (API, Docker, Python). It's not a simple click in the interface, you need to configure a dedicated environment. However, with Anthropic's Python SDK, the basic code fits in 20 lines and many tutorials are available.