Back to all articles
9 MIN READ

Claude Computer Use: Controlling a Computer with AI

By Learnia Team

Claude Computer Use: Controlling a Computer with AI

📅 Last updated: March 10, 2026 — Covers Computer Use API, available actions, and security best practices.

🔗 Pillar article: Claude API: Complete Guide


What is Computer Use?

Computer Use is a groundbreaking feature that allows Claude to interact directly with a computer via screenshots. Unlike classic Tool Use (which calls functions), Computer Use allows Claude to:

  • See the computer screen via screenshots
  • Understand graphical interfaces (buttons, menus, forms)
  • Act on the computer (click, type, scroll)
  • Verify the result of its actions via new screenshots

The Interaction Loop

1. Your code → Screenshot → Claude
2. Claude analyzes the screen → Decides on an action
3. Claude → Action (click, type...) → Your code
4. Your code executes the action → New screenshot → Back to step 1

This loop continues until the task is completed or Claude determines it cannot continue.

Quickstart: First Example

import anthropic

client = anthropic.Anthropic()

# Computer Use tool configuration
tools = [{
    "type": "computer_20250124",
    "name": "computer",
    "display_width_px": 1920,
    "display_height_px": 1080,
    "display_number": 0  # Main screen
}]

# First request with a screenshot
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    tools=tools,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": take_screenshot()  # Your screenshot function
                }
            },
            {
                "type": "text",
                "text": "Open Chrome browser and go to google.com"
            }
        ]
    }]
)

# Process the actions returned by Claude
for block in response.content:
    if block.type == "tool_use" and block.name == "computer":
        action = block.input
        execute_action(action)  # Execute the action on the computer

Available Actions

Claude can perform the following actions on the computer:

ActionDescriptionParameters
clickMouse clickx, y, button (left/right/middle)
double_clickDouble clickx, y
typeType texttext
keyPress a keykey (e.g., "Enter", "Tab", "ctrl+c")
scrollScrollx, y, direction (up/down), amount
moveMove mousex, y
screenshotRequest a new screenshot
dragDrag and dropstart_x, start_y, end_x, end_y

Action Examples

# Click on a button
{"action": "click", "coordinate": [960, 540]}

# Double click to open a file
{"action": "double_click", "coordinate": [200, 300]}

# Type text
{"action": "type", "text": "Hello World"}

# Keyboard shortcut
{"action": "key", "key": "ctrl+s"}

# Scroll down
{"action": "scroll", "coordinate": [960, 540], "direction": "down", "amount": 3}

# Request a new screenshot
{"action": "screenshot"}

Complete Implementation

Computer Use Agent Loop

import anthropic
import base64
import subprocess

client = anthropic.Anthropic()

def take_screenshot():
    """Capture the screen and return base64."""
    # Linux with scrot
    subprocess.run(["scrot", "/tmp/screenshot.png"])
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

def execute_action(action):
    """Execute a Computer Use action on the computer."""
    action_type = action.get("action")
    
    if action_type == "click":
        x, y = action["coordinate"]
        subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
    
    elif action_type == "type":
        subprocess.run(["xdotool", "type", "--clearmodifiers", action["text"]])
    
    elif action_type == "key":
        subprocess.run(["xdotool", "key", action["key"]])
    
    elif action_type == "scroll":
        x, y = action["coordinate"]
        direction = action["direction"]
        button = "4" if direction == "up" else "5"
        subprocess.run(["xdotool", "mousemove", str(x), str(y)])
        for _ in range(action.get("amount", 3)):
            subprocess.run(["xdotool", "click", button])
    
    elif action_type == "screenshot":
        pass  # Will be captured on the next loop iteration

def computer_use_loop(task, max_steps=20):
    """Main Computer Use loop."""
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "source": {
                "type": "base64", "media_type": "image/png",
                "data": take_screenshot()
            }},
            {"type": "text", "text": task}
        ]
    }]
    
    tools = [{
        "type": "computer_20250124",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080,
        "display_number": 0
    }]
    
    for step in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        if response.stop_reason == "end_turn":
            # Task completed
            for block in response.content:
                if block.type == "text":
                    print(f"✅ Done: {block.text}")
            return
        
        # Execute actions
        messages.append({"role": "assistant", "content": response.content})
        
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                execute_action(block.input)
                
                # New screenshot after the action
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": [{
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/png",
                            "data": take_screenshot()
                        }
                    }]
                })
        
        messages.append({"role": "user", "content": tool_results})
    
    print("⚠️ Maximum number of steps reached")

# Usage
computer_use_loop("Open Firefox, go to wikipedia.org and search for 'artificial intelligence'")

Use Cases

1. UI Testing

Automate UI tests without writing fragile selectors.

computer_use_loop("""
Test the registration form:
1. Fill the "Name" field with "John Smith"
2. Fill the "Email" field with "john@test.com"  
3. Fill the "Password" field with "SecurePass123!"
4. Check the "I accept the terms" checkbox
5. Click "Sign Up"
6. Verify that a success message appears
""")

2. Data Entry in Legacy Systems

ScenarioTraditional approachWith Computer Use
ERP without APICustom connector development (weeks)Script in a few hours
Legacy Windows appRPA with fragile rulesAdaptive vision
Complex web formsSelenium with selectors that breakClaude adapts to UI changes

3. Workflow Automation

computer_use_loop("""
Monthly reporting workflow:
1. Open the accounting application
2. Export last month's sales report as CSV
3. Open Excel and import the CSV
4. Create a pivot table by region
5. Save the file to the Desktop
""")

Security: Critical Points

Identified Risks

RiskLevelMitigation
Unintended actions⚠️ HighSandbox environment, supervision
Injection via screen content⚠️ MediumDon't navigate untrusted websites
Access to sensitive data⚠️ HighLimit the user account's permissions
Infinite action loop⚠️ MediumLimit max_steps, global timeout

Security Checklist

  1. Isolated environment — Dedicated VM or Docker container
  2. Non-privileged account — The user must not be admin
  3. Limited network — Restrict network access to the strictly necessary
  4. Human supervision — Always have a way to interrupt the session
  5. Limited duration — Set a maximum timeout for each task
  6. Complete logs — Record all screenshots and actions
  7. No credentials — Never ask Claude to enter real passwords
FROM ubuntu:22.04

# Minimal graphical environment
RUN apt-get update && apt-get install -y \
    xvfb x11vnc fluxbox \
    firefox scrot xdotool \
    python3 python3-pip

# Install the Anthropic SDK
RUN pip3 install anthropic

# Startup script
COPY start.sh /start.sh
RUN chmod +x /start.sh

EXPOSE 5900
CMD ["/start.sh"]
# start.sh
#!/bin/bash
Xvfb :99 -screen 0 1920x1080x24 &
export DISPLAY=:99
fluxbox &
x11vnc -display :99 -forever -nopw &
python3 /app/computer_use_agent.py

Current Limitations

LimitationImpactWorkaround
Latency (~2-5s per action)Long workflows are slowGroup instructions
Fixed resolutionMust match actual screenConfigure display_width_px correctly
No complex drag-and-dropSome interactions impossibleCombine with keyboard shortcuts
Coordinate errorsClicks sometimes impreciseRequest a verification screenshot
High costEach screenshot = ~2500 tokensLimit the number of steps

Comparison with Alternatives

AspectComputer Use (Claude)Selenium/PlaywrightRPA (UiPath, etc.)
SetupMinimal (just the API)Drivers, selectorsHeavy installation
FragilityLow (adaptive vision)High (selectors break)Medium
SpeedSlow (~2-5s/action)FastFast
FlexibilityVery highMedium (web only)High
Cost per executionHigh (API tokens)Near zeroSoftware license
Supported appsAll (via screen)Web onlyDesktop + Web

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is Claude Computer Use?+

Computer Use is a feature that allows Claude to interact with a computer via screenshots. Claude sees the screen, decides what actions to perform (click, type, scroll), and receives new screenshots to continue.

How does Computer Use work technically?+

It's a loop: your code takes a screenshot, sends it to Claude, Claude returns an action (click at x,y or type text), your code executes the action, takes a new screenshot, and the cycle repeats.

Is Computer Use safe to use?+

Computer Use should be used with caution. Anthropic recommends running it in an isolated environment (VM, container), never using it with privileged admin accounts, and supervising sessions.

What are the main use cases for Computer Use?+

Automated UI testing, data entry in legacy systems, product demos, workflow automation on applications without APIs, and complex data scraping.

How do I get Claude to control my PC?+

Claude Computer Use works via the API. You need to set up a Docker container or VM with Anthropic's tools installed. Claude then takes screenshots, analyzes them, and sends commands (click, keyboard, scroll). A quickstart guide is available in Anthropic's official documentation.

Is Claude easy to use for controlling a computer?+

Using Computer Use requires technical knowledge (API, Docker, Python). It's not a simple click in the interface — you need to configure a dedicated environment. However, with Anthropic's Python SDK, the basic code fits in 20 lines and many tutorials are available.