March 10, 20269 MIN READ

Claude Vision: Analyzing Images, Charts & Visual Documents

By Dorian Laurenceau

Part ofModule 0 — Prompting Fundamentals→

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

🔗 Pillar article: Claude API: Complete Guide

What is Claude Vision?

Claude Vision is Claude's multimodal capability that allows it to understand and analyze images. Beyond simple object recognition, Claude can:

→Read text (OCR) in documents, screenshots, and photos
→Interpret charts and extract data
→Analyze technical diagrams (UML, architecture, workflows)
→Describe images in detail with context
→Compare multiple images in a single request

The honest read on Claude Vision in 2026, tracked across r/ClaudeAI, r/LocalLLaMA, and r/computervision: vision models are very good at the tasks they've been trained to evaluate themselves on (describing, captioning, chart-reading on common formats) and unreliably good at the tasks people actually want them for in production (extracting data from a specific invoice template, reading handwritten clinical notes, parsing a screenshot of a legacy app's UI). The Anthropic vision docs and the MMMU benchmark leaderboard both reflect this gap — top-line scores are high, real-world edge cases are where the failures hide.

Where the community correctly pushes back on vision-model hype: OCR is still a dedicated-tooling problem for anything that matters. Tesseract, Textract, Google Document AI, and the Mistral OCR API consistently beat general-purpose vision models on structured document extraction, and they give you confidence scores which Claude Vision does not. For charts, Nougat and purpose-built parsers beat general vision models on anything with dense numeric content.

Pragmatic rule from teams who've deployed vision pipelines without hallucination incidents: use Claude Vision for the semantic layer — "what is this document about", "what chart type is this", "is this a legitimate ID" — and route the actual numeric extraction to a dedicated OCR/parser. The combination is more reliable than either tool alone, and the cost math usually works out.

Sending an Image via the API

Method 1: Base64

The base64 method encodes the image directly in the request. Ideal for local images.

import anthropic
import base64

client = anthropic.Anthropic()

# Encode the image in base64
with open("sales-chart.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": image_data
                }
            },
            {
                "type": "text",
                "text": "Analyze this sales chart. What trends do you observe?"
            }
        ]
    }]
)

print(response.content[0].text)

Method 2: URL

The URL method points to a publicly accessible image.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "url",
                    "url": "https://example.com/dashboard-screenshot.png"
                }
            },
            {
                "type": "text",
                "text": "Describe this dashboard and identify the visible KPIs."
            }
        ]
    }]
)

Method Comparison

Aspect	Base64	URL
Local images	✅	❌
Online images	⚠️ (download first)	✅
Request size	Larger	Smaller
Reliability	Always works	Depends on URL access
Recommended for	Applications, scripts	Quick prototyping

Formats and Limits

Format	Supported	Notes
JPEG	✅	Most common format, good compression
PNG	✅	Ideal for screenshots and diagrams
GIF	✅	First frame only (no animation)
WebP	✅	Good size/quality compromise
SVG	❌	Convert to PNG first
PDF	⚠️	Via document feature, not image
TIFF	❌	Convert to JPEG or PNG

Technical limits:

→Max size: 20 MB per image
→Images per request: Up to 100
→Resolution: Automatically resized if too large
→Tokens: Calculated based on image resolution

Token Calculation per Image

Images consume tokens proportional to their size:

Resolution	Approximate tokens
200×200	~250 tokens
500×500	~800 tokens
1000×1000	~1,600 tokens
1920×1080	~2,500 tokens
4000×3000	~5,000 tokens

Multi-Image: Comparison and Analysis

Claude can analyze multiple images simultaneously, opening up powerful use cases.

import base64

def load_image(path):
    with open(path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/png",
                           "data": load_image("design-v1.png")}
            },
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/png",
                           "data": load_image("design-v2.png")}
            },
            {
                "type": "text",
                "text": """Compare these two design versions:
                1. What are the major differences?
                2. Which version better follows UX principles?
                3. Improvement suggestions for the chosen version."""
            }
        ]
    }]
)

Multi-Image Use Cases

Use case	Number of images	Description
Before/After	2	Compare a design before and after modification
A/B Testing	2-4	Evaluate mockup variants
UI Audit	5-10	Check visual consistency across a site
Classification	10-50	Categorize a batch of product photos
Documentation	3-10	Extract content from multiple scanned pages

OCR and Text Extraction

Claude excels at text recognition (OCR), well beyond simple character reading.

# OCR of a scanned document
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/jpeg",
                           "data": load_image("invoice-scan.jpg")}
            },
            {
                "type": "text",
                "text": """Extract the following information from this invoice:
                - Invoice number
                - Date
                - Vendor name
                - Subtotal
                - Tax
                - Total amount
                - List of items with unit price
                
                Return the result as structured JSON."""
            }
        ]
    }]
)

OCR Capabilities

Document type	Quality	Notes
Printed text	⭐⭐⭐⭐⭐	Excellent accuracy
Screenshots	⭐⭐⭐⭐⭐	Reads text and understands the interface
Legible handwriting	⭐⭐⭐⭐	Good quality, depends on handwriting
Illegible handwriting	⭐⭐	Variable results
Scanned docs (good quality)	⭐⭐⭐⭐⭐	Understands layout
Scanned docs (poor quality)	⭐⭐⭐	May miss details

Chart and Diagram Analysis

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/png",
                           "data": load_image("financial-chart.png")}
            },
            {
                "type": "text",
                "text": """Analyze this financial chart:
                1. What are the axes and their units?
                2. What is the overall trend?
                3. Identify inflection points or anomalies.
                4. Extract approximate values of key data points.
                5. Summarize in 3 insights for a CFO."""
            }
        ]
    }]
)

Types of Analyzable Visuals

Type	Capability	Extraction example
Bar chart	✅ Values and labels	"Q1: 120K, Q2: 145K, Q3: 98K"
Line chart	✅ Trends and points	"15% growth between March and June"
Pie chart	✅ Proportions	"Marketing: 35%, R&D: 28%, Sales: 22%"
Table in image	✅ Structured extraction	Reconstructs the table in Markdown
UML diagram	✅ Relations and entities	"3 classes with inheritance and 2 interfaces"
Org chart	✅ Hierarchy	Organization structure
System architecture	✅ Components and flows	"Microservices with API Gateway and Redis"

What Works Best

Optimizing Analysis Quality

→Use high-quality images, Sufficient resolution so text is readable
→Frame the subject well, Avoid overly wide images with important content in small print
→Prefer PNG for screenshots, No lossy compression
→Orient the image correctly, Claude handles rotation, but correct orientation is better

Optimizing Costs

→Resize before sending, A 1000×1000 image is sufficient for most analyses
→Compress JPEGs, 80% quality is sufficient for OCR
→Limit the number of images, Only send necessary images
→Use appropriate resolutions per use case:

Use case	Recommended resolution	Approx. tokens
Text OCR	1000-1500px wide	~1,500
Simple chart	800-1200px	~1,200
Full UI	1920×1080	~2,500
Document photo	1500-2000px	~2,000

Writing Effective Visual Prompts

❌ Vague prompt	✅ Precise prompt
"What do you see?"	"List all visible UI elements with their text and position."
"Analyze this image"	"Extract the 5 KPIs displayed in this dashboard and their values."
"Read this document"	"Extract the name, date, and total amount from this invoice as JSON."

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Explore the Module

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact

Published: March 10, 2026Updated: April 24, 2026

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What image formats does Claude support?+

Claude supports JPEG, PNG, GIF (first frame only), and WebP. Maximum size is 20 MB per image, and resolution is automatically adjusted if it exceeds model limits.

How do I send an image to Claude via the API?+

Two methods: base64 (encode the image and include it directly in the request) or URL (provide a public link to the image). The base64 method is more reliable for local images.

Can Claude read text in images (OCR)?+

Yes, Claude excels at OCR. It can read printed text, scanned documents, screenshots, and even handwritten text with good accuracy. It also understands layout and structure.

How many images can be sent in a single request?+

You can send up to 100 images in a single request. Each image consumes tokens proportional to its resolution. Be mindful of costs on multi-image requests.

Can Claude analyze charts and diagrams?+

Yes, Claude can read charts (bar, line, pie), UML diagrams, flowcharts, and schematics. It identifies trends, extracts approximate values, and describes visual relationships.