Back to all articles
8 MIN READ

Claude Vision: Analyzing Images, Charts & Visual Documents

By Learnia Team

Claude Vision: Analyzing Images, Charts & Visual Documents

📅 Last updated: March 10, 2026 — Covers the Vision API, OCR, and document analysis.

🔗 Pillar article: Claude API: Complete Guide


What is Claude Vision?

Claude Vision is Claude's multimodal capability that allows it to understand and analyze images. Beyond simple object recognition, Claude can:

  • Read text (OCR) in documents, screenshots, and photos
  • Interpret charts and extract data
  • Analyze technical diagrams (UML, architecture, workflows)
  • Describe images in detail with context
  • Compare multiple images in a single request

Sending an Image via the API

Method 1: Base64

The base64 method encodes the image directly in the request. Ideal for local images.

import anthropic
import base64

client = anthropic.Anthropic()

# Encode the image in base64
with open("sales-chart.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": image_data
                }
            },
            {
                "type": "text",
                "text": "Analyze this sales chart. What trends do you observe?"
            }
        ]
    }]
)

print(response.content[0].text)

Method 2: URL

The URL method points to a publicly accessible image.

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "url",
                    "url": "https://example.com/dashboard-screenshot.png"
                }
            },
            {
                "type": "text",
                "text": "Describe this dashboard and identify the visible KPIs."
            }
        ]
    }]
)

Method Comparison

AspectBase64URL
Local images
Online images⚠️ (download first)
Request sizeLargerSmaller
ReliabilityAlways worksDepends on URL access
Recommended forApplications, scriptsQuick prototyping

Formats and Limits

FormatSupportedNotes
JPEGMost common format, good compression
PNGIdeal for screenshots and diagrams
GIFFirst frame only (no animation)
WebPGood size/quality compromise
SVGConvert to PNG first
PDF⚠️Via document feature, not image
TIFFConvert to JPEG or PNG

Technical limits:

  • Max size: 20 MB per image
  • Images per request: Up to 100
  • Resolution: Automatically resized if too large
  • Tokens: Calculated based on image resolution

Token Calculation per Image

Images consume tokens proportional to their size:

ResolutionApproximate tokens
200×200~250 tokens
500×500~800 tokens
1000×1000~1,600 tokens
1920×1080~2,500 tokens
4000×3000~5,000 tokens

Multi-Image: Comparison and Analysis

Claude can analyze multiple images simultaneously, opening up powerful use cases.

import base64

def load_image(path):
    with open(path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/png",
                           "data": load_image("design-v1.png")}
            },
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/png",
                           "data": load_image("design-v2.png")}
            },
            {
                "type": "text",
                "text": """Compare these two design versions:
                1. What are the major differences?
                2. Which version better follows UX principles?
                3. Improvement suggestions for the chosen version."""
            }
        ]
    }]
)

Multi-Image Use Cases

Use caseNumber of imagesDescription
Before/After2Compare a design before and after modification
A/B Testing2-4Evaluate mockup variants
UI Audit5-10Check visual consistency across a site
Classification10-50Categorize a batch of product photos
Documentation3-10Extract content from multiple scanned pages

OCR and Text Extraction

Claude excels at text recognition (OCR), well beyond simple character reading.

# OCR of a scanned document
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/jpeg",
                           "data": load_image("invoice-scan.jpg")}
            },
            {
                "type": "text",
                "text": """Extract the following information from this invoice:
                - Invoice number
                - Date
                - Vendor name
                - Subtotal
                - Tax
                - Total amount
                - List of items with unit price
                
                Return the result as structured JSON."""
            }
        ]
    }]
)

OCR Capabilities

Document typeQualityNotes
Printed text⭐⭐⭐⭐⭐Excellent accuracy
Screenshots⭐⭐⭐⭐⭐Reads text and understands the interface
Legible handwriting⭐⭐⭐⭐Good quality, depends on handwriting
Illegible handwriting⭐⭐Variable results
Scanned docs (good quality)⭐⭐⭐⭐⭐Understands layout
Scanned docs (poor quality)⭐⭐⭐May miss details

Chart and Diagram Analysis

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {"type": "base64", "media_type": "image/png",
                           "data": load_image("financial-chart.png")}
            },
            {
                "type": "text",
                "text": """Analyze this financial chart:
                1. What are the axes and their units?
                2. What is the overall trend?
                3. Identify inflection points or anomalies.
                4. Extract approximate values of key data points.
                5. Summarize in 3 insights for a CFO."""
            }
        ]
    }]
)

Types of Analyzable Visuals

TypeCapabilityExtraction example
Bar chart✅ Values and labels"Q1: 120K, Q2: 145K, Q3: 98K"
Line chart✅ Trends and points"15% growth between March and June"
Pie chart✅ Proportions"Marketing: 35%, R&D: 28%, Sales: 22%"
Table in image✅ Structured extractionReconstructs the table in Markdown
UML diagram✅ Relations and entities"3 classes with inheritance and 2 interfaces"
Org chart✅ HierarchyOrganization structure
System architecture✅ Components and flows"Microservices with API Gateway and Redis"

Best Practices

Optimizing Analysis Quality

  1. Use high-quality images — Sufficient resolution so text is readable
  2. Frame the subject well — Avoid overly wide images with important content in small print
  3. Prefer PNG for screenshots — No lossy compression
  4. Orient the image correctly — Claude handles rotation, but correct orientation is better

Optimizing Costs

  1. Resize before sending — A 1000×1000 image is sufficient for most analyses
  2. Compress JPEGs — 80% quality is sufficient for OCR
  3. Limit the number of images — Only send necessary images
  4. Use appropriate resolutions per use case:
Use caseRecommended resolutionApprox. tokens
Text OCR1000-1500px wide~1,500
Simple chart800-1200px~1,200
Full UI1920×1080~2,500
Document photo1500-2000px~2,000

Writing Effective Visual Prompts

❌ Vague prompt✅ Precise prompt
"What do you see?""List all visible UI elements with their text and position."
"Analyze this image""Extract the 5 KPIs displayed in this dashboard and their values."
"Read this document""Extract the name, date, and total amount from this invoice as JSON."

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What image formats does Claude support?+

Claude supports JPEG, PNG, GIF (first frame only), and WebP. Maximum size is 20 MB per image, and resolution is automatically adjusted if it exceeds model limits.

How do I send an image to Claude via the API?+

Two methods: base64 (encode the image and include it directly in the request) or URL (provide a public link to the image). The base64 method is more reliable for local images.

Can Claude read text in images (OCR)?+

Yes, Claude excels at OCR. It can read printed text, scanned documents, screenshots, and even handwritten text with good accuracy. It also understands layout and structure.

How many images can be sent in a single request?+

You can send up to 100 images in a single request. Each image consumes tokens proportional to its resolution. Be mindful of costs on multi-image requests.

Can Claude analyze charts and diagrams?+

Yes, Claude can read charts (bar, line, pie), UML diagrams, flowcharts, and schematics. It identifies trends, extracts approximate values, and describes visual relationships.