Claude Vision: Analyzing Images, Charts & Visual Documents
By Dorian Laurenceau
๐ Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.
๐ Pillar article: Claude API: Complete Guide
What is Claude Vision?
Claude Vision is Claude's multimodal capability that allows it to understand and analyze images. Beyond simple object recognition, Claude can:
- โRead text (OCR) in documents, screenshots, and photos
- โInterpret charts and extract data
- โAnalyze technical diagrams (UML, architecture, workflows)
- โDescribe images in detail with context
- โCompare multiple images in a single request
The honest read on Claude Vision in 2026, tracked across r/ClaudeAI, r/LocalLLaMA, and r/computervision: vision models are very good at the tasks they've been trained to evaluate themselves on (describing, captioning, chart-reading on common formats) and unreliably good at the tasks people actually want them for in production (extracting data from a specific invoice template, reading handwritten clinical notes, parsing a screenshot of a legacy app's UI). The Anthropic vision docs and the MMMU benchmark leaderboard both reflect this gap โ top-line scores are high, real-world edge cases are where the failures hide.
Where the community correctly pushes back on vision-model hype: OCR is still a dedicated-tooling problem for anything that matters. Tesseract, Textract, Google Document AI, and the Mistral OCR API consistently beat general-purpose vision models on structured document extraction, and they give you confidence scores which Claude Vision does not. For charts, Nougat and purpose-built parsers beat general vision models on anything with dense numeric content.
Pragmatic rule from teams who've deployed vision pipelines without hallucination incidents: use Claude Vision for the semantic layer โ "what is this document about", "what chart type is this", "is this a legitimate ID" โ and route the actual numeric extraction to a dedicated OCR/parser. The combination is more reliable than either tool alone, and the cost math usually works out.
Sending an Image via the API
Method 1: Base64
The base64 method encodes the image directly in the request. Ideal for local images.
import anthropic
import base64
client = anthropic.Anthropic()
# Encode the image in base64
with open("sales-chart.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Analyze this sales chart. What trends do you observe?"
}
]
}]
)
print(response.content[0].text)
Method 2: URL
The URL method points to a publicly accessible image.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/dashboard-screenshot.png"
}
},
{
"type": "text",
"text": "Describe this dashboard and identify the visible KPIs."
}
]
}]
)
Method Comparison
| Aspect | Base64 | URL |
|---|---|---|
| Local images | โ | โ |
| Online images | โ ๏ธ (download first) | โ |
| Request size | Larger | Smaller |
| Reliability | Always works | Depends on URL access |
| Recommended for | Applications, scripts | Quick prototyping |
Formats and Limits
| Format | Supported | Notes |
|---|---|---|
| JPEG | โ | Most common format, good compression |
| PNG | โ | Ideal for screenshots and diagrams |
| GIF | โ | First frame only (no animation) |
| WebP | โ | Good size/quality compromise |
| SVG | โ | Convert to PNG first |
| โ ๏ธ | Via document feature, not image | |
| TIFF | โ | Convert to JPEG or PNG |
Technical limits:
- โMax size: 20 MB per image
- โImages per request: Up to 100
- โResolution: Automatically resized if too large
- โTokens: Calculated based on image resolution
Token Calculation per Image
Images consume tokens proportional to their size:
| Resolution | Approximate tokens |
|---|---|
| 200ร200 | ~250 tokens |
| 500ร500 | ~800 tokens |
| 1000ร1000 | ~1,600 tokens |
| 1920ร1080 | ~2,500 tokens |
| 4000ร3000 | ~5,000 tokens |
Multi-Image: Comparison and Analysis
Claude can analyze multiple images simultaneously, opening up powerful use cases.
import base64
def load_image(path):
with open(path, "rb") as f:
return base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png",
"data": load_image("design-v1.png")}
},
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png",
"data": load_image("design-v2.png")}
},
{
"type": "text",
"text": """Compare these two design versions:
1. What are the major differences?
2. Which version better follows UX principles?
3. Improvement suggestions for the chosen version."""
}
]
}]
)
Multi-Image Use Cases
| Use case | Number of images | Description |
|---|---|---|
| Before/After | 2 | Compare a design before and after modification |
| A/B Testing | 2-4 | Evaluate mockup variants |
| UI Audit | 5-10 | Check visual consistency across a site |
| Classification | 10-50 | Categorize a batch of product photos |
| Documentation | 3-10 | Extract content from multiple scanned pages |
OCR and Text Extraction
Claude excels at text recognition (OCR), well beyond simple character reading.
# OCR of a scanned document
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/jpeg",
"data": load_image("invoice-scan.jpg")}
},
{
"type": "text",
"text": """Extract the following information from this invoice:
- Invoice number
- Date
- Vendor name
- Subtotal
- Tax
- Total amount
- List of items with unit price
Return the result as structured JSON."""
}
]
}]
)
OCR Capabilities
| Document type | Quality | Notes |
|---|---|---|
| Printed text | โญโญโญโญโญ | Excellent accuracy |
| Screenshots | โญโญโญโญโญ | Reads text and understands the interface |
| Legible handwriting | โญโญโญโญ | Good quality, depends on handwriting |
| Illegible handwriting | โญโญ | Variable results |
| Scanned docs (good quality) | โญโญโญโญโญ | Understands layout |
| Scanned docs (poor quality) | โญโญโญ | May miss details |
Chart and Diagram Analysis
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png",
"data": load_image("financial-chart.png")}
},
{
"type": "text",
"text": """Analyze this financial chart:
1. What are the axes and their units?
2. What is the overall trend?
3. Identify inflection points or anomalies.
4. Extract approximate values of key data points.
5. Summarize in 3 insights for a CFO."""
}
]
}]
)
Types of Analyzable Visuals
| Type | Capability | Extraction example |
|---|---|---|
| Bar chart | โ Values and labels | "Q1: 120K, Q2: 145K, Q3: 98K" |
| Line chart | โ Trends and points | "15% growth between March and June" |
| Pie chart | โ Proportions | "Marketing: 35%, R&D: 28%, Sales: 22%" |
| Table in image | โ Structured extraction | Reconstructs the table in Markdown |
| UML diagram | โ Relations and entities | "3 classes with inheritance and 2 interfaces" |
| Org chart | โ Hierarchy | Organization structure |
| System architecture | โ Components and flows | "Microservices with API Gateway and Redis" |
What Works Best
Optimizing Analysis Quality
- โUse high-quality images, Sufficient resolution so text is readable
- โFrame the subject well, Avoid overly wide images with important content in small print
- โPrefer PNG for screenshots, No lossy compression
- โOrient the image correctly, Claude handles rotation, but correct orientation is better
Optimizing Costs
- โResize before sending, A 1000ร1000 image is sufficient for most analyses
- โCompress JPEGs, 80% quality is sufficient for OCR
- โLimit the number of images, Only send necessary images
- โUse appropriate resolutions per use case:
| Use case | Recommended resolution | Approx. tokens |
|---|---|---|
| Text OCR | 1000-1500px wide | ~1,500 |
| Simple chart | 800-1200px | ~1,200 |
| Full UI | 1920ร1080 | ~2,500 |
| Document photo | 1500-2000px | ~2,000 |
Writing Effective Visual Prompts
| โ Vague prompt | โ Precise prompt |
|---|---|
| "What do you see?" | "List all visible UI elements with their text and position." |
| "Analyze this image" | "Extract the 5 KPIs displayed in this dashboard and their values." |
| "Read this document" | "Extract the name, date, and total amount from this invoice as JSON." |
Module 0 โ Prompting Fundamentals
Build your first effective prompts from scratch with hands-on exercises.
Dorian Laurenceau
Full-Stack Developer & Learning DesignerFull-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.
Weekly AI Insights
Tools, techniques & news โ curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
โRelated Articles
FAQ
What image formats does Claude support?+
Claude supports JPEG, PNG, GIF (first frame only), and WebP. Maximum size is 20 MB per image, and resolution is automatically adjusted if it exceeds model limits.
How do I send an image to Claude via the API?+
Two methods: base64 (encode the image and include it directly in the request) or URL (provide a public link to the image). The base64 method is more reliable for local images.
Can Claude read text in images (OCR)?+
Yes, Claude excels at OCR. It can read printed text, scanned documents, screenshots, and even handwritten text with good accuracy. It also understands layout and structure.
How many images can be sent in a single request?+
You can send up to 100 images in a single request. Each image consumes tokens proportional to its resolution. Be mindful of costs on multi-image requests.
Can Claude analyze charts and diagrams?+
Yes, Claude can read charts (bar, line, pie), UML diagrams, flowcharts, and schematics. It identifies trends, extracts approximate values, and describes visual relationships.