Claude Vision: Analyzing Images, Charts & Visual Documents
By Learnia Team
Claude Vision: Analyzing Images, Charts & Visual Documents
📅 Last updated: March 10, 2026 — Covers the Vision API, OCR, and document analysis.
🔗 Pillar article: Claude API: Complete Guide
What is Claude Vision?
Claude Vision is Claude's multimodal capability that allows it to understand and analyze images. Beyond simple object recognition, Claude can:
- →Read text (OCR) in documents, screenshots, and photos
- →Interpret charts and extract data
- →Analyze technical diagrams (UML, architecture, workflows)
- →Describe images in detail with context
- →Compare multiple images in a single request
Sending an Image via the API
Method 1: Base64
The base64 method encodes the image directly in the request. Ideal for local images.
import anthropic
import base64
client = anthropic.Anthropic()
# Encode the image in base64
with open("sales-chart.png", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data
}
},
{
"type": "text",
"text": "Analyze this sales chart. What trends do you observe?"
}
]
}]
)
print(response.content[0].text)
Method 2: URL
The URL method points to a publicly accessible image.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/dashboard-screenshot.png"
}
},
{
"type": "text",
"text": "Describe this dashboard and identify the visible KPIs."
}
]
}]
)
Method Comparison
| Aspect | Base64 | URL |
|---|---|---|
| Local images | ✅ | ❌ |
| Online images | ⚠️ (download first) | ✅ |
| Request size | Larger | Smaller |
| Reliability | Always works | Depends on URL access |
| Recommended for | Applications, scripts | Quick prototyping |
Formats and Limits
| Format | Supported | Notes |
|---|---|---|
| JPEG | ✅ | Most common format, good compression |
| PNG | ✅ | Ideal for screenshots and diagrams |
| GIF | ✅ | First frame only (no animation) |
| WebP | ✅ | Good size/quality compromise |
| SVG | ❌ | Convert to PNG first |
| ⚠️ | Via document feature, not image | |
| TIFF | ❌ | Convert to JPEG or PNG |
Technical limits:
- →Max size: 20 MB per image
- →Images per request: Up to 100
- →Resolution: Automatically resized if too large
- →Tokens: Calculated based on image resolution
Token Calculation per Image
Images consume tokens proportional to their size:
| Resolution | Approximate tokens |
|---|---|
| 200×200 | ~250 tokens |
| 500×500 | ~800 tokens |
| 1000×1000 | ~1,600 tokens |
| 1920×1080 | ~2,500 tokens |
| 4000×3000 | ~5,000 tokens |
Multi-Image: Comparison and Analysis
Claude can analyze multiple images simultaneously, opening up powerful use cases.
import base64
def load_image(path):
with open(path, "rb") as f:
return base64.standard_b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png",
"data": load_image("design-v1.png")}
},
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png",
"data": load_image("design-v2.png")}
},
{
"type": "text",
"text": """Compare these two design versions:
1. What are the major differences?
2. Which version better follows UX principles?
3. Improvement suggestions for the chosen version."""
}
]
}]
)
Multi-Image Use Cases
| Use case | Number of images | Description |
|---|---|---|
| Before/After | 2 | Compare a design before and after modification |
| A/B Testing | 2-4 | Evaluate mockup variants |
| UI Audit | 5-10 | Check visual consistency across a site |
| Classification | 10-50 | Categorize a batch of product photos |
| Documentation | 3-10 | Extract content from multiple scanned pages |
OCR and Text Extraction
Claude excels at text recognition (OCR), well beyond simple character reading.
# OCR of a scanned document
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/jpeg",
"data": load_image("invoice-scan.jpg")}
},
{
"type": "text",
"text": """Extract the following information from this invoice:
- Invoice number
- Date
- Vendor name
- Subtotal
- Tax
- Total amount
- List of items with unit price
Return the result as structured JSON."""
}
]
}]
)
OCR Capabilities
| Document type | Quality | Notes |
|---|---|---|
| Printed text | ⭐⭐⭐⭐⭐ | Excellent accuracy |
| Screenshots | ⭐⭐⭐⭐⭐ | Reads text and understands the interface |
| Legible handwriting | ⭐⭐⭐⭐ | Good quality, depends on handwriting |
| Illegible handwriting | ⭐⭐ | Variable results |
| Scanned docs (good quality) | ⭐⭐⭐⭐⭐ | Understands layout |
| Scanned docs (poor quality) | ⭐⭐⭐ | May miss details |
Chart and Diagram Analysis
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {"type": "base64", "media_type": "image/png",
"data": load_image("financial-chart.png")}
},
{
"type": "text",
"text": """Analyze this financial chart:
1. What are the axes and their units?
2. What is the overall trend?
3. Identify inflection points or anomalies.
4. Extract approximate values of key data points.
5. Summarize in 3 insights for a CFO."""
}
]
}]
)
Types of Analyzable Visuals
| Type | Capability | Extraction example |
|---|---|---|
| Bar chart | ✅ Values and labels | "Q1: 120K, Q2: 145K, Q3: 98K" |
| Line chart | ✅ Trends and points | "15% growth between March and June" |
| Pie chart | ✅ Proportions | "Marketing: 35%, R&D: 28%, Sales: 22%" |
| Table in image | ✅ Structured extraction | Reconstructs the table in Markdown |
| UML diagram | ✅ Relations and entities | "3 classes with inheritance and 2 interfaces" |
| Org chart | ✅ Hierarchy | Organization structure |
| System architecture | ✅ Components and flows | "Microservices with API Gateway and Redis" |
Best Practices
Optimizing Analysis Quality
- →Use high-quality images — Sufficient resolution so text is readable
- →Frame the subject well — Avoid overly wide images with important content in small print
- →Prefer PNG for screenshots — No lossy compression
- →Orient the image correctly — Claude handles rotation, but correct orientation is better
Optimizing Costs
- →Resize before sending — A 1000×1000 image is sufficient for most analyses
- →Compress JPEGs — 80% quality is sufficient for OCR
- →Limit the number of images — Only send necessary images
- →Use appropriate resolutions per use case:
| Use case | Recommended resolution | Approx. tokens |
|---|---|---|
| Text OCR | 1000-1500px wide | ~1,500 |
| Simple chart | 800-1200px | ~1,200 |
| Full UI | 1920×1080 | ~2,500 |
| Document photo | 1500-2000px | ~2,000 |
Writing Effective Visual Prompts
| ❌ Vague prompt | ✅ Precise prompt |
|---|---|
| "What do you see?" | "List all visible UI elements with their text and position." |
| "Analyze this image" | "Extract the 5 KPIs displayed in this dashboard and their values." |
| "Read this document" | "Extract the name, date, and total amount from this invoice as JSON." |
Module 0 — Prompting Fundamentals
Build your first effective prompts from scratch with hands-on exercises.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
What image formats does Claude support?+
Claude supports JPEG, PNG, GIF (first frame only), and WebP. Maximum size is 20 MB per image, and resolution is automatically adjusted if it exceeds model limits.
How do I send an image to Claude via the API?+
Two methods: base64 (encode the image and include it directly in the request) or URL (provide a public link to the image). The base64 method is more reliable for local images.
Can Claude read text in images (OCR)?+
Yes, Claude excels at OCR. It can read printed text, scanned documents, screenshots, and even handwritten text with good accuracy. It also understands layout and structure.
How many images can be sent in a single request?+
You can send up to 100 images in a single request. Each image consumes tokens proportional to its resolution. Be mindful of costs on multi-image requests.
Can Claude analyze charts and diagrams?+
Yes, Claude can read charts (bar, line, pie), UML diagrams, flowcharts, and schematics. It identifies trends, extracts approximate values, and describes visual relationships.