Claude API: Practical Guide with Python & TypeScript (2026)
By Dorian Laurenceau
Claude API: Complete Guide, Messages, Batch, SDKs & Integrations
📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.
📚 Specialized guides: Tool Use | Extended Thinking | Vision | Computer Use | Evals | Bedrock | Vertex AI | Prompt Caching | Hackathon
Why Use the Claude API?
The Claude API is the most direct way to integrate Anthropic's artificial intelligence into your applications. Unlike the claude.ai web interface, the API gives you full control over:
- →The model used (Opus, Sonnet, Haiku)
- →Generation parameters (temperature, max tokens, stop sequences)
- →Advanced features (tool use, vision, extended thinking, streaming)
- →Integration architecture (real-time, batch, webhooks)
Claude API in production: what actually matters
The Claude API is solid, well-documented, and shipping-ready; it also has operational gotchas that the marketing pages don't surface. The threads on r/ClaudeAI, r/LocalLLaMA, r/ChatGPTCoding, and r/ExperiencedDevs cover what teams actually hit.
What the Anthropic API docs get right:
- →Clean request/response shape. The Messages API is simpler than OpenAI's Chat Completions in a few useful ways (system parameter is top-level, content blocks are explicit).
- →Prompt caching is production-ready and genuinely reduces cost for long-context workflows. Measure before and after; the savings compound.
- →Message batching for non-urgent workloads is a 50% discount most teams don't use and should.
- →Tool use is first-class and well-specified.
- →Extended thinking on Claude 3.7+ gives visible reasoning for evaluation and debugging.
What catches teams in production:
- →Rate limits per organisation, not per key. Heavy workloads need enterprise tiers or Bedrock / Vertex AI for higher quotas.
- →Token counting differs from OpenAI's. The tokenizer is documented, but cost estimates copy-pasted from OpenAI-land will be off.
- →Streaming backpressure. Long streaming responses need proper SSE handling; buffering at proxies (Cloudflare, NGINX) breaks streaming in subtle ways.
- →Retries and idempotency. Implement exponential backoff; the official SDK handles most cases, but batched workflows need extra care.
- →Content filter ambiguity. Some safety refusals are hard to distinguish from legitimate "I don't know" responses without inspection. Log raw responses for diagnosis.
- →No built-in embeddings endpoint. Pair with Voyage AI, OpenAI embeddings, or Cohere for RAG.
What production teams actually do:
- →Use the official SDKs (Python, TypeScript, Go). Hand-rolled HTTP calls miss retry/streaming/caching logic.
- →Abstract the provider with LiteLLM or similar so switching to Bedrock, Vertex, or another vendor doesn't require code changes.
- →Instrument everything. Langfuse, Helicone, LangSmith, or PostHog LLM analytics make debugging and cost attribution tractable.
- →Cache aggressively. Pair prompt caching with request-level caching (Redis, Cloudflare KV) for idempotent prompts.
- →Evaluate continuously. promptfoo, Braintrust, or home-grown eval harnesses run on PRs.
- →Set hard budgets. Per-request token caps, per-user spend limits, per-feature monthly budgets. Without these, a loop bug can burn thousands overnight.
The honest framing: the Claude API is one of the best-engineered LLM APIs available, and it behaves like a real production service, not a research preview. The operational discipline around it (caching, instrumentation, evals, budgets, retries) is where most teams underinvest. Build the scaffolding once; the API itself is the easy part.
API Architecture
The Claude API is built on a simple REST architecture with a single main endpoint:
POST https://api.anthropic.com/v1/messages
Each request includes:
- →A model (
claude-sonnet-4-20250514,claude-opus-4-20250918, etc.) - →Messages (conversation as an array)
- →Optional parameters (temperature, max_tokens, tools, etc.)
import anthropic
client = anthropic.Anthropic() # Uses ANTHROPIC_API_KEY
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain microservices in 3 sentences."}
]
)
print(message.content[0].text)
Authentication and Configuration
Getting an API Key
- →Create an account on console.anthropic.com
- →Navigate to Settings > API Keys
- →Click Create Key and give it a descriptive name
- →Copy the key (it won't be displayed again)
Configuring the API Key
# Environment variable (recommended)
export ANTHROPIC_API_KEY="sk-ant-api03-..."
# Or in a .env file
echo 'ANTHROPIC_API_KEY=sk-ant-api03-...' >> .env
# Python - Automatic via environment variable
client = anthropic.Anthropic()
# Python - Explicit
client = anthropic.Anthropic(api_key="sk-ant-api03-...")
// TypeScript - Automatic via environment variable
const client = new Anthropic();
// TypeScript - Explicit
const client = new Anthropic({ apiKey: "sk-ant-api03-..." });
The Messages API in Detail
The Messages API is the core of interaction with Claude. Here is the complete structure of a request:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
temperature=0.7,
system="You are an expert in software architecture.",
messages=[
{
"role": "user",
"content": "What are the benefits of microservices?"
},
{
"role": "assistant",
"content": "Microservices offer several key advantages..."
},
{
"role": "user",
"content": "And the drawbacks?"
}
],
stop_sequences=["\n\nHuman:"]
)
print(response.content[0].text)
print(f"Tokens: {response.usage.input_tokens} in / {response.usage.output_tokens} out")
Key Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
model | string | Model ID to use | Required |
max_tokens | int | Maximum number of output tokens | Required |
messages | array | Conversation history | Required |
system | string | System prompt | None |
temperature | float | Creativity (0.0 - 1.0) | 1.0 |
top_p | float | Nucleus sampling | 1.0 |
top_k | int | Top-K sampling | None |
stop_sequences | array | Stop sequences | None |
stream | bool | Enable streaming | false |
tools | array | Tools available for Claude | None |
metadata | object | Metadata (e.g., user_id) | None |
Response Structure
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Microservices offer..."
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 42,
"output_tokens": 156
}
}
Streaming
Streaming allows you to display Claude's response in real time, token by token. Essential for interactive user interfaces.
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a poem about code."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const stream = client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a poem about code." }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
Streaming Events
| Event | Description |
|---|---|
message_start | Message start, contains metadata |
content_block_start | Start of a content block |
content_block_delta | Text fragment (the actual content) |
content_block_stop | End of a content block |
message_delta | Message update (stop_reason, usage) |
message_stop | End of message |
Batch API: Bulk Processing
The Batch API lets you send up to 100,000 requests in a single batch, with a 50% cost reduction and a processing time of up to 24 hours.
import anthropic
client = anthropic.Anthropic()
# Create a batch
batch = client.batches.create(
requests=[
{
"custom_id": "req-1",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Summarize this article: ..."}
]
}
},
{
"custom_id": "req-2",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Translate this text: ..."}
]
}
}
]
)
# Check status
status = client.batches.retrieve(batch.id)
print(f"Status: {status.processing_status}")
# Retrieve results when ready
if status.processing_status == "ended":
for result in client.batches.results(batch.id):
print(f"{result.custom_id}: {result.result.message.content[0].text}")
When to Use the Batch API?
| Use case | Messages API | Batch API |
|---|---|---|
| Real-time chatbot | ✅ | ❌ |
| Analyzing 10,000 documents | ❌ | ✅ |
| Bulk content translation | ❌ | ✅ |
| Support ticket classification | ⚠️ (costly) | ✅ |
| Interactive assistant | ✅ | ❌ |
| Periodic report generation | ⚠️ | ✅ |
Official SDKs
Anthropic provides SDKs for the major programming languages:
| SDK | Language | Installation | Maintained by |
|---|---|---|---|
anthropic | Python | pip install anthropic | Anthropic |
@anthropic-ai/sdk | TypeScript/JS | npm install @anthropic-ai/sdk | Anthropic |
anthropic-java | Java | Maven/Gradle | Anthropic |
anthropic-go | Go | go get github.com/anthropics/anthropic-sdk-go | Anthropic |
anthropic-ruby | Ruby | gem install anthropic | Anthropic |
Java Example
import com.anthropic.client.AnthropicClient;
import com.anthropic.models.*;
AnthropicClient client = AnthropicClient.builder()
.apiKey(System.getenv("ANTHROPIC_API_KEY"))
.build();
MessageCreateParams params = MessageCreateParams.builder()
.model("claude-sonnet-4-20250514")
.maxTokens(1024)
.addUserMessage("Hello Claude!")
.build();
Message message = client.messages().create(params);
System.out.println(message.content().get(0).text());
Go Example
package main
import (
"context"
"fmt"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
message, err := client.Messages.New(context.Background(),
anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_20250514,
MaxTokens: 1024,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(
anthropic.NewTextBlock("Hello Claude!"),
),
},
},
)
if err != nil {
panic(err)
}
fmt.Println(message.Content[0].Text)
}
Error Handling
The Claude API uses standard HTTP codes and descriptive error messages.
| HTTP Code | Meaning | Recommended Action |
|---|---|---|
| 400 | Invalid request | Check parameters |
| 401 | Invalid API key | Verify your API key |
| 403 | Permission denied | Check model permissions |
| 429 | Rate limit reached | Wait and retry with backoff |
| 500 | Server error | Retry after a few seconds |
| 529 | API overloaded | Retry with exponential backoff |
Robust Error Handling
import anthropic
import time
client = anthropic.Anthropic()
def call_claude_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages
)
except anthropic.RateLimitError:
wait = 2 ** attempt # Exponential backoff
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
except anthropic.APIStatusError as e:
if e.status_code >= 500:
time.sleep(1)
continue
raise
raise Exception("Maximum number of retries reached")
Rate Limits
Rate limits protect API stability and vary based on your usage tier.
| Tier | Requests/min | Input tokens/min | Output tokens/min |
|---|---|---|---|
| Tier 1 (default) | 4,000 | 400,000 | 80,000 |
| Tier 2 | 8,000 | 800,000 | 160,000 |
| Tier 3 | 16,000 | 1,600,000 | 320,000 |
| Tier 4 | 32,000 | 3,200,000 | 640,000 |
Response headers include your current limits:
anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 3999
anthropic-ratelimit-requests-reset: 2026-03-10T12:00:30Z
anthropic-ratelimit-tokens-limit: 400000
anthropic-ratelimit-tokens-remaining: 399800
Pricing
| Model | Input ($/M tokens) | Output ($/M tokens) | Cache Write | Cache Read |
|---|---|---|---|---|
| Claude Opus 4.6 | 15.00 | 75.00 | 18.75 | 1.50 |
| Claude Sonnet 4 | 3.00 | 15.00 | 3.75 | 0.30 |
| Claude Haiku 3.5 | 0.80 | 4.00 | 1.00 | 0.08 |
Quick calculation: A typical conversation (500 tokens in + 500 tokens out) with Sonnet costs approximately $0.009, less than one cent.
Common Patterns
Multi-Turn Conversation
conversation = []
def chat(user_message):
conversation.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="You are an expert Python development assistant.",
messages=conversation
)
assistant_message = response.content[0].text
conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Usage
print(chat("How do I create a REST API with FastAPI?"))
print(chat("Add JWT authentication."))
print(chat("Now add tests."))
Structured Output (JSON)
import json
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": """Analyze this text and return a structured JSON:
"The product is excellent, fast delivery but damaged packaging."
Expected format:
{"sentiment": "positive|negative|mixed", "aspects": [...], "score": 0-10}"""
}]
)
result = json.loads(response.content[0].text)
print(result)
# {"sentiment": "mixed", "aspects": [...], "score": 7}
System Prompt with Context
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
system="""You are an assistant for the "TechShop" e-commerce platform.
Rules:
- Always respond in English
- Only recommend products from the catalog
- If you don't know the answer, redirect to support
Current catalog:
- MacBook Pro M4: $2,499
- iPhone 16 Pro: $1,299
- iPad Air M3: $799""",
messages=[
{"role": "user", "content": "Which laptop would you recommend?"}
]
)
Cloud Access
The Claude API is also available through major cloud providers:
| Platform | Main Advantage | Dedicated Guide |
|---|---|---|
| Amazon Bedrock | Native AWS integration, unified billing | Bedrock Guide |
| Google Vertex AI | Native GCP integration, model garden | Vertex AI Guide |
| Direct API | Immediate access, latest features | This guide |
Guidelines
- →Use environment variables for API keys, never hard-code them
- →Implement retry with exponential backoff to handle transient errors
- →Monitor your usage via the Anthropic console to avoid surprises
- →Use streaming for interactive user interfaces
- →Prefer the Batch API for bulk processing (50% savings)
- →Enable prompt caching for repetitive system prompts
- →Choose the right model: Haiku for simple tasks, Sonnet for general use, Opus for complex reasoning
Resources
| Resource | Link |
|---|---|
| Official documentation | docs.anthropic.com |
| Anthropic Console | console.anthropic.com |
| Python SDK | github.com/anthropics/anthropic-sdk-python |
| TypeScript SDK | github.com/anthropics/anthropic-sdk-typescript |
| Cookbook | github.com/anthropics/anthropic-cookbook |
- →Structured Outputs and Strict Mode, Guarantee valid JSON outputs with the strict parameter
Module 0 — Prompting Fundamentals
Build your first effective prompts from scratch with hands-on exercises.
Dorian Laurenceau
Full-Stack Developer & Learning DesignerFull-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
How much does the Claude API cost?+
Prices vary by model: Claude 4.6 Opus costs $15/M input tokens and $75/M output tokens. Claude Sonnet is $3/M input and $15/M output. Prompt caching reduces costs by up to 90%.
How do I get a Claude API key?+
Create an account at console.anthropic.com, go to Settings > API Keys, then generate a new key. Add credits to your account to start using the API.
What is the difference between the Messages API and the Batch API?+
The Messages API processes requests in real time (response in seconds). The Batch API processes batches of requests asynchronously, with a 50% cost reduction and up to 24-hour processing time.
What SDKs are available for the Claude API?+
Anthropic provides official SDKs for Python, TypeScript/JavaScript, Java, Go, and Ruby. Community SDKs exist for other languages like Rust, PHP, and C#.
What are the Claude API rate limits?+
Default rate limits are 4,000 requests/minute and 400,000 tokens/minute for tier 1. You can request an increase through the Anthropic console based on your usage.
How much does a Claude API token cost?+
Prices vary by model: Haiku 4.5 costs $0.80/M input tokens and $4/M output tokens. Sonnet 4.6 costs $3/M input and $15/M output. Opus 4.6 costs $15/M input and $75/M output. Prompt caching reduces costs by 90% on cached tokens.
Do I need Claude Pro to use the API?+
No. The Claude API and Claude Pro subscription are separate products. The API requires an API key (created at console.anthropic.com) and uses pay-per-use pricing. Claude Pro is a monthly subscription for the claude.ai web interface.
How to use the Anthropic API in Python?+
Install the SDK with 'pip install anthropic', then create a client with your API key. A basic call: client.messages.create(model='claude-sonnet-4-6-20260610', max_tokens=1024, messages=[{'role': 'user', 'content': 'Your question'}]). See our 'First API Call' section for a full tutorial.