March 10, 202614 MIN READ

Claude API: Practical Guide with Python & TypeScript (2026)

By Dorian Laurenceau

Part ofModule 0 — Prompting Fundamentals→

Claude API: Complete Guide, Messages, Batch, SDKs & Integrations

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

📚 Specialized guides: Tool Use | Extended Thinking | Vision | Computer Use | Evals | Bedrock | Vertex AI | Prompt Caching | Hackathon

Why Use the Claude API?

The Claude API is the most direct way to integrate Anthropic's artificial intelligence into your applications. Unlike the claude.ai web interface, the API gives you full control over:

→The model used (Opus, Sonnet, Haiku)
→Generation parameters (temperature, max tokens, stop sequences)
→Advanced features (tool use, vision, extended thinking, streaming)
→Integration architecture (real-time, batch, webhooks)

Claude API in production: what actually matters

The Claude API is solid, well-documented, and shipping-ready; it also has operational gotchas that the marketing pages don't surface. The threads on r/ClaudeAI, r/LocalLLaMA, r/ChatGPTCoding, and r/ExperiencedDevs cover what teams actually hit.

What the Anthropic API docs get right:

→Clean request/response shape. The Messages API is simpler than OpenAI's Chat Completions in a few useful ways (system parameter is top-level, content blocks are explicit).
→Prompt caching is production-ready and genuinely reduces cost for long-context workflows. Measure before and after; the savings compound.
→Message batching for non-urgent workloads is a 50% discount most teams don't use and should.
→Tool use is first-class and well-specified.
→Extended thinking on Claude 3.7+ gives visible reasoning for evaluation and debugging.

What catches teams in production:

→Rate limits per organisation, not per key. Heavy workloads need enterprise tiers or Bedrock / Vertex AI for higher quotas.
→Token counting differs from OpenAI's. The tokenizer is documented, but cost estimates copy-pasted from OpenAI-land will be off.
→Streaming backpressure. Long streaming responses need proper SSE handling; buffering at proxies (Cloudflare, NGINX) breaks streaming in subtle ways.
→Retries and idempotency. Implement exponential backoff; the official SDK handles most cases, but batched workflows need extra care.
→Content filter ambiguity. Some safety refusals are hard to distinguish from legitimate "I don't know" responses without inspection. Log raw responses for diagnosis.
→No built-in embeddings endpoint. Pair with Voyage AI, OpenAI embeddings, or Cohere for RAG.

What production teams actually do:

→Use the official SDKs (Python, TypeScript, Go). Hand-rolled HTTP calls miss retry/streaming/caching logic.
→Abstract the provider with LiteLLM or similar so switching to Bedrock, Vertex, or another vendor doesn't require code changes.
→Instrument everything. Langfuse, Helicone, LangSmith, or PostHog LLM analytics make debugging and cost attribution tractable.
→Cache aggressively. Pair prompt caching with request-level caching (Redis, Cloudflare KV) for idempotent prompts.
→Evaluate continuously. promptfoo, Braintrust, or home-grown eval harnesses run on PRs.
→Set hard budgets. Per-request token caps, per-user spend limits, per-feature monthly budgets. Without these, a loop bug can burn thousands overnight.

The honest framing: the Claude API is one of the best-engineered LLM APIs available, and it behaves like a real production service, not a research preview. The operational discipline around it (caching, instrumentation, evals, budgets, retries) is where most teams underinvest. Build the scaffolding once; the API itself is the easy part.

API Architecture

The Claude API is built on a simple REST architecture with a single main endpoint:

POST https://api.anthropic.com/v1/messages

Each request includes:

→A model (claude-sonnet-4-20250514, claude-opus-4-20250918, etc.)
→Messages (conversation as an array)
→Optional parameters (temperature, max_tokens, tools, etc.)

import anthropic

client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain microservices in 3 sentences."}
    ]
)
print(message.content[0].text)

Authentication and Configuration

Getting an API Key

→Create an account on console.anthropic.com
→Navigate to Settings > API Keys
→Click Create Key and give it a descriptive name
→Copy the key (it won't be displayed again)

Configuring the API Key

# Environment variable (recommended)
export ANTHROPIC_API_KEY="sk-ant-api03-..."

# Or in a .env file
echo 'ANTHROPIC_API_KEY=sk-ant-api03-...' >> .env

# Python - Automatic via environment variable
client = anthropic.Anthropic()

# Python - Explicit
client = anthropic.Anthropic(api_key="sk-ant-api03-...")

// TypeScript - Automatic via environment variable
const client = new Anthropic();

// TypeScript - Explicit
const client = new Anthropic({ apiKey: "sk-ant-api03-..." });

The Messages API in Detail

The Messages API is the core of interaction with Claude. Here is the complete structure of a request:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    temperature=0.7,
    system="You are an expert in software architecture.",
    messages=[
        {
            "role": "user",
            "content": "What are the benefits of microservices?"
        },
        {
            "role": "assistant",
            "content": "Microservices offer several key advantages..."
        },
        {
            "role": "user",
            "content": "And the drawbacks?"
        }
    ],
    stop_sequences=["\n\nHuman:"]
)

print(response.content[0].text)
print(f"Tokens: {response.usage.input_tokens} in / {response.usage.output_tokens} out")

Key Parameters

Parameter	Type	Description	Default
`model`	string	Model ID to use	Required
`max_tokens`	int	Maximum number of output tokens	Required
`messages`	array	Conversation history	Required
`system`	string	System prompt	None
`temperature`	float	Creativity (0.0 - 1.0)	1.0
`top_p`	float	Nucleus sampling	1.0
`top_k`	int	Top-K sampling	None
`stop_sequences`	array	Stop sequences	None
`stream`	bool	Enable streaming	false
`tools`	array	Tools available for Claude	None
`metadata`	object	Metadata (e.g., user_id)	None

Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Microservices offer..."
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 42,
    "output_tokens": 156
  }
}

Streaming

Streaming allows you to display Claude's response in real time, token by token. Essential for interactive user interfaces.

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a poem about code."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a poem about code." }],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}

Streaming Events

Event	Description
`message_start`	Message start, contains metadata
`content_block_start`	Start of a content block
`content_block_delta`	Text fragment (the actual content)
`content_block_stop`	End of a content block
`message_delta`	Message update (stop_reason, usage)
`message_stop`	End of message

Batch API: Bulk Processing

The Batch API lets you send up to 100,000 requests in a single batch, with a 50% cost reduction and a processing time of up to 24 hours.

import anthropic

client = anthropic.Anthropic()

# Create a batch
batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        },
        {
            "custom_id": "req-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Translate this text: ..."}
                ]
            }
        }
    ]
)

# Check status
status = client.batches.retrieve(batch.id)
print(f"Status: {status.processing_status}")

# Retrieve results when ready
if status.processing_status == "ended":
    for result in client.batches.results(batch.id):
        print(f"{result.custom_id}: {result.result.message.content[0].text}")

When to Use the Batch API?

Use case	Messages API	Batch API
Real-time chatbot	✅	❌
Analyzing 10,000 documents	❌	✅
Bulk content translation	❌	✅
Support ticket classification	⚠️ (costly)	✅
Interactive assistant	✅	❌
Periodic report generation	⚠️	✅

Official SDKs

Anthropic provides SDKs for the major programming languages:

SDK	Language	Installation	Maintained by
`anthropic`	Python	`pip install anthropic`	Anthropic
`@anthropic-ai/sdk`	TypeScript/JS	`npm install @anthropic-ai/sdk`	Anthropic
`anthropic-java`	Java	Maven/Gradle	Anthropic
`anthropic-go`	Go	`go get github.com/anthropics/anthropic-sdk-go`	Anthropic
`anthropic-ruby`	Ruby	`gem install anthropic`	Anthropic

Java Example

import com.anthropic.client.AnthropicClient;
import com.anthropic.models.*;

AnthropicClient client = AnthropicClient.builder()
    .apiKey(System.getenv("ANTHROPIC_API_KEY"))
    .build();

MessageCreateParams params = MessageCreateParams.builder()
    .model("claude-sonnet-4-20250514")
    .maxTokens(1024)
    .addUserMessage("Hello Claude!")
    .build();

Message message = client.messages().create(params);
System.out.println(message.content().get(0).text());

Go Example

package main

import (
    "context"
    "fmt"
    "github.com/anthropics/anthropic-sdk-go"
)

func main() {
    client := anthropic.NewClient()

    message, err := client.Messages.New(context.Background(),
        anthropic.MessageNewParams{
            Model:     anthropic.ModelClaudeSonnet4_20250514,
            MaxTokens: 1024,
            Messages: []anthropic.MessageParam{
                anthropic.NewUserMessage(
                    anthropic.NewTextBlock("Hello Claude!"),
                ),
            },
        },
    )
    if err != nil {
        panic(err)
    }
    fmt.Println(message.Content[0].Text)
}

Error Handling

The Claude API uses standard HTTP codes and descriptive error messages.

HTTP Code	Meaning	Recommended Action
400	Invalid request	Check parameters
401	Invalid API key	Verify your API key
403	Permission denied	Check model permissions
429	Rate limit reached	Wait and retry with backoff
500	Server error	Retry after a few seconds
529	API overloaded	Retry with exponential backoff

Robust Error Handling

import anthropic
import time

client = anthropic.Anthropic()

def call_claude_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=messages
            )
        except anthropic.RateLimitError:
            wait = 2 ** attempt  # Exponential backoff
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                time.sleep(1)
                continue
            raise
    raise Exception("Maximum number of retries reached")

Rate Limits

Rate limits protect API stability and vary based on your usage tier.

Tier	Requests/min	Input tokens/min	Output tokens/min
Tier 1 (default)	4,000	400,000	80,000
Tier 2	8,000	800,000	160,000
Tier 3	16,000	1,600,000	320,000
Tier 4	32,000	3,200,000	640,000

Response headers include your current limits:

anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 3999
anthropic-ratelimit-requests-reset: 2026-03-10T12:00:30Z
anthropic-ratelimit-tokens-limit: 400000
anthropic-ratelimit-tokens-remaining: 399800

Pricing

Model	Input ($/M tokens)	Output ($/M tokens)	Cache Write	Cache Read
Claude Opus 4.6	15.00	75.00	18.75	1.50
Claude Sonnet 4	3.00	15.00	3.75	0.30
Claude Haiku 3.5	0.80	4.00	1.00	0.08

Quick calculation: A typical conversation (500 tokens in + 500 tokens out) with Sonnet costs approximately $0.009, less than one cent.

Common Patterns

Multi-Turn Conversation

conversation = []

def chat(user_message):
    conversation.append({"role": "user", "content": user_message})
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are an expert Python development assistant.",
        messages=conversation
    )
    
    assistant_message = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_message})
    return assistant_message

# Usage
print(chat("How do I create a REST API with FastAPI?"))
print(chat("Add JWT authentication."))
print(chat("Now add tests."))

Structured Output (JSON)

import json

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Analyze this text and return a structured JSON:
        
        "The product is excellent, fast delivery but damaged packaging."
        
        Expected format:
        {"sentiment": "positive|negative|mixed", "aspects": [...], "score": 0-10}"""
    }]
)

result = json.loads(response.content[0].text)
print(result)
# {"sentiment": "mixed", "aspects": [...], "score": 7}

System Prompt with Context

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    system="""You are an assistant for the "TechShop" e-commerce platform.

Rules:
- Always respond in English
- Only recommend products from the catalog
- If you don't know the answer, redirect to support

Current catalog:
- MacBook Pro M4: $2,499
- iPhone 16 Pro: $1,299
- iPad Air M3: $799""",
    messages=[
        {"role": "user", "content": "Which laptop would you recommend?"}
    ]
)

Cloud Access

The Claude API is also available through major cloud providers:

Platform	Main Advantage	Dedicated Guide
Amazon Bedrock	Native AWS integration, unified billing	Bedrock Guide
Google Vertex AI	Native GCP integration, model garden	Vertex AI Guide
Direct API	Immediate access, latest features	This guide

Guidelines

→Use environment variables for API keys, never hard-code them
→Implement retry with exponential backoff to handle transient errors
→Monitor your usage via the Anthropic console to avoid surprises
→Use streaming for interactive user interfaces
→Prefer the Batch API for bulk processing (50% savings)
→Enable prompt caching for repetitive system prompts
→Choose the right model: Haiku for simple tasks, Sonnet for general use, Opus for complex reasoning

Resources

Resource	Link
Official documentation	docs.anthropic.com
Anthropic Console	console.anthropic.com
Python SDK	github.com/anthropics/anthropic-sdk-python
TypeScript SDK	github.com/anthropics/anthropic-sdk-typescript
Cookbook	github.com/anthropics/anthropic-cookbook

→Structured Outputs and Strict Mode, Guarantee valid JSON outputs with the strict parameter

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Explore the Module

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact

Published: March 10, 2026Updated: April 24, 2026

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

How much does the Claude API cost?+

Prices vary by model: Claude 4.6 Opus costs $15/M input tokens and $75/M output tokens. Claude Sonnet is $3/M input and $15/M output. Prompt caching reduces costs by up to 90%.

How do I get a Claude API key?+

Create an account at console.anthropic.com, go to Settings > API Keys, then generate a new key. Add credits to your account to start using the API.

What is the difference between the Messages API and the Batch API?+

The Messages API processes requests in real time (response in seconds). The Batch API processes batches of requests asynchronously, with a 50% cost reduction and up to 24-hour processing time.

What SDKs are available for the Claude API?+

Anthropic provides official SDKs for Python, TypeScript/JavaScript, Java, Go, and Ruby. Community SDKs exist for other languages like Rust, PHP, and C#.

What are the Claude API rate limits?+

Default rate limits are 4,000 requests/minute and 400,000 tokens/minute for tier 1. You can request an increase through the Anthropic console based on your usage.

How much does a Claude API token cost?+

Prices vary by model: Haiku 4.5 costs $0.80/M input tokens and $4/M output tokens. Sonnet 4.6 costs $3/M input and $15/M output. Opus 4.6 costs $15/M input and $75/M output. Prompt caching reduces costs by 90% on cached tokens.

Do I need Claude Pro to use the API?+

No. The Claude API and Claude Pro subscription are separate products. The API requires an API key (created at console.anthropic.com) and uses pay-per-use pricing. Claude Pro is a monthly subscription for the claude.ai web interface.

How to use the Anthropic API in Python?+

Install the SDK with 'pip install anthropic', then create a client with your API key. A basic call: client.messages.create(model='claude-sonnet-4-6-20260610', max_tokens=1024, messages=[{'role': 'user', 'content': 'Your question'}]). See our 'First API Call' section for a full tutorial.