Back to all articles
14 MIN READ

Claude API: Practical Guide with Python & TypeScript (2026)

By Dorian Laurenceau

Claude API: Complete Guide, Messages, Batch, SDKs & Integrations

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

📚 Specialized guides: Tool Use | Extended Thinking | Vision | Computer Use | Evals | Bedrock | Vertex AI | Prompt Caching | Hackathon


Why Use the Claude API?

The Claude API is the most direct way to integrate Anthropic's artificial intelligence into your applications. Unlike the claude.ai web interface, the API gives you full control over:

  • The model used (Opus, Sonnet, Haiku)
  • Generation parameters (temperature, max tokens, stop sequences)
  • Advanced features (tool use, vision, extended thinking, streaming)
  • Integration architecture (real-time, batch, webhooks)

Claude API in production: what actually matters

The Claude API is solid, well-documented, and shipping-ready; it also has operational gotchas that the marketing pages don't surface. The threads on r/ClaudeAI, r/LocalLLaMA, r/ChatGPTCoding, and r/ExperiencedDevs cover what teams actually hit.

What the Anthropic API docs get right:

  • Clean request/response shape. The Messages API is simpler than OpenAI's Chat Completions in a few useful ways (system parameter is top-level, content blocks are explicit).
  • Prompt caching is production-ready and genuinely reduces cost for long-context workflows. Measure before and after; the savings compound.
  • Message batching for non-urgent workloads is a 50% discount most teams don't use and should.
  • Tool use is first-class and well-specified.
  • Extended thinking on Claude 3.7+ gives visible reasoning for evaluation and debugging.

What catches teams in production:

  • Rate limits per organisation, not per key. Heavy workloads need enterprise tiers or Bedrock / Vertex AI for higher quotas.
  • Token counting differs from OpenAI's. The tokenizer is documented, but cost estimates copy-pasted from OpenAI-land will be off.
  • Streaming backpressure. Long streaming responses need proper SSE handling; buffering at proxies (Cloudflare, NGINX) breaks streaming in subtle ways.
  • Retries and idempotency. Implement exponential backoff; the official SDK handles most cases, but batched workflows need extra care.
  • Content filter ambiguity. Some safety refusals are hard to distinguish from legitimate "I don't know" responses without inspection. Log raw responses for diagnosis.
  • No built-in embeddings endpoint. Pair with Voyage AI, OpenAI embeddings, or Cohere for RAG.

What production teams actually do:

  • Use the official SDKs (Python, TypeScript, Go). Hand-rolled HTTP calls miss retry/streaming/caching logic.
  • Abstract the provider with LiteLLM or similar so switching to Bedrock, Vertex, or another vendor doesn't require code changes.
  • Instrument everything. Langfuse, Helicone, LangSmith, or PostHog LLM analytics make debugging and cost attribution tractable.
  • Cache aggressively. Pair prompt caching with request-level caching (Redis, Cloudflare KV) for idempotent prompts.
  • Evaluate continuously. promptfoo, Braintrust, or home-grown eval harnesses run on PRs.
  • Set hard budgets. Per-request token caps, per-user spend limits, per-feature monthly budgets. Without these, a loop bug can burn thousands overnight.

The honest framing: the Claude API is one of the best-engineered LLM APIs available, and it behaves like a real production service, not a research preview. The operational discipline around it (caching, instrumentation, evals, budgets, retries) is where most teams underinvest. Build the scaffolding once; the API itself is the easy part.

API Architecture

The Claude API is built on a simple REST architecture with a single main endpoint:

POST https://api.anthropic.com/v1/messages

Each request includes:

  1. A model (claude-sonnet-4-20250514, claude-opus-4-20250918, etc.)
  2. Messages (conversation as an array)
  3. Optional parameters (temperature, max_tokens, tools, etc.)
import anthropic

client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain microservices in 3 sentences."}
    ]
)
print(message.content[0].text)

Authentication and Configuration

Getting an API Key

  1. Create an account on console.anthropic.com
  2. Navigate to Settings > API Keys
  3. Click Create Key and give it a descriptive name
  4. Copy the key (it won't be displayed again)

Configuring the API Key

# Environment variable (recommended)
export ANTHROPIC_API_KEY="sk-ant-api03-..."

# Or in a .env file
echo 'ANTHROPIC_API_KEY=sk-ant-api03-...' >> .env
# Python - Automatic via environment variable
client = anthropic.Anthropic()

# Python - Explicit
client = anthropic.Anthropic(api_key="sk-ant-api03-...")
// TypeScript - Automatic via environment variable
const client = new Anthropic();

// TypeScript - Explicit
const client = new Anthropic({ apiKey: "sk-ant-api03-..." });

The Messages API in Detail

The Messages API is the core of interaction with Claude. Here is the complete structure of a request:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    temperature=0.7,
    system="You are an expert in software architecture.",
    messages=[
        {
            "role": "user",
            "content": "What are the benefits of microservices?"
        },
        {
            "role": "assistant",
            "content": "Microservices offer several key advantages..."
        },
        {
            "role": "user",
            "content": "And the drawbacks?"
        }
    ],
    stop_sequences=["\n\nHuman:"]
)

print(response.content[0].text)
print(f"Tokens: {response.usage.input_tokens} in / {response.usage.output_tokens} out")

Key Parameters

ParameterTypeDescriptionDefault
modelstringModel ID to useRequired
max_tokensintMaximum number of output tokensRequired
messagesarrayConversation historyRequired
systemstringSystem promptNone
temperaturefloatCreativity (0.0 - 1.0)1.0
top_pfloatNucleus sampling1.0
top_kintTop-K samplingNone
stop_sequencesarrayStop sequencesNone
streamboolEnable streamingfalse
toolsarrayTools available for ClaudeNone
metadataobjectMetadata (e.g., user_id)None

Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Microservices offer..."
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 42,
    "output_tokens": 156
  }
}

Streaming

Streaming allows you to display Claude's response in real time, token by token. Essential for interactive user interfaces.

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a poem about code."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a poem about code." }],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}

Streaming Events

EventDescription
message_startMessage start, contains metadata
content_block_startStart of a content block
content_block_deltaText fragment (the actual content)
content_block_stopEnd of a content block
message_deltaMessage update (stop_reason, usage)
message_stopEnd of message

Batch API: Bulk Processing

The Batch API lets you send up to 100,000 requests in a single batch, with a 50% cost reduction and a processing time of up to 24 hours.

import anthropic

client = anthropic.Anthropic()

# Create a batch
batch = client.batches.create(
    requests=[
        {
            "custom_id": "req-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Summarize this article: ..."}
                ]
            }
        },
        {
            "custom_id": "req-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Translate this text: ..."}
                ]
            }
        }
    ]
)

# Check status
status = client.batches.retrieve(batch.id)
print(f"Status: {status.processing_status}")

# Retrieve results when ready
if status.processing_status == "ended":
    for result in client.batches.results(batch.id):
        print(f"{result.custom_id}: {result.result.message.content[0].text}")

When to Use the Batch API?

Use caseMessages APIBatch API
Real-time chatbot
Analyzing 10,000 documents
Bulk content translation
Support ticket classification⚠️ (costly)
Interactive assistant
Periodic report generation⚠️

Official SDKs

Anthropic provides SDKs for the major programming languages:

SDKLanguageInstallationMaintained by
anthropicPythonpip install anthropicAnthropic
@anthropic-ai/sdkTypeScript/JSnpm install @anthropic-ai/sdkAnthropic
anthropic-javaJavaMaven/GradleAnthropic
anthropic-goGogo get github.com/anthropics/anthropic-sdk-goAnthropic
anthropic-rubyRubygem install anthropicAnthropic

Java Example

import com.anthropic.client.AnthropicClient;
import com.anthropic.models.*;

AnthropicClient client = AnthropicClient.builder()
    .apiKey(System.getenv("ANTHROPIC_API_KEY"))
    .build();

MessageCreateParams params = MessageCreateParams.builder()
    .model("claude-sonnet-4-20250514")
    .maxTokens(1024)
    .addUserMessage("Hello Claude!")
    .build();

Message message = client.messages().create(params);
System.out.println(message.content().get(0).text());

Go Example

package main

import (
    "context"
    "fmt"
    "github.com/anthropics/anthropic-sdk-go"
)

func main() {
    client := anthropic.NewClient()

    message, err := client.Messages.New(context.Background(),
        anthropic.MessageNewParams{
            Model:     anthropic.ModelClaudeSonnet4_20250514,
            MaxTokens: 1024,
            Messages: []anthropic.MessageParam{
                anthropic.NewUserMessage(
                    anthropic.NewTextBlock("Hello Claude!"),
                ),
            },
        },
    )
    if err != nil {
        panic(err)
    }
    fmt.Println(message.Content[0].Text)
}

Error Handling

The Claude API uses standard HTTP codes and descriptive error messages.

HTTP CodeMeaningRecommended Action
400Invalid requestCheck parameters
401Invalid API keyVerify your API key
403Permission deniedCheck model permissions
429Rate limit reachedWait and retry with backoff
500Server errorRetry after a few seconds
529API overloadedRetry with exponential backoff

Robust Error Handling

import anthropic
import time

client = anthropic.Anthropic()

def call_claude_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=messages
            )
        except anthropic.RateLimitError:
            wait = 2 ** attempt  # Exponential backoff
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                time.sleep(1)
                continue
            raise
    raise Exception("Maximum number of retries reached")

Rate Limits

Rate limits protect API stability and vary based on your usage tier.

TierRequests/minInput tokens/minOutput tokens/min
Tier 1 (default)4,000400,00080,000
Tier 28,000800,000160,000
Tier 316,0001,600,000320,000
Tier 432,0003,200,000640,000

Response headers include your current limits:

anthropic-ratelimit-requests-limit: 4000
anthropic-ratelimit-requests-remaining: 3999
anthropic-ratelimit-requests-reset: 2026-03-10T12:00:30Z
anthropic-ratelimit-tokens-limit: 400000
anthropic-ratelimit-tokens-remaining: 399800

Pricing

ModelInput ($/M tokens)Output ($/M tokens)Cache WriteCache Read
Claude Opus 4.615.0075.0018.751.50
Claude Sonnet 43.0015.003.750.30
Claude Haiku 3.50.804.001.000.08

Quick calculation: A typical conversation (500 tokens in + 500 tokens out) with Sonnet costs approximately $0.009, less than one cent.

Common Patterns

Multi-Turn Conversation

conversation = []

def chat(user_message):
    conversation.append({"role": "user", "content": user_message})
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are an expert Python development assistant.",
        messages=conversation
    )
    
    assistant_message = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_message})
    return assistant_message

# Usage
print(chat("How do I create a REST API with FastAPI?"))
print(chat("Add JWT authentication."))
print(chat("Now add tests."))

Structured Output (JSON)

import json

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Analyze this text and return a structured JSON:
        
        "The product is excellent, fast delivery but damaged packaging."
        
        Expected format:
        {"sentiment": "positive|negative|mixed", "aspects": [...], "score": 0-10}"""
    }]
)

result = json.loads(response.content[0].text)
print(result)
# {"sentiment": "mixed", "aspects": [...], "score": 7}

System Prompt with Context

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    system="""You are an assistant for the "TechShop" e-commerce platform.

Rules:
- Always respond in English
- Only recommend products from the catalog
- If you don't know the answer, redirect to support

Current catalog:
- MacBook Pro M4: $2,499
- iPhone 16 Pro: $1,299
- iPad Air M3: $799""",
    messages=[
        {"role": "user", "content": "Which laptop would you recommend?"}
    ]
)

Cloud Access

The Claude API is also available through major cloud providers:

PlatformMain AdvantageDedicated Guide
Amazon BedrockNative AWS integration, unified billingBedrock Guide
Google Vertex AINative GCP integration, model gardenVertex AI Guide
Direct APIImmediate access, latest featuresThis guide

Guidelines

  1. Use environment variables for API keys, never hard-code them
  2. Implement retry with exponential backoff to handle transient errors
  3. Monitor your usage via the Anthropic console to avoid surprises
  4. Use streaming for interactive user interfaces
  5. Prefer the Batch API for bulk processing (50% savings)
  6. Enable prompt caching for repetitive system prompts
  7. Choose the right model: Haiku for simple tasks, Sonnet for general use, Opus for complex reasoning

Resources


GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

D

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact
Published: March 10, 2026Updated: April 24, 2026
Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

How much does the Claude API cost?+

Prices vary by model: Claude 4.6 Opus costs $15/M input tokens and $75/M output tokens. Claude Sonnet is $3/M input and $15/M output. Prompt caching reduces costs by up to 90%.

How do I get a Claude API key?+

Create an account at console.anthropic.com, go to Settings > API Keys, then generate a new key. Add credits to your account to start using the API.

What is the difference between the Messages API and the Batch API?+

The Messages API processes requests in real time (response in seconds). The Batch API processes batches of requests asynchronously, with a 50% cost reduction and up to 24-hour processing time.

What SDKs are available for the Claude API?+

Anthropic provides official SDKs for Python, TypeScript/JavaScript, Java, Go, and Ruby. Community SDKs exist for other languages like Rust, PHP, and C#.

What are the Claude API rate limits?+

Default rate limits are 4,000 requests/minute and 400,000 tokens/minute for tier 1. You can request an increase through the Anthropic console based on your usage.

How much does a Claude API token cost?+

Prices vary by model: Haiku 4.5 costs $0.80/M input tokens and $4/M output tokens. Sonnet 4.6 costs $3/M input and $15/M output. Opus 4.6 costs $15/M input and $75/M output. Prompt caching reduces costs by 90% on cached tokens.

Do I need Claude Pro to use the API?+

No. The Claude API and Claude Pro subscription are separate products. The API requires an API key (created at console.anthropic.com) and uses pay-per-use pricing. Claude Pro is a monthly subscription for the claude.ai web interface.

How to use the Anthropic API in Python?+

Install the SDK with 'pip install anthropic', then create a client with your API key. A basic call: client.messages.create(model='claude-sonnet-4-6-20260610', max_tokens=1024, messages=[{'role': 'user', 'content': 'Your question'}]). See our 'First API Call' section for a full tutorial.