January 30, 20268 MIN READ

Temperature & Top-P: Controlling AI Creativity

By Dorian Laurenceau

Part ofModule 1 — LLM Anatomy & Prompt Structure→

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

Ever noticed how ChatGPT sometimes gives creative, varied responses and other times stays strictly factual? That's not random-it's controlled by two parameters: Temperature and Top-P. Understanding them gives you precise control over AI behavior.

Temperature and top-p in 2026: what the defaults get wrong

Sampling parameters are where a surprising number of production LLM problems originate. The threads on r/LocalLLaMA, r/MachineLearning, and r/ChatGPTPro repeatedly return to the same point: the default temperature 0.7 and top-p 1.0 used across most APIs is a compromise that's wrong for many specific tasks.

What the parameters actually do:

→Temperature scales the token logits before softmax. Higher temperature flattens the distribution (more randomness); lower temperature sharpens it (more deterministic). The OpenAI API reference and Anthropic docs both document this accurately.
→Top-p (nucleus sampling) truncates the distribution. Keeps only tokens whose cumulative probability reaches p, samples from those. Introduced by Holtzman et al. 2019.
→They compose. In most APIs, top-p applies after temperature. Using both is allowed but can produce unexpected behaviour if misconfigured.

What practitioners have settled on:

→Factual recall, classification, extraction: temperature 0.0-0.2, top-p 1.0. You want the model's single best answer, no exploration.
→Code generation: temperature 0.2-0.5. Low enough to stay correct, high enough to recover from dead ends.
→Creative writing: temperature 0.7-1.0, top-p 0.9-0.95. The default "chat" settings happen to be right for this one case.
→Summarisation: temperature 0.2-0.4. Low randomness preserves faithfulness to source.
→Brainstorming and diversity sampling: temperature 1.0+, top-p 0.95. Explicitly high to encourage exploration.

Common mistakes:

→Using default 0.7 for factual tasks. This is the single most common cause of "the model is inconsistent" and "hallucination" complaints in production.
→Setting temperature to 0 and expecting determinism. Most hosted APIs are not fully deterministic even at T=0 due to non-determinism in GPU kernels. OpenAI's seed parameter improves this but doesn't guarantee it.
→Using top-p as a substitute for temperature. They control different things. Top-p filters the tail; temperature reshapes the whole distribution.
→Changing both simultaneously. Change one at a time when tuning; otherwise you can't tell what helped.
→Ignoring model-specific quirks. Some open-source models are very sensitive to temperature in particular ranges; some are much less so. Benchmark on your actual task.

What's often overlooked:

→Reasoning models (o-series, GPT-5 thinking, Claude extended thinking) are less sensitive to these parameters. Their internal reasoning dominates the final-answer distribution. For these models, default settings are usually fine.
→Structured-output modes override some of this. JSON-mode and function-calling constrain the output space, making temperature less impactful for the format itself.
→Streaming latency is independent of temperature. Lowering temperature doesn't speed up the model.

The honest framing: temperature and top-p are not arcane ML parameters — they're the most consequential knobs in the API for output quality. Setting them by task, measuring results, and documenting the rationale is cheap and pays off immediately. Using defaults without thinking is the most common unforced error in LLM production systems.

Learn AI — From Prompts to Agents

10 Free Interactive Guides120+ Hands-On Exercises100% Free

Explore All Guides

What Is Temperature?

Temperature controls the randomness of AI responses. It determines how likely the model is to choose unexpected words.

The Scale

Value	Behavior
0.0	Deterministic, Predictable, Focused
0.5	Balanced
1.0	Default, Moderate creativity
2.0	Chaotic, Creative, Random

Low Temperature (0.0 - 0.3)

The AI picks the most probable next word almost every time:

Temperature = 0
"The capital of France is ___"
→ "Paris" (99.9% of the time)

Best for: Factual answers, data extraction, code generation

Medium Temperature (0.4 - 0.7)

Balanced between predictability and variety:

Temperature = 0.5
"Write a greeting"
→ "Hello! How can I help you today?"
→ "Hi there! What brings you here?"
→ "Good day! How may I assist?"

Best for: General writing, emails, documentation

High Temperature (0.8 - 1.5)

More creative, unexpected choices:

Temperature = 1.2
"Write a creative opening"
→ "The moon whispered secrets to the tide..."
→ "Three crows sat on a digital wire..."
→ "Everything changed when the coffee machine became sentient..."

Best for: Creative writing, brainstorming, storytelling

What Is Top-P (Nucleus Sampling)?

Top-P is a different approach: instead of controlling randomness directly, it limits which words the AI can even consider.

How Top-P Works

The AI ranks all possible next words by probability:

Possible words: "Paris" (70%), "Lyon" (15%), "France" (8%), "Marseille" (5%), ...

Top-P = 0.85 → Only considers words until cumulative probability reaches 85%
→ Can choose from: "Paris", "Lyon"
→ Ignores: "France", "Marseille", and everything else

Top-P Values

0.1 → Only the single most likely word
0.5 → Top ~50% probability mass
0.9 → Most words included (default for most APIs)
1.0 → All words possible

Temperature vs Top-P: What's the Difference?

Aspect	Temperature	Top-P
Controls	Selection randomness	Candidate pool size
Mechanism	Scales probabilities	Filters options
Low value	Always pick top choice	Fewer options
High value	More random picks	More options

A Simple Analogy

Imagine picking a restaurant:

Temperature = How adventurous your choice is

→Low: Always pick your favorite
→High: Might try something completely new

Top-P = Which restaurants are even on the list

→Low: Only consider top-rated places
→High: Consider any restaurant in town

Common Use Cases

Factual Q&A / Data Extraction

Temperature: 0.0 - 0.2
Top-P: 0.9 (or even lower)

You want consistency and accuracy:

"Extract the date from: Meeting scheduled for March 15, 2025"
→ Should always return "March 15, 2025"

Professional Writing

Temperature: 0.4 - 0.6
Top-P: 0.85 - 0.95

Balance quality with some variety:

"Draft a professional email declining a meeting request"
→ Natural variation while staying appropriate

Creative Writing

Temperature: 0.8 - 1.2
Top-P: 0.95 - 1.0

Encourage novelty and surprise:

"Write a creative story opening about time travel"
→ Unique, unexpected approaches

Code Generation

Temperature: 0.0 - 0.2
Top-P: 0.9

Code needs to be correct, not creative:

"Write a Python function to calculate factorial"
→ Standard, working implementation

Brainstorming

Temperature: 1.0 - 1.5
Top-P: 0.95

Maximize variety and unexpected ideas:

"Give me 10 creative product name ideas"
→ Wild, diverse suggestions

The Temperature/Top-P Matrix

	Low Top-P (<0.5)	High Top-P (>0.9)
Low Temp (0-0.3)	Very focused, repetitive	Focused with slight variation
High Temp (0.8+)	Somewhat creative	Highly creative, unpredictable

Most APIs default to: Temperature: 0.7, Top-P: 0.9

Practical Tips

1. Adjust One at a Time

Don't change both simultaneously-it's hard to understand the effect:

Step 1: Set Top-P to 0.9 (neutral)
Step 2: Adjust Temperature to find sweet spot

2. Match to Task Criticality

High stakes (legal, medical) → Low temperature
Low stakes (brainstorming) → Higher temperature

3. Test with the Same Prompt

Run the same prompt 5 times to see consistency:

Temperature 0.0 → Same output 5/5 times
Temperature 0.7 → Similar outputs with variation
Temperature 1.2 → Very different each time

4. Document Your Settings

When you find settings that work, save them:

{
  "use_case": "Customer support responses",
  "temperature": 0.3,
  "top_p": 0.9,
  "notes": "Professional, consistent tone"
}

Common Mistakes

1. Temperature Too High for Facts

Temperature: 1.5
"What year was the Eiffel Tower built?"
→ "1889" or "1887" or "around 1890" 😕

2. Temperature Too Low for Creativity

Temperature: 0.0
"Write a creative story"
→ Same generic story every time

3. Ignoring These Settings Entirely

Default values work often, but not always. Tune them for your use case.

Key Takeaways

→Temperature controls response randomness (0.0 = focused, 1.0+ = creative)
→Top-P filters which words are even considered
→Low settings for facts, code, extraction
→High settings for creativity, brainstorming
→Test and tune for your specific use case

Ready to Master LLM Parameters?

This article covered the what and why of Temperature and Top-P. But effective AI applications require understanding the full range of parameters and techniques.

In our Module 1, Fundamentals of Prompt Engineering, you'll learn:

→Complete parameter reference (Temperature, Top-P, Max Tokens)
→How token prediction actually works
→Context window management
→Practical configuration for different use cases

→ Explore Module 1: Fundamentals

GO DEEPER — FREE GUIDE

Module 1 — LLM Anatomy & Prompt Structure

Understand how LLMs work and construct clear, reusable prompts.

Explore the Module

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact

Published: January 30, 2026Updated: April 24, 2026

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is temperature in AI models?+

Temperature controls randomness in AI outputs. Low temperature (0-0.3) makes responses focused and deterministic. High temperature (0.7-1.0) makes outputs more creative and varied.

What is Top-P (nucleus sampling)?+

Top-P limits which tokens the model considers. Top-P of 0.9 means the model picks from tokens covering 90% probability mass, excluding unlikely options. It's an alternative to temperature.

Should I use temperature or Top-P?+

Use one, not both. Temperature is more intuitive for most users. Top-P gives finer control. For factual tasks, use low temperature (0.1-0.3). For creative tasks, use higher values (0.7-0.9).

What settings should I use for different tasks?+

Code/math: temperature 0-0.2. Factual Q&A: 0.1-0.3. Business writing: 0.3-0.5. Creative writing: 0.7-0.9. Brainstorming: 0.9-1.0. Always test for your specific use case.