Beyond basic prompting lies a rich set of techniques that significantly improve LLM reasoning, accuracy, and reliability. These aren't tricks โ they're systematic approaches backed by research that exploit how Transformers actually process information.
1. Chain-of-Thought (CoT) Prompting
โ
Best for: math problems, logical reasoning, multi-step deductions
CoT prompting instructs the model to show its reasoning step-by-step before giving the final answer. This dramatically improves accuracy on tasks requiring multi-step reasoning.
Zero-Shot CoT โ just add "think step by step"Q: A store has 120 apples. They sell 35% on Monday and 25% of the remainder on Tuesday. How many are left?
Think through this step by step before giving the final answer.
Few-Shot CoT โ show reasoning examplesQ: Roger has 5 tennis balls. He buys 2 cans of 3. How many does he have?
A: Roger starts with 5. 2 cans ร 3 balls = 6 new balls. 5 + 6 = 11. Answer: 11.
Q: A juggler has 16 balls. Half are golf balls. Of the golf balls, half are blue. How many blue golf balls?
A: 16 รท 2 = 8 golf balls. 8 รท 2 = 4 blue golf balls. Answer: 4.
Q: [Your problem here]
A:
Why it works: By generating intermediate reasoning tokens, the model gets more "thinking space" โ each step can inform the next, reducing errors that accumulate in single-step generation.
2. Self-Consistency
โ
Best for: math, factual questions, decisions with a correct answer
Instead of one generation, generate multiple reasoning chains with higher temperature, then take the majority answer. This is like asking several smart people the same question and going with the consensus.
Implementation (pseudocode)responses = []
for i in range(5): # generate 5 independent reasoning chains
response = llm(prompt, temperature=0.8)
responses.append(extract_final_answer(response))
final_answer = majority_vote(responses) # most common answer wins
Self-consistency can improve accuracy on benchmarks by 10โ20% at the cost of 5ร the API calls. Worth it for high-stakes decisions.
3. Tree of Thoughts (ToT)
โ
Best for: creative planning, complex problem solving, strategic decisions
ToT generalizes CoT by exploring multiple reasoning branches simultaneously and using the model to evaluate which branches are most promising โ like a search algorithm over thought space.
ToT Prompt StructureProblem: [describe the problem]
Generate 3 different high-level approaches to solve this. For each approach, rate its feasibility (1-10) and identify the biggest risk.
Then, take the highest-rated approach and generate 3 sub-approaches for its first step. Evaluate each and select the best one to continue exploring.
4. ReAct (Reason + Act)
โ
Best for: agentic tasks, tool use, multi-step workflows with external systems
ReAct interleaves reasoning (Thought) with actions (Act) and observations. The model thinks โ takes an action (search, calculate, look up) โ observes the result โ continues thinking. This is the foundation of most AI agent frameworks.
ReAct PatternYou have access to these tools:
- search(query): Returns top 3 web results
- calculate(expression): Returns numeric result
- get_weather(city): Returns current weather
Respond in this format:
Thought: [your reasoning]
Action: [tool_name(args)]
Observation: [result of action]
... (repeat as needed)
Final Answer: [your conclusion]
Question: What is the population of the city with the tallest building in the world?
5. Prompt Chaining
โ
Best for: complex multi-stage tasks, quality pipelines, structured workflows
Break a complex task into a sequence of simpler prompts where each output feeds the next. This improves quality (each stage can be specialized) and allows validation between steps.
3-Stage Writing Pipeline# Stage 1: Research & outline
prompt_1 = f"Create a detailed outline for an article about {topic}. Include 5 sections with 3 sub-points each."
outline = llm(prompt_1)
# Stage 2: Draft
prompt_2 = f"Write a detailed first draft based on this outline:\n{outline}"
draft = llm(prompt_2)
# Stage 3: Edit & polish
prompt_3 = f"""Edit this draft for clarity, flow, and engagement.
Fix any factual inconsistencies. Tighten the opening hook.
Draft:\n{draft}"""
final = llm(prompt_3)
6. Structured Output with JSON Schema
โ
Best for: any production system that needs to parse LLM output programmatically
Force the model to produce valid, schema-conforming JSON. Most major LLM APIs support this natively โ it eliminates parsing errors entirely.
Using OpenAI JSON modefrom openai import OpenAI
from pydantic import BaseModel
from typing import List
class ProductReview(BaseModel):
sentiment: str # "positive" | "neutral" | "negative"
score: int # 1-10
key_themes: List[str]
suggested_response: str
client = OpenAI()
completion = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "Extract review insights."},
{"role": "user", "content": f"Review: {review_text}"}
],
response_format=ProductReview,
)
review_data = completion.choices[0].message.parsed
7. Metacognitive Prompting
โ
Best for: complex analysis, fact-checking, reducing hallucination
Ask the model to evaluate its own confidence and reasoning. This catches overconfident wrong answers and improves calibration.
Self-Evaluation PromptAnswer the following question. After answering:
1. Rate your confidence: high / medium / low
2. List any assumptions you made
3. Identify what information you're uncertain about
4. If confidence is low or medium, suggest how to verify
Question: What was the GDP of Vietnam in 2024?
Choosing the Right Technique
- Math / logic problems โ Chain-of-Thought + Self-Consistency
- Creative problem solving โ Tree of Thoughts
- Agent workflows with tools โ ReAct
- Multi-step production tasks โ Prompt Chaining
- Programmatic output parsing โ Structured JSON output
- Uncertain/complex factual claims โ Metacognitive prompting
Research finding: Wei et al. (2022) showed Chain-of-Thought prompting with few-shot examples improved performance on the GSM8K math benchmark from 17.9% (standard prompting) to 74.4% with large models. Reasoning unlocked by prompting is not a trick โ it reflects genuine latent capability in the model.
Key Takeaways
- Chain-of-Thought forces step-by-step reasoning โ a huge boost for logic tasks
- Self-Consistency samples multiple chains and votes โ reduces variance at compute cost
- Tree of Thoughts explores branching reasoning โ like heuristic search over ideas
- ReAct is the pattern behind most AI agents: think, act, observe, repeat
- Prompt chaining breaks complex tasks into quality-controlled stages
- Structured output eliminates parsing errors in production systems