๐ŸŸก Intermediate

Advanced Prompting Techniques:
Chain-of-Thought, ToT, Self-Consistency & More

โœ๏ธ Prompt Engineeringโฑ 15 min read๐Ÿ—“ May 2026

Beyond basic prompting lies a rich set of techniques that significantly improve LLM reasoning, accuracy, and reliability. These aren't tricks โ€” they're systematic approaches backed by research that exploit how Transformers actually process information.

1. Chain-of-Thought (CoT) Prompting

โœ… Best for: math problems, logical reasoning, multi-step deductions

CoT prompting instructs the model to show its reasoning step-by-step before giving the final answer. This dramatically improves accuracy on tasks requiring multi-step reasoning.

Zero-Shot CoT โ€” just add "think step by step"Q: A store has 120 apples. They sell 35% on Monday and 25% of the remainder on Tuesday. How many are left? Think through this step by step before giving the final answer.
Few-Shot CoT โ€” show reasoning examplesQ: Roger has 5 tennis balls. He buys 2 cans of 3. How many does he have? A: Roger starts with 5. 2 cans ร— 3 balls = 6 new balls. 5 + 6 = 11. Answer: 11. Q: A juggler has 16 balls. Half are golf balls. Of the golf balls, half are blue. How many blue golf balls? A: 16 รท 2 = 8 golf balls. 8 รท 2 = 4 blue golf balls. Answer: 4. Q: [Your problem here] A:

Why it works: By generating intermediate reasoning tokens, the model gets more "thinking space" โ€” each step can inform the next, reducing errors that accumulate in single-step generation.

2. Self-Consistency

โœ… Best for: math, factual questions, decisions with a correct answer

Instead of one generation, generate multiple reasoning chains with higher temperature, then take the majority answer. This is like asking several smart people the same question and going with the consensus.

Implementation (pseudocode)responses = [] for i in range(5): # generate 5 independent reasoning chains response = llm(prompt, temperature=0.8) responses.append(extract_final_answer(response)) final_answer = majority_vote(responses) # most common answer wins

Self-consistency can improve accuracy on benchmarks by 10โ€“20% at the cost of 5ร— the API calls. Worth it for high-stakes decisions.

3. Tree of Thoughts (ToT)

โœ… Best for: creative planning, complex problem solving, strategic decisions

ToT generalizes CoT by exploring multiple reasoning branches simultaneously and using the model to evaluate which branches are most promising โ€” like a search algorithm over thought space.

ToT Prompt StructureProblem: [describe the problem] Generate 3 different high-level approaches to solve this. For each approach, rate its feasibility (1-10) and identify the biggest risk. Then, take the highest-rated approach and generate 3 sub-approaches for its first step. Evaluate each and select the best one to continue exploring.

4. ReAct (Reason + Act)

โœ… Best for: agentic tasks, tool use, multi-step workflows with external systems

ReAct interleaves reasoning (Thought) with actions (Act) and observations. The model thinks โ†’ takes an action (search, calculate, look up) โ†’ observes the result โ†’ continues thinking. This is the foundation of most AI agent frameworks.

ReAct PatternYou have access to these tools: - search(query): Returns top 3 web results - calculate(expression): Returns numeric result - get_weather(city): Returns current weather Respond in this format: Thought: [your reasoning] Action: [tool_name(args)] Observation: [result of action] ... (repeat as needed) Final Answer: [your conclusion] Question: What is the population of the city with the tallest building in the world?

5. Prompt Chaining

โœ… Best for: complex multi-stage tasks, quality pipelines, structured workflows

Break a complex task into a sequence of simpler prompts where each output feeds the next. This improves quality (each stage can be specialized) and allows validation between steps.

3-Stage Writing Pipeline# Stage 1: Research & outline prompt_1 = f"Create a detailed outline for an article about {topic}. Include 5 sections with 3 sub-points each." outline = llm(prompt_1) # Stage 2: Draft prompt_2 = f"Write a detailed first draft based on this outline:\n{outline}" draft = llm(prompt_2) # Stage 3: Edit & polish prompt_3 = f"""Edit this draft for clarity, flow, and engagement. Fix any factual inconsistencies. Tighten the opening hook. Draft:\n{draft}""" final = llm(prompt_3)

6. Structured Output with JSON Schema

โœ… Best for: any production system that needs to parse LLM output programmatically

Force the model to produce valid, schema-conforming JSON. Most major LLM APIs support this natively โ€” it eliminates parsing errors entirely.

Using OpenAI JSON modefrom openai import OpenAI from pydantic import BaseModel from typing import List class ProductReview(BaseModel): sentiment: str # "positive" | "neutral" | "negative" score: int # 1-10 key_themes: List[str] suggested_response: str client = OpenAI() completion = client.beta.chat.completions.parse( model="gpt-4o", messages=[ {"role": "system", "content": "Extract review insights."}, {"role": "user", "content": f"Review: {review_text}"} ], response_format=ProductReview, ) review_data = completion.choices[0].message.parsed

7. Metacognitive Prompting

โœ… Best for: complex analysis, fact-checking, reducing hallucination

Ask the model to evaluate its own confidence and reasoning. This catches overconfident wrong answers and improves calibration.

Self-Evaluation PromptAnswer the following question. After answering: 1. Rate your confidence: high / medium / low 2. List any assumptions you made 3. Identify what information you're uncertain about 4. If confidence is low or medium, suggest how to verify Question: What was the GDP of Vietnam in 2024?

Choosing the Right Technique

Research finding: Wei et al. (2022) showed Chain-of-Thought prompting with few-shot examples improved performance on the GSM8K math benchmark from 17.9% (standard prompting) to 74.4% with large models. Reasoning unlocked by prompting is not a trick โ€” it reflects genuine latent capability in the model.

Key Takeaways