Language models that think before they act — and act to think better. ReAct interleaves internal reasoning traces with external tool use, creating a feedback loop that beats both pure reasoning and pure tool use alone.
Ask an LLM: "Was the 2004 Summer Olympics held in a city that also hosted a previous Olympics?" The model needs to know where the 2004 Olympics were held (Athens), whether Athens hosted before (1896), and combine those facts correctly. If its training data is slightly off, it will confidently give a wrong answer with no way to self-correct.
Now try the other extreme: a pure tool-using agent. It gets a Wikipedia search tool. It searches "2004 Summer Olympics," reads the page, searches "Athens Olympics history," reads another page, maybe searches a third thing. No reasoning between steps. It either finds the answer or gets lost in irrelevant documents. There's no internal deliberation to guide which tool to call next.
The specific tasks where this matters most:
These tasks are hard because they require maintaining a coherent goal across many steps, knowing when to search for more information, and being able to revise a plan when new evidence contradicts earlier assumptions. Pure reasoning fails at the first; pure acting fails at the second and third.
Two agents trying to answer "In what year did the director of 'Bambi' also direct 'Pinocchio'?" Click to step through each agent's process and see where they go wrong.
Reasoning-onlyReAct produces a trajectory of interleaved thoughts and actions. A thought is a free-text reasoning trace — the model thinking out loud. An action is a call to an external tool (search, lookup, finish). An observation is the tool's response, fed back into the context. Then another thought, then another action, until the task is complete.
The action space in the original ReAct paper (Wikipedia-based tasks) is three actions: Search[entity] retrieves the Wikipedia intro for entity; Lookup[term] finds the next occurrence of a term in the current page; and Finish[answer] terminates with an answer. Thoughts are unconstrained free text. The model decides everything in natural language — when to think, when to act, and what to say in each.
Let's watch ReAct solve a real multi-hop question step by step. The agent must answer: "Were Scott Derrickson and Ed Wood involved in the same genre of films?"
You control the pace. Each step reveals the next thought (reasoning), action (tool call), or observation (tool response). Watch how the thoughts guide the actions and how each observation updates the plan.
Click "Next Step" to advance through the agent's reasoning. Each box is color-coded: orange=Thought, teal=Action, blue=Observation.
python (pseudocode) def react_agent(question, tools, max_steps=10): context = few_shot_examples + f"Question: {question}\n" for step in range(max_steps): # Model generates next thought/action output = llm.generate(context, stop=["\nObservation"]) if "Finish[" in output: answer = parse_finish(output) return answer # Parse and execute the action action_type, arg = parse_action(output) # e.g., Search[Paris] obs = tools[action_type](arg) # call Wikipedia API, etc. # Append thought+action+observation to context context += output + f"\nObservation: {obs}\n" return "Max steps reached"
Chain-of-Thought (CoT) prompting is the most direct baseline: ask the model to "think step by step" before answering. It works well for math problems and logical deductions where the model's training contains the relevant knowledge. For factual multi-hop questions, CoT's Achilles heel is hallucination.
ReAct addresses this with two mechanisms:
Search[X] and reads the actual Wikipedia answer.| Property | Chain-of-Thought | ReAct |
|---|---|---|
| Factual grounding | Training data only | Live tool calls |
| Hallucination risk | High on obscure facts | Low (verified at each step) |
| Self-correction | None (one-shot) | Yes (observation feedback) |
| Transparency | Yes (reasoning visible) | Yes (full trace) |
| Reasoning capability | Full LLM reasoning | Full LLM reasoning |
| Works offline | Yes | No (needs tools) |
Interestingly, ReAct can be combined with CoT: the CoT → ReAct strategy uses CoT when the model is confident (straightforward reasoning) and switches to ReAct when it needs to verify a fact. The paper calls this the "self-consistency + ReAct" combination and shows it achieves the best of both worlds.
The other baseline: a pure tool-using agent with no reasoning traces. For each step, the model just outputs an action directly. Call it Act-only. It can call the same tools as ReAct but produces no thoughts between observations.
Act-only's failure modes are the mirror image of CoT's:
| Property | Act-Only | ReAct |
|---|---|---|
| External knowledge | Yes (tools) | Yes (tools) |
| Planning between steps | Implicit only | Explicit thought trace |
| Loop detection | No | Yes (thought records what was tried) |
| Sub-goal tracking | Fragile | Explicit in thought steps |
| Interpretability | Low (opaque action sequence) | High (full reasoning visible) |
| Token cost | Lower | Higher (thoughts add tokens) |
ReAct is tool-agnostic. The paper demonstrates it on Wikipedia search, but the pattern applies to any tool with a text input and text output: calculators, code interpreters, database queries, web browsers, calendar APIs, sensor readings. The model generates an action in natural language; a parser routes it to the right tool; the tool returns an observation in natural language; the observation goes back into context.
Search[entity] → Returns first paragraph of entity's Wikipedia article (truncated to 100 words)Lookup[keyword] → Returns next sentence containing keyword in the current article (like browser Ctrl+F)Finish[answer] → Terminates the episode with the final answer stringGood tools for ReAct agents have three properties:
See how an action string from the LLM is parsed into (tool, argument). Edit the action string and watch the parser extract the components.
ReAct was evaluated on four benchmarks using PaLM-540B and GPT-3 with few-shot prompting. No fine-tuning — just in-context examples of the thought-action-observation pattern.
| Task | Dataset | Metric | CoT | Act | ReAct | Best |
|---|---|---|---|---|---|---|
| Multi-hop QA | HotpotQA | EM | 29.4% | 25.7% | 27.4% | ReAct+CoT: 35.1% |
| Fact verif. | FEVER | Acc. | 56.3% | 58.9% | 60.9% | ReAct+CoT: 64.6% |
| Decision (text) | ALFWorld | Succ. | — | 45% | 71% | ReAct: 71% |
| Decision (web) | WebShop | Score | — | 49.9% | 40.0% | Fine-tuned: 62% |
The pattern is clear: ReAct outperforms Act-only on tasks that require multi-step reasoning (ALFWorld, FEVER). CoT + ReAct combination consistently beats both pure methods. The one exception is WebShop, where Act-only slightly outperforms ReAct — the paper attributes this to WebShop's action space being large and continuous, where generating thoughts adds noise without providing useful grounding.
ReAct is one of the foundational papers in what is now called agentic AI — LLMs that act in the world, not just generate text. Understanding its lineage and descendants maps the field.
| Method | Key Idea | vs ReAct |
|---|---|---|
| CoT (Wei et al. 2022) | Reason step by step | ReAct adds actions to CoT's reasoning |
| ReAct (this paper) | Interleave thought + action | — |
| Reflexion (Shinn et al. 2023) | Reflect on failure, store in memory | ReAct per episode; Reflexion across episodes |
| Toolformer (Schick 2023) | Fine-tune to call APIs | ReAct: prompting only; Toolformer: weights updated |
| Tree of Thoughts (Yao 2023) | Branch and prune reasoning paths | ReAct: linear trace; ToT: tree search over thoughts |
| SWE-agent (Yang 2024) | ReAct loop for code editing | Direct descendant with specialized tools |
Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models." 2022. arXiv:2210.03629
"Act not just to execute, but to explore. Think not just to plan, but to verify." — the ReAct philosophy