Reflexion Prompting: Reinforcing LLMs through Self-Reflection
Introduction to Reflexion
Reflexion is a cutting-edge framework, designed to reinforce language-based agents using linguistic feedback. Unlike traditional reinforcement learning, which relies heavily on numerical rewards, Reflexion leverages verbal feedback and self-reflection to help agents learn from their mistakes and improve performance across various tasks.
Core Components of Reflexion
The Reflexion framework consists of three main components, each playing a distinct role in the learning process:
- Actor: Generates text and actions based on state observations. It produces trajectories by interacting with an environment, guided by methods like Chain-of-Thought (CoT) or ReAct. A memory module provides context to inform future actions.
- Evaluator: Scores the output generated by the Actor. The evaluation may be scalar (numeric) or free-form language and helps quantify the quality of the trajectory. LLMs and rule-based heuristics are commonly used for evaluation.
- Self-Reflection: Converts the feedback from the Evaluator into actionable verbal reinforcement. This allows the Actor to improve in future episodes. Reflections are stored in memory, forming a long-term knowledge base that accelerates learning.
How Reflexion Works
The Reflexion workflow can be summarized in the following steps:
- Define the task or objective.
- Generate a trajectory using the Actor, which includes actions and observations.
- Evaluate the trajectory to provide a reward or feedback.
- Perform self-reflection, generating verbal feedback to improve the agent’s policy.
- Use the updated feedback and memory to generate the next trajectory, iteratively improving performance.
This process allows the agent to learn from prior mistakes in a structured, interpretable manner, extending frameworks like ReAct with episodic memory and self-evaluation capabilities.
Illustrative Example
Consider a Reflexion agent working on a decision-making task in AlfWorld:
- Trajectory Generation: The Actor attempts a multi-step task, such as navigating a house to locate objects.
- Evaluation: The Evaluator scores each step and identifies mistakes, such as taking a wrong path.
- Self-Reflection: The agent generates verbal feedback: "I took the wrong path. Next time, I will check the map before moving."
- Next Iteration: Using the feedback stored in memory, the Actor improves its strategy in the next episode.
Key Advantages of Reflexion
- Learning from Trial and Error: Reflexion allows agents to improve performance iteratively by reflecting on past mistakes.
- Efficient Reinforcement: Unlike traditional RL, no fine-tuning is needed for the underlying LLM, saving compute and data resources.
- Nuanced Feedback: Verbal feedback is richer and more informative than scalar rewards, helping agents understand why actions were suboptimal.
- Interpretability: Self-reflections provide a human-readable memory of past decisions, aiding debugging and analysis.
- Memory Integration: Episodic memory allows the agent to leverage prior experiences for improved decision-making.
Applications of Reflexion
Reflexion has shown effectiveness across a range of tasks:
- Sequential Decision-Making: AlfWorld tasks, where agents navigate environments and complete multi-step objectives.
- Reasoning: Performance improvements on HotPotQA, which requires reasoning over multiple documents.
- Programming: Writing Python or Rust code for benchmarks like HumanEval and MBPP, achieving state-of-the-art results.
Performance Results
Experimental results show significant improvements when Reflexion is applied:
| Task | Base Approach | Reflexion Improvement |
|---|---|---|
| AlfWorld Decision-Making | ReAct | Completed 130/134 tasks with self-reflection feedback |
| HotPotQA Reasoning | CoT | Improved accuracy with episodic memory and self-reflection |
| HumanEval & MBPP Programming | Standard LLM Coding | Higher pass rates and fewer logic errors with self-reflective feedback |
When to Use Reflexion
Reflexion is ideal in the following scenarios:
- Tasks requiring trial-and-error learning, such as sequential decision-making or reasoning challenges.
- When traditional reinforcement learning is impractical due to high data or compute costs.
- Situations where nuanced feedback is essential for improvement.
- Applications needing interpretability and explicit memory for agent decisions.
Limitations of Reflexion
- Reliance on Self-Evaluation: Accuracy of feedback depends on the agent’s evaluation capabilities.
- Memory Constraints: Sliding window memory may limit performance in complex tasks; advanced memory structures may be required.
- Code Generation Limitations: Non-deterministic outputs and hardware dependencies can affect programming tasks.
Conclusion
Reflexion introduces a paradigm shift in reinforcing LLM agents through verbal feedback and self-reflection. By integrating episodic memory, self-evaluation, and iterative feedback, Reflexion agents outperform traditional approaches on complex decision-making, reasoning, and programming tasks. While there are limitations, its interpretability and efficiency make it a promising alternative to conventional reinforcement learning methods.
Further Reading & References
- ReAct Framework: https://python.langchain.com/docs/modules/agents/agent_types/react
- AlfWorld Benchmark: https://github.com/arthurflor23/alfworld
- HotPotQA Dataset: https://hotpotqa.github.io/