Running Agents LoPE Demo - Prompt Perturbation for Reasoning Exploration 🧠 Compare baseline and perturbed reasoning for tasks
Running Agents LoPE Demo - Prompt Perturbation for Reasoning Exploration 🧠 Compare baseline and perturbed reasoning for tasks
Paused Agents Lost-in-Thought Benchmark 🧠 Run a benchmark to see how reasoning steps affect retrieval accuracy
Sleeping Agents Master Key Capability Demo 🔑 Show expected accuracy boost for a math problem via steering