BusyBeaver-50M

BusyBeaver

BusyBeaver-50M is a compact agent-policy model for strict JSON tool-call prediction. It is not a general chatbot. It receives a compact agent state, goal, recent observations, and available tool schemas, then predicts exactly one next tool call for a local agent harness.

Intended Adapter Use

BusyBeaver-50M is intended to work with the BusyBeaver Hermes Adapter / harness. In production it should be used as: model-selected tool + deterministic harness argument resolver.

This repository currently packages the RunPod-trained V12 path-grounding checkpoint 250. The full checkpoint archive is GestaltLabs/BusyBeaver-50M-v12-path-grounding-runpod.

Hermes Adapter

A standalone BusyBeaver Hermes adapter package is available on GitHub:

https://github.com/DJLougen/BusyBeaver-Hermes-Adapter

The adapter runs BusyBeaver as a compact OpenAI-compatible policy endpoint, detects BusyBeaver model selections inside Hermes-style harnesses, and maps strict JSON BusyBeaver actions into harness-native tool events and deterministic artifacts.

BusyBeaver should not replace the full Hermes controller. It is a tiny local tool-policy helper for deterministic operations: inspect, test, patch, diff, safe shell, recovery, memory, cron/message routing, and escalation gates.

Production Contract

BusyBeaver-50M is strongest when the harness supplies compact state and then validates/resolves the emitted action:

  1. Model emits one strict JSON object.
  2. Harness validates tool name and schema.
  3. Harness resolves concrete arguments from structured state when needed, especially file paths, commands, cron fields, and message targets.
  4. Harness enforces safety gates before execution.

This keeps the model tiny while avoiding the main weakness of sub-100M models: copying arbitrary long paths or commands from context perfectly.

Input Format

<|system|>
You are BusyBeaver, a small tool-policy model. Emit exactly one JSON object matching the schema. Do not explain.
<|goal|>
...
<|state|>
...
<|tools|>
...
<|output_schema|>
{"tool":"string","args":"object","confidence":"number","state_update":"string"}
<|assistant|>

Expected output is strict JSON only:

{"tool":"read_file","args":{"path":"src/parser.py"},"confidence":0.97,"state_update":"Read the referenced file before editing."}

Canonical Tools

  • read_file
  • list_files
  • run_shell / Hermes shell
  • run_tests
  • apply_patch
  • git_diff
  • remember / Hermes memory_write
  • retrieve_memory
  • cron_create, cron_update
  • message_send
  • clarify, escalate

Evaluation

V12 checkpoint 250 raw checkpoint validation:

Metric Score
JSON validity 1.0000
Schema validity 0.9792
Correct tool 0.9818
Arg semantic 0.6510

V12 with harness argument resolver on frozen evals:

Eval JSON Schema Correct Tool Arg Semantic Unsafe Cmd Placeholder
frozen_path_grounding_v2 1.0000 1.0000 1.0000 0.9792 0.0000 0.0000
frozen_harness_v1 1.0000 1.0000 1.0000 0.9000 0.0000 0.0000

The unresolved V11 baseline on a 24-row adversarial path-copy sample was correct_tool=0.4167 and arg_sem=0.0000; V12 plus resolver fixes that product-level failure mode.

Model Size

  • Parameters: 49,382,784
  • Tokenizer: 16k BusyBeaver policy tokenizer
  • Context length used in training/eval: 2048 tokens
  • Architecture: BusyBeaver QDelta causal LM
  • Reloadable weights: busybeaver_state.pt

The included model.safetensors is kept for compatibility with training output, but the current local loader should prefer busybeaver_state.pt.

Loading

Use the BusyBeaver local implementation from the adapter or training repo. The loader instantiates BusyBeaverQDeltaForCausalLM from config.json, then loads busybeaver_state.pt.

import torch
from busybeaver.modeling import BusyBeaverQDeltaConfig, BusyBeaverQDeltaForCausalLM

model_dir = "path/to/BusyBeaver-50M"
cfg = BusyBeaverQDeltaConfig.from_pretrained(model_dir)
model = BusyBeaverQDeltaForCausalLM(cfg)
state = torch.load(f"{model_dir}/busybeaver_state.pt", map_location="cpu")
model.load_state_dict(state, strict=True)
model.eval()

Harness Integration

Expose BusyBeaver to normal agent harnesses through the OpenAI-compatible adapter server:

python scripts/busybeaver_openai_server.py   --model GestaltLabs/BusyBeaver-50M   --host 127.0.0.1   --port 8765

Use http://127.0.0.1:8765/v1 as the OpenAI-compatible base URL and BusyBeaver-50M as the model id. Native support in engines such as llama.cpp, vLLM, or Ollama requires either a BusyBeaver architecture adapter or a future export through a compatible runtime wrapper.

Safety

BusyBeaver predicts tool calls; it does not execute them. Production harnesses should validate schema, reject unsafe shell commands, sandbox execution, cap repeated identical actions, and log state/action pairs for trajectory analysis.

Limitations

  • Specialized policy model, not a general assistant.
  • Depends on BusyBeaver/Hermes compact state formatting.
  • Concrete argument reliability depends on the harness argument resolver.
  • Browser-agent data was not the main training target yet.
  • Custom architecture requires the BusyBeaver loader/adapter unless exported through a compatible runtime wrapper.

Provenance

  • Internal run label: V12 path-grounding
  • Training hardware: RunPod GPU pod
  • Promoted checkpoint: 250
  • Full checkpoint archive: GestaltLabs/BusyBeaver-50M-v12-path-grounding-runpod
  • Training payload: DJLougen/busybeaver-training-payload-v12-path-grounding
Downloads last month
168
Safetensors
Model size
49.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support