BusyBeaver-50M

BusyBeaver-50M is a compact agent-policy model for strict JSON tool-call prediction. It is not a general chatbot. It receives a compact agent state, goal, recent observations, and available tool schemas, then predicts exactly one next tool call for a local agent harness.

Intended Adapter Use

BusyBeaver-50M is intended to work with the BusyBeaver Hermes Adapter / harness. In production it should be used as: model-selected tool + deterministic harness argument resolver.

This repository currently packages the RunPod-trained V12 path-grounding checkpoint 250. The full checkpoint archive is GestaltLabs/BusyBeaver-50M-v12-path-grounding-runpod.

Hermes Adapter

A standalone BusyBeaver Hermes adapter package is available on GitHub:

https://github.com/DJLougen/BusyBeaver-Hermes-Adapter

The adapter runs BusyBeaver as a compact OpenAI-compatible policy endpoint, detects BusyBeaver model selections inside Hermes-style harnesses, and maps strict JSON BusyBeaver actions into harness-native tool events and deterministic artifacts.

BusyBeaver should not replace the full Hermes controller. It is a tiny local tool-policy helper for deterministic operations: inspect, test, patch, diff, safe shell, recovery, memory, cron/message routing, and escalation gates.

Production Contract

BusyBeaver-50M is strongest when the harness supplies compact state and then validates/resolves the emitted action:

Model emits one strict JSON object.
Harness validates tool name and schema.
Harness resolves concrete arguments from structured state when needed, especially file paths, commands, cron fields, and message targets.
Harness enforces safety gates before execution.

This keeps the model tiny while avoiding the main weakness of sub-100M models: copying arbitrary long paths or commands from context perfectly.

Input Format

<|system|>
You are BusyBeaver, a small tool-policy model. Emit exactly one JSON object matching the schema. Do not explain.
<|goal|>
...
<|state|>
...
<|tools|>
...
<|output_schema|>
{"tool":"string","args":"object","confidence":"number","state_update":"string"}
<|assistant|>

Expected output is strict JSON only:

{"tool":"read_file","args":{"path":"src/parser.py"},"confidence":0.97,"state_update":"Read the referenced file before editing."}

Canonical Tools

read_file
list_files
run_shell / Hermes shell
run_tests
apply_patch
git_diff
remember / Hermes memory_write
retrieve_memory
cron_create, cron_update
message_send
clarify, escalate

Evaluation

V12 checkpoint 250 raw checkpoint validation:

Metric	Score
JSON validity	1.0000
Schema validity	0.9792
Correct tool	0.9818
Arg semantic	0.6510

V12 with harness argument resolver on frozen evals:

Eval	JSON	Schema	Correct Tool	Arg Semantic	Unsafe Cmd	Placeholder
`frozen_path_grounding_v2`	1.0000	1.0000	1.0000	0.9792	0.0000	0.0000
`frozen_harness_v1`	1.0000	1.0000	1.0000	0.9000	0.0000	0.0000

The unresolved V11 baseline on a 24-row adversarial path-copy sample was correct_tool=0.4167 and arg_sem=0.0000; V12 plus resolver fixes that product-level failure mode.

Model Size

Parameters: 49,382,784
Tokenizer: 16k BusyBeaver policy tokenizer
Context length used in training/eval: 2048 tokens
Architecture: BusyBeaver QDelta causal LM
Reloadable weights: busybeaver_state.pt

The included model.safetensors is kept for compatibility with training output, but the current local loader should prefer busybeaver_state.pt.

Loading

Use the BusyBeaver local implementation from the adapter or training repo. The loader instantiates BusyBeaverQDeltaForCausalLM from config.json, then loads busybeaver_state.pt.

import torch
from busybeaver.modeling import BusyBeaverQDeltaConfig, BusyBeaverQDeltaForCausalLM

model_dir = "path/to/BusyBeaver-50M"
cfg = BusyBeaverQDeltaConfig.from_pretrained(model_dir)
model = BusyBeaverQDeltaForCausalLM(cfg)
state = torch.load(f"{model_dir}/busybeaver_state.pt", map_location="cpu")
model.load_state_dict(state, strict=True)
model.eval()

Harness Integration

Expose BusyBeaver to normal agent harnesses through the OpenAI-compatible adapter server:

python scripts/busybeaver_openai_server.py   --model GestaltLabs/BusyBeaver-50M   --host 127.0.0.1   --port 8765

Use http://127.0.0.1:8765/v1 as the OpenAI-compatible base URL and BusyBeaver-50M as the model id. Native support in engines such as llama.cpp, vLLM, or Ollama requires either a BusyBeaver architecture adapter or a future export through a compatible runtime wrapper.

Safety

BusyBeaver predicts tool calls; it does not execute them. Production harnesses should validate schema, reject unsafe shell commands, sandbox execution, cap repeated identical actions, and log state/action pairs for trajectory analysis.

Limitations

Specialized policy model, not a general assistant.
Depends on BusyBeaver/Hermes compact state formatting.
Concrete argument reliability depends on the harness argument resolver.
Browser-agent data was not the main training target yet.
Custom architecture requires the BusyBeaver loader/adapter unless exported through a compatible runtime wrapper.

Provenance

Internal run label: V12 path-grounding
Training hardware: RunPod GPU pod
Promoted checkpoint: 250
Full checkpoint archive: GestaltLabs/BusyBeaver-50M-v12-path-grounding-runpod
Training payload: DJLougen/busybeaver-training-payload-v12-path-grounding

Downloads last month: 168

Safetensors

Model size

49.4M params

Tensor type

F32