SHRAM — Sparse Hybrid Token Routed Attention Mixture

A research baseline implementing the SHRAM architecture from "An Examination of Sparse Attention for Long Context Purposes." No pretrained weights — pull the architecture from the Hub and instantiate a freshly initialised model from config. Every parameter is overridable at instantiation time via kwargs.

Important: trust_remote_code=True is required. It downloads the architecture source files from the Hub and imports them into your Python process. Review the source at smithblack-0/SHRAM before use. Those interested can also clone the git repository at https://github.com/smithblack-0/advanced-transformers-lib

Architecture

SHRAM replaces every standard attention layer with a hybrid layer H(x) = h_l(x) + h_s(x):

h_l — local sliding-window causal attention path.
h_s — MoSRAH sparse routed path. Each token selects K of L available expert heads via token-choice routing. Bottlenecked Ensemble Attention (BEA) is applied per head.

All other components follow the Llama 3 baseline (RMSNorm, SwiGLU FFN, RoPE).

Usage

This repository contains no pretrained weights. The intended workflow is: pull the architecture config from the Hub, instantiate a model with fresh random weights, then train it yourself.

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer

# Step 1: pull the architecture config from the Hub.
# AutoConfig.from_pretrained downloads config.json only — no weights are loaded.
# Override any parameter via kwargs.
config = AutoConfig.from_pretrained(
    "smithblack-0/SHRAM",
    trust_remote_code=True,
    num_hidden_layers=16,       # example override
    num_mosrah_heads=32,        # example override
)

# Step 2: instantiate with fresh random weights.
# from_config never loads a checkpoint — it always produces a randomly initialised model.
model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)

# Step 3: load the tokenizer.
tokenizer = AutoTokenizer.from_pretrained("smithblack-0/SHRAM")

After training your own checkpoint, save and reload it in the standard way:

model.save_pretrained("./my-checkpoint")
model = AutoModelForCausalLM.from_pretrained("./my-checkpoint", trust_remote_code=True)

Constructor Defaults

The values below are the defaults you get if you call AutoConfig.from_pretrained with no overrides. They are not the parameters of a pretrained model — this repository contains no weights. All values are overridable via kwargs.

Parameter	Default
`alpha`	1.0
`attention_dropout`	0.0
`beta`	32.0
`dtype`	None
`head_dim`	16
`hidden_size`	512
`inference_sequence_length`	1024
`intermediate_size`	1366
`local_rope_theta`	10000.0
`mosrah_rope_theta`	10000.0
`num_hidden_layers`	12
`num_mosrah_heads`	16
`num_selected_heads`	16
`num_sliding_window_heads`	16
`output_hidden_states`	False
`rms_norm_eps`	1e-05
`rope_mode`	main_sequence
`tie_word_embeddings`	False
`training_sequence_length`	1024
`use_cache`	True
`vocab_size`	50277
`window_size`	128

License

MIT. Clean-room synthesis informed by the reference paper. Tokenizer is GPT-NeoX (EleutherAI/gpt-neox-20b, Apache 2.0).

Downloads last month: 2,424