Coding without MoEs
Collection
some slower than others • 104 items • Updated
Brainwaves (Instruct)
arc arc/e boolq hswag obkqa piqa wino
bf16 0.683,0.858,0.910,0.797,0.494,0.820,0.755
mxfp8 0.695,0.869,0.910,0.791,0.504,0.824,0.760
qx64-hi 0.688,0.859,0.903
Quant Perplexity Peak Memory Tokens/sec
mxfp8 4.006 ± 0.026 34.74 GB 187
qx64-hi 4.098 ± 0.027 25.64 GB 208
arc arc/e boolq hswag obkqa piqa wino
DavidAU/Qwen3.5-27B-Claude-4.6-OS-INSTRUCT
mxfp8 0.675,0.827,0.900,0.750,0.496,0.800,0.721
qx86-hi 0.667,0.824,0.902,0.752,0.502,0.791,0.725
Qwen3.6-27B-Instruct
qx86-hi 0.637,0.798,0.911,0.775,0.442,0.807,0.737
Qwen3.6-27B-Qwopus-GLM-Instruct
arc arc/e boolq hswag obkqa piqa wino
qx86-hi 0.656,0.826,0.910,0.776,0.474,0.812,0.739
qx64-hi 0.662,0.827,0.904
Quant Perplexity Peak Memory Tokens/sec
qx86-hi 4.184 ± 0.027 32.36 GB 208
qx64-hi 4.184 ± 0.028 25.64 GB 216
models:
- model: Qwen3.6-27B
parameters:
weight: 1.4
- model: DavidAU/Qwen3.5-27B-Claude-4.6-OS-INSTRUCT
parameters:
weight: 0.6
merge_method: nuslerp
dtype: bfloat16
name: Qwen3.6-27B-Claude-4.6-OS
-G
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3.6-27B-Claude-4.6-OS-bf16")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
Quantized
Base model
Qwen/Qwen3.6-27B