Qwen3.6-27B — Claude Opus Reasoning Distilled

Qwen3.6-27B fine-tuned with ~14k Claude 4.6 Opus reasoning traces — structured, efficient thinking for coding, math, and analytical tasks.

🙏 This model was trained following the methodology and pipeline guide by Jackrong, adapted for Qwen3.6-27B and extended with additional datasets and quantization options.

📦 Looking for GGUF quantized versions (llama.cpp, Ollama)?rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF


🎯 Why This Model Exists

Qwen3.6-27B is one of the most capable open-weight 27B models ever released — it outperforms models 10× its size on coding benchmarks and rivals closed frontier models. But raw capability alone isn't enough.

The base model has a known weakness: verbose, repetitive reasoning loops on straightforward queries. It over-thinks simple tasks and produces unnecessarily long chains of thought that hurt inference speed and readability.

This fine-tune addresses that directly by distilling the structured, efficient reasoning style of Claude 4.6 Opus into Qwen3.6-27B. The goal is not to change what the model knows, but how it thinks:

  • ✅ Structured <think>...</think> before every response
  • ✅ Concise reasoning on simple tasks, deep analysis on hard ones
  • ✅ Claude-style step-by-step decomposition
  • ✅ Reduced redundant cognitive loops
  • ✅ Preserved base model capabilities

🧠 Learned Reasoning Pattern

The model adopts a Claude-style structured reasoning scaffold:

<think>
Let me analyze this request carefully:

1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
</think>

[Final Answer]

📊 Base Model Benchmarks (Qwen3.6-27B)

Qwen3.6-27B is the base model. These are its official benchmark results — the fine-tune inherits this capability while improving reasoning structure.

Benchmark Results

Language & Coding

Benchmark Qwen3.5-27B Qwen3.5-397B Gemma4-31B Claude 4.5 Opus Qwen3.6-35B-A3B Qwen3.6-27B
SWE-bench Verified 75.0 76.2 52.0 80.9 73.4 77.2
SWE-bench Pro 51.2 50.9 35.7 57.1 49.5 53.5
SWE-bench Multilingual 69.3 69.3 51.7 77.5 67.2 71.3
Terminal-Bench 2.0 41.6 52.5 42.9 59.3 51.5 59.3
SkillsBench Avg 27.2 30.0 23.6 45.3 28.7 48.2
LiveCodeBench v6 80.7 83.6 80.0 84.8 80.4 83.9

Knowledge & Reasoning

Benchmark Qwen3.5-27B Qwen3.5-397B Gemma4-31B Claude 4.5 Opus Qwen3.6-35B-A3B Qwen3.6-27B
MMLU-Pro 86.1 87.8 85.2 89.5 85.2 86.2
MMLU-Redux 93.2 94.9 93.7 95.6 93.3 93.5
GPQA Diamond 85.5 88.4 84.3 87.0 86.0 87.8
AIME 2026 92.6 93.3 89.2 95.1 92.7 94.1
HMMT Feb 2026 84.3 87.9 77.2 85.3 83.6 84.3
HLE 24.3 28.7 19.5 30.8 21.4 24.0

Source: Qwen3.6-27B official release

Fine-tuned Model

Metric Value
Train Loss (final) 0.305
Training Duration 4h 28min
Training Hardware NVIDIA RTX PRO 6000 Blackwell 96GB
MMLU-Pro (smoke test) coming soon

🗺️ Training Pipeline

Base Model: Qwen/Qwen3.6-27B (27B dense, multimodal)
        │
        ▼
4-bit quantized loading via Unsloth
        │
        ▼
LoRA Rank-64 Adapter attached
(q_proj, k_proj, v_proj, o_proj,
 gate_proj, up_proj, down_proj, out_proj)
        │
        ▼
SFT — Response-Only Training
Masked on: "<|im_start|>assistant\n<think>"
Chat template: qwen3-thinking
        │
        ▼
rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled

📚 Datasets

Dataset Samples Purpose
nohurry/Opus-4.6-Reasoning-3000x-filtered 3,900 Filtered high-quality Claude 4.6 Opus reasoning traces
Roman1111111/claude-opus-4.6-10000x 9,633 Large-scale Claude 4.6 Opus distillation data
Jackrong/Qwen3.5-reasoning-700x 700 Curated step-by-step reasoning, Qwen-specific

Total: ~14,233 examples after normalization, deduplication and length filtering (max 8192 tokens).


⚙️ Training Configuration

Parameter Value
Base model Qwen/Qwen3.6-27B
Framework Unsloth + TRL SFTTrainer
LoRA rank 64
LoRA alpha 64
Target modules All attention + MLP projections
Load precision 4-bit (training)
Export precision 16-bit merged
Effective batch size 36 (2 × 18 grad accum)
Learning rate 2e-4
LR scheduler Linear
Epochs 1
Max sequence length 8192 tokens
Optimizer AdamW 8-bit
Supervision Response-only (assistant turns only)
Chat template qwen3-thinking

💻 Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Implement a binary search tree in Python with insert and search methods."}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=4096,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))

vLLM

pip install vllm

vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --max-model-len 8192 \
  --reasoning-parser qwen3

⚡ Speculative Decoding with MTP (vLLM)

Qwen3.6 supports Multi-Token Prediction (MTP) for significantly faster inference. Community tests show ~90% acceptance rate on this fine-tuned model — higher than typical, thanks to the structured reasoning training. Generation throughput reaches 60+ tok/s with MTP enabled, compared to ~25 tok/s standard.

vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --max-model-len 8192 \
  --reasoning-parser qwen3 \
  --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

SGLang

python -m sglang.launch_server \
  --model-path rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
  --port 8000 \
  --reasoning-parser qwen3

Recommended Sampling Parameters

Mode temperature top_p top_k presence_penalty
Thinking (general) 1.0 0.95 20 0.0
Thinking (coding) 0.6 0.95 20 0.0
Non-thinking 0.7 0.80 20 1.5

⚠️ Limitations

  • Text-only SFT: vision capabilities of the base model are not fine-tuned
  • 1 epoch: trained for 1 epoch on ~14k samples
  • Hallucination risk: autoregressive LLM — may produce incorrect facts
  • Intended use: coding, math, offline analytical tasks, logic-heavy prompting

📖 Citation

@misc{rico03-qwen36-opus-reasoning,
  title  = {Qwen3.6-27B Claude Opus Reasoning Distilled},
  author = {rico03},
  year   = {2026},
  url    = {https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled}
}

@misc{qwen3.6-27b,
  title  = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
  author = {{Qwen Team}},
  month  = {April},
  year   = {2026},
  url    = {https://qwen.ai/blog?id=qwen3.6-27b}
}

🙏 Acknowledgements

  • Jackrong — fine-tuning guide and pipeline this work is based on
  • Unsloth — 2x faster fine-tuning with 70% less VRAM
  • Qwen Team — for releasing Qwen3.6-27B under Apache 2.0
  • All dataset contributors

Released for research and personal use. Not intended for production deployment without additional safety evaluation.

Downloads last month
770
Safetensors
Model size
28B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled

Base model

Qwen/Qwen3.6-27B
Adapter
(14)
this model
Adapters
1 model
Merges
2 models

Datasets used to train rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled

Collection including rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled