Qwen3.6-27B — Claude Opus Reasoning Distilled
Qwen3.6-27B fine-tuned with ~14k Claude 4.6 Opus reasoning traces — structured, efficient thinking for coding, math, and analytical tasks.
🙏 This model was trained following the methodology and pipeline guide by Jackrong, adapted for Qwen3.6-27B and extended with additional datasets and quantization options.
📦 Looking for GGUF quantized versions (llama.cpp, Ollama)? → rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled-GGUF
🎯 Why This Model Exists
Qwen3.6-27B is one of the most capable open-weight 27B models ever released — it outperforms models 10× its size on coding benchmarks and rivals closed frontier models. But raw capability alone isn't enough.
The base model has a known weakness: verbose, repetitive reasoning loops on straightforward queries. It over-thinks simple tasks and produces unnecessarily long chains of thought that hurt inference speed and readability.
This fine-tune addresses that directly by distilling the structured, efficient reasoning style of Claude 4.6 Opus into Qwen3.6-27B. The goal is not to change what the model knows, but how it thinks:
- ✅ Structured
<think>...</think>before every response - ✅ Concise reasoning on simple tasks, deep analysis on hard ones
- ✅ Claude-style step-by-step decomposition
- ✅ Reduced redundant cognitive loops
- ✅ Preserved base model capabilities
🧠 Learned Reasoning Pattern
The model adopts a Claude-style structured reasoning scaffold:
<think>
Let me analyze this request carefully:
1. Identify the core objective of the problem.
2. Break the task into clearly defined subcomponents.
3. Evaluate constraints and edge cases.
4. Formulate a step-by-step solution plan.
5. Execute the reasoning sequentially and verify consistency.
</think>
[Final Answer]
📊 Base Model Benchmarks (Qwen3.6-27B)
Qwen3.6-27B is the base model. These are its official benchmark results — the fine-tune inherits this capability while improving reasoning structure.
Language & Coding
| Benchmark | Qwen3.5-27B | Qwen3.5-397B | Gemma4-31B | Claude 4.5 Opus | Qwen3.6-35B-A3B | Qwen3.6-27B |
|---|---|---|---|---|---|---|
| SWE-bench Verified | 75.0 | 76.2 | 52.0 | 80.9 | 73.4 | 77.2 |
| SWE-bench Pro | 51.2 | 50.9 | 35.7 | 57.1 | 49.5 | 53.5 |
| SWE-bench Multilingual | 69.3 | 69.3 | 51.7 | 77.5 | 67.2 | 71.3 |
| Terminal-Bench 2.0 | 41.6 | 52.5 | 42.9 | 59.3 | 51.5 | 59.3 |
| SkillsBench Avg | 27.2 | 30.0 | 23.6 | 45.3 | 28.7 | 48.2 |
| LiveCodeBench v6 | 80.7 | 83.6 | 80.0 | 84.8 | 80.4 | 83.9 |
Knowledge & Reasoning
| Benchmark | Qwen3.5-27B | Qwen3.5-397B | Gemma4-31B | Claude 4.5 Opus | Qwen3.6-35B-A3B | Qwen3.6-27B |
|---|---|---|---|---|---|---|
| MMLU-Pro | 86.1 | 87.8 | 85.2 | 89.5 | 85.2 | 86.2 |
| MMLU-Redux | 93.2 | 94.9 | 93.7 | 95.6 | 93.3 | 93.5 |
| GPQA Diamond | 85.5 | 88.4 | 84.3 | 87.0 | 86.0 | 87.8 |
| AIME 2026 | 92.6 | 93.3 | 89.2 | 95.1 | 92.7 | 94.1 |
| HMMT Feb 2026 | 84.3 | 87.9 | 77.2 | 85.3 | 83.6 | 84.3 |
| HLE | 24.3 | 28.7 | 19.5 | 30.8 | 21.4 | 24.0 |
Source: Qwen3.6-27B official release
Fine-tuned Model
| Metric | Value |
|---|---|
| Train Loss (final) | 0.305 |
| Training Duration | 4h 28min |
| Training Hardware | NVIDIA RTX PRO 6000 Blackwell 96GB |
| MMLU-Pro (smoke test) | coming soon |
🗺️ Training Pipeline
Base Model: Qwen/Qwen3.6-27B (27B dense, multimodal)
│
▼
4-bit quantized loading via Unsloth
│
▼
LoRA Rank-64 Adapter attached
(q_proj, k_proj, v_proj, o_proj,
gate_proj, up_proj, down_proj, out_proj)
│
▼
SFT — Response-Only Training
Masked on: "<|im_start|>assistant\n<think>"
Chat template: qwen3-thinking
│
▼
rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled
📚 Datasets
| Dataset | Samples | Purpose |
|---|---|---|
nohurry/Opus-4.6-Reasoning-3000x-filtered |
3,900 | Filtered high-quality Claude 4.6 Opus reasoning traces |
Roman1111111/claude-opus-4.6-10000x |
9,633 | Large-scale Claude 4.6 Opus distillation data |
Jackrong/Qwen3.5-reasoning-700x |
700 | Curated step-by-step reasoning, Qwen-specific |
Total: ~14,233 examples after normalization, deduplication and length filtering (max 8192 tokens).
⚙️ Training Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3.6-27B |
| Framework | Unsloth + TRL SFTTrainer |
| LoRA rank | 64 |
| LoRA alpha | 64 |
| Target modules | All attention + MLP projections |
| Load precision | 4-bit (training) |
| Export precision | 16-bit merged |
| Effective batch size | 36 (2 × 18 grad accum) |
| Learning rate | 2e-4 |
| LR scheduler | Linear |
| Epochs | 1 |
| Max sequence length | 8192 tokens |
| Optimizer | AdamW 8-bit |
| Supervision | Response-only (assistant turns only) |
| Chat template | qwen3-thinking |
💻 Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Implement a binary search tree in Python with insert and search methods."}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=0.6,
top_p=0.95,
top_k=20,
)
print(tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True))
vLLM
pip install vllm
vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
--port 8000 \
--max-model-len 8192 \
--reasoning-parser qwen3
⚡ Speculative Decoding with MTP (vLLM)
Qwen3.6 supports Multi-Token Prediction (MTP) for significantly faster inference. Community tests show ~90% acceptance rate on this fine-tuned model — higher than typical, thanks to the structured reasoning training. Generation throughput reaches 60+ tok/s with MTP enabled, compared to ~25 tok/s standard.
vllm serve rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
--port 8000 \
--max-model-len 8192 \
--reasoning-parser qwen3 \
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
SGLang
python -m sglang.launch_server \
--model-path rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled \
--port 8000 \
--reasoning-parser qwen3
Recommended Sampling Parameters
| Mode | temperature | top_p | top_k | presence_penalty |
|---|---|---|---|---|
| Thinking (general) | 1.0 | 0.95 | 20 | 0.0 |
| Thinking (coding) | 0.6 | 0.95 | 20 | 0.0 |
| Non-thinking | 0.7 | 0.80 | 20 | 1.5 |
⚠️ Limitations
- Text-only SFT: vision capabilities of the base model are not fine-tuned
- 1 epoch: trained for 1 epoch on ~14k samples
- Hallucination risk: autoregressive LLM — may produce incorrect facts
- Intended use: coding, math, offline analytical tasks, logic-heavy prompting
📖 Citation
@misc{rico03-qwen36-opus-reasoning,
title = {Qwen3.6-27B Claude Opus Reasoning Distilled},
author = {rico03},
year = {2026},
url = {https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled}
}
@misc{qwen3.6-27b,
title = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
author = {{Qwen Team}},
month = {April},
year = {2026},
url = {https://qwen.ai/blog?id=qwen3.6-27b}
}
🙏 Acknowledgements
- Jackrong — fine-tuning guide and pipeline this work is based on
- Unsloth — 2x faster fine-tuning with 70% less VRAM
- Qwen Team — for releasing Qwen3.6-27B under Apache 2.0
- All dataset contributors
Released for research and personal use. Not intended for production deployment without additional safety evaluation.
- Downloads last month
- 770
