Instructions to use North-ML1/Wind-Edge-1.6-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use North-ML1/Wind-Edge-1.6-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="North-ML1/Wind-Edge-1.6-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("North-ML1/Wind-Edge-1.6-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use North-ML1/Wind-Edge-1.6-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "North-ML1/Wind-Edge-1.6-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/North-ML1/Wind-Edge-1.6-Instruct
- SGLang
How to use North-ML1/Wind-Edge-1.6-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "North-ML1/Wind-Edge-1.6-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "North-ML1/Wind-Edge-1.6-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use North-ML1/Wind-Edge-1.6-Instruct with Docker Model Runner:
docker model run hf.co/North-ML1/Wind-Edge-1.6-Instruct
Wind-Edge-1.6-Instruct
Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass.
This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models.
Recommended Usage
Use trust_remote_code=True; the custom loader re-applies tied weights from model.safetensors.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "arthu1/Wind-Edge-1.6-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Who are you?"}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.06,
eos_token_id=[
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|im_end|>"),
],
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
Suggested Settings
For chat:
enable_thinking=Falsetemperature=0.55-0.7top_p=0.85-0.92repetition_penalty=1.05-1.08max_new_tokens=128-512
For deterministic tests:
do_sample=Falserepetition_penalty=1.06- Keep prompts short and direct.
The bundled chat template injects a minimal default identity system message if no system message is supplied:
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.
Training Summary
- Source family: Qwen3-compatible Wind-Edge architecture
- Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights
- Final SFT:
- 12M tokens of no-thinking distillation SFT
- Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct
- Bad self-identity teacher rows filtered
- 6M-token system-template adaptation pass
- 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior
Quick Sanity Outputs
Expected behavior after the final polish:
hi-> short greeting as Wind-Edge-1.6Who are you?-> identifies as Wind-Edge-1.6, not humansort this list: [3, 1, 2]->[1, 2, 3]60 miles in 1.5 hours->40 mph
Limitations
Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work.
Citation
See wind_edge_1_6_paper.html in this repository for a short technical write-up of the build and tuning process.
- Downloads last month
- 101