Instructions to use microsoft/phi-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/phi-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/phi-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/phi-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/phi-2
- SGLang
How to use microsoft/phi-2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/phi-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/phi-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/phi-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/phi-2 with Docker Model Runner:
docker model run hf.co/microsoft/phi-2
in fine tuning the model begin with zero loss.
`
import peft
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model
lora_config = LoraConfig(
r=32,
lora_alpha=16,
target_modules=[
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',
],
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
import transformers
from transformers import TrainingArguments
import torch
HAS_BFLOAT16 = torch.cuda.is_bf16_supported()
training_args = TrainingArguments(
output_dir= "phib",
max_steps = 100,
per_device_train_batch_size= 1,
gradient_accumulation_steps= 4,
optim="paged_adamw_32bit",
warmup_steps = 10,
logging_steps = 1,
logging_strategy="steps",
learning_rate = 2e-4,
fp16 = not HAS_BFLOAT16,
bf16 = HAS_BFLOAT16,
weight_decay = 0.01,
lr_scheduler_type = "linear",
group_by_length= True,
#disable_tqdm=False,
report_to="none",
seed = 3407,
)
`
check the lose
Step Training Loss 1 0.000000 2 0.000000 3 0.000000 4 0.000000 5 0.000000 6 0.000000 7 0.000000
Got the same issue on similar settings
Could you please try with microsoft/phi-1_5 and report if you are seing the same issue?
Can't try that right now, it looks like this rev "refs/pr/23" is working. The lora total number of trainable parameters are somehow 2 time higher as previous while conserving the same setting. I am wondering if this is supposed to be so (refs/pr/23 vs latest(Jan 16)) .
Could you please re-run with the latest update?
We updated the modeling_phi.py file and disabled the auto-casting on the Attention layer. This is the same fix as the previous code had.
No problems! Please let me know if you see anything else.