Instructions to use bigcode/starcoder2-15b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bigcode/starcoder2-15b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bigcode/starcoder2-15b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder2-15b") model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder2-15b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bigcode/starcoder2-15b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bigcode/starcoder2-15b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bigcode/starcoder2-15b
- SGLang
How to use bigcode/starcoder2-15b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bigcode/starcoder2-15b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bigcode/starcoder2-15b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bigcode/starcoder2-15b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bigcode/starcoder2-15b with Docker Model Runner:
docker model run hf.co/bigcode/starcoder2-15b
problem while loading fintuned startcoder2 model with base model
I have finetuned bigcode/starcoder2-15b and generated adapter weights. Now while trying to load adapter model with base model I'm getting the following error.
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.modules_to_save.adapter_model.weight: copying a param with shape torch.Size([49156, 6144]) from checkpoint, the shape in current model is torch.Size([49152, 6144]).
size mismatch for base_model.model.lm_head.modules_to_save.adapter_model.weight: copying a param with shape torch.Size([49156, 6144]) from checkpoint, the shape in current model is torch.Size([49152, 6144]).
I have used https://github.com/OpenAccess-AI-Collective/axolotl/tree/main for training and i have put following params in .yml
lora_modules_to_save:
- embed_tokens
- lm_head
using the following code to load the model:
def load_model_and_tokenizer(adapter_path, bnb_config=None):
base_model = AutoModelForCausalLM.from_pretrained(
"bigcode/starcoder2-15b",
# load_in_4bit=True,
torch_dtype=torch.bfloat16,
# quantization_config=bnb_config,
device_map='auto'
)
tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder2-15b")
model = PeftModel.from_pretrained(
model=base_model,
model_id=adapter_path,
adapter_name="adapter_model",
torch_dtype=torch.bfloat16,
is_trainable= False)
tokenizer.pad_token = tokenizer.eos_token
model.eval()
return model, tokenizer