Instructions to use thejaminator/grpo-feature-vector-step-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use thejaminator/grpo-feature-vector-step-1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("thejaminator/gemma-introspection-20250821-merged") model = PeftModel.from_pretrained(base_model, "thejaminator/grpo-feature-vector-step-1") - Notebooks
- Google Colab
- Kaggle
thejaminator/grpo-feature-vector-step-1
This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.
Training Details
- Base model: google/gemma-2-9b-it
- Framework: verl GRPO
- Training steps: 1
- Dataset: Math reasoning problems
- Batch size: 8
- Learning rate: 5e-05
- LoRA rank: 64
- LoRA alpha: 128.0
- Number of generations: 16
Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support