thejaminator
/

grpo-feature-vector-step-1

Model card Files Files and versions

thejaminator/grpo-feature-vector-step-1

This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.

Training Details

Base model: google/gemma-2-9b-it
Framework: verl GRPO
Training steps: 1
Dataset: Math reasoning problems
Batch size: 8
Learning rate: 5e-05
LoRA rank: 64
LoRA alpha: 128.0
Number of generations: 16

Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thejaminator/grpo-feature-vector-step-1

Base model

google/gemma-2-9b

Finetuned

google/gemma-2-9b-it

Adapter

(457)

this model