hanzlajavaid's picture

hanzlajavaid PRO

hanzla

·

AI & ML interests

Direct Preference Optimization, Supervised Finetuning, Stable Diffusion

Recent Activity

posted an update 3 days ago

Reinforcement learning can sometimes lead to emergent behavior through much simpler training setups compared to large scale pre-training. I explored this idea by running a small GRPO experiment on Qwen3.5 4B, and the results were pretty exciting. Hypothesis: improving visual mathematical reasoning may also improve the model’s ability to transcribe LaTeX from images. I wrote a short breakdown of the experiment here: https://hanzlajavaid.github.io/blog/grpo-experiment-exploring-emergent-properties/

updated a model 10 days ago

hanzla/Qwen3.5-4B-mathvista-GRPO

published a model 10 days ago

hanzla/Qwen3.5-4B-mathvista-GRPO

View all activity

Organizations

hanzla 's models 32

hanzla/Qwen3.5-4B-mathvista-GRPO

5B • Updated 10 days ago • 17

hanzla/Qwen3.5-4B-mathvista-GRPO-adapter

Updated 10 days ago • 12

hanzla/Llama-3.1-1b-finetuned

Updated Dec 5, 2025

hanzla/Qwen2-0.5B-SFT-summary

Updated Oct 7, 2025

hanzla/Qwen2-0.5B-GRPO-summary_test_v5

Updated Sep 30, 2025

hanzla/qwen25-0p5b-grpo-exp1

Text Generation • Updated Sep 28, 2025 • 3

hanzla/qwen25_0p5b_grpo_ds

Text Generation • Updated Sep 24, 2025 • 2

hanzla/Llama-3.1-1b-gsm8k-finetuned-labtest

Updated Apr 22, 2025

hanzla/Llama-3.1-1b-gsm8k-finetuned-new

Updated Apr 22, 2025

hanzla/Gemma3-1B-GRPO-summary_test_v4

Updated Apr 21, 2025

hanzla/Qwen2-0.5B-GRPO-summary_test_v4

Updated Apr 20, 2025

hanzla/Qwen2-0.5B-GRPO-summary_test_v3

Updated Apr 20, 2025

hanzla/Qwen2-0.5B-GRPO-summary_test_v2

Updated Apr 18, 2025

hanzla/Qwen2-0.5B-GRPO-summary_test

Updated Apr 17, 2025

hanzla/Qwen2-0.5B-GRPO-test

Updated Apr 17, 2025

hanzla/Falcon3-Mamba-R1-v0-4bit

Text Generation • 7B • Updated Mar 23, 2025 • 3

hanzla/Falcon3-Mamba-R1-v0

Text Generation • 7B • Updated Mar 22, 2025 • 8 • 11

hanzla/mamba-finetuned-deepspeed-s1-deepseek

Updated Mar 17, 2025

hanzla/mamba-finetuned-deepspeed-openthoughts-r1-scorebased

Updated Mar 17, 2025

hanzla/falcon3-mamba-finetuned-multigpu-openthoughts-r1-scorebased

Updated Mar 6, 2025

hanzla/mamba-finetuned-s1

Updated Mar 4, 2025

hanzla/mamba-finetuned-thinktoken

Updated Mar 4, 2025

hanzla/mamba_essay_classifier

Updated Dec 12, 2024

hanzla/bert-essay-classifier

Text Classification • 0.1B • Updated Dec 12, 2024 • 2

hanzla/Moondream-ocr-enhanced

Text Generation • 2B • Updated May 8, 2024 • 7 • 2

hanzla/gemma-2b-datascience-instruct-v5

Text Generation • Updated Mar 31, 2024 • 7

hanzla/gemma-2b-datascience-instruct-v4.5

Text Generation • Updated Mar 30, 2024 • 12 • 1

hanzla/gemma-2b-datascience-instruct-v4

Text Generation • Updated Mar 30, 2024 • 7

hanzla/gemma-2b-datascience-instruct-v3.5

Text Generation • Updated Mar 30, 2024 • 10

hanzla/gemma-2b-datascience-instruct-v3

Text Generation • 3B • Updated Mar 26, 2024 • 5