Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchenΒ 
posted an update about 21 hours ago
view post
Post
6253
Unsloth is now one of the top 10 most followed organizations on Hugging Face. πŸ€—πŸ¦₯

Thanks so much for all the support!
Our HF page:
unsloth
  • 4 replies
Β·
EnderchefΒ 
posted an update 3 days ago
view post
Post
8154
Hi, everyone!
Please follow, like, and support the work of
CompactAI-O
!
Spread the word!
  • 9 replies
Β·
CrowneliusΒ 
posted an update about 8 hours ago
view post
Post
981
My Huggingface journey has been a trip!
I wanted to take the time to thank each and every one of you for using my dataset and getting it to go as far as it did. Believe it or not, some neanderthal was and maybe still is trending on huggingface.

Not only did my dataset reach number one, my fine-tuned qwen3.5 model did as well. Top 10. Honestly, ain't much left to do here.

Y'all have given me the desire, no... the craving for more. I am absolutely obsessed with AI now. I want to tweak it... I want to take it apart, just to see what makes everything tick. I want to put it together like Frankenstein and his monster.

The only thing that's stopping this guy is compute. I don't mind spending every penny I have on this. I desperately want to drive AI forward, even just a little bit.

I never knew the clanker hater from a year ago would be saying this.

Thank you all from the bottom of my heart.

Looking forward to showing you what I'm cooking up next. @CompactAI is your only hint!
  • 1 reply
Β·
qgallouedecΒ 
posted an update 3 days ago
view post
Post
7678

TRL v1.3 ships day-one training support for Qwen 3.6 πŸš€

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling β€” just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
TonicΒ 
posted an update 1 day ago
view post
Post
1096
πŸ™‹πŸ»β€β™‚οΈ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! πŸš€
  • 1 reply
Β·
yuriyvnvΒ 
posted an update 2 days ago
view post
Post
1263
πŸ”Š Four Qwen3-ASR (0.6B and 1.7B) Fine-Tunes for Portuguese and Dutch.

Both the 1.7B and 0.6B variants of Alibaba's Qwen3-ASR, fine-tuned for European Portuguese and Dutch and bundled in a single collection.

πŸ”— Collection: https://huggingface.co/collections/yuriyvnv/qwen-asr-for-portuguese-and-dutch-17b-and-06b

Headline numbers β€” Common Voice 22 test, with the zero-shot baseline.
πŸ‡΅πŸ‡Ή Qwen3-ASR-1.7B-PT β€” 12.91% β†’ 8.50% WER (-34%)
πŸ‡΅πŸ‡Ή Qwen3-ASR-0.6B-PT β€” 18.26% β†’ 11.85% WER (-35%)
πŸ‡³πŸ‡± Qwen3-ASR-1.7B-NL β€” 6.68% β†’ 5.28% WER (-21%)
πŸ‡³πŸ‡± Qwen3-ASR-0.6B-NL β€” 12.46% β†’ 8.31% WER (-33%)

The 0.6B variants are the more interesting half of the release. They give up only a few WER points compared to the 1.7B at a third of the parameters β€” relevant for edge hardware, CPU inference, or anywhere keeping inference cost down. The Dutch 0.6B in particular lands at 8.3% WER on CV22, competitive with much larger systems.

The Dutch 1.7B started from a strong 6.7% zero-shot, so the absolute gain is smaller β€” Qwen already handles Dutch well, and the fine-tune mostly sharpens it on Common Voice's casing and punctuation conventions.

Training stuck close to Qwen's official SFT recipe (lr 2e-5, linear schedule, 2% warmup, bf16, gradient checkpointing on a single H100). The data is the differentiator: Common Voice 22 train + validation augmented with synthetic OpenAI-TTS speech, filtered by the WAVe multimodal embedding model that scores clips at the word level and drops the ones that don't align well with their transcripts.

πŸ“¦ Full pipeline β€” synthetic data generation, WAVe filtering, training scripts, evaluation protocol β€” is open-source:
github.com/yuriyvnv/TTS-Augmented-ASR
@hf-audio .
#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice
evalstateΒ 
posted an update 2 days ago
view post
Post
812
Hugging Face MCP Server v0.3.9
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Users with a bucket named mcp will get an additional list_files tool that returns the public URL of contained files. This is primarily intended for use with Gradio Spaces that need URLs as inputs.
mlabonneΒ 
posted an update 2 days ago
view post
Post
523
Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!
  • 1 reply
Β·
ManniX-ITAΒ 
posted an update 3 days ago
view post
Post
116
Two custom releases β€” both unusual takes on common problems, on a single RTX 3090 + a vast.ai pod.

πŸ”Ή ManniX-ITA/Qwen3.5-27B-Omnimerge-v2

3-source weight-space merge over Qwen3.5-27B combining OBIM-lite magnitude masking + DAREx rescaling + EMR election (sign from consensus, amplitude from max-abs across sources). GPU-accelerated, ~35Γ— over CPU.

Sources: Claude-4.6-Opus-distill (0.40), Esper3.1 code (0.35), Gemini-3.1-Pro-distill (0.25). density 0.53, DAREx q 0.75.

Q6_K vs best source:
β€’ GPQA Diamond: 53.03 β†’ 69.19 (+16.16 pp)
β€’ MBPP pass@1: 71.20 β†’ 74.60 (+3.40)
β€’ HumanEval pass@1: 76.22 β†’ 79.27 (+3.05)

vs Omnimerge v1 (vanilla DARE-TIES): +8.08 pp GPQA, +2.80 MBPP. Amplitude-from-max + sign-from-consensus is what unlocked the GPQA jump.

πŸ”Ή ManniX-ITA/gemma-4-A4B-98e-v3-it

Gemma 4 26B-A4B pruned 128 β†’ 98 experts/layer (-23.4% MoE capacity, -5.2B params), zero GPQA degradation.

GPQA Diamond:
β€’ 128e reference: 75.25%
β€’ 98e v3 (this): 75.25% β€” +0.00 pp despite -23.4% capacity, -5.2B params
β€’ 109e v3 (older): 71.72% β€” -3.53 pp

The win over 109e v3 came from changing the importance map: aggregate per-expert contribution across math/logic/code/science/creative via 128-token teacher-force, instead of GPQA-specific per-question top-16 (which overfitted). Result: more experts dropped, quality preserved.

Findings worth flagging:
β€’ Experts NOT topic-specialized β€” 28/32 overlap math/creative top-32.
β€’ Expert weight cosine β‰ˆ 0.05 max β†’ merging destroys the model. Dropping is the only viable structural compression here.
β€’ Contribution Gini β‰ˆ 0.38 β†’ ~75 experts/layer carry 80% of signal.

Eval: lm-eval gpqa_diamond_cot_zeroshot, llama-server --reasoning-format deepseek --reasoning-budget 8192, Gemma 4 official sampling. Feedback welcome.
akhiilllΒ 
posted an update 4 days ago
view post
Post
165
Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudicationβ€”the monthly β€œtool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 β†’ ~0.06 (real weight updates), completion length shrinks (~25 β†’ ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

πŸ”— Env (Space URL): akhiilll/claims-env
πŸ§ͺ Notebook: akhiilll/claims-env
πŸ“ Blog: docs/HF_MINI_BLOG.md in the Space