-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.5M • • 1.39k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 34 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 50 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 21
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.39M • • 5.77k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 52.8k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.1k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.5M • • 1.39k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 34 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 50 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 21
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.39M • • 5.77k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 52.8k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.1k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
models 307
inference-optimization/MiniMax-M2.5.w8a8
229B • Updated
inference-optimization/MiniMax-M2.5.w4a16
34B • Updated • 111
inference-optimization/MiniMax-M2.5-BF16
Text Generation • 229B • Updated • 137
inference-optimization/DeepSeek-V4-Flash-FP8-NVFP4
163B • Updated
inference-optimization/DeepSeek-V4-Flash-bf16-NVFP4-FP8-BLOCK
Updated
inference-optimization/DeepSeek-V4-Flash-bf16
Updated
inference-optimization/ctest-Qwen3-8B-speculator.dflash
2B • Updated
inference-optimization/MiniMax-M2.5-NVFP4
130B • Updated • 285
inference-optimization/DeepSeek-V4-Flash-5layers-nvfp4moe
20B • Updated
inference-optimization/DeepSeek-V4-Flash-bf16-dequantized-5layers
Updated
datasets 13
inference-optimization/laguna-xs-ultrachat-responses
Preview • Updated • 20
inference-optimization/laguna-xs-ultrachat-conversations
Viewer • Updated • 205k • 21
inference-optimization/laguna-xs-magpie-300k-responses
Viewer • Updated • 300k • 27
inference-optimization/laguna-xs-magpie-300k-conversations
Viewer • Updated • 298k • 26
inference-optimization/Qwen3-8b-sharegpt-5k
Preview • Updated • 86
inference-optimization/speculators_benchmarks_tool_call
Viewer • Updated • 4.9k • 67
inference-optimization/speculators-qwen3-30b-a3b-instruct-2507
Preview • Updated • 33
inference-optimization/speculators-qwen3-30b-a3b-instruct
Preview • Updated • 28
inference-optimization/speculators-qwen3-32b-instruct
Preview • Updated • 40
inference-optimization/gpt-oss-20b-nan-hidden-states-repro
Updated • 59