Instructions to use Ateeqq/food-analysis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Ateeqq/food-analysis with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Ateeqq/food-analysis") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Ateeqq/food-analysis", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Ateeqq/food-analysis with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Ateeqq/food-analysis" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ateeqq/food-analysis", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Ateeqq/food-analysis
- SGLang
How to use Ateeqq/food-analysis with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Ateeqq/food-analysis" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ateeqq/food-analysis", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Ateeqq/food-analysis" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Ateeqq/food-analysis", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Ateeqq/food-analysis with Docker Model Runner:
docker model run hf.co/Ateeqq/food-analysis
π½οΈ Food Analyzer
A model designed to analyze food images and generate structured nutritional information in JSON format.
It helps users instantly understand what they are eating by predicting calories, macronutrients, and meal composition directly from images.
Built on top of Qwen3-VL-2B-Instruct (4-bit) and fine-tuned using LoRA, this model is optimized for efficient food understanding.
π§ Main Capabilities
π Food Recognition
- Identifies dish name and food type (homemade, restaurant, etc.)
π₯ Calorie Prediction
- Estimates total calories per serving
π₯ Macronutrient Breakdown
- Protein (g)
- Carbohydrates (g)
- Fat (g)
π³ Cooking Method Detection
- Boiled, fried, grilled, baked, mixed, etc.
π Portion Estimation
- Approximates ingredient quantities
π Features
β
Accepts food images as input
β
Outputs clean structured JSON only
β
Detects dish name and cooking method
β
Estimates nutritional values (calories, macros)
π₯ Model Overview
| Property | Value |
|---|---|
| Base Model | Qwen3-VL-2B-Instruct |
| Finetuning Method | LoRA |
| Modality | Image + Text |
| Output | JSON |
| License | openrail |
π§ Example Output
{
"dish_name": "Vegetable Bowl",
"food_type": "Homemade food",
"cooking_method": "boiled and mixed",
"nutritional_summary": {
"calories_kcal": 500,
"protein_g": 20.0,
"carbohydrate_g": 70.0,
"fat_g": 15.0
},
"portion_size": {
"quinoa": 200,
"vegetables": 300,
"sauce": 50
}
}
βοΈ Usage
https://colab.research.google.com/drive/1iPQTY_5sM4OZCj1fCXi_bHb-Lt4DeCxv?usp=sharing
Install dependencies
! pip install -U bitsandbytes accelerate
! pip install -U transformers==4.57.0
! pip install peft pillow requests
! pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Python Inference Example
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
import requests
from io import BytesIO
# import gc
# torch.cuda.empty_cache()
# gc.collect()
base_model_name = "unsloth/Qwen3-VL-2B-Instruct-bnb-4bit"
print("Loading quantized base models...")
model = Qwen3VLForConditionalGeneration.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
low_cpu_mem_usage=True,
attn_implementation="sdpa", # Use "flash_attention_2" if available
)
processor = AutoProcessor.from_pretrained(
base_model_name,
trust_remote_code=True
)
print("Loading the saved LoRA adapter...")
model = PeftModel.from_pretrained(
model,
"Ateeqq/food-analysis",
)
print("LoRA adapter loaded successfully!")
model.eval()
user_prompt = """As a food-analyzer AI, analyze the image and return a single JSON object containing nutritional information.
Respond with JSON only. No extra text.
"""
image_url = "https://images.pexels.com/photos/1640777/pexels-photo-1640777.jpeg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert("RGB")
# Resize image to reduce memory usage (optional but helpful)
max_size = 1024
image.thumbnail((max_size, max_size), Image.Resampling.LANCZOS)
print("\nRunning inference on a new image...")
# Format messages
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": user_prompt}
]
}
]
# Process inputs
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt",
padding=True
).to(model.device)
# # Clear cache before generation
# torch.cuda.empty_cache()
# Generate output with memory optimizations
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
use_cache=True, # Enable KV cache
num_beams=1, # Use greedy decoding to save memory
# Add these memory-saving options:
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
)
# Decode output
decoded_output = processor.decode(outputs[0], skip_special_tokens=True)
print("\n\nFinetuned model's response:")
print(decoded_output)
π Citation
If you use this model, please cite:
@misc{food_analyzer_qwen3_vl,
author = {Muhammad Ateeq},
title = {Food Analyzer Vision-Language Model},
year = {2026},
base_model = {Qwen3-VL-2B-Instruct}
}
π€ Acknowledgements
- Qwen team for Qwen3-VL
- Hugging Face Transformers
- PEFT (LoRA) framework
Model tree for Ateeqq/food-analysis
Base model
Qwen/Qwen3-VL-2B-Instruct