Qari-OCR-0.4.0-VL-4B-Instruct

A vision-language model fine-tuned for OCR on Islamic books and Arabic manuscripts. Based on Qwen/Qwen3-VL-4B-Instruct, trained on 45,000 image-text pairs from the seemorg/books-ocr dataset.

Results

Model	CER ↓	WER ↓	BLEU ↑
Qari-OCR-0.4.0	0.1222	0.2562	68.41
Qwen/Qwen3-VL-4B-Instruct	0.4922	0.6966	34.61
Qwen/Qwen3-VL-8B-Instruct	0.6876	0.8954	23.89
NAMAA/Qari-0.2.2.1	0.6448	0.5126	21.97
MBZUAI/AIN	1.2843	1.2697	3.50

Usage

from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import torch

model_name = "NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": f"./{src}"},
            {"type": "text", "text":  "Free OCR."},
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
result = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]

Training

Base model: Qwen/Qwen3-VL-4B-Instruct
Dataset: seemorg/books-ocr
Training samples: 45,000 image-text pairs
Domain: Islamic books and Arabic religious texts

Limitations

Optimized for printed Islamic texts; performance may vary on modern Arabic fonts or handwritten text.
Requires reasonable image quality (300+ DPI recommended).
Arabic script only.

Citation

@misc{qari-ocr-0.4.0,
  author       = {NAMAA-Space},
  title        = {Qari-OCR-0.4.0-VL-4B-Instruct},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct}}
}

Downloads last month: -

Model tree for NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct

Base model

Qwen/Qwen3-VL-4B-Instruct

Adapter

(38)

this model

Dataset used to train NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct

Collection including NAMAA-Space/Qari-OCR-0.4.0-VL-4B-Instruct

Qari-OCR: A High-Accuracy Model for Arabic Optical Character

Collection

𝐵𝑢𝑖𝑙𝑡 𝑜𝑛 𝑡ℎ𝑒 𝑝𝑜𝑤𝑒𝑟𝑓𝑢𝑙 𝑄𝑤𝑒𝑛2 𝑉𝐿 2𝐵 𝑎𝑛𝑑 𝑓𝑖𝑛𝑒-𝑡𝑢𝑛𝑒𝑑 𝑜𝑛 𝑎𝑛 𝐴𝑟𝑎𝑏𝑖𝑐 𝑂𝐶𝑅 𝑑𝑎𝑡𝑎𝑠𝑒𝑡, 𝑄𝑎𝑟𝑖 𝑣0.1 𝑑𝑒 • 8 items • Updated Mar 2 • 17