iapp/thai_handwriting_dataset
Viewer • Updated • 13.6k • 1.26k • 19
A Thai Handwritten OCR model fine-tuned from Microsoft TrOCR for recognizing Thai handwritten text.
This model is developed to convert Thai handwritten images into text using the TrOCR architecture, which combines Vision Transformer (ViT) for image processing and Transformer Decoder for text generation.
This model can be used directly for converting Thai handwritten images into text. Suitable for:
Trained on iapp/thai_handwriting_dataset, which contains Thai handwritten images paired with their corresponding text labels.
Uses SentencePiece with Unigram algorithm instead of Dictionary-based Word Segmentation because:
Tokenizer Configuration:
| Parameter | Value |
|---|---|
| Epochs | 250 |
| Batch Size | 16 |
| Learning Rate | 1e-5 |
| Optimizer | AdamW |
| Training Regime | fp16 mixed precision |
| Metric | Value |
|---|---|
| CER (Character Error Rate) | 0.488% |
import editdistance
def calculate_cer(pred, label):
"""Character Error Rate (lower is better)"""
if len(label) == 0:
return 1.0 if len(pred) > 0 else 0.0
distance = editdistance.eval(pred, label)
return distance / len(label)
pip install transformers torch sentencepiece pillow
import torch
from PIL import Image
import sentencepiece as spm
from transformers import VisionEncoderDecoderModel, ViTImageProcessor
# Load model
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-handwritten')
image_processor = ViTImageProcessor.from_pretrained('microsoft/trocr-base-handwritten')
# Load Thai tokenizer
sp = spm.SentencePieceProcessor()
sp.Load('thai_sp_30000.model')
# Load trained weights
checkpoint = torch.load('best_model.pt', map_location='cpu')
model.decoder.resize_token_embeddings(sp.GetPieceSize())
model.load_state_dict(checkpoint['model_state_dict'], strict=False)
model.eval()
# Inference
image = Image.open('handwriting.jpg').convert('RGB')
pixel_values = image_processor(image, return_tensors='pt').pixel_values
with torch.no_grad():
generated_ids = model.generate(
pixel_values,
max_length=128,
num_beams=4,
)
# Decode
ids = generated_ids[0].tolist()
text = sp.DecodeIds(ids)
print(text)
Input Image
|
v
Vision Transformer (ViT) Encoder
|
v
Cross-Attention
|
v
Transformer Decoder
|
v
SentencePiece Tokenizer (Unigram)
|
v
Thai Text Output
@misc{thai-handwritten-trocr,
author = {Warit Sirikosityanggoon},
title = {Thai Handwritten OCR using TrOCR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://github.com/waritkan/Thai-Hand-Written-TrOCR-Webapp}}
}
Base model
microsoft/trocr-base-handwritten