Instructions to use hfl/vle-base-for-vqa with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hfl/vle-base-for-vqa with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hfl/vle-base-for-vqa", dtype="auto") - Notebooks
- Google Colab
- Kaggle
VLE (Visual-Language Encoder) is an image-text multimodal understanding model built on the pre-trained text and image encoders. It can be used for multimodal discriminative tasks such as visual question answering and image-text retrieval. Especially on the visual commonsense reasoning (VCR) task, which requires high-level language understanding and reasoning skills, VLE achieves significant improvements.
For more details see https://github.com/iflytek/VLE.
Online VLE demo on Visual Question Answering: https://huggingface.co/spaces/hfl/VQA_VLE_LLM
- Downloads last month
- 20
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support