Model Weights Comming Soon!

Using HDT

To use the pre-trained model for UL2, use the following snippet:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# See the `MDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-ED')
model_name = 'howey/HDT-ED'
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

For more details, please see our github repository: HDT

Model Details

The model, which has a context length of 8192 and is similar in size to BERT with approximately 110M parameters, was trained on standard UL2 task with a Transformer-based architecture using our proposed hierarchical attention. The training regimen comprised 72 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of 2.6 billion tokens.

For more details, please see our paper: HDT: Hierarchical Document Transformer.

Citation

Please cite our work using the bibtex below:

BibTeX:

@inproceedings{He2024COLM,
      title={HDT: Hierarchical Document Transformer},
      author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger},
      year={2024},
      booktitle={Conference on Language Modeling}
}

Model Card Contact

Haoyu (haoyu.he@uni-tuebingen.de)

Downloads last month: 14

Datasets used to train howey/HDT-ED

Collection including howey/HDT-ED

HDT

Collection

Data and model weights for our COLM' 24 paper, HDT: Hierarchical Document Transformer. Project page https://cli212.github.io/HDT/ • 6 items • Updated Jul 14, 2024 • 1

Papers for howey/HDT-ED

HDT: Hierarchical Document Transformer

Paper • 2407.08330 • Published Jul 11, 2024

UL2: Unifying Language Learning Paradigms

Paper • 2205.05131 • Published May 10, 2022 • 5