Papers
arxiv:2605.09751

Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes

Published on May 10
Authors:

Abstract

Fixed minimal binary token codes can replace trainable input embedding tables in language models without sacrificing performance, demonstrating that learnable input tables are not essential for effective language modeling.

AI-generated summary

Trainable input embedding tables are a standard component of modern language models. We ask whether they are actually necessary at the input interface. For a vocabulary of size V, exact token identity requires only K=lceil log_2 Vrceil bits. We replace the usual trainable Vtimes d_{model} input embedding matrix with fixed minimal binary token codes and a zero-parameter lift to model width. In our main setting, V=65{,}536, so K=16, and tokens are represented by fixed 16-dimensional binary codes tiled to d_{model}=1024. We also evaluate a fully table-free variant in which codes are generated from token IDs on the fly and randomly recoded by an invertible affine transform over F_2^K. Across matched 32-layer decoder-only models trained on approximately 17B tokens and evaluated over three independent training seeds, fixed minimal codes achieve comparable held-out validation perplexity to a standard learned-input baseline while removing 67.1M trainable input parameters. The fixed-code runs have a lower mean validation perplexity in our experiments, 2.36 versus 2.44, but the observed gap is within the measured seed-to-seed variation of 4.8\%; we therefore interpret the result as evidence that the trainable input table is not necessary, rather than as a statistically resolved superiority claim. The table-free affine-recoded variant remains close at 2.39 despite a slightly shorter training run. These results show that, in this regime, a trainable input embedding table is not necessary for useful language modeling. The output projection remains standard and trainable.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.09751
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09751 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09751 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.