Atto: Extreme Intelligence Density Research

Atto is an exploration into the fundamental limits of Intelligence Density — how much knowledge and generative capability can be packed into a neural network with a strictly limited parameter budget.

This project focuses on the "sub-kiloparameter" and "low-kiloparameter" regime, training models to generate Shakespearean text with as few as 64 parameters.

The Atto Series

Model	Parameters	Context	Weights Size (JSON)	Val Loss
atto-64	64	3	1.8 KB	2.59
atto-128	128	7	3.5 KB	2.83
atto-256	256	8	6.0 KB	2.33
atto-512	512	16	11.8 KB	2.44
atto-1024	1,024	8	22.3 KB	2.11
atto-2048	2,048	24	44.3 KB	2.15
atto-4096	4,096	56	86.4 KB	2.40
atto-8192	8,192	28	172.7 KB	1.91
atto-16384	16,384	60	~640 KB	2.11

Research Findings: Intelligence Density

Architecture Matters: At the sub-1000 parameter scale, standard Transformers are highly inefficient due to the overhead of Attention and LayerNorm. Our custom Neural N-Gram (AttoLM) architecture ensures that every single parameter directly participates in character prediction.
The Embedding Threshold: We found that moving from 8-dimensional to 16-dimensional embeddings (at 8,192 parameters) creates a significant jump in coherence, allowing the model to represent complex character relationships.
Context vs. Width: In extremely small models, there is a sharp trade-off between the context window (memory) and embedding dimensionality (representation). Our 8,192 and 16,384 models prioritize a balance that favors realistic word formation.

Next Steps

This is just a first step in making intelligence very dense. By optimizing weight initialization, custom activation functions, and even more extreme parameter-tying, we believe it is possible to achieve "readable Shakespeare" with even fewer than 1,000 parameters.

Usage

Training

To train the base series, run:

python3 train_atto.py

Sampling

To evaluate all trained models:

python3 sample.py

The models are exported as dependency-free JSON files in the models/ directory, ready for client-side inference in a web browser.

Sample generations:


============================================================
  atto-8192  |  8192 params  |  embd=16  ctx=28  vocab=64
============================================================
  prompt="the":
    Math Laer axfourith tipht's gord me hour hace (remaat ond,
    I'll wore ser ar now pre's for word to styous the mall, stpoul folthis yow apt and be a

  prompt="to be":
     CPon. How gue. O- whut feathent. Thou the in ap bast.  gos A thing of be rith nosset?
    [Tiths that hintend kyele in younk hore;
    Gat sgees wis 

  prompt="Ham":
    . HaCleata,
    Wlotsef yow preerant fore thipe matte of iche in you?
    And spour, the tang offe herees welr then[foritr her veut arve id for houn w

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support