Atto: Extreme Intelligence Density Research

Atto is an exploration into the fundamental limits of Intelligence Density — how much knowledge and generative capability can be packed into a neural network with a strictly limited parameter budget.

This project focuses on the "sub-kiloparameter" and "low-kiloparameter" regime, training models to generate Shakespearean text with as few as 64 parameters.

The Atto Series

Model Parameters Context Weights Size (JSON) Val Loss
atto-64 64 3 1.8 KB 2.59
atto-128 128 7 3.5 KB 2.83
atto-256 256 8 6.0 KB 2.33
atto-512 512 16 11.8 KB 2.44
atto-1024 1,024 8 22.3 KB 2.11
atto-2048 2,048 24 44.3 KB 2.15
atto-4096 4,096 56 86.4 KB 2.40
atto-8192 8,192 28 172.7 KB 1.91
atto-16384 16,384 60 ~640 KB 2.11

Research Findings: Intelligence Density

  1. Architecture Matters: At the sub-1000 parameter scale, standard Transformers are highly inefficient due to the overhead of Attention and LayerNorm. Our custom Neural N-Gram (AttoLM) architecture ensures that every single parameter directly participates in character prediction.
  2. The Embedding Threshold: We found that moving from 8-dimensional to 16-dimensional embeddings (at 8,192 parameters) creates a significant jump in coherence, allowing the model to represent complex character relationships.
  3. Context vs. Width: In extremely small models, there is a sharp trade-off between the context window (memory) and embedding dimensionality (representation). Our 8,192 and 16,384 models prioritize a balance that favors realistic word formation.

Next Steps

This is just a first step in making intelligence very dense. By optimizing weight initialization, custom activation functions, and even more extreme parameter-tying, we believe it is possible to achieve "readable Shakespeare" with even fewer than 1,000 parameters.

Usage

Training

To train the base series, run:

python3 train_atto.py

Sampling

To evaluate all trained models:

python3 sample.py

The models are exported as dependency-free JSON files in the models/ directory, ready for client-side inference in a web browser.

Sample generations:


============================================================
  atto-8192  |  8192 params  |  embd=16  ctx=28  vocab=64
============================================================
  prompt="the":
    Math Laer axfourith tipht's gord me hour hace (remaat ond,
    I'll wore ser ar now pre's for word to styous the mall, stpoul folthis yow apt and be a

  prompt="to be":
     CPon. How gue. O- whut feathent. Thou the in ap bast.  gos A thing of be rith nosset?
    [Tiths that hintend kyele in younk hore;
    Gat sgees wis 

  prompt="Ham":
    . HaCleata,
    Wlotsef yow preerant fore thipe matte of iche in you?
    And spour, the tang offe herees welr then[foritr her veut arve id for houn w

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support