FlashLM v4-Large (Trained Longer)

Ternary (1.58-bit) language model with weights constrained to {-1, 0, +1}. Trained on 2x NVIDIA H200 GPUs.

Training Details

Metric Value
Architecture FlashLM v4 "Bolt"
Parameters 16.8M (ternary)
Hidden dim 384
Blocks 8
GLU hidden 1024
Seq length 512
Vocab size 10,000
Dataset TinyStories (~474M tokens)
Tokens seen 2.16B (~4.5 epochs)
Best val loss 1.675
Training time ~1.5 hours total (2x H200)
Speed ~600K tok/s
50 500
0.1 2
0 100

Note: Some words may be missing due to the 10K vocabulary limitation. The model was trained on children's stories and works best with story-like prompts.

Model Card | v3 Demo | GitHub

Examples