FlashLM v4-Large (Trained Longer)
Ternary (1.58-bit) language model with weights constrained to {-1, 0, +1}. Trained on 2x NVIDIA H200 GPUs.
Training Details
| Metric | Value |
|---|---|
| Architecture | FlashLM v4 "Bolt" |
| Parameters | 16.8M (ternary) |
| Hidden dim | 384 |
| Blocks | 8 |
| GLU hidden | 1024 |
| Seq length | 512 |
| Vocab size | 10,000 |
| Dataset | TinyStories (~474M tokens) |
| Tokens seen | 2.16B (~4.5 epochs) |
| Best val loss | 1.675 |
| Training time | ~1.5 hours total (2x H200) |
| Speed | ~600K tok/s |
50 500
0.1 2
0 100
Note: Some words may be missing due to the 10K vocabulary limitation. The model was trained on children's stories and works best with story-like prompts.
Model Card | v3 Demo | GitHub
Examples