FlashLM v4-Large (Trained Longer)

Ternary (1.58-bit) language model with weights constrained to {-1, 0, +1}. Trained on 2x NVIDIA H200 GPUs.

Training Details

Metric	Value
Architecture	FlashLM v4 "Bolt"
Parameters	16.8M (ternary)
Hidden dim	384
Blocks	8
GLU hidden	1024
Seq length	512
Vocab size	10,000
Dataset	TinyStories (~474M tokens)
Tokens seen	2.16B (~4.5 epochs)
Best val loss	1.675
Training time	~1.5 hours total (2x H200)
Speed	~600K tok/s

Note: Some words may be missing due to the 10K vocabulary limitation. The model was trained on children's stories and works best with story-like prompts.

Model Card | v3 Demo | GitHub

Examples