← Back
Langtrain
Langtrain
Templates
Layout
Content
Details
Preview
Langtrain
Langtrain
BENCHMARK

3x Faster Training

New RoPE + MLP kernels (no accuracy loss)

Training Throughput (tokens/s)

Langtrain
Standard
1141
2.3x
1
1547
2.1x
2
1911
1.8x
4
1835
2.0x
8
1724
2.2x
16
1575
2.5x
32

Train LLMs like Qwen3-4B on as little as 3.9GB VRAM

2.3x faster QK Rotary Embedding fused Triton kernel

Updated SwiGLU, GeGLU kernels with int64 indexing

50% less VRAM usage with no accuracy degradation

Efficiency
50%Less memory usage