← Back
Langtrain
Preview
Langtrain
BENCHMARK
3x Faster Training
New RoPE + MLP kernels (no accuracy loss)
Training Throughput (tokens/s)
Langtrain
Standard
Train LLMs like Qwen3-4B on as little as 3.9GB VRAM
2.3x faster QK Rotary Embedding fused Triton kernel
Updated SwiGLU, GeGLU kernels with int64 indexing
50% less VRAM usage with no accuracy degradation
Efficiency
50%Less memory usage