clicked on the track in error but after 30secs had to hear more. its a great sound that just seems to gel. There is nothing better than music in its rawest form and ...
"blurb": "11 layers, int6 quant, zstd-22. Novel contribution: Efficient Partial Exclusive Self Attention (XSA, arXiv:2603.09078) applied to deepest 3 layers only. GQA-aware reshape avoids tensor ...
The `train_gpt.py` and `train_gpt_mlx.py` scripts are intended as good launching-off points for new participants, not SOTA configs. We'll accept PRs that tune, improve, or simplify these scripts ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results