The MAMBA design transformer using a language modeling head on prime (linear layer with weights tied towards the input
This dedicate isn't going to belong to any branch on this repository, and should belong to a fork https://k2spiceshop.com/product/liquid-k2-on-paper-online/