MambaByte
An unlabeled selectional state space model
CommonProductOthersLanguage ModelUnlabeled
MambaByte is an unlabeled language model that learns directly from raw bytes, eliminating biases introduced by subword tokenization. While it operates on bytes, this results in significantly lengthened sequences, posing challenges for the extensibility of standard autoregressive Transformer models. We trained MambaByte autoregressively on byte sequences, representing a unlabeled adaptation of the Mamba state space model. Our experiments demonstrate that MambaByte exhibits higher computational efficiency compared to other byte-level models. We further observe that MambaByte performs comparably to or even surpasses the performance of state-of-the-art subword Transformers. Moreover, due to its linear length expansion, MambaByte achieves faster inference speeds compared to Transformers. Our findings validate the feasibility of MambaByte in achieving unlabeled language modeling.
MambaByte Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32