MambaByte

An unlabeled selectional state space model

CommonProductOthersLanguage ModelUnlabeled
MambaByte is an unlabeled language model that learns directly from raw bytes, eliminating biases introduced by subword tokenization. While it operates on bytes, this results in significantly lengthened sequences, posing challenges for the extensibility of standard autoregressive Transformer models. We trained MambaByte autoregressively on byte sequences, representing a unlabeled adaptation of the Mamba state space model. Our experiments demonstrate that MambaByte exhibits higher computational efficiency compared to other byte-level models. We further observe that MambaByte performs comparably to or even surpasses the performance of state-of-the-art subword Transformers. Moreover, due to its linear length expansion, MambaByte achieves faster inference speeds compared to Transformers. Our findings validate the feasibility of MambaByte in achieving unlabeled language modeling.
Visit

MambaByte Visit Over Time

Monthly Visits

17104189

Bounce Rate

44.67%

Page per Visit

5.5

Visit Duration

00:05:49

MambaByte Visit Trend

MambaByte Visit Geography

MambaByte Traffic Sources

MambaByte Alternatives