RWKV-6 Mixture of Experts
The largest model in the RWKV family, utilizing MoE technology to enhance efficiency.
CommonProductProgrammingMachine LearningMoE
Flock of Finches 37B-A11B v0.1 is the latest member of the RWKV family, representing an experimental model with 1.1 billion active parameters. Despite being trained on only 109 billion tokens, it performs comparably to the recently released Finch 14B model on common benchmark tests. This model employs an efficient sparse mixture of experts (MoE) approach, activating only a portion of parameters for any given token, thereby saving time and reducing computational resource usage during training and inference. Although this architectural choice incurs higher VRAM usage, from our perspective, it is highly beneficial to train and operate a model with greater capacity at a lower cost.