Vary is an official code implementation for large-scale visual language models. It enhances model performance by expanding the visual vocabulary. The model boasts strong image understanding and language generation capabilities, applicable across multiple domains.