Honeybee is a local-enhancement predictor for multimodal language models. It enhances the performance of multimodal language models on various downstream tasks, such as natural language inference and visual question answering. The advantage of Honeybee lies in the introduction of a local perception mechanism, which can better model the dependencies between input samples, thereby strengthening the inference and question-answering abilities of the multimodal language model.