At NeurIPS 2023, MQ-Det, a multimodal large model, made its debut, achieving a significant 7.8% improvement in object detection accuracy. MQ-Det's unique feature lies in its integration of textual descriptions and visual example queries, addressing issues of fine-grained information and category ambiguity. Its design includes a gated perception module and a vision-conditioned masked language prediction strategy, supporting multimodal queries. Experiments demonstrate that MQ-Det performs exceptionally well on the LVIS benchmark dataset, particularly enhancing GLIP accuracy by 7%. This model infuses new vitality into the field of multimodal object detection and holds broad application prospects.