The NExT++ Lab at the National University of Singapore has collaborated with the Liu Zhiyuan team from Tsinghua University to create a multi-modal large model that integrates a detection and segmentation module, making image matting simpler. By describing requirements in natural language, the model can quickly label the objects to be found and provide textual explanations. This model has demonstrated excellent performance in experiments across multiple task datasets, with strong capabilities in referring segmentation and REC tasks. Additionally, the model introduces a location modeling method based on embeddings, offering enhanced location modeling abilities. Through the optimization of the training process, the model also achieves good performance in segmentation tasks with scarce annotations.