In the field of video analysis, the persistence of objects is an important cue for humans to understand that objects still exist even when completely occluded. However, current object segmentation methods mainly focus on visible (modal) objects, lacking the capability to handle non-modal (visible + invisible) objects. To address this issue, researchers proposed a two-stage method based on diffusion priors, Diffusion-Vas, aimed at improving the effectiveness of non-modal segmentation and content completion in videos, enabling the tracking of specified targets within the video and using diffusion models to complete the occluded parts.