MASA is an advanced model for object matching in video frames, capable of handling multi-object tracking (MOT) in complex scenes. Unlike models relying on specific domain-labeled video datasets, MASA learns instance-level correspondences through the rich object segmentation of the Segment Anything Model (SAM). MASA features a general-purpose adapter that can be used with base segmentation or detection models, enabling zero-shot tracking capabilities and outstanding performance even in complex domains.