OK-Robot is an open and modular framework that enables zero-shot item transportation tasks based on language instructions in any home environment. The framework adopts a modular design, utilizing 3D VoxelMap for open vocabulary navigation, AnyGrasp and LangSam for open vocabulary grasping, and a placement primitive for item placement. The framework does not require pre-training and can achieve zero-shot generalization of language instructions.