InseRF is a method for generating new 3D objects within NeRF-reconstructed 3D scenes using text prompts and 2D bounding boxes. It can create a new 3D object from user-provided text descriptions and a 2D bounding box in a reference viewpoint, then insert it into the scene. This method enables controllable and 3D-consistent object insertion without requiring explicit 3D information. Experiments on multiple 3D scenes have demonstrated the effectiveness of the InseRF method compared to existing approaches.