VSP-LLM is a framework that combines Visual Speech Processing (VSP) with Large Language Models (LLMs), designed to maximize the capability of contextual modeling by leveraging the powerful abilities of LLMs. VSP-LLM is engineered for multitasking, performing visual speech recognition and translation tasks. It maps input videos to the LLM's input latent space through an unsupervised visual speech model. The framework efficiently trains by proposing a novel deduplication method and Low-Rank Adaptation (LoRA).