WARM is a solution that aligns large language models (LLMs) with human preferences using weighted average reward models (WARM). It first fine-tunes multiple reward models and then averages them in the weight space. Through weighted averaging, WARM improves efficiency compared to traditional prediction ensemble methods while enhancing reliability under distributional shift and preference inconsistency. Our experiments demonstrate that WARM outperforms traditional methods on summarization tasks, and using the best N and RL methods, WARM improves the overall quality and alignment of LLM predictions.