DreamLLM is a learning framework that first achieves the synergistic effect of multi-modal large language models (LLMs) between multi-modal understanding and creation. It generates language and image posterior models by directly sampling in the original multi-modal space. This method avoids the inherent limitations and information loss of external feature extractors like CLIP, achieving a more comprehensive multi-modal understanding. DreamLLM also effectively learns all conditional, marginal, and joint multimodal distributions by modeling text and image content as well as unstructured layouts of raw cross-document content. Therefore, DreamLLM is the first MLLM capable of generating free-form cross-content. Comprehensive experiments demonstrate DreamLLM's excellent performance as a zero-shot multimodal generalist, fully leveraging the enhanced learning synergy.