Bailing-TTS is a series of large-scale text-to-speech (TTS) models developed by Giant Network's AI Lab, focused on generating high-quality Chinese dialect voices. The model employs continuous semi-supervised learning and a specific Transformer architecture, effectively aligning text and speech markers through a multi-stage training process to achieve high-quality dialect speech synthesis. Bailing-TTS has demonstrated speech synthesis results that closely resemble natural human expression, holding significant relevance in the field of dialect speech synthesis.