JD Retail's technology team proudly announces the successful launch of TimeHF, its first self-developed billion-scale sales forecasting time-series large model. This model leverages Reinforcement Learning from Human Feedback (RLHF), a first in the sales forecasting field, resulting in a significant accuracy improvement of over 10% and a marked reduction in demand-side prediction uncertainty. This achievement not only demonstrates exceptional performance in JD's internal automated replenishment scenarios for 20,000 products but also surpasses existing industry standards on multiple public datasets, setting a new benchmark in time-series forecasting.

JD's supply chain algorithm team discovered that traditional time-series forecasting methods, such as ARIMA, Prophet, and earlier deep learning models like LSTM and TCN, have significant shortcomings in capturing complex patterns and achieving zero-shot generalization. Existing time-series large models also face numerous challenges in dataset quality and RLHF implementation. To address these, the JD team innovated in three key areas: dataset construction, model design, and training methodology.

For dataset construction, the JD team integrated JD's internal sales time-series data, public datasets, and synthetic data. Through quality filtering, deduplication, diversity sorting, and data balancing, they built a large-scale, high-quality, complex dataset containing 1.5 billion samples. The scale and quality of this dataset are unprecedented in the time-series field, providing a solid foundation for model training.

微信截图_20250410085800.png

In model design, JD introduced the PCTLM (Patch Convolutional Timeseries Large Model). This model employs a patch-based approach, using a masked encoder architecture to model time series. It also incorporates a grouped attention mechanism with temporal positional encoding, effectively capturing inter-patch information and enhancing the model's ability to capture complex spatiotemporal relationships.

Regarding the training methodology, JD pioneered TPO (Timeseries Policy Optimization), a reinforcement learning framework specifically designed for pure time-series large models. This framework addresses the limitations of traditional RLHF frameworks in time-series scenarios by incorporating a probabilistic prediction component, designing an advantage function, and introducing a time-series loss, significantly improving the model's predictive performance.

Through these innovations, TimeHF achieved state-of-the-art (SOTA) results on multiple public datasets, demonstrating superior zero-shot performance and prediction accuracy compared to leading time-series deep learning methods and fine-tuned large models. Currently, this model is deployed in JD's supply chain system, providing automated replenishment predictions for 20,000 SKUs with significantly improved accuracy.

JD Retail Group's supply chain team will host an online sharing session on April 19th to provide a detailed explanation of TimeHF's technical details, including the construction of high-quality, diverse, large-scale time-series datasets and RLHF solutions for time-series large models. This achievement not only revolutionizes JD's supply chain management but also offers valuable technical references and practical examples for the entire industry.