SpacTor-T5
Pre-trained T5 model using a combination of span corruption (SC) and replacement tag detection (RTD).
CommonProductProgrammingNLPPre-trained model
SpacTor is a new training procedure that includes (1) a mixed objective combining span corruption (SC) and replacement tag detection (RTD), and (2) a two-stage curriculum that optimizes the mixed objective in the initial \tau iterations and then transitions to standard SC loss. Experiments on various NLP tasks, using the encoder-decoder architecture (T5), show that SpacTor-T5 achieves comparable downstream performance to standard SC pre-training while reducing the pre-training iterations by 50% and the total FLOPs by 40%. Additionally, under the same computational budget, we find that SpacTor can significantly improve downstream benchmark performance.
SpacTor-T5 Visit Over Time
Monthly Visits
20899836
Bounce Rate
46.04%
Page per Visit
5.2
Visit Duration
00:04:57