SpacTor-T5
Pre-trained T5 model using a combination of span corruption (SC) and replacement tag detection (RTD).
CommonProductProgrammingNLPPre-trained model
SpacTor is a new training procedure that includes (1) a mixed objective combining span corruption (SC) and replacement tag detection (RTD), and (2) a two-stage curriculum that optimizes the mixed objective in the initial \tau iterations and then transitions to standard SC loss. Experiments on various NLP tasks, using the encoder-decoder architecture (T5), show that SpacTor-T5 achieves comparable downstream performance to standard SC pre-training while reducing the pre-training iterations by 50% and the total FLOPs by 40%. Additionally, under the same computational budget, we find that SpacTor can significantly improve downstream benchmark performance.
SpacTor-T5 Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32