In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer. However, our results are mixed across pretraining tasks and show some concerning trends: In ELMo’s pretrain-then-freeze paradigm, random baselines are worryingly strong and results vary strikingly across target tasks. Our primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Abstract Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |