『GPU』所有你要知道的 BERT 模型压缩方法,都在这里( 二 )


2、如果我们能拿出一个数字来记录我们真正关心的事情 , 那将会很棒 , 就像 F1 。
3、其中一些百分比是根据 BERT-Large 而不是 BERT-Base 衡量的 , 仅供参考 。
4、不同的压缩方法如何交互 , 是一个开放的研究问题 。
相关论文列表:
[1] Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning
[2] Are Sixteen Heads Really Better than One?
[3] Pruning a BERT-based Question Answering Model
[4] Reducing Transformer Depth on Demand with Structured Dropout
[5] Reweighted Proximal Pruning for Large-Scale Language Representation
[6] Structured Pruning of Large Language Models
[7] ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
[8] Extreme Language Model Compression with Optimal Subwords and Shared Projections
[9] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
[10] Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
[11] Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data
[12] Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation for Pretrained Models
[13] Patient Knowledge Distillation for BERT Model Compression
[14] TinyBERT: Distilling BERT for Natural Language Understanding
[15] MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer
[16] Q8BERT: Quantized 8Bit BERT
[17] Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT 雷锋网雷锋网雷锋网
【『GPU』所有你要知道的 BERT 模型压缩方法,都在这里】Via http://mitchgordon.me/machine/learning/2019/11/18/all-the-ways-to-compress-BERT.html


推荐阅读