机器之心|从word2vec开始,说下GPT庞大的家族系谱( 十 )
[4] Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin (2017). Attention Is All You NeedCoRR, abs/1706.03762.
[5] Zihang Dai and Zhilin Yang and Yiming Yang and Jaime G. Carbonell and Quoc V. Le and Ruslan Salakhutdinov (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length ContextCoRR, abs/1901.02860.
[6] P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer. Generating wikipedia by summarizing long sequences. ICLR, 2018.
[7] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
[8] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
[9] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, & Dario Amodei. (2020). Language Models are Few-Shot Learners.
[10]Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingCoRR, abs/1810.04805.
[11]Zhilin Yang and Zihang Dai and Yiming Yang and Jaime G. Carbonell and Ruslan Salakhutdinov and Quoc V. Le (2019). XLNet: Generalized Autoregressive Pretraining for Language UnderstandingCoRR, abs/1906.08237.
[12] attention 机制及 self-attention(transformer). Accessed at: https://blog.csdn.net/Enjoy_endless/article/details/88679989
[13] Attention 机制详解(一)——Seq2Seq 中的 Attention. Accessed at: https://zhuanlan.zhihu.com/p/47063917
[14]一文看懂 Attention(本质原理 + 3 大优点 + 5 大类型.Accessed at:https://medium.com/@pkqiang49/%E4%B8%80%E6%96%87%E7%9C%8B%E6%87%82-attention-%E6%9C%AC%E8%B4%A8%E5%8E%9F%E7%90%86-3%E5%A4%A7%E4%BC%98%E7%82%B9-5%E5%A4%A7%E7%B1%BB%E5%9E%8B-e4fbe4b6d030
[15]The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).Accessed at:http://jalammar.github.io/illustrated-bert/
[16] Yang, Zhilin, et al. "Xlnet: Generalized autoregressive pretraining for language understanding." Advances in neural information processing systems. 2019.
[17] Dai, Zihang, et al. "Transformer-xl: Attentive language models beyond a fixed-length context." arXiv preprint arXiv:1901.02860 (2019).
[18] NLP——GPT 对比 GPT-2. Accessed at: https://zhuanlan.zhihu.com/p/96791725
[19] 深度学习:前沿技术 - GPT 1 & 2. Accessed at: http://www.bdpt.net/cn/2019/10/08/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%EF%BC%9A%E5%89%8D%E6%B2%BF%E6%8A%80%E6%9C%AF-gpt-1-2/
推荐阅读
- Live800|智能客服机器人是否可替代人工客服?解决机器人智能化痛点,Live800智能套电机器人有这些优势:解决机器人营销痛点,Live800智能套电机器人有这些
- 山西君和文悦|工业机器人发展史
- 避障|解决扫地机器人的避障难题 石头 T7 Pro 是怎样做到的?
- |加速冲刺世界级技术领先地位,大族机器人获1.65亿元A轮融资
- 投资|AI机器视觉技术及产品研发商“感图科技”完成A轮融资,熠美投资领投
- 智能机器人|碳纤维牙叉有哪些使用优势?
- 机器人|智能机器人厂商大族机器人完成了1.65亿元A轮融资
- 机器人|加速冲刺“世界级”技术领先地位,大族机器人获1.65亿元A轮融资
- 智能机器人|工业制造业的未来——一批具有影响力的新技术
- 融资并购,智能机器人|专注运动规划与3D视觉领域,如本科技完成近千万美元A轮融资
