H好菇凉666用万字长文聊一聊 Embedding 技术( 十 )
五、总结 针对当前热门的embedding技术 , 本文系统的总结了能处理各类型数据的embedding方法 , 如传统基于矩阵分解的方法(如SVD分解)、处理文本的embedding方法(如Word2vec、FastText等)以及处理图数据的embedding方法(如DeepWalk、GraphSAGE等) 。 在推荐系统中 , 针对于不同数据类型 , 可以灵活采用上述方法来实现对数据的抽象表示 。 如可以基于用户行为 , 构造item列表 , 采用基于文本的方法对item进行向量化;也可以通过构建user和item关系图 , 采用基于图的方法来对user和item进行向量化 。 在实际过程中 , 不同的向量化方法得到的embedding结果也会有较大差异 , 需要根据具体业务需求来选择相应的算法 。 如要挖掘用户与用户的同质性 , 可以尝试采用Node2vec;此外 , 如果需要结合物品或Item的side-info , 可以考虑GraphSAGE算法来对图中节点进行embedding 。 跟深度学习炼丹术一样 , 要熟练掌握各类embedding技术 , 需要根据具体应用场景不断试错积累经验 。 最后 , 要司庆了 , 祝我们“炼丹人”能快乐搬砖!
参考文献 Simon Funk. Netflix Update: Try This at Home. http://www.sifter.org/~simon/journal/20061211.html. 2006
Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
Pennington, Jeffrey, et al. "Glove: Global vectors for word representation." Conference on empirical methods in natural language processin. 2014.
Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018). Radford, Alec, et al. "Improving language understanding by generative pre-training." (2018): 12.
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Perozzi, Bryan, et al. "Deepwalk: Online learning of social representations." ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
Grover, Aditya, et al. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
Dong, Yuxiao, et al. "metapath2vec: Scalable representation learning for heterogeneous networks." Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.
Wilson L Taylor. 1953. cloze procedure: A new tool for measuring readability. Journalism Bulletin, 30(4):415–433.
Hammond, David K., Pierre Vandergheynst, and Rémi Gribonval. "Wavelets on graphs via spectral graph theory." Applied and Computational Harmonic Analysis 30.2 (2011): 129-150.
Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).
推荐阅读
- 京东|京东天猫年货节薅羊毛!9999元/6666元红包速领
- 天猫|最高6666元 可当现金用!天猫超级红包7日20点开抢
- 蔚来ET7|对标宝马5系 蔚来ET7最新售价公布:43.666万起售
- 爱去酒吧是不是“好女孩”?厦大女学者万字长文分析引热议
- 小鹏汽车|Q3交付25666台 夺得造车新势力季度销量冠军!小鹏发2021年最新财报
- 手机号“1**66666666”,287万起拍!
- 六安一“6666”手机号拍出3.18万元!背后的故事让人拍手称快!
- 陈天桥|Intel公布12代酷睿兼容DDR5内存:16GB套装、最高6666MHz
- 嫌作者写的太烂!黑客盗号改小说大纲,还码了两万字新剧情……网友:手把手教学
- 论文|男子写3万字论文证明家乡是“华夏第一县” 被专家认可
