基于多特征融合的视频点击率预测方法Video Clicks Through Rate Prediction Method Based on Multi-feature Fusion
李一野;邓浩江;
摘要(Abstract):
结合多个模型集成学习可以提升单模型预测算法的性能,本文提出一种基于多特征融合的视频点击率预测方法,将哈希降维的特征和GBDT组合特征进行拼接作为输入特征,采用随机梯度下降法对逻辑回归、因子分解机和场感知因子分解机的输出值进行线性加权的迭代调整。实验结果表明该算法的预测效果优于基于单模型算法,也优于基于套袋方法的随机森林算法和基于平均法的其他集成算法,可以提高视频点击率预测精度。
关键词(KeyWords): 点击率预测;多特征融合;集成学习
基金项目(Foundation): 先导专项课题:SEANET技术标准化研究与系统研制(编号:XDC02010701)
作者(Authors): 李一野;邓浩江;
参考文献(References):
- [1] He X,Pan J,Jin O,et al.Practical Lessons from Predicting Clicks on Ads at Facebook[C]//international workshop on data mining for online advertising,2014:1-9.
- [2] Rendle S.Factorization Machines[C]//international conference on data mining,2010:995-1000.
- [3] Juan Y,Zhuang Y,Chin W,et al.Field-aware Factorization Machines for CTR Prediction[C]//conference on recommender systems,2016:43-50.
- [4] Río S D,López V,Benítez J M,et al.On the use of MapReduce for imbalanced big data using Random Forest[J].Information Sciences An International Journal,2014,285(3):112-137.
- [5] 匡俊,唐卫红,陈雷慧,等.基于特征工程的视频点击率预测算法[J].华东师范大学学报(自然科学版),2018,2018(3):77-87.
- [6] Juan Y,Lefortier D,Chapelle O,et al.Field-aware Factorization Machines in a Real-world Online Advertising System[C]//the web conference,2017:680-688.
- [7] Miyoshi S,Uezu T,Okada M.Statistical Mechanics of Time Domain Ensemble Learning[J].Journal of the Physical Society of Japan,2006,75(8):2652-2674.
- [8] Berk R A.An Introduction to Ensemble Methods for Data Analysis[J].Sociological Methods & Research,2006,34(3):263-295.
- [9] Johnson R W.An Introduction to the Bootstrap[J].Teaching Statistics,2001,23(2):49-54.
- [10] Prasad A M,Iverson L R,Liaw A,et al.Newer Classification and Regression Tree Techniques:Bagging and Random Forests for Ecological Prediction[J].Ecosystems,2006,9(2):181-199.
- [11] 吴桂平,侯晓琴,王冰,周军,张艳,颜永红.基于艺人画像的歌曲点播量预测[J].网络新媒体技术,2017,6(03):20-26.
- [12] Schapire R E.A brief introduction to boosting[C]//international joint conference on artificial intelligence,1999:1401-1406.
- [13] Quinlan J R.Induction of Decision Trees[J].Machine Learning,1986,1(1):81-106.
- [14] Ye J,Chow J,Chen J,et al.Stochastic gradient boosted distributed decision trees[C]//conference on information and knowledge management,2009:2061-2064.
- [15] Davis J,Goadrich M.The relationship between Precision-Recall and ROC curves[C]//international conference on machine learning,2006:233-240.
- [16] 刘会河,徐维超,刘舜.基于SVM的降维方法在三类ROC分析中的应用[J].计算机与现代化,2016(7):49-54
- [17] Bradley A P.The use of the area under the ROC curve in the evaluation of machine learning algorithms[J].Pattern Recognition,1997,30(7):1145-1159.
- [18] Meng L,Shi J,Wang H,et al.SVM with improved grid search and its application to wind power prediction[C]//international conference on machine learning and cybernetics,2013:603-609.
- [19] Collobert R,Weston J,Bottou L,et al.Natural Language Processing (Almost) from Scratch[J].Journal of Machine Learning Research,2011,12(1):2493-2537.
- [20] 孙娅楠,林文斌.梯度下降法在机器学习中的应用[J].苏州科技大学学报(自然科学版),2018,35(02):26-31.