[问题类型]:
意见调查(我对R 有个很棒的想法,想问问大家的意见)
[软件熟悉度]:
入门(写过其他程式,只是对语法不熟悉)
[问题叙述]:
小弟我最近想要抓YOUTUBE的人数与影片长短等等结构化资料来分析,
无奈用rvest只能够抓前30笔资料,我目的是想抓取所有的影片资料
有试着用RCurl来抓,但是编码的问题让我非常困扰,请求大大指点迷津
[程式范例]:
pew.ytb <- read_html('https://www.youtube.com/user/PewDiePie/videos') #读取
pewdiepie的影片
ytb.nodes <-
html_nodes(pew.ytb,"div.yt-lockup.clearfix.yt-lockup-video.yt-lockup-grid") #
截取影片观看人数与发布时间
[环境叙述]:
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950
[2] LC_CTYPE=Chinese (Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] XML_3.98-1.5 rvest_0.3.2 xml2_1.0.0 RCurl_1.95-4.8
[5] bitops_1.0-6
loaded via a namespace (and not attached):
[1] httr_1.2.1 selectr_0.3-0 magrittr_1.5 R6_2.2.0 tools_3.3.2
[6] curl_2.2 Rcpp_0.12.7 stringi_1.1.2 stringr_1.1.0
[关键字]:网络爬虫、youtube