[问题] xpathSApply 问题 dan40418 PTT批踢踢实业坊

[问题] xpathSApply 问题

楼主: dan40418 (成登) 2014-11-09 20:35:47

[软件熟悉度]:
小弟大概使用R一个月，之前有些写过其他程式
[问题叙述]:
想利用R来抓网页进行一些图表分析
想利用xpathSApply来抓网页内容，但读不到网页内容
[程式范例]:
getdoc <- function(line){
start <- regexpr('www', line)[1]
end <- regexpr('html', line)[1]
if(start != -1 & end != -1){
url <- substr(line, start, end+3)
html <- htmlParse(getURL(url), encoding='UTF-8', useInternalNodes = T)
doc <- xpathSApply(html, "//div[@id='main-content']", xmlValue)
name <- strsplit(url, '/')[[1]][4]
write(doc, gsub('html', 'txt', name))
}
}
sapply(data, getdoc)
有上网survey过也try过许多方法一直try不出来，因此想请问各位神人问题出在哪?
doc一直读不出东西来
[关键字]:
xpathSApply

作者: Wush978 (拒看低质媒体) 2014-11-09 20:42:00

有reproducible example吗? 会让想帮忙的版友比较容易测

作者: psinqoo (é›¶åº¦ç©ºé–“) 2014-11-09 22:31:00

RCURL XML

作者: john5601 (HTC粉) 2014-11-20 02:13:00

" target="_blank" rel="nofollow">

试试我刚刚试了可以抓到

继续阅读

[问题] 预载套件wanson [问题] Rstudio读取路径为何？ntme Re: [问题] if elso loop不能接着执行？nh2 [问题] 请推荐R语言入门的中文书TrueBear Re: [问题] 统计向量次数ireullin [问题] Excel字串的处理qazzzzz [问题] 将资料框架的数值向量依区间分组milkdicky [问题] 统计向量次数ireullin [心得] Coursera course: R programmingnh2 Re: [问题] if elso loop不能接着执行？Edster