[问题] Shiny上做crawler(爬虫)

楼主: x9060000456 (你好)   2017-09-05 23:24:41
- 问题: 各位大大好, 目前想在shiny上做爬虫,
但一直出现'Warning: Error in curl::curl_fetch_memory:
Bad URL, colon is first character', 感激不尽!
[问题类型]:
Shiny与Crawler做连结
[问题叙述]:
想在Shiny进行爬虫, 并进行文字云分析!
[程式范例]:
#Submit使用 https://forum.gamer.com.tw/C.php?bsn=23805&snA=564246&tnum=13 为例
suppressPackageStartupMessages({
# library(tcltk)
library(httr)
library(data.table)
library(stringr)
library(rvest)
require(jiebaR)
require(data.table)
# library(tidyverse)
library(text2vec)
library(stringr)
# library(iterators)
library(pbapply)
# library(doParallel)
library(class)
library(plyr)
library(DT)
library(wordcloud)
require(RColorBrewer)
library(reshape2)
library(tmcn)
library(parallel)
library(shiny)
library(curl)
})
ui <- shinyUI(
fluidPage(
# Application title
titlePanel("Word Cloud"),
tags$style(type="text/css",
".shiny-output-error { visibility: hidden; }",
".shiny-output-error:before { visibility: hidden; }"
),
sidebarLayout(
# Sidebar with a slider and selection inputs
sidebarPanel(
#######
textInput("scholarid",'google scholar profile link',value = ""),
actionButton("submit", "Submit"),
hr(),
sliderInput("freq",
"Minimum Frequency:",
min = 1, max = 50, value = 15),
sliderInput("max",
"Maximum Number of Words:",
min = 1, max = 300, value = 100)
),
# Show Word Cloud
mainPanel(
plotOutput("plot")
)
)
)
)
server <- shinyServer(function(input, output, session) {
# Define a reactive expression for the document term matrix
terms <- reactive({
# Change when the "update" button is pressed...
input$update
# ...but not for anything else
isolate({
withProgress({
setProgress(message = "Processing corpus...")
getTermMatrix(input$submit)
})
})
})
# Make the wordcloud drawing predictable during a session
wordcloud_rep <- repeatable(wordcloud)
output$plot <- renderPlot({
v <- terms()
wordcloud_rep(names(v), v, scale=c(8,1),
min.freq = input$freq, max.words=input$max,
colors=brewer.pal(8, "Dark2"))
})
})
# Using "memoise" to automatically cache the results
getTermMatrix <- function(f) {
cutter <- worker()
core <- detectCores() - 1
cl <- makeCluster(core)
clusterEvalQ(cl, library(magrittr))
clusterEvalQ(cl, library(httr))
clusterEvalQ(cl, library(rvest))
f <- as.character(f)
Contents <- f %>% GET(encoding = 'UTF-8') %>% content %>%
html_nodes(css = '.c-article__content div') %>% html_text()
DF_Data_list <- list()
for(i in 1:length(Contents)){
DF_Data_list[i] <- as.character(Contents[i])
}
text <- sapply(DF_Data_list, function(x) {
segment(x, cutter)})
Data_list_split.token <- itoken(text)
Data_list_split.vocab <- create_vocabulary(Data_list_split.token, ngram =
c(1L, 1L))
Data_list_split.vocab <- Data_list_split.vocab %>% data.frame %>% .[order(.
$ term_count, decreasing = T), -3]
v <- Data_list_split.vocab[, 2] %>% as.vector()
names(v) <- Data_list_split.vocab $ term
v
}
shinyApp(ui, server)
作者: andrew43 (讨厌有好心推文后删文者)   2017-09-06 00:52:00
看错误是说网址有误。你有针对错误讯息检查了吗?
楼主: x9060000456 (你好)   2017-09-06 11:00:00
回A大, 有喔! 因为在一般R执行可以爬到但放到shiny上就出问题惹
作者: Wush978 (拒看低质媒体)   2017-09-06 16:39:00
错误讯息不会骗人,网址有误你的shiny是shiny server还是一般使用者?
楼主: x9060000456 (你好)   2017-09-06 16:57:00
W大你好, 我的shiny是一般使用者.我还没放url Submit时, 就出现error

Links booklink

Contact Us: admin [ a t ] ucptt.com