Re: [问题] 爬虫POST问题

楼主: celestialgod (天)   2021-04-07 13:44:04
※ 引述《ppp1987 (ppp)》之铭言:
: [问题类型]:
: 程式咨询(我想用R 做某件事情,但是我不知道要怎么用R 写出来)
: [软件熟悉度]:
: 入门(写过其他程式,只是对语法不熟悉)
: [问题叙述]:
: 想爬一个网站的资料 用python可以做出来 但是用R就发生问题
: google不到解决方式
: 求板上高手帮忙
: 谢谢
: [程式范例]:
: <python> 可以顺利跑出来
: import requests
: import pandas as pd
: import json
: url = "https://securev.jihsun.com.tw/JssFHCTradeNet/JSStockCR/
: StockRatingCR_P.aspx/GetData"
: headers = {'Content-Type': 'application/json; charset=UTF-8'}
: data = {'stockNo': '2330'}
: response = requests.post(url = url, data=json.dumps(data), headers=headers)
: <R>
: url = "https://securev.jihsun.com.tw/JssFHCTradeNet/JSStockCR/
: StockRatingCR_P.aspx/GetData"
: headers = c('Content-Type' = 'application/json; charset=UTF-8')
: data = '{"stockNo": "2330"}'
: get_data <- httr::POST(url = url,
: httr::add_headers(.headers=headers),
: body = data)
: # 会喷出下面的error
: # Error in curl::curl_fetch_memory(url, handle = handle) :
: # Maximum (10) redirects followed
: [环境]
: R version 4.0.4
: curl 4.3
: httr 1.4.2
: macbbok M1
: [关键字]:
你只要先开verbose
get_data <- POST(
url = url,
content_type("application/json"),
body = data,
verbose()
)
你会发现下面的讯息:
-> POST /JssFHCTradeNet/JSStockCR/StockRatingCR_P.aspx/GetData HTTP/1.1
-> Host: securev.jihsun.com.tw
-> User-Agent: libcurl/7.59.0 r-curl/3.3 httr/1.4.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Content-Length: 16
->
>> {"stockNo":2330}
<- HTTP/1.1 302 Found
<- Connection: close
<- Content-Length: 0
<- Content-Type: text/html; charset=utf-8
<- Location:
http://jsmarket.jihsun.com.tw/Marketnet/Error/Error.aspx?sys=08&support_id=41102190011827406
<-
-> GET /Marketnet/Error/Error.aspx?sys=08&support_id=41102190011827406
HTTP/1.1
-> Host: jsmarket.jihsun.com.tw
-> User-Agent: libcurl/7.59.0 r-curl/3.3 httr/1.4.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
->
<- HTTP/1.1 302 Found
<- Connection: close
<- Content-Length: 0
<- Content-Type: text/html; charset=utf-8
<- Location:
http://jsmarket.jihsun.com.tw/Marketnet/Error/Error.aspx?sys=09&support_id=41102190016500008
你可以把上面的Error网址点进去,就是一个错误讯息
所以就可以排除是redirection的问题了
不过补充一下,如果是redirection的话,解法如下:
get_data <- POST(
url = url,
content_type("application/json"),
body = data,
config(maxredirs=-1)
)
所以就很简单猜测一下,可能user agent不对
get_data <- POST(
url = url,
content_type("application/json"),
body = data,
user_agent("Chrome/89.0.4389.114"),
verbose()
)
这样就过了
-> POST /JssFHCTradeNet/JSStockCR/StockRatingCR_P.aspx/GetData HTTP/1.1
-> Host: securev.jihsun.com.tw
-> User-Agent: Chrome/89.0.4389.114
-> Accept-Encoding: gzip, deflate
-> Cookie: ASP.NET_SessionId=1wl2d0fpigsiiwxvoudb0jlw;
TS014ea3cc=01b12d6ecc001a4641027d81bf890dc86511b24c71a79e3cb594413c98562558ff8e91057a93a67298bc1020dfa0f573cd9c0bd7cd
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/json
-> Content-Length: 16
->
>> {"stockNo":2330}
<- HTTP/1.1 200 OK
<- Cache-Control: private, max-age=0
<- Content-Type: application/json; charset=utf-8
<- X-AspNet-Version: 4.0.30319
<- X-Powered-By: ASP.NET
<- Date: Wed, 07 Apr 2021 05:43:44 GMT
<- Content-Length: 13571
<-
以上
作者: ppp1987 (ppp)   2021-04-07 16:47:00
成功了 非常感谢!

Links booklink

Contact Us: admin [ a t ] ucptt.com