楼主:
ntasop (kuli)
2017-11-17 21:52:09s860134大,网址和CODE如下,我发现只是一段时间没ACCESS网页
,抓网页就会正常,但多抓几次就会掉字。
https://shopee.tw/viviancloe
import requests
import json
if __name__ == "__main__":
headers = {
'Cookie':'SPC_IA=-1; SPC_EC=-;
SPC_F=b9xBLc7WroUphDkfgTLhUFTbZDQoNbTu;
REC_T_ID=6cf0de6c-a762-11e7-bf41-246e960f6a68;
SPC_T_ID="WAlev0L2X1AMTz1j56adnJD9mpCd0b4dT3kdd1BrRTZD27vuhGveETTogw0AQ1jvsKFZF2chyh4Ut7whluhOn/0MxeAPZwthaoAleA3JmC4=";
SPC_U=-; SPC_T_IV="KnibcL4buEmqFNtMuczz+w==";
__utma=88845529.924012273.1506942680.1508042090.1508042090.1;
__utmz=88845529.1508042090.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
_atrk_siteuid=SYQBwFEiz55k_Fw8; csrftoken=pX243QosnAg5tqNlVwAhllB30qL4418F;
__BWfp=c1509458278808x1715baa2b; SPC_SC_TK=; UYOMAPJWEMDGJ=; SPC_SC_UD=;
_ga=GA1.2.924012273.1506942680; _gid=GA1.2.1418713679.1510903500; _gat=1;
SPC_SI=lxbotjnjan1rp46ocb0pkcy8z1qwhc4g',
'Referer':'https://shopee.tw/viviancloe',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36',
'X-CSRFToken':'pX243QosnAg5tqNlVwAhllB30qL4418F'
}
jd = json.loads('{"shop_ids":[730057]}')
response = requests.post('https://shopee.tw/api/v1/shops/', json = jd,
headers = headers)
print(response.text)
※ 引述《ntasop (kuli)》之铭言:
: 使用requests post爬虫虾皮网站,chrome显示"place"字段的长度和python
: 抓的长度不同,python截掉剩1个字,"description"字段长度也不同,请教
: 大家该如何改善这问题,非常谢谢。(截掉都是中文字)
: python爬虫结果:(长度太长截掉一些)
: [......"description": "\ud83d\udc4b
: \ud83c\udf86\u6b61\u8fce\u5149\u81e8\u5154\u5bf6\ud83c\udf86
: \ud83d\udc4b\n\n\ud83d\ude4c\ud83d\ude4c\u6709\u8208\u8da3\u7684\u5546\u54c1\u6b61\u8fce\u5229\u7528\u804a\u804a\u8a62\u554f
: ~~\u8b1d", "place": "\u65b0",..
: chrome抓的结果:
: [...."description": "\ud83d\udc4b
: \ud83c\udf86\u6b61\u8fce\u5149\u81e8\u5154\u5bf6\ud83c\udf86
: \ud83d\udc4b\n\n\ud83d\ude4c\ud83d\ude4c\u6709\u8208\u8da3\u7684\u5546\u54c1\u6b61\u8fce\u5229\u7528\u804a\u804a\u8a62\u554f
: ~~\u8b1d\u8b1d\u5149\u81e8 \ud83d\udc07", "place": "\u65b0\u5317\u5e02\u4e09\u91cd\u5340",....]