[问题] request.get.text抓到资料不全

楼主: Federer5566 (费德勒5566)   2022-08-02 13:00:45
主要是要抓这网站里,每日的资料
https://branch.taipower.com.tw/Content/NoticeBlackout/bulletin.aspx?&SiteID=564732650546663520&MmmID=616371300020211533
全部的headers都用上了
requests.text抓到的
仍停在<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"......这标签
不晓得是缺了什么导致后面的html出不来
请版上高手赐教
程式码如下:
def GET_TAIPOWER_INFO(url):
hs = {'Cookie': '_ga=GA1.3.1703053287.1657282388;
_ga_M55J0R8SEB=GS1.1.1658290723.7.1.1658290760.0;
ASP.NET_SessionId=covedx45ofvik1vsgjnpryi5; WebLang=','User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/103.0.0.0 Safari/537.36','Accept':
'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','Accept-Encoding':
'gzip, deflate, br','Accept-Language':
'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7','Cache-Control':
'max-age=0','Connection': 'keep-alive','Host':
'branch.taipower.com.tw','sec-ch-ua':'".Not/A)Brand";v="99", "Google
Chrome";v="103",
"Chromium";v="103"','sec-ch-ua-mobile':'?0','sec-ch-ua-platform':
'Windows','Sec-Fetch-Dest': 'document','Sec-Fecth-Mode':
'navigate','Sec-Fetch-Site': 'none','Sec-Fetch-User':
'?1','Upgrade-Insecure-Requsets': '1'}
Params = {'SiteID': '564732650546663520','MmmID': '616371300020211533'}
re = requests.get(url,headers=hs,params=Params)
for i in range(10):
if str(re.status_code) == '200':
print('TaiPower page loaded.')
break
else:
time.sleep(1)
print(str(i))
print(str(re.text))
return('done')
a=GET_TAIPOWER_INFO('https://branch.taipower.com.tw/Content/NoticeBlackout/bulletin.aspx')
print(a)
作者: mikemike1021 (mike)   2022-08-03 00:13:00
我稍微整理一下 发现 'Accept':mage 少了 ' 在前面改成 'Accept':'mage ... 后就有抓到资料了
作者: hanfadacai (没有绰啦)   2022-08-03 19:41:00
m大方法正确

Links booklink

Contact Us: admin [ a t ] ucptt.com