自问自解
加了COOKIE就可以抓了
cj = http.cookiejar.MozillaCookieJar()opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
※ 引述《orafrank (法兰克 )》之铭言:
: 上次用python抓了三大法人每日买卖 CODE如下
: LINK http://www.twse.com.tw/ch/trading/fund/T86/T86.php
: 这次想说如法炮制 来抓
: LINK http://www.taifex.com.tw/chinese/3/3_1_2.asp
: 结果失败了
: 可以提点一下吗?
: 抓下来CSV内容如下
: <HTML>
: 3 <head><link rel="image_src" type="image/jpeg" href="http://www.taifex.com.tw/chinese/images/fb_logo.jpg" /><meta property="og:image" content="http://www.taifex.com.tw/chinese/images/
: 4 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
: 5 <META NAME="GENERATOR" Content="Microsoft Visual Studio 6.0">
: 6 </HEAD>
: 7 <BODY>
: 8
: 9 <script language="javascript">
: 10 alert('?亦访鞈欧?');
: 11 window.history.go(-1);
: 12 </script>
: 13
: 14 </BODY>
: 15 </HTML>
: 我的程式码
: import requests
: import time
: import os
: import sys
: #mydate = time.strftime("%Y/%m/%d")
: mydate = "2016/11/28"
: #mydate = "2016/10/03"
: #mydate2 = "105/10/03"
: if len(sys.argv) < 2: # argv=1 path
: print 'no argument'
: elif sys.argv[1].strip()!='':
: mydate = sys.argv[1]
: year = str(int(mydate[0:4])-1911)
: mydate2 = mydate.replace(mydate[0:4],year)
: print sys.argv
: print mydate2
: setting = os.getcwd() + "\\" + "path.ini"
: if os.path.exists(setting):
: with open(setting, 'r') as r:
: path = r.read()
: if not os.path.exists(path):
: os.makedirs(path)
: else:
: path = "/home/telepaq/frankh/fonepy/" #path have to be direct path
: headers = {"User-Agent":"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36", "Referer":"http://www.taifex.com.tw/chinese/3/3_1_2.asp"}
: url1 = "http://www.taifex.com.tw/chinese/3/3_1_2dl.asp"
: payload1 = {"DATA_DATE": mydate,
: "DATA_DATE1": mydate,
: "COMMODITY_ID": "ALL",
: #"sorting": "by_issue"}
: "his_year": "2015",
: "datestart": mydate,
: "dateend": mydate
: }
: res1 = requests.post(url1, headers=headers, data=payload1, stream=True)
: print payload1
: fName1 = path + mydate.replace("/","") + ".csv"
: print fName1
: with open(fName1, 'wb') as f1:
: for chunk in res1.iter_content(1024):
: f1.write(chunk)
: f = open("/home/telepaq/frankh/fonepy/log_future312.txt",'w')
: #f.write(payload1 + "\r\n")
: f.write(mydate + " mydate\r\n")
: f.write(mydate2 + " mydate2 \r\n")
: f.write(fName1 + " fName1 \r\n")
: print 'this is a test'