Re: [问题] 机票网站爬虫问题 caty1010 PTT批踢踢实业坊

Re: [问题] 机票网站爬虫问题

楼主: caty1010 (Lucas) 2018-05-27 21:50:56

原文恕删~
我刚使用python requests 来做实验
发现可以抓到你想要的资料~
想法是:
感觉呈现资料的页面
跟ajax的呼叫可能会存在cookie验证关系
所以就先get一下search页面 (html)
再去post ajax的url (json)
以下烂code 大大们请指教
以2018-06-28日期, TPE to OKA为例 (可恶想去...)
session = requests.Session()
index_url = 'https://www.ctrip.com.hk/flights/taipei-to-okinawa/tickets-tpe-oka/?flighttype=s&dcity=tpe&acity=oka&startdate=2018-06-28&class=y&quantity=1&searchboxarg=t'
index_content = session.post(index_url,verify=False)
header = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'https://www.ctrip.com.hk',
'referer': index_url
}
flight_api_url = 'https://www.ctrip.com.hk/flights/Ajax/SearchFlight'
param = {'context':'{"SearchNo":"1","FlightWay":"OW","SubChannel":"0","SearchToken":"1","Class":"Y","SegmentParameterList":[{"DCityCode":"TPE","ACityCode":"OKA","DDateString":"2018-06-28"}],"TravelerParameteList":[{"TravelerType":"ADT","TravelerCount":"1"},{"TravelerType":"CHD","TravelerCount":"0"},{"TravelerType":"INF","TravelerCount":"0"}]}'}
SearchResult = session.post(flight_api_url,verify=False,headers=header,data=param).json()
#没有意外的话 SearchResult 印出来就是你要的json了
我的环境是python 2.7
上面的search条件请你自己再加以修改
应该就可以开始爬其他天或是其他机场的资料了

作者: rs6000 (正义的胖虎) 2018-05-28 07:31:00

推

作者: TakiDog (多奇狗) 2018-05-28 10:30:00

结论selenium根本再搞自己!?

楼主: caty1010 (Lucas) 2018-05-28 12:26:00

只能说selenium 在这个case上比较不适合替selenium平反一下, 加注sleep后也可以抓到资料code: https://www.codepile.net/pile/p1LvV8lQ

作者: haru97724 (haruru) 2018-05-28 19:49:00

有~~我有试出来惹(人′∀`)♪大感谢~~我一开始用selenium是因为要拿他的cookie。结果好像也不用这么麻烦XDDD

继续阅读

[问题] 图中找长方型的强者请进g318 [问题] class执行结果yimean [问题] 机票网站爬虫问题haru97724 [问题] Django Run thread 的方法J02 [问题] 新手神经网络梯度询问roger5455858 [问题] debugschmitt [问题] 请教关于日期时间序列的问题choco1202 [问题] pyqt的视窗送字串到其他视窗。ides13 Re: [问题] 请问如何安装gmpy2Neisseria [问题] 请问pipe operator以及vectorizationclsmbstu