[问题] Pchome股票网站爬虫 s8607142004 PTT批踢踢实业坊

[问题] Pchome股票网站爬虫

楼主: s8607142004 (挖哩勒) 2021-12-08 22:13:33

各位版上大大好
小弟刚进到爬虫的世界
想尝试爬取Pchome股市的概念股清单
网址如下
https://pchome.megatime.com.tw/group/sto3
先附上程式码
import time
import requests
from bs4 import BeautifulSoup
header={'Referer':'http://pchome.megatime.com.tw/stock/sto3/',
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}
url = "https://pchome.megatime.com.tw/group/sto3"
r = requests.post(url,header)
r.encoding = 'UTF-8'
sp = BeautifulSoup(r.text, 'html5lib')
sp
在sto3 那个Document里面有看到需要的资料但爬出来的资料却只有下面几行
但爬出来只有看到下面几行
<html><head>
</head>
<body>
<form action="https://pchome.megatime.com.tw/group/sto3" id="submit_form"
method="post" name="submit_form">
<input name="is_check" type="hidden" value="1"/>
</form>
<script type="text/javascript">
document.getElementById('submit_form').submit();
</script>
</body></html>
有爬到之前的文章说是header设定不对
https://pttdigit.com/python/M.1485354796.A.810.html
但我header 照着这篇大大说的设定方法类比去设还是没办法成功
有另外尝试使用pyppeteer 但也是爬不出来
想请版上大神能指点迷津
感谢

作者: Woqeker (窝颗ker) 2021-12-10 02:42:00

第一则推文不是有说不能用requests吗

作者: blc (Anemos) 2021-12-10 20:30:00

Referer的意思是从哪个url来的不是填你要连的网址抱歉我搞错了把Referer最后的 / 去掉试试

楼主: s8607142004 (挖哩勒) 2021-12-13 18:07:00

最后是 headers = header 就成功了

继续阅读

[问题] tkinter.entryconfig无法使用循环输入MaJaeYun [问题] PYTHON问题newforte [问题]rebuild TensorFlow with the appropriapolytrade [问题] 请问如何在bash script启动pyenv虚拟环境chang0206 [问题] isChanged 是python的 keywords 还是方法njpp [问题] 柜买分点爬虫ccccccccc [资讯] 专属女生的Python入门课（Pyladies主办）stepfish [问题] 读取/修改档案内容指定区间文字m0911182606 [问题] 新手 list to string 的问题Moonmoon0827 [问题] numpy dimensionRasin