Re: [问题] 爬虫取得相对路径的图片 Hsins PTT批踢踢实业坊

Re: [问题] 爬虫取得相对路径的图片

楼主: Hsins (翔) 2021-12-14 16:57:11

※ 引述《sky094315 (monkeyo)》之铭言：
: 想请问一下各位大大
: 目前正在做一个网站爬虫，此网站会有图形验证码，而此验证码每次重新整理后都会改变
: (服务器端会产生乱数制作一组图片)，且只可取得一次。
: 请问有其他不使用selenium开启浏览器把图档抓下来的方法吗？
: 或是有什么关键字呢？
: 谢谢
: 参考资料：https://weirenxue.github.io/2021/07/04/python_selenium_captcha/
: 这边附上
: 参考网站：https://aaav2.hinet.net/A1/AuthScreen.jsp
你这参考网站没 cookies 进不去
所以我拿其他页面的内容示范下：
https://aaaservice.hinet.net/User/unipresidentConsole.jsp
https://aaacp.hinet.net/CP/index.html
这两个页面都有 Captcha, 透过 Chrome/Edge 的开发者工具可以检查：
https://i.imgur.com/V5x4d9u.png
其中的 Captcha 主要是透过向以下两个 URI 打 GET 获取
https://aaaservice.hinet.net/User/Captcha?rdn=1639470286847
https://aaacp.hinet.net/CP/Captcha?rdn=1639469984177
其中后面的 rdn 一脸就长得很像 timestamp
喂过去 https://www.epochconverter.com/ 检查下是含 milliseconds 的
所以事情就变得很简单了：
1. 打请求
2. 存图片
```python
import requests
from datetime import datetime
for _ in range(10):
current_timestamp = round(datetime.now().timestamp() * 1000)
image_url = f"https://aaacp.hinet.net/CP/Captcha?rdn={current_timestamp}"
image_data = requests.get(image_url).content
with open(f'./{current_timestamp}.jpg', 'wb') as handler:
handler.write(image_data)
```

作者: sky094315 2021-12-14 18:29:00

不好意思没有发现要cookie感谢您的回复，这样我有方向了

楼主: Hsins (翔) 2021-12-14 19:36:00

如果你是载来要训练的话没差，载来要识别然后登入的话，要处理一下 cookies

作者: sky094315 2021-12-14 20:04:00

好的，谢谢您的回复

继续阅读

[问题] 爬虫取得相对路径的图片sky094315 [教学] Instagram 发文 by Seleniumbrad0315 [问题] 推荐简单可以学习模仿的Packagectr1 [问题] Opencv 读取高分辨率Webcam时FPS很低ADDandy [问题] Pchome股票网站爬虫s8607142004 [问题] tkinter.entryconfig无法使用循环输入MaJaeYun [问题] PYTHON问题newforte [问题]rebuild TensorFlow with the appropriapolytrade [问题] 请问如何在bash script启动pyenv虚拟环境chang0206 [问题] isChanged 是python的 keywords 还是方法njpp