[问题] 请教Beautifulsoup撷取文字的问题 onlyAPU PTT批踢踢实业坊

[问题] 请教Beautifulsoup撷取文字的问题

楼主: onlyAPU (Nothing) 2022-07-01 17:03:51

各位好
我是程式小白，最近买了堂新手入门课程
尝试写了个PTT爬虫
并且只会print出有包含关键字的文章及连结
目前是可以执行，但是有以下图片的问题
想要只截取出网址的部分(图片红框部分)，却找不到办法
https://imgur.com/a/rYe0880
以下是程式码
import requests
from bs4 import BeautifulSoup
import time
#这边以上是基本配置
# today = time.strftime('%m/%d').lstrip('0')
url = 'https://www.ptt.cc/bbs/Steam/index.html'
keyword = '特'
articles = []
for x in range(10):
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html5lib')
paging = soup.find('div', 'btn-group
btn-group-paging').find_all('a')[1]['href']
rents = soup.find_all('div', 'r-ent')
for rent in rents:
title = rent.find('div', 'title').text.strip()
count = rent.find('div', 'nrec').text.strip()
date = rent.find('div', 'date').text.strip()
link = rent.find('a')
article = '%s %s %s %s' % (date, title, count, link)
try:
if keyword in title:
articles.append(article)
except:
if count == '爆':
articles.append(article)
url = 'https://www.ptt.cc' + paging
if len(articles) != 0:
for article in articles:
print(article)

作者: blc (Anemos) 2022-07-01 18:23:00

link['href']

作者: tzouandy2818 (Naked Bear) 2022-07-01 18:25:00

用一般字串处理的方式就好了吧

作者: lycantrope (阿宽) 2022-07-01 18:25:00

link.get("href", "err:no_href")rent.find传回None就会Error

楼主: onlyAPU (Nothing) 2022-07-02 15:46:00

谢谢，来研究一下f'的用法，有时候直接输出变量会错误

继续阅读

[问题] python requests.get Instagrams22233433 [情报]透过职业训练养成就业AI领域的真实力oepan [问题] 侦测讯息问题LaughPoint [问题] 想请问Selenium webdriver 出错的问题adsc88 [问题] string to variable?harold1018 [问题] 请问utf-8的问题earny [问题] Pytest测试程式与PEP8相容性出现错误新疑问jspnotjava [问题] 从colab绘制并下载特殊library之图片boggy [问题] powershell执行conda没有反应gene50814 [问题] google ocr api的辨识问题be00148