[问题] Beautifulsoup find_all 找不到符合标签 stanley2k PTT批踢踢实业坊

[问题] Beautifulsoup find_all 找不到符合标签

楼主: stanley2k (使单力) 2016-05-05 18:23:24

各位大大好：
小弟目前在学写python+beautifulsoup+lxml
目前有个练习是读取一份清单中的资料，来此资料再读取、辨断某个xml资料中是否有符合的tag存在：
比如xml中只有<centos>：
<centos>
<name>centos</name>
<version>7</version>
<download-url>http://ftp.ksu.edu.tw/pub/CentOS/7/isos/x86_64/CentOS-7-x86_64-DVD-1511.iso</download-url>
</centos>
并用下面的code读取xml档案后并尝试判断是否有找到或找不到对应的tag:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open(os.xml))
os = "fedora"
for item in soup.findAll( os ):
print item.tag,item,attrib
if item == "":
print "OS %s not exist in DB"
else:
print "OS %s exist in DB"
看起来并不会执行，如果 os = "centos"，就能抓到对应的资料，但 os = "fedora"不行
求教：
1，如何判断才是正确的方法？bs的网站说，findall在找不到tag时会return空字串，但我不太理解如何可以判断空字串？看起来用 == ""是不行的。
2，另外执行python时会有下列错误，这个如何解决？
/usr/local/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this:
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "html.parser")
markup_type=markup_type))
我有先爬过文，各种解决都不行，比如 BeautifulSoup(markup, "xml")
抱歉问的可能是很基本的问题，感谢指导：D

作者: yeh6 2016-05-05 19:22:00

不是就因为没有fedora这个标签吗空的时候是回传空阵列吧, 不是空字串而且应该是soup.findAll(os) == [] 不是item

作者: octantis (@.@) 2016-05-05 23:57:00

可以用len()来判断list是否为空出现Warning是因为你没有注明使用何种Parser，所以他默认使用内建的html.parser并跳出警告，但html.parser不支援xml，所以你需要安装lxml套件，才可以使用BeautifulSoup(markup, "lxml")或BeautifulSoup(markup, "xml")

继续阅读

[问题] 函数回传值WingedDragon [问题] selenium爬虫新手问题xyz6206a [问题] 矩阵数值写成bin档enjoyloli [问题] request 如何实现多重代理yf9000555 Re: [问题] os.sepuranusjr [闲聊] multiprocessing的thread数量shemale [问题] os.sepshemale Re: [问题] 用file open抓TXT开的问题uranusjr [问题] 重新index一个去除重复列的DataFramejimmy15923 Re: [问题] 用file open抓TXT开的问题doomleika