大家好 最近想试着撰写网页爬虫
想抓取网页的这部分资讯
尝试的结果为
# -*- coding: utf-8 -*-
from urllib2 import urlopen
import xml.etree.ElementTree as ET
from lxml import etree
import mechanize
import sys
url = "http://www.tham.com.tw/recipe6.php"
path = "//*[@id=\"left-inner\"]/div[2]/div[3]"
html = urlopen(url).read()
tree = etree.HTML(html)
startindex = 4
data = tree.xpath(path)
print data[0].text
Output:
>>> ================================ RESTART ================================
>>>
材料 2人份
>>>
看网页的原始码猜测是因为<br />阻挡了判断的缘故
请问这个有解吗??