[问题] Scrapy 蜘蛛程式无法爬超过3个网页 allen511081 PTT批踢踢实业坊

[问题] Scrapy 蜘蛛程式无法爬超过3个网页

楼主: allen511081 (蓝) 2014-12-04 13:04:49

小弟是python 新手，最近想要抓一个鸟会的DataBase，但这个DataBase的页面没有其他
连结，于是自己以程式产出连结，并丢给爬虫程式去爬，
程式可以正常执行，无奈程式只要爬超过三个网页，
抓下来的资料顺序就会出错，请教各位大大，我该如何解决??
下面附上程式码
import scrapy
import string
from scrapy.http import Request
from Birdtest.items import BirdItem
class BirdSpider(scrapy.Spider):
name = "bird"
allowed_domains = ["webdata.bird.org.tw"]
start_urls = ["http://webdata.bird.org.tw/contents.php?key=000001"]
def parse(self, response):
for sel in response.xpath('//tr/td[@colspan]'):
item = BirdItem()
item['desc'] = sel.xpath('text()').extract()
yield item
for i in xrange(2,5,1):
url="http://webdata.bird.org.tw/contents.php?key=" + str(i)
yield Request(url,self.parse)

作者: goldflower (金色小黄花) 2014-12-05 15:42:00

顺序出错的具体状况是@@?

楼主: allen511081 (蓝) 2014-12-05 19:24:00

例如：第一页有25笔资料，第二页有30笔资料，第三页有35笔资料，第4页有40笔资料，程式会先抓page1的10笔资料后，继续去page2抓几笔资料，依此类推，但是最后检查资料，资料却是一笔都不少的抓下来

继续阅读

Re: scrapy xpath extraction 以及其编码的问题panpass scrapy xpath extraction 以及其编码的问题stevec [问题] 变量范围Arim Re: [问题]如何让os.system执行多笔指令uranusjr [问题]如何让os.system执行多笔指令arnold0613 [问题] 如何将照片使用接口让user切割成方形sobonbon [问题] 安装gensim包出现问题OoShiunoO [问题] 请教区网开启和停用 ?Love1019 Re: [问题] Django POST部份资料呈现在redirect pagewalelile Re: [问题]Django Transaction error MacPerson