More than 5 years have passed since last update.

【Scrapy】エンドポイントだけクロールして巡回してくれない

Last updated at 2018-08-14Posted at 2016-12-09

Scrapy使って、エンドポイントだけクロールするのではなく、エンドポイントから順繰りできるようにしたいと思って実装しはじめたものの、はまったのでメモしておく。

このあたりの内容を参考にさせていただいて、スクレイピングを実装。
http://qiita.com/meltest/items/b445510f09d81276a420
http://qiita.com/checkpoint/items/0c8ad814c25e85bbcfa2#_reference-2f452f48c4e974829586
http://qiita.com/tamonoki/items/ce58ff209f8eae808162
http://web-tsukuru.com/570

状況

上記サイトの見よう見まねでスクレイピングルールを実装してみたが、なぜかエンドポイントしかクロールしてくれなかった。

    # スクレイピングルール設定
    rules = (
             # スクレイピングするURLのルールを指定
             Rule(LinkExtractor(deny=deny_list,unique=True), callback='parse'),
             # spiderがたどるURLを指定
             Rule(LinkExtractor(), follow=True)
            )

    def parse(self, response:

原因

Callbackで読んでいるfunction名(parse)に問題があったみたい。
もしかしたら下記に書いてあるのかな？英語読めん。
https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.parse

対応

function名を変更するだけでエンドポイントから順繰りスクレイピングしてくれるようになった。

    # スクレイピングルール設定
    rules = (
             # スクレイピングするURLのルールを指定
             Rule(LinkExtractor(deny=deny_list,unique=True), callback='downloadPic'),
             # spiderがたどるURLを指定
             Rule(LinkExtractor(), follow=True)
            )

    def downloadPic(self, response):

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up