More than 5 years have passed since last update.

Scrapyで2種類のアイテムを別々のファイルに出力する

Last updated at 2019-06-03Posted at 2019-06-03

前提の環境

Python 3.7.0
Scrapy 1.6.0

動機

Scrapyでスクレイピングして，同じページから種類の違う情報を集めて別のCSVに保存したい．

要点

Exporterを複数用意してそれぞれにアイテムを保存する．

例

my_process.py

import scrapy
from scrapy.exporters import CsvItemExporter
from my_project.items import MyItemA
from my_project.items import MyItemB

class MyProcessSpider(scrapy.Spider):
    name = 'my_process'
    allowed_domains = ['qiita.com']
    start_urls = ['http://qiita.com']

    def __init__(self, *args, **kwargs):
        super(MyProcessSpider, self).__init__(*args, **kwargs)
        self.file_a = open('file_a.csv', 'w+b')
        self.file_b = open('file_b.csv', 'w+b')
        self.exporter_a = CsvItemExporter(self.file_a)
        self.exporter_b = CsvItemExporter(self.file_b)
        self.exporter_a.start_exporting()
        self.exporter_b.start_exporting()
        ## 他に初期化の処理が必要ならここに書く

    def parse(self, response):
        ## 情報収集するURLのリストがurlsに出来ているとする
        for url in urls:
            yield scrapy.Request(url, callback = self.parse_item)

    def parese_item(self, response)
        item_a = MyItemA()
        item_b = MyItemB()
        ##
        ## item_aとitem_bにデータを格納
        ##
        self.exporter_a.export_item(item_a)
        self.exporter_b.export_item(item_b)
        return

    def closed(self, reason):
        self.exporter_a.finish_exporting()
        self.exporter_b.finish_exporting()
        self.file_a.close()
        self.file_b.close()

CSV以外の形式にしたい場合は適切なExporterを使う．

追記

start_exporting()の前にfields_to_exportを設定することで，どのフィールドを出力するかや，出力時のフィールドの並び順を指定することができる．

        self.exporter_a = CsvItemExporter(self.file_a)
        self.exporter_b = CsvItemExporter(self.file_b)
        self.exporter_a.fields_to_export = ['field1', 'field2', 'field3']
        self.exporter_b.fields_to_export = ['field_x', 'field_y']
        self.exporter_a.start_exporting()
        self.exporter_b.start_exporting()

参考にしたページ

Scrapy 1.2 ドキュメント-アイテムエクスポーター

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up