7
9

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Scrapyで2種類のアイテムを別々のファイルに出力する

Last updated at Posted at 2019-06-03

前提の環境

  • Python 3.7.0
  • Scrapy 1.6.0

動機

  • Scrapyでスクレイピングして,同じページから種類の違う情報を集めて別のCSVに保存したい.

要点

  • Exporterを複数用意してそれぞれにアイテムを保存する.

my_process.py
import scrapy
from scrapy.exporters import CsvItemExporter
from my_project.items import MyItemA
from my_project.items import MyItemB

class MyProcessSpider(scrapy.Spider):
    name = 'my_process'
    allowed_domains = ['qiita.com']
    start_urls = ['http://qiita.com']

    def __init__(self, *args, **kwargs):
        super(MyProcessSpider, self).__init__(*args, **kwargs)
        self.file_a = open('file_a.csv', 'w+b')
        self.file_b = open('file_b.csv', 'w+b')
        self.exporter_a = CsvItemExporter(self.file_a)
        self.exporter_b = CsvItemExporter(self.file_b)
        self.exporter_a.start_exporting()
        self.exporter_b.start_exporting()
        ## 他に初期化の処理が必要ならここに書く

    def parse(self, response):
        ## 情報収集するURLのリストがurlsに出来ているとする
        for url in urls:
            yield scrapy.Request(url, callback = self.parse_item)

    def parese_item(self, response)
        item_a = MyItemA()
        item_b = MyItemB()
        ##
        ## item_aとitem_bにデータを格納
        ##
        self.exporter_a.export_item(item_a)
        self.exporter_b.export_item(item_b)
        return

    def closed(self, reason):
        self.exporter_a.finish_exporting()
        self.exporter_b.finish_exporting()
        self.file_a.close()
        self.file_b.close()
  • CSV以外の形式にしたい場合は適切なExporterを使う.

追記

  • start_exporting()の前にfields_to_exportを設定することで,どのフィールドを出力するかや,出力時のフィールドの並び順を指定することができる.
        self.exporter_a = CsvItemExporter(self.file_a)
        self.exporter_b = CsvItemExporter(self.file_b)
        self.exporter_a.fields_to_export = ['field1', 'field2', 'field3']
        self.exporter_b.fields_to_export = ['field_x', 'field_y']
        self.exporter_a.start_exporting()
        self.exporter_b.start_exporting()

参考にしたページ

7
9
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
9

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?