LoginSignup
8

More than 5 years have passed since last update.

CSVを1行ずつフィールドの型変換をしながら読み込んで(プログレスバーも表示しながら)処理したい

Last updated at Posted at 2016-03-02

プログレスバーを表示しながら、CSVを1行ずつフィールドの型変換をしながら読んで何か処理をする例。
プログレスバーはclickのやつを。

とりあえずコード貼っときます
(with なんとか... で使えるiteratorの書き方の自分用コピペ例でもある)

field_converter.py はこちら:https://gist.github.com/naoyat/3db8cd96c8dcecb5caea
前の記事「Pythonで "文字列".split() した結果を一括変換したい」のやつです

csv_iterator.py
import sys
import csv
import click
from field_converter import FieldConverter

class CSV_Iterator:
    def __init__(self, path, skip_header=False, with_progress_bar=False,
                 field_converter=None):
        self.path = path
        self.with_progress_bar = with_progress_bar
        self.field_converter = field_converter

        self.f = open(path, 'r')
        self.line_count = sum(1 for line in self.f)

        self.f.seek(0)  # rewind
        self.r = csv.reader(self.f, dialect='excel')
        if skip_header:
            self.r.next()
            self.line_count -= 1

        print '(%d lines)' % (self.line_count,)

        if self.with_progress_bar:
            self.bar = click.progressbar(self.r, self.line_count)

    def __iter__(self):
        return self

    def next(self):
        try:
            if self.with_progress_bar:
                fields = self.bar.next()
            else:
                fields = self.r.next()
            if self.field_converter:
                try:
                    fields = self.field_converter.convert(fields)
                except:
                    print sys.exc_info()
            return fields
        except:
            raise StopIteration

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type:
            return False
        if self.with_progress_bar:
            print
        self.f.close()
        return True

gistに貼っておきます。
https://gist.github.com/naoyat/b1290d917638c412e140

使用例。

example.py
from csv_iterator import CSV_Iterator

def foobar(csv_path):
    with CSV_Iterator(csv_path,
                  skip_header=True,
                  with_progress_bar=True,
                  field_converter=FieldConverter(int, int, 'iso-8859-1', 'iso-8859-1', float)) as line:
        for id, uid, title, query, target in line:
            ...

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8