Help us understand the problem. What is going on with this article?

CSVを1行ずつフィールドの型変換をしながら読み込んで(プログレスバーも表示しながら)処理したい

More than 3 years have passed since last update.

プログレスバーを表示しながら、CSVを1行ずつフィールドの型変換をしながら読んで何か処理をする例。
プログレスバーはclickのやつを。

とりあえずコード貼っときます
(with なんとか... で使えるiteratorの書き方の自分用コピペ例でもある)

field_converter.py はこちら:https://gist.github.com/naoyat/3db8cd96c8dcecb5caea
前の記事「Pythonで "文字列".split() した結果を一括変換したい」のやつです

csv_iterator.py
import sys
import csv
import click
from field_converter import FieldConverter

class CSV_Iterator:
    def __init__(self, path, skip_header=False, with_progress_bar=False,
                 field_converter=None):
        self.path = path
        self.with_progress_bar = with_progress_bar
        self.field_converter = field_converter

        self.f = open(path, 'r')
        self.line_count = sum(1 for line in self.f)

        self.f.seek(0)  # rewind
        self.r = csv.reader(self.f, dialect='excel')
        if skip_header:
            self.r.next()
            self.line_count -= 1

        print '(%d lines)' % (self.line_count,)

        if self.with_progress_bar:
            self.bar = click.progressbar(self.r, self.line_count)

    def __iter__(self):
        return self

    def next(self):
        try:
            if self.with_progress_bar:
                fields = self.bar.next()
            else:
                fields = self.r.next()
            if self.field_converter:
                try:
                    fields = self.field_converter.convert(fields)
                except:
                    print sys.exc_info()
            return fields
        except:
            raise StopIteration

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        if exc_type:
            return False
        if self.with_progress_bar:
            print
        self.f.close()
        return True

gistに貼っておきます。
https://gist.github.com/naoyat/b1290d917638c412e140

使用例。

example.py
from csv_iterator import CSV_Iterator

def foobar(csv_path):
    with CSV_Iterator(csv_path,
                  skip_header=True,
                  with_progress_bar=True,
                  field_converter=FieldConverter(int, int, 'iso-8859-1', 'iso-8859-1', float)) as line:
        for id, uid, title, query, target in line:
            ...
naoya_t
自然言語処理とか機械学習とか競技プログラミングとか
https://naoyat.hatenablog.jp/
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした