33
Help us understand the problem. What are the problem?

More than 1 year has passed since last update.

posted at

updated at

PapermillでJupyter Notebookをコマンドラインから実行する

QiitaにPapermillを紹介する記事が1つもなかったので、紹介します。

Papermillとは

PapermillはJupyter Notebookをバッチ実行するためのツールです。
Papermillを使うと、分析やモデリングで使ったNotebookがcrontab等で実行できるようになるので、そのままアプリケーションに組み込むことができます。
有名なところでは、Netflixが"Notebook Infrastructure"として使っています。1

インストール

pipでインストールします。

$ pip install papermill

使い方

CLIで実行する

CLIで次のように実行します。第1引数が実行するNotebookで、第2引数が各セルの出力結果を保存するNotebookです。

$ papermill papermill_example.ipynb papermill_example_output.ipynb

エラー発生時の挙動

Notebookでエラーが起きるとスタックトレースが出力されます。

$ papermill papermill_example.ipynb papermill_example_output.ipynb
Input Notebook:  papermill_example.ipynb
Output Notebook: papermill_example_output.ipynb
 50%|█████████████████████████████████████████████████████████████▌                                                             | 1/2 [00:01<00:01,  1.56s/it]
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/bin/papermill", line 11, in <module>
    sys.exit(papermill())
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/papermill/cli.py", line 155, in papermill
    report_mode=report_mode,
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/papermill/execute.py", line 70, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/papermill/execute.py", line 185, in raise_for_execution_errors
    raise error
papermill.exceptions.PapermillExecutionError:
---------------------------------------------------------------------------
Exception encountered at "In [1]":
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-1-1dec16e5c581> in <module>()
      1 import pandas as pd
----> 2 pd.read_csv('test.csv')

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose,skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677
--> 678         return _read(filepath_or_buffer, kwds)
    679
    680     parser_f.__name__ = name

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    438
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441
    442     if chunksize or iterator:

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786
--> 787         self._make_engine(self.engine)
    788
    789     def close(self):

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709
   1710         passed_names = self.names is None

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'test.csv' does not exist

終了コードによるエラー判定もできます。

$ echo $?
1

Python APIで実行

Python APIによる実行もできますが、この記事では割愛します。
より詳しい使い方を知りたい方は、公式GitHubをご覧ください。

参考

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Sign upLogin
33
Help us understand the problem. What are the problem?