More than 1 year has passed since last update.

【mlflow】AttributeError: 'NoneType' object has no attribute 'copy'

MLflow

Posted at 2023-10-04

mlflowが動かない！！

いつも通りmlflowを開いて実験結果を見ようとしたところ、InternalServerErrorが発生。結果が見れない！mlflowの実行している端末を見ると、以下のようなエラーがでていた。

2023/10/04 03:24:42 ERROR mlflow.server: Exception on /ajax-api/2.0/mlflow/runs/search [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/server/handlers.py", line 486, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/server/handlers.py", line 527, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/server/handlers.py", line 926, in _search_runs
    run_entities = _get_tracking_store().search_runs(
  File "/usr/local/lib/python3.10/dist-packages/mlflow/store/tracking/abstract_store.py", line 298, in search_runs
    runs, token = self._search_runs(
  File "/usr/local/lib/python3.10/dist-packages/mlflow/store/tracking/file_store.py", line 915, in _search_runs
    run_infos = self._list_run_infos(experiment_id, run_view_type)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/store/tracking/file_store.py", line 883, in _list_run_infos
    run_info = self._get_run_info_from_dir(r_dir)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/store/tracking/file_store.py", line 696, in _get_run_info_from_dir
    return _read_persisted_run_info_dict(meta)
  File "/usr/local/lib/python3.10/dist-packages/mlflow/store/tracking/file_store.py", line 132, in _read_persisted_run_info_dict
    dict_copy = run_info_dict.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

エラーの内容的にmlflowのデータがうまく読み込めなくてNoneになっている？？と感じたので。実行していたプログラムを確認してみた。すると、ストレージが満タンで書き込めない的なエラーを吐いて止まっていた。は？っと思って調べると、なんと、他の人がストレージを食い潰していました。みなさん、共有サーバでは気をつけましょう。
でっ、なんとなくこのエラーの原因が分かってきました。ストレージエラーでプログラムが止まったせいで、mlflowのデータが壊れたんだろうなぁーと。そこで、Tracebackにそのパスが表示されていないかなと思って調べてみたが、載ってなかった。仕方ないのでググってみると, https://github.com/mlflow/mlflow/issues/873に、悪さをしているrunを見つけるプログラムが紹介されていた。（ありがとうございます。）

import os
import yaml
filestore_root_dir = "./mlruns" # insert your filestore dir (string) here
experiment_id = 0 # insert your experiment ID (int) here
experiment_dir = os.path.join(filestore_root_dir, str(experiment_id))
for run_dir in [elem for elem in os.listdir(experiment_dir) if elem != "meta.yaml"]:
  meta_file_path = os.path.join(experiment_dir, run_dir, 'meta.yaml')
  with open(meta_file_path) as meta_file:
    if yaml.safe_load(meta_file.read()) is None:
      print("Run data in file %s was malformed" % meta_file_path)

上のプログラムを使って、悪さをしているrun_idを見つけ出して削除すれば、解決できそうである。
使い方は、filestore_root_dirにmlflowのログが保存されているルートのフォルダを指定し、experiment_idにエラーが発生するexperiment_idを入力すれば良い。
自分の場合、実験の説明文を入れているため、上のプログラムでは正しく動かなかった。
そのため、以下のようなプログラムに変更した。

import os
import yaml
filestore_root_dir = "mlruns" # insert your filestore dir (string) here
experiment_id = 803071133184922812 # insert your experiment ID (int) here
experiment_dir = os.path.join(filestore_root_dir, str(experiment_id))
for run_dir in [elem for elem in os.listdir(experiment_dir) if elem != "meta.yaml" and elem != "tags"]:
  meta_file_path = os.path.join(experiment_dir, run_dir, 'meta.yaml')
  with open(meta_file_path) as meta_file:
    if yaml.safe_load(meta_file.read()) is None:
      print("Run data in file %s was malformed" % meta_file_path)

run_dirにrunではない"tags"というフォルダ(実験の説明を保存しているらしい)が含まれているため、それを取り除く設定を追加した。上を実行することで、悪さをしているrunがあった場合に、"Run data in file [悪さをしているデータパス] was malformed"と表示される。これを元に、悪さをしているrunフォルダを削除したところ、mlflowを復活させることができた。

まとめ

mlflow復活っつ！！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up