はじめに
OpenChemで掲題のエラーが発生した時の対応メモ
環境
- PyTorch 1.7(GPU)
エラー内容
OpenChemのGCNNのチュートリアルであるlogP_gcnn_config.pyを実行すると以下のエラーが。
$ python launch.py --nproc_per_node=1 run.py --config_file="./example_configs/logP_gcnn
_config.py" --mode="train_eval" --batch_size=256 --num_epochs=100
11339 unsanitized smiles (10.8%)
warnings.warn('{:d}/{:d} unsanitized smiles ({:.1f}%)'.format(num_bad, len(smiles), 10
0 * invalid_rate))
C:\kimisyo\work\SoftwareDevelop\OpenChem\openchem\data\utils.py:187: UserWarning: 300/2
835 unsanitized smiles (10.6%)
warnings.warn('{:d}/{:d} unsanitized smiles ({:.1f}%)'.format(num_bad, len(smiles), 10
0 * invalid_rate))
2020-11-28 14:29:53,054 openchem INFO: Running on 1 GPUs
2020-11-28 14:29:53,055 openchem INFO: Logging directory is set to logs/logp_gcnn_logs
2020-11-28 14:29:53,055 openchem INFO: Running with config:
batch_size: 256
encoder_params/encoder_dim: 128
encoder_params/input_size: 33
encoder_params/n_layers: 3
logdir: logs/logp_gcnn_logs
lr_scheduler_params/gamma: 0.8
lr_scheduler_params/step_size: 15
mlp_params/input_size: 128
mlp_params/n_layers: 2
num_epochs: 100
optimizer_params/lr: 0.0005
print_every: 10
random_seed: 42
save_every: 5
task: regression
use_clip_grad: False
use_cuda: True
2020-11-28 14:29:54,342 openchem INFO: Starting training from scratch
2020-11-28 14:29:54,342 openchem INFO: Training is set up from epoch 0
0%| | 0/100 [00:00<?, ?it/s]2
020-11-28 14:30:14,451 openchem.fit INFO: TRAINING: [Time: 0m 20s, Epoch: 0, Progress: 0
%, Loss: 3.7441]
0%| | 0/100 [00:20<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 327, in <module>
main()
File "run.py", line 257, in main
model_config, eval=True, val_loader=val_loader, cur_epoch=cur_epoch)
File "C:\kimisyo\work\SoftwareDevelop\OpenChem\openchem\models\openchem_model.py", li
ne 174, in fit
val_loss, metrics = evaluate(model, val_loader, criterion, epoch=epoch)
File "C:\kimisyo\work\SoftwareDevelop\OpenChem\openchem\models\openchem_model.py", li
ne 238, in evaluate
for i_batch, sample_batched in enumerate(data_loader):
File "C:\Users\kimisyo\.conda\envs\openchem\lib\site-packages\torch\utils\data\dataloade
r.py", line 352, in __iter__
return self._get_iterator()
File "C:\Users\kimisyo\.conda\envs\openchem\lib\site-packages\torch\utils\data\dataloade
r.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\kimisyo\.conda\envs\openchem\lib\site-packages\torch\utils\data\dataloade
r.py", line 801, in __init__
w.start()
File "C:\Users\kimisyo\.conda\envs\openchem\lib\multiprocessing\process.py", line 112, i
n start
self._popen = self._Popen(self)
File "C:\Users\kimisyo\.conda\envs\openchem\lib\multiprocessing\context.py", line 223, i
n _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\kimisyo\.conda\envs\openchem\lib\multiprocessing\context.py", line 322, i
n _Popen
return Popen(process_obj)
File "C:\Users\kimisyo\.conda\envs\openchem\lib\multiprocessing\popen_spawn_win32.py", l
ine 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\kimisyo\.conda\envs\openchem\lib\multiprocessing\reduction.py", line 60,
in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function get_atomic_attributes at 0x000002169C40D55
8>: import of module '<run_path>' failed
対応
原因不明。
PyTorchのデータローダでmultiprocess処理をしているときに発生しているようなので、singleprocessになるよう、応急処置としてopenchemのrun.pyのcreate_loaderの引数 num_workerを1から0に変更する。
run.py
val_loader = create_loader(val_dataset,
batch_size=model_config['batch_size'],
shuffle=False,
#num_workers=1,
num_workers=0,
pin_memory=True
これでとりあえず、問題なく動作するようになった。