LoginSignup
0
0

More than 1 year has passed since last update.

【Quantization】pytroch でFP32をFP16にすると遅くなる

Posted at

背景

実行速度を上げたくFP32をFP16にしてみた

実験

stereo matchingのPSMNetで実験してみた
FP32(args.half=False)とFP16(args.half=True)

    model.eval()

    if args.cuda:
        imgL = imgL.cuda()
        imgR = imgR.cuda()

    if args.half:
        model.half()
        imgL = imgL.half()
        imgR = imgR.half()

    with torch.no_grad():
        torch.cuda.synchronize()
        start_time = time.time()
        pred_dispL = model(imgL, imgR)
        torch.cuda.synchronize()
        processing_time = time.time() - start_time
        print("time = %.4f" % (processing_time))

結果

FP32
time = 0.5656[s]
time = 0.5608[s]
time = 0.5621[s]
time = 0.5624[s]
time = 0.5623[s]
time = 0.5626[s]

FP16
time = 0.7387[s]
time = 0.7365[s]
time = 0.7357[s]
time = 0.7370[s]
time = 0.7365[s]

結論

なぜだかFP16の方が遅い
何回やっても違うモデルでやってもFP16の方が遅い
今後理由を調査したい

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0