More than 3 years have passed since last update.

【Quantization】pytroch でFP32をFP16にすると遅くなる

Posted at 2021-05-14

背景

実行速度を上げたくFP32をFP16にしてみた

実験

stereo matchingのPSMNetで実験してみた
FP32(args.half=False)とFP16(args.half=True)

    model.eval()

    if args.cuda:
        imgL = imgL.cuda()
        imgR = imgR.cuda()
    
    if args.half:
        model.half()
        imgL = imgL.half()
        imgR = imgR.half()

    with torch.no_grad():
        torch.cuda.synchronize()
        start_time = time.time()
        pred_dispL = model(imgL, imgR)
        torch.cuda.synchronize()
        processing_time = time.time() - start_time
        print("time = %.4f" % (processing_time))

結果

FP32
time = 0.5656[s]
time = 0.5608[s]
time = 0.5621[s]
time = 0.5624[s]
time = 0.5623[s]
time = 0.5626[s]

FP16
time = 0.7387[s]
time = 0.7365[s]
time = 0.7357[s]
time = 0.7370[s]
time = 0.7365[s]

結論

なぜだかFP16の方が遅い
何回やっても違うモデルでやってもFP16の方が遅い
今後理由を調査したい

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up