pytorchのモデルサマリを表示するのにはtorchsummaryがありますが,torchinfoのほうが新しいので,pre-trained 3D CNNを表示してみます.
- I3D
- C2D
- X3D-S/M/L
- SlowFast各種
- R(2+1)D
- 3D ResNet
ちなみにtorchsummary
のオプションは通常はinput_size
ですが,slowfastは複数入力を取るので,input_data
を使います.
I3D
I3D
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/I3D_8x8_R50.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'i3d_r50', pretrained=True)
batch_size = 1
frames = 8
size = 224
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, size, size),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/I3D_8x8_R50.pyth" to /root/.cache/torch/hub/checkpoints/I3D_8x8_R50.pyth
100%
214M/214M [00:18<00:00, 12.9MB/s]
=========================================================================================================
Layer (type (var_name)) Input Shape Output Shape
=========================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (5) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 8, 224, 224] [1, 64, 8, 56, 56]
│ │ └─Conv3d (conv) [1, 3, 8, 224, 224] [1, 64, 8, 112, 112]
│ │ └─BatchNorm3d (norm) [1, 64, 8, 112, 112] [1, 64, 8, 112, 112]
│ │ └─ReLU (activation) [1, 64, 8, 112, 112] [1, 64, 8, 112, 112]
│ │ └─MaxPool3d (pool) [1, 64, 8, 112, 112] [1, 64, 8, 56, 56]
│ └─ResStage (1) [1, 64, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 64, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResBlock (1) [1, 256, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResBlock (2) [1, 256, 8, 56, 56] [1, 256, 8, 56, 56]
│ └─MaxPool3d (2) [1, 256, 8, 56, 56] [1, 256, 4, 56, 56]
│ └─ResStage (3) [1, 256, 4, 56, 56] [1, 512, 4, 28, 28]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 256, 4, 56, 56] [1, 512, 4, 28, 28]
│ │ │ └─ResBlock (1) [1, 512, 4, 28, 28] [1, 512, 4, 28, 28]
│ │ │ └─ResBlock (2) [1, 512, 4, 28, 28] [1, 512, 4, 28, 28]
│ │ │ └─ResBlock (3) [1, 512, 4, 28, 28] [1, 512, 4, 28, 28]
│ └─ResStage (4) [1, 512, 4, 28, 28] [1, 1024, 4, 14, 14]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 512, 4, 28, 28] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (1) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (2) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (3) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (4) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (5) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ └─ResStage (5) [1, 1024, 4, 14, 14] [1, 2048, 4, 7, 7]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 1024, 4, 14, 14] [1, 2048, 4, 7, 7]
│ │ │ └─ResBlock (1) [1, 2048, 4, 7, 7] [1, 2048, 4, 7, 7]
│ │ │ └─ResBlock (2) [1, 2048, 4, 7, 7] [1, 2048, 4, 7, 7]
│ └─ResNetBasicHead (6) [1, 2048, 4, 7, 7] [1, 400]
│ │ └─AvgPool3d (pool) [1, 2048, 4, 7, 7] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
=========================================================================================================
Total params: 28,043,472
Trainable params: 28,043,472
Non-trainable params: 0
Total mult-adds (G): 28.41
=========================================================================================================
Input size (MB): 4.82
Forward/backward pass size (MB): 1045.27
Params size (MB): 112.17
Estimated Total Size (MB): 1162.26
=========================================================================================================
C2D
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/C2D_8x8_R50.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'c2d_r50', pretrained=True)
batch_size = 1
frames = 8
size = 224
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, size, size),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/C2D_8x8_R50.pyth" to /root/.cache/torch/hub/checkpoints/C2D_8x8_R50.pyth
100%
186M/186M [00:16<00:00, 12.6MB/s]
=========================================================================================================
Layer (type (var_name)) Input Shape Output Shape
=========================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (5) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 8, 224, 224] [1, 64, 8, 56, 56]
│ │ └─Conv3d (conv) [1, 3, 8, 224, 224] [1, 64, 8, 112, 112]
│ │ └─BatchNorm3d (norm) [1, 64, 8, 112, 112] [1, 64, 8, 112, 112]
│ │ └─ReLU (activation) [1, 64, 8, 112, 112] [1, 64, 8, 112, 112]
│ │ └─MaxPool3d (pool) [1, 64, 8, 112, 112] [1, 64, 8, 56, 56]
│ └─ResStage (1) [1, 64, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 64, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResBlock (1) [1, 256, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResBlock (2) [1, 256, 8, 56, 56] [1, 256, 8, 56, 56]
│ └─MaxPool3d (2) [1, 256, 8, 56, 56] [1, 256, 4, 56, 56]
│ └─ResStage (3) [1, 256, 4, 56, 56] [1, 512, 4, 28, 28]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 256, 4, 56, 56] [1, 512, 4, 28, 28]
│ │ │ └─ResBlock (1) [1, 512, 4, 28, 28] [1, 512, 4, 28, 28]
│ │ │ └─ResBlock (2) [1, 512, 4, 28, 28] [1, 512, 4, 28, 28]
│ │ │ └─ResBlock (3) [1, 512, 4, 28, 28] [1, 512, 4, 28, 28]
│ └─ResStage (4) [1, 512, 4, 28, 28] [1, 1024, 4, 14, 14]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 512, 4, 28, 28] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (1) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (2) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (3) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (4) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ │ │ └─ResBlock (5) [1, 1024, 4, 14, 14] [1, 1024, 4, 14, 14]
│ └─ResStage (5) [1, 1024, 4, 14, 14] [1, 2048, 4, 7, 7]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 1024, 4, 14, 14] [1, 2048, 4, 7, 7]
│ │ │ └─ResBlock (1) [1, 2048, 4, 7, 7] [1, 2048, 4, 7, 7]
│ │ │ └─ResBlock (2) [1, 2048, 4, 7, 7] [1, 2048, 4, 7, 7]
│ └─ResNetBasicHead (6) [1, 2048, 4, 7, 7] [1, 400]
│ │ └─AvgPool3d (pool) [1, 2048, 4, 7, 7] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
=========================================================================================================
Total params: 24,327,632
Trainable params: 24,327,632
Non-trainable params: 0
Total mult-adds (G): 19.49
=========================================================================================================
Input size (MB): 4.82
Forward/backward pass size (MB): 1045.27
Params size (MB): 97.31
Estimated Total Size (MB): 1147.40
=========================================================================================================
SlowFast
SLOWFAST_16x8_R101_50_50
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/SLOWFAST_16x8_R101_50_50.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'slowfast_16x8_r101_50_50', pretrained=True)
batch_size = 1
slow_frames = 64
fast_frames = 16
input_data = [[
torch.zeros(batch_size, 3, fast_frames, 224, 224),
torch.zeros(batch_size, 3, slow_frames, 224, 224),
]]
torchinfo.summary(
model=model,
input_data=input_data,
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/SLOWFAST_16x8_R101_50_50.pyth" to /root/.cache/torch/hub/checkpoints/SLOWFAST_16x8_R101_50_50.pyth
100%
411M/411M [00:38<00:00, 11.6MB/s]
===================================================================================================================
Layer (type (var_name)) Input Shape Output Shape
===================================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─MultiPathWayWithFuse (0) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (1) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (2) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (3) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (4) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─PoolConcatPathway (5) -- --
│ │ └─ModuleList (pool) -- --
│ └─MultiPathWayWithFuse (0) [1, 64, 16, 56, 56] [1, 80, 16, 56, 56]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResNetBasicStem (0) [1, 3, 16, 224, 224] [1, 64, 16, 56, 56]
│ │ │ └─ResNetBasicStem (1) [1, 3, 64, 224, 224] [1, 8, 64, 56, 56]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 64, 16, 56, 56] [1, 80, 16, 56, 56]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 8, 64, 56, 56] [1, 16, 16, 56, 56]
│ │ │ └─BatchNorm3d (norm) [1, 16, 16, 56, 56] [1, 16, 16, 56, 56]
│ │ │ └─ReLU (activation) [1, 16, 16, 56, 56] [1, 16, 16, 56, 56]
│ └─MultiPathWayWithFuse (1) [1, 256, 16, 56, 56] [1, 320, 16, 56, 56]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 80, 16, 56, 56] [1, 256, 16, 56, 56]
│ │ │ └─ResStage (1) [1, 8, 64, 56, 56] [1, 32, 64, 56, 56]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 256, 16, 56, 56] [1, 320, 16, 56, 56]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 32, 64, 56, 56] [1, 64, 16, 56, 56]
│ │ │ └─BatchNorm3d (norm) [1, 64, 16, 56, 56] [1, 64, 16, 56, 56]
│ │ │ └─ReLU (activation) [1, 64, 16, 56, 56] [1, 64, 16, 56, 56]
│ └─MultiPathWayWithFuse (2) [1, 512, 16, 28, 28] [1, 640, 16, 28, 28]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 320, 16, 56, 56] [1, 512, 16, 28, 28]
│ │ │ └─ResStage (1) [1, 32, 64, 56, 56] [1, 64, 64, 28, 28]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 512, 16, 28, 28] [1, 640, 16, 28, 28]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 64, 64, 28, 28] [1, 128, 16, 28, 28]
│ │ │ └─BatchNorm3d (norm) [1, 128, 16, 28, 28] [1, 128, 16, 28, 28]
│ │ │ └─ReLU (activation) [1, 128, 16, 28, 28] [1, 128, 16, 28, 28]
│ └─MultiPathWayWithFuse (3) [1, 1024, 16, 14, 14] [1, 1280, 16, 14, 14]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 640, 16, 28, 28] [1, 1024, 16, 14, 14]
│ │ │ └─ResStage (1) [1, 64, 64, 28, 28] [1, 128, 64, 14, 14]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 1024, 16, 14, 14] [1, 1280, 16, 14, 14]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 128, 64, 14, 14] [1, 256, 16, 14, 14]
│ │ │ └─BatchNorm3d (norm) [1, 256, 16, 14, 14] [1, 256, 16, 14, 14]
│ │ │ └─ReLU (activation) [1, 256, 16, 14, 14] [1, 256, 16, 14, 14]
│ └─MultiPathWayWithFuse (4) [1, 2048, 16, 7, 7] [1, 2048, 16, 7, 7]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 1280, 16, 14, 14] [1, 2048, 16, 7, 7]
│ │ │ └─ResStage (1) [1, 128, 64, 14, 14] [1, 256, 64, 7, 7]
│ │ └─Identity (multipathway_fusion) [1, 2048, 16, 7, 7] [1, 2048, 16, 7, 7]
│ └─PoolConcatPathway (5) [1, 2048, 1, 1, 1] [1, 2304, 1, 1, 1]
│ │ └─ModuleList (pool) -- --
│ │ │ └─AvgPool3d (0) [1, 2048, 16, 7, 7] [1, 2048, 1, 1, 1]
│ │ │ └─AvgPool3d (1) [1, 256, 64, 7, 7] [1, 256, 1, 1, 1]
│ └─ResNetBasicHead (6) [1, 2304, 1, 1, 1] [1, 400]
│ │ └─Dropout (dropout) [1, 2304, 1, 1, 1] [1, 2304, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2304] [1, 1, 1, 1, 400]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
===================================================================================================================
Total params: 53,774,808
Trainable params: 53,774,808
Non-trainable params: 0
Total mult-adds (G): 163.09
===================================================================================================================
Input size (MB): 19.27
Forward/backward pass size (MB): 6335.83
Params size (MB): 215.10
Estimated Total Size (MB): 6570.19
===================================================================================================================
SLOWFAST_8x8_R101
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/SLOWFAST_8x8_R101.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'slowfast_r101', pretrained=True)
batch_size = 1
slow_frames = 32
fast_frames = 8
input_data = [[
torch.zeros(batch_size, 3, fast_frames, 224, 224),
torch.zeros(batch_size, 3, slow_frames, 224, 224),
]]
torchinfo.summary(
model=model,
input_data=input_data,
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/SLOWFAST_8x8_R101.pyth" to /root/.cache/torch/hub/checkpoints/SLOWFAST_8x8_R101.pyth
100%
480M/480M [00:40<00:00, 12.7MB/s]
===================================================================================================================
Layer (type (var_name)) Input Shape Output Shape
===================================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─MultiPathWayWithFuse (0) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (1) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (2) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (3) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (4) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─PoolConcatPathway (5) -- --
│ │ └─ModuleList (pool) -- --
│ └─MultiPathWayWithFuse (0) [1, 64, 8, 56, 56] [1, 80, 8, 56, 56]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResNetBasicStem (0) [1, 3, 8, 224, 224] [1, 64, 8, 56, 56]
│ │ │ └─ResNetBasicStem (1) [1, 3, 32, 224, 224] [1, 8, 32, 56, 56]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 64, 8, 56, 56] [1, 80, 8, 56, 56]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 8, 32, 56, 56] [1, 16, 8, 56, 56]
│ │ │ └─BatchNorm3d (norm) [1, 16, 8, 56, 56] [1, 16, 8, 56, 56]
│ │ │ └─ReLU (activation) [1, 16, 8, 56, 56] [1, 16, 8, 56, 56]
│ └─MultiPathWayWithFuse (1) [1, 256, 8, 56, 56] [1, 320, 8, 56, 56]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 80, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResStage (1) [1, 8, 32, 56, 56] [1, 32, 32, 56, 56]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 256, 8, 56, 56] [1, 320, 8, 56, 56]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 32, 32, 56, 56] [1, 64, 8, 56, 56]
│ │ │ └─BatchNorm3d (norm) [1, 64, 8, 56, 56] [1, 64, 8, 56, 56]
│ │ │ └─ReLU (activation) [1, 64, 8, 56, 56] [1, 64, 8, 56, 56]
│ └─MultiPathWayWithFuse (2) [1, 512, 8, 28, 28] [1, 640, 8, 28, 28]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 320, 8, 56, 56] [1, 512, 8, 28, 28]
│ │ │ └─ResStage (1) [1, 32, 32, 56, 56] [1, 64, 32, 28, 28]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 512, 8, 28, 28] [1, 640, 8, 28, 28]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 64, 32, 28, 28] [1, 128, 8, 28, 28]
│ │ │ └─BatchNorm3d (norm) [1, 128, 8, 28, 28] [1, 128, 8, 28, 28]
│ │ │ └─ReLU (activation) [1, 128, 8, 28, 28] [1, 128, 8, 28, 28]
│ └─MultiPathWayWithFuse (3) [1, 1024, 8, 14, 14] [1, 1280, 8, 14, 14]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 640, 8, 28, 28] [1, 1024, 8, 14, 14]
│ │ │ └─ResStage (1) [1, 64, 32, 28, 28] [1, 128, 32, 14, 14]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 1024, 8, 14, 14] [1, 1280, 8, 14, 14]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 128, 32, 14, 14] [1, 256, 8, 14, 14]
│ │ │ └─BatchNorm3d (norm) [1, 256, 8, 14, 14] [1, 256, 8, 14, 14]
│ │ │ └─ReLU (activation) [1, 256, 8, 14, 14] [1, 256, 8, 14, 14]
│ └─MultiPathWayWithFuse (4) [1, 2048, 8, 7, 7] [1, 2048, 8, 7, 7]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 1280, 8, 14, 14] [1, 2048, 8, 7, 7]
│ │ │ └─ResStage (1) [1, 128, 32, 14, 14] [1, 256, 32, 7, 7]
│ │ └─Identity (multipathway_fusion) [1, 2048, 8, 7, 7] [1, 2048, 8, 7, 7]
│ └─PoolConcatPathway (5) [1, 2048, 1, 1, 1] [1, 2304, 1, 1, 1]
│ │ └─ModuleList (pool) -- --
│ │ │ └─AvgPool3d (0) [1, 2048, 8, 7, 7] [1, 2048, 1, 1, 1]
│ │ │ └─AvgPool3d (1) [1, 256, 32, 7, 7] [1, 256, 1, 1, 1]
│ └─ResNetBasicHead (6) [1, 2304, 1, 1, 1] [1, 400]
│ │ └─Dropout (dropout) [1, 2304, 1, 1, 1] [1, 2304, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2304] [1, 1, 1, 1, 400]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
===================================================================================================================
Total params: 62,826,968
Trainable params: 62,826,968
Non-trainable params: 0
Total mult-adds (G): 96.40
===================================================================================================================
Input size (MB): 9.63
Forward/backward pass size (MB): 3167.92
Params size (MB): 251.31
Estimated Total Size (MB): 3428.86
===================================================================================================================
SLOWFAST_8x8_R50
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/SLOWFAST_8x8_R50.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'slowfast_r50', pretrained=True)
batch_size = 1
slow_frames = 32
fast_frames = 8
input_data = [[
torch.zeros(batch_size, 3, fast_frames, 224, 224),
torch.zeros(batch_size, 3, slow_frames, 224, 224),
]]
torchinfo.summary(
model=model,
input_data=input_data,
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/SLOWFAST_8x8_R50.pyth" to /root/.cache/torch/hub/checkpoints/SLOWFAST_8x8_R50.pyth
100%
264M/264M [00:36<00:00, 12.9MB/s]
===================================================================================================================
Layer (type (var_name)) Input Shape Output Shape
===================================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─MultiPathWayWithFuse (0) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (1) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (2) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (3) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─MultiPathWayWithFuse (4) -- --
│ │ └─ModuleList (multipathway_blocks) -- --
│ └─PoolConcatPathway (5) -- --
│ │ └─ModuleList (pool) -- --
│ └─MultiPathWayWithFuse (0) [1, 64, 8, 56, 56] [1, 80, 8, 56, 56]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResNetBasicStem (0) [1, 3, 8, 224, 224] [1, 64, 8, 56, 56]
│ │ │ └─ResNetBasicStem (1) [1, 3, 32, 224, 224] [1, 8, 32, 56, 56]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 64, 8, 56, 56] [1, 80, 8, 56, 56]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 8, 32, 56, 56] [1, 16, 8, 56, 56]
│ │ │ └─BatchNorm3d (norm) [1, 16, 8, 56, 56] [1, 16, 8, 56, 56]
│ │ │ └─ReLU (activation) [1, 16, 8, 56, 56] [1, 16, 8, 56, 56]
│ └─MultiPathWayWithFuse (1) [1, 256, 8, 56, 56] [1, 320, 8, 56, 56]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 80, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResStage (1) [1, 8, 32, 56, 56] [1, 32, 32, 56, 56]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 256, 8, 56, 56] [1, 320, 8, 56, 56]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 32, 32, 56, 56] [1, 64, 8, 56, 56]
│ │ │ └─BatchNorm3d (norm) [1, 64, 8, 56, 56] [1, 64, 8, 56, 56]
│ │ │ └─ReLU (activation) [1, 64, 8, 56, 56] [1, 64, 8, 56, 56]
│ └─MultiPathWayWithFuse (2) [1, 512, 8, 28, 28] [1, 640, 8, 28, 28]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 320, 8, 56, 56] [1, 512, 8, 28, 28]
│ │ │ └─ResStage (1) [1, 32, 32, 56, 56] [1, 64, 32, 28, 28]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 512, 8, 28, 28] [1, 640, 8, 28, 28]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 64, 32, 28, 28] [1, 128, 8, 28, 28]
│ │ │ └─BatchNorm3d (norm) [1, 128, 8, 28, 28] [1, 128, 8, 28, 28]
│ │ │ └─ReLU (activation) [1, 128, 8, 28, 28] [1, 128, 8, 28, 28]
│ └─MultiPathWayWithFuse (3) [1, 1024, 8, 14, 14] [1, 1280, 8, 14, 14]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 640, 8, 28, 28] [1, 1024, 8, 14, 14]
│ │ │ └─ResStage (1) [1, 64, 32, 28, 28] [1, 128, 32, 14, 14]
│ │ └─FuseFastToSlow (multipathway_fusion) [1, 1024, 8, 14, 14] [1, 1280, 8, 14, 14]
│ │ │ └─Conv3d (conv_fast_to_slow) [1, 128, 32, 14, 14] [1, 256, 8, 14, 14]
│ │ │ └─BatchNorm3d (norm) [1, 256, 8, 14, 14] [1, 256, 8, 14, 14]
│ │ │ └─ReLU (activation) [1, 256, 8, 14, 14] [1, 256, 8, 14, 14]
│ └─MultiPathWayWithFuse (4) [1, 2048, 8, 7, 7] [1, 2048, 8, 7, 7]
│ │ └─ModuleList (multipathway_blocks) -- --
│ │ │ └─ResStage (0) [1, 1280, 8, 14, 14] [1, 2048, 8, 7, 7]
│ │ │ └─ResStage (1) [1, 128, 32, 14, 14] [1, 256, 32, 7, 7]
│ │ └─Identity (multipathway_fusion) [1, 2048, 8, 7, 7] [1, 2048, 8, 7, 7]
│ └─PoolConcatPathway (5) [1, 2048, 1, 1, 1] [1, 2304, 1, 1, 1]
│ │ └─ModuleList (pool) -- --
│ │ │ └─AvgPool3d (0) [1, 2048, 8, 7, 7] [1, 2048, 1, 1, 1]
│ │ │ └─AvgPool3d (1) [1, 256, 32, 7, 7] [1, 256, 1, 1, 1]
│ └─ResNetBasicHead (6) [1, 2304, 1, 1, 1] [1, 400]
│ │ └─Dropout (dropout) [1, 2304, 1, 1, 1] [1, 2304, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2304] [1, 1, 1, 1, 400]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
===================================================================================================================
Total params: 34,566,488
Trainable params: 34,566,488
Non-trainable params: 0
Total mult-adds (G): 50.31
===================================================================================================================
Input size (MB): 9.63
Forward/backward pass size (MB): 2185.27
Params size (MB): 138.27
Estimated Total Size (MB): 2333.17
===================================================================================================================
X3D
X3D-s
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/X3D_S.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'x3d_s', pretrained=True)
batch_size = 1
frames = 13
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, 160, 160),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/X3D_S.pyth" to /root/.cache/torch/hub/checkpoints/X3D_S.pyth
100%
29.4M/29.4M [00:03<00:00, 11.6MB/s]
==============================================================================================================
Layer (type (var_name)) Input Shape Output Shape
==============================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (2) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 13, 160, 160] [1, 24, 13, 80, 80]
│ │ └─Conv2plus1d (conv) [1, 3, 13, 160, 160] [1, 24, 13, 80, 80]
│ │ │ └─Conv3d (conv_t) [1, 3, 13, 160, 160] [1, 24, 13, 80, 80]
│ │ │ └─Conv3d (conv_xy) [1, 24, 13, 80, 80] [1, 24, 13, 80, 80]
│ │ └─BatchNorm3d (norm) [1, 24, 13, 80, 80] [1, 24, 13, 80, 80]
│ │ └─ReLU (activation) [1, 24, 13, 80, 80] [1, 24, 13, 80, 80]
│ └─ResStage (1) [1, 24, 13, 80, 80] [1, 24, 13, 40, 40]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 24, 13, 80, 80] [1, 24, 13, 40, 40]
│ │ │ └─ResBlock (1) [1, 24, 13, 40, 40] [1, 24, 13, 40, 40]
│ │ │ └─ResBlock (2) [1, 24, 13, 40, 40] [1, 24, 13, 40, 40]
│ └─ResStage (2) [1, 24, 13, 40, 40] [1, 48, 13, 20, 20]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 24, 13, 40, 40] [1, 48, 13, 20, 20]
│ │ │ └─ResBlock (1) [1, 48, 13, 20, 20] [1, 48, 13, 20, 20]
│ │ │ └─ResBlock (2) [1, 48, 13, 20, 20] [1, 48, 13, 20, 20]
│ │ │ └─ResBlock (3) [1, 48, 13, 20, 20] [1, 48, 13, 20, 20]
│ │ │ └─ResBlock (4) [1, 48, 13, 20, 20] [1, 48, 13, 20, 20]
│ └─ResStage (3) [1, 48, 13, 20, 20] [1, 96, 13, 10, 10]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 48, 13, 20, 20] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (1) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (2) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (3) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (4) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (5) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (6) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (7) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (8) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (9) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ │ │ └─ResBlock (10) [1, 96, 13, 10, 10] [1, 96, 13, 10, 10]
│ └─ResStage (4) [1, 96, 13, 10, 10] [1, 192, 13, 5, 5]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 96, 13, 10, 10] [1, 192, 13, 5, 5]
│ │ │ └─ResBlock (1) [1, 192, 13, 5, 5] [1, 192, 13, 5, 5]
│ │ │ └─ResBlock (2) [1, 192, 13, 5, 5] [1, 192, 13, 5, 5]
│ │ │ └─ResBlock (3) [1, 192, 13, 5, 5] [1, 192, 13, 5, 5]
│ │ │ └─ResBlock (4) [1, 192, 13, 5, 5] [1, 192, 13, 5, 5]
│ │ │ └─ResBlock (5) [1, 192, 13, 5, 5] [1, 192, 13, 5, 5]
│ │ │ └─ResBlock (6) [1, 192, 13, 5, 5] [1, 192, 13, 5, 5]
│ └─ResNetBasicHead (5) [1, 192, 13, 5, 5] [1, 400]
│ │ └─ProjectedPool (pool) [1, 192, 13, 5, 5] [1, 2048, 1, 1, 1]
│ │ │ └─Conv3d (pre_conv) [1, 192, 13, 5, 5] [1, 432, 13, 5, 5]
│ │ │ └─BatchNorm3d (pre_norm) [1, 432, 13, 5, 5] [1, 432, 13, 5, 5]
│ │ │ └─ReLU (pre_act) [1, 432, 13, 5, 5] [1, 432, 13, 5, 5]
│ │ │ └─AvgPool3d (pool) [1, 432, 13, 5, 5] [1, 432, 1, 1, 1]
│ │ │ └─Conv3d (post_conv) [1, 432, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ │ └─ReLU (post_act) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─Softmax (activation) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
==============================================================================================================
Total params: 3,794,274
Trainable params: 3,794,274
Non-trainable params: 0
Total mult-adds (G): 1.96
==============================================================================================================
Input size (MB): 3.99
Forward/backward pass size (MB): 563.15
Params size (MB): 15.18
Estimated Total Size (MB): 582.32
==============================================================================================================
X3D-m
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/X3D_M.yaml/
model = torch.hub.load('facebookresearch/pytorchvideo', 'x3d_m', pretrained=True)
batch_size = 1
frames = 16
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, 224, 224),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/X3D_M.pyth" to /root/.cache/torch/hub/checkpoints/X3D_M.pyth
100%
29.4M/29.4M [00:03<00:00, 12.2MB/s]
==============================================================================================================
Layer (type (var_name)) Input Shape Output Shape
==============================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (2) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 16, 224, 224] [1, 24, 16, 112, 112]
│ │ └─Conv2plus1d (conv) [1, 3, 16, 224, 224] [1, 24, 16, 112, 112]
│ │ │ └─Conv3d (conv_t) [1, 3, 16, 224, 224] [1, 24, 16, 112, 112]
│ │ │ └─Conv3d (conv_xy) [1, 24, 16, 112, 112] [1, 24, 16, 112, 112]
│ │ └─BatchNorm3d (norm) [1, 24, 16, 112, 112] [1, 24, 16, 112, 112]
│ │ └─ReLU (activation) [1, 24, 16, 112, 112] [1, 24, 16, 112, 112]
│ └─ResStage (1) [1, 24, 16, 112, 112] [1, 24, 16, 56, 56]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 24, 16, 112, 112] [1, 24, 16, 56, 56]
│ │ │ └─ResBlock (1) [1, 24, 16, 56, 56] [1, 24, 16, 56, 56]
│ │ │ └─ResBlock (2) [1, 24, 16, 56, 56] [1, 24, 16, 56, 56]
│ └─ResStage (2) [1, 24, 16, 56, 56] [1, 48, 16, 28, 28]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 24, 16, 56, 56] [1, 48, 16, 28, 28]
│ │ │ └─ResBlock (1) [1, 48, 16, 28, 28] [1, 48, 16, 28, 28]
│ │ │ └─ResBlock (2) [1, 48, 16, 28, 28] [1, 48, 16, 28, 28]
│ │ │ └─ResBlock (3) [1, 48, 16, 28, 28] [1, 48, 16, 28, 28]
│ │ │ └─ResBlock (4) [1, 48, 16, 28, 28] [1, 48, 16, 28, 28]
│ └─ResStage (3) [1, 48, 16, 28, 28] [1, 96, 16, 14, 14]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 48, 16, 28, 28] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (1) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (2) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (3) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (4) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (5) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (6) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (7) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (8) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (9) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ │ │ └─ResBlock (10) [1, 96, 16, 14, 14] [1, 96, 16, 14, 14]
│ └─ResStage (4) [1, 96, 16, 14, 14] [1, 192, 16, 7, 7]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 96, 16, 14, 14] [1, 192, 16, 7, 7]
│ │ │ └─ResBlock (1) [1, 192, 16, 7, 7] [1, 192, 16, 7, 7]
│ │ │ └─ResBlock (2) [1, 192, 16, 7, 7] [1, 192, 16, 7, 7]
│ │ │ └─ResBlock (3) [1, 192, 16, 7, 7] [1, 192, 16, 7, 7]
│ │ │ └─ResBlock (4) [1, 192, 16, 7, 7] [1, 192, 16, 7, 7]
│ │ │ └─ResBlock (5) [1, 192, 16, 7, 7] [1, 192, 16, 7, 7]
│ │ │ └─ResBlock (6) [1, 192, 16, 7, 7] [1, 192, 16, 7, 7]
│ └─ResNetBasicHead (5) [1, 192, 16, 7, 7] [1, 400]
│ │ └─ProjectedPool (pool) [1, 192, 16, 7, 7] [1, 2048, 1, 1, 1]
│ │ │ └─Conv3d (pre_conv) [1, 192, 16, 7, 7] [1, 432, 16, 7, 7]
│ │ │ └─BatchNorm3d (pre_norm) [1, 432, 16, 7, 7] [1, 432, 16, 7, 7]
│ │ │ └─ReLU (pre_act) [1, 432, 16, 7, 7] [1, 432, 16, 7, 7]
│ │ │ └─AvgPool3d (pool) [1, 432, 16, 7, 7] [1, 432, 1, 1, 1]
│ │ │ └─Conv3d (post_conv) [1, 432, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ │ └─ReLU (post_act) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─Softmax (activation) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
==============================================================================================================
Total params: 3,794,274
Trainable params: 3,794,274
Non-trainable params: 0
Total mult-adds (G): 4.73
==============================================================================================================
Input size (MB): 9.63
Forward/backward pass size (MB): 1358.41
Params size (MB): 15.18
Estimated Total Size (MB): 1383.22
==============================================================================================================
X3D-L
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/X3D_L.yaml/
model = torch.hub.load('facebookresearch/pytorchvideo', 'x3d_l', pretrained=True)
batch_size = 1
frames = 16
size = 312
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, size, size),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Downloading: "https://github.com/facebookresearch/pytorchvideo/archive/master.zip" to /root/.cache/torch/hub/master.zip
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/X3D_L.pyth" to /root/.cache/torch/hub/checkpoints/X3D_L.pyth
100%
47.7M/47.7M [00:04<00:00, 12.9MB/s]
==============================================================================================================
Layer (type (var_name)) Input Shape Output Shape
==============================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (2) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 16, 312, 312] [1, 24, 16, 156, 156]
│ │ └─Conv2plus1d (conv) [1, 3, 16, 312, 312] [1, 24, 16, 156, 156]
│ │ │ └─Conv3d (conv_t) [1, 3, 16, 312, 312] [1, 24, 16, 156, 156]
│ │ │ └─Conv3d (conv_xy) [1, 24, 16, 156, 156] [1, 24, 16, 156, 156]
│ │ └─BatchNorm3d (norm) [1, 24, 16, 156, 156] [1, 24, 16, 156, 156]
│ │ └─ReLU (activation) [1, 24, 16, 156, 156] [1, 24, 16, 156, 156]
│ └─ResStage (1) [1, 24, 16, 156, 156] [1, 24, 16, 78, 78]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 24, 16, 156, 156] [1, 24, 16, 78, 78]
│ │ │ └─ResBlock (1) [1, 24, 16, 78, 78] [1, 24, 16, 78, 78]
│ │ │ └─ResBlock (2) [1, 24, 16, 78, 78] [1, 24, 16, 78, 78]
│ │ │ └─ResBlock (3) [1, 24, 16, 78, 78] [1, 24, 16, 78, 78]
│ │ │ └─ResBlock (4) [1, 24, 16, 78, 78] [1, 24, 16, 78, 78]
│ └─ResStage (2) [1, 24, 16, 78, 78] [1, 48, 16, 39, 39]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 24, 16, 78, 78] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (1) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (2) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (3) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (4) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (5) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (6) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (7) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (8) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ │ │ └─ResBlock (9) [1, 48, 16, 39, 39] [1, 48, 16, 39, 39]
│ └─ResStage (3) [1, 48, 16, 39, 39] [1, 96, 16, 20, 20]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 48, 16, 39, 39] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (1) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (2) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (3) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (4) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (5) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (6) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (7) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (8) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (9) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (10) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (11) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (12) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (13) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (14) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (15) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (16) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (17) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (18) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (19) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (20) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (21) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (22) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (23) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ │ │ └─ResBlock (24) [1, 96, 16, 20, 20] [1, 96, 16, 20, 20]
│ └─ResStage (4) [1, 96, 16, 20, 20] [1, 192, 16, 10, 10]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 96, 16, 20, 20] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (1) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (2) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (3) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (4) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (5) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (6) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (7) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (8) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (9) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (10) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (11) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (12) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (13) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ │ │ └─ResBlock (14) [1, 192, 16, 10, 10] [1, 192, 16, 10, 10]
│ └─ResNetBasicHead (5) [1, 192, 16, 10, 10] [1, 400]
│ │ └─ProjectedPool (pool) [1, 192, 16, 10, 10] [1, 2048, 1, 1, 1]
│ │ │ └─Conv3d (pre_conv) [1, 192, 16, 10, 10] [1, 432, 16, 10, 10]
│ │ │ └─BatchNorm3d (pre_norm) [1, 432, 16, 10, 10] [1, 432, 16, 10, 10]
│ │ │ └─ReLU (pre_act) [1, 432, 16, 10, 10] [1, 432, 16, 10, 10]
│ │ │ └─AvgPool3d (pool) [1, 432, 16, 10, 10] [1, 432, 1, 1, 1]
│ │ │ └─Conv3d (post_conv) [1, 432, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ │ └─ReLU (post_act) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─Softmax (activation) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
==============================================================================================================
Total params: 6,153,384
Trainable params: 6,153,384
Non-trainable params: 0
Total mult-adds (G): 18.37
==============================================================================================================
Input size (MB): 18.69
Forward/backward pass size (MB): 4574.27
Params size (MB): 24.61
Estimated Total Size (MB): 4617.58
==============================================================================================================
R(2+1)D
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/R2PLUS1D_16x4_R50.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'r2plus1d_r50', pretrained=True)
batch_size = 1
frames = 16
size = 224
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, size, size),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/R2PLUS1D_16x4_R50.pyth" to /root/.cache/torch/hub/checkpoints/R2PLUS1D_16x4_R50.pyth
100%
215M/215M [00:18<00:00, 13.2MB/s]
=========================================================================================================
Layer (type (var_name)) Input Shape Output Shape
=========================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (2) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 16, 224, 224] [1, 64, 16, 112, 112]
│ │ └─Conv3d (conv) [1, 3, 16, 224, 224] [1, 64, 16, 112, 112]
│ │ └─BatchNorm3d (norm) [1, 64, 16, 112, 112] [1, 64, 16, 112, 112]
│ │ └─ReLU (activation) [1, 64, 16, 112, 112] [1, 64, 16, 112, 112]
│ └─ResStage (1) [1, 64, 16, 112, 112] [1, 256, 16, 56, 56]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 64, 16, 112, 112] [1, 256, 16, 56, 56]
│ │ │ └─ResBlock (1) [1, 256, 16, 56, 56] [1, 256, 16, 56, 56]
│ │ │ └─ResBlock (2) [1, 256, 16, 56, 56] [1, 256, 16, 56, 56]
│ └─ResStage (2) [1, 256, 16, 56, 56] [1, 512, 16, 28, 28]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 256, 16, 56, 56] [1, 512, 16, 28, 28]
│ │ │ └─ResBlock (1) [1, 512, 16, 28, 28] [1, 512, 16, 28, 28]
│ │ │ └─ResBlock (2) [1, 512, 16, 28, 28] [1, 512, 16, 28, 28]
│ │ │ └─ResBlock (3) [1, 512, 16, 28, 28] [1, 512, 16, 28, 28]
│ └─ResStage (3) [1, 512, 16, 28, 28] [1, 1024, 8, 14, 14]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 512, 16, 28, 28] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (1) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (2) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (3) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (4) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (5) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ └─ResStage (4) [1, 1024, 8, 14, 14] [1, 2048, 4, 7, 7]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 1024, 8, 14, 14] [1, 2048, 4, 7, 7]
│ │ │ └─ResBlock (1) [1, 2048, 4, 7, 7] [1, 2048, 4, 7, 7]
│ │ │ └─ResBlock (2) [1, 2048, 4, 7, 7] [1, 2048, 4, 7, 7]
│ └─ResNetBasicHead (5) [1, 2048, 4, 7, 7] [1, 400]
│ │ └─AvgPool3d (pool) [1, 2048, 4, 7, 7] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─Softmax (activation) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
=========================================================================================================
Total params: 28,107,600
Trainable params: 28,107,600
Non-trainable params: 0
Total mult-adds (G): 57.53
=========================================================================================================
Input size (MB): 9.63
Forward/backward pass size (MB): 3190.39
Params size (MB): 112.43
Estimated Total Size (MB): 3312.46
=========================================================================================================
3D ResNet
slow_r50はResNet50です.
# https://github.com/facebookresearch/SlowFast/blob/master/configs/Kinetics/pytorchvideo/SLOW_8x8_R50.yaml
model = torch.hub.load('facebookresearch/pytorchvideo', 'slow_r50', pretrained=True)
batch_size = 1
frames = 8
size = 224
torchinfo.summary(
model=model,
input_size=(batch_size, 3, frames, size, size),
depth=4,
col_names=["input_size",
"output_size"],
row_settings=("var_names",)
)
Using cache found in /root/.cache/torch/hub/facebookresearch_pytorchvideo_master
Downloading: "https://dl.fbaipublicfiles.com/pytorchvideo/model_zoo/kinetics/SLOW_8x8_R50.pyth" to /root/.cache/torch/hub/checkpoints/SLOW_8x8_R50.pyth
0%| | 0.00/248M [00:00<?, ?B/s]
Out[12]:
=========================================================================================================
Layer (type (var_name)) Input Shape Output Shape
=========================================================================================================
Net -- --
├─ModuleList (blocks) -- --
│ └─ResStage (1) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (2) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (3) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResStage (4) -- --
│ │ └─ModuleList (res_blocks) -- --
│ └─ResNetBasicStem (0) [1, 3, 8, 224, 224] [1, 64, 8, 56, 56]
│ │ └─Conv3d (conv) [1, 3, 8, 224, 224] [1, 64, 8, 112, 112]
│ │ └─BatchNorm3d (norm) [1, 64, 8, 112, 112] [1, 64, 8, 112, 112]
│ │ └─ReLU (activation) [1, 64, 8, 112, 112] [1, 64, 8, 112, 112]
│ │ └─MaxPool3d (pool) [1, 64, 8, 112, 112] [1, 64, 8, 56, 56]
│ └─ResStage (1) [1, 64, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 64, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResBlock (1) [1, 256, 8, 56, 56] [1, 256, 8, 56, 56]
│ │ │ └─ResBlock (2) [1, 256, 8, 56, 56] [1, 256, 8, 56, 56]
│ └─ResStage (2) [1, 256, 8, 56, 56] [1, 512, 8, 28, 28]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 256, 8, 56, 56] [1, 512, 8, 28, 28]
│ │ │ └─ResBlock (1) [1, 512, 8, 28, 28] [1, 512, 8, 28, 28]
│ │ │ └─ResBlock (2) [1, 512, 8, 28, 28] [1, 512, 8, 28, 28]
│ │ │ └─ResBlock (3) [1, 512, 8, 28, 28] [1, 512, 8, 28, 28]
│ └─ResStage (3) [1, 512, 8, 28, 28] [1, 1024, 8, 14, 14]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 512, 8, 28, 28] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (1) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (2) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (3) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (4) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ │ │ └─ResBlock (5) [1, 1024, 8, 14, 14] [1, 1024, 8, 14, 14]
│ └─ResStage (4) [1, 1024, 8, 14, 14] [1, 2048, 8, 7, 7]
│ │ └─ModuleList (res_blocks) -- --
│ │ │ └─ResBlock (0) [1, 1024, 8, 14, 14] [1, 2048, 8, 7, 7]
│ │ │ └─ResBlock (1) [1, 2048, 8, 7, 7] [1, 2048, 8, 7, 7]
│ │ │ └─ResBlock (2) [1, 2048, 8, 7, 7] [1, 2048, 8, 7, 7]
│ └─ResNetBasicHead (5) [1, 2048, 8, 7, 7] [1, 400]
│ │ └─AvgPool3d (pool) [1, 2048, 8, 7, 7] [1, 2048, 1, 1, 1]
│ │ └─Dropout (dropout) [1, 2048, 1, 1, 1] [1, 2048, 1, 1, 1]
│ │ └─Linear (proj) [1, 1, 1, 1, 2048] [1, 1, 1, 1, 400]
│ │ └─AdaptiveAvgPool3d (output_pool) [1, 400, 1, 1, 1] [1, 400, 1, 1, 1]
=========================================================================================================
Total params: 32,454,096
Trainable params: 32,454,096
Non-trainable params: 0
Total mult-adds (G): 41.74
=========================================================================================================
Input size (MB): 4.82
Forward/backward pass size (MB): 1422.59
Params size (MB): 129.82
Estimated Total Size (MB): 1557.23
=========================================================================================================
比較
model | input | params | param size (MB) | GFLOPs |
---|---|---|---|---|
I3D | 224x244x8 | 28,043,472 | 112.17 | 28.41 |
C2D | 224x224x8 | 24,327,632 | 97.31 | 19.49 |
SLOWFAST_16x8_R101_50_50 | 224x224x(64+16) | 53,774,808 | 215.10 | 163.09 |
SLOWFAST_8x8_R101 | 224x224x(32+8) | 62,826,968 | 251.31 | 96.40 |
SLOWFAST_8x8_R50 | 224x224x(32+8) | 34,566,488 | 138.27 | 50.31 |
X3D-S | 160x160x13 | 3,794,274 | 15.18 | 1.96 |
X3D-M | 224x224x16 | 3,794,274 | 15.18 | 4.73 |
X3D-L | 312x312x16 | 6,153,384 | 24.61 | 18.37 |
R(2+1)D | 224x224x16 | 28,107,600 | 112.43 | 57.53 |
3D ResNet | 224x224x8 | 32,454,096 | 129.82 | 41.74 |