What is im2col
im2col is an important function used in CNN, Convolutional Neural Networks, which transforms 4 dimensional images data to 2 dimensional numpy array. An image has a height, width and channel dimension (usually 3 channels for RGB colors). The last dimension is the number of images that is fed into CNN model. So usually you have input data that has a shape of (N,C,H,W), where N is number of images in the batch, C is number of channels (such as R,G,B), H is the height of the image in pixels, W is the width of the image.
im2col transforms this data to 2 dimensional format. Specifically it will convert it to (NOHOW,CFHFW) shape, where OH and OW are the resulting Output Height and Output Width after applying filter to the image, FH and FW are filter Height and Width respectively.
im2col code
We provide im2col code in below. Essentially this function takes all (C,FH,FW) cubes to which the filter is applied and transforms them into horizontal vector. Below figure helps visualizing this process
def im2col(input_data,fh,fw,stride=1,pad=0):
# first we apply padding to the input data
img=np.pad(input_data,[(0,0),(0,0),(pad,pad),(pad,pad)],'constant')
N,C,H,W=img.shape
OH = (H+2*pad-fh)//stride+1
OW = (W+2*pad-fw)//stride+1
col = np.zeros((N,C,fh,fw,OH,OW))
for y in range(fh):
y_max = y + OH * stride
for x in range(fw):
x_max = x + OW * stride
col[:,:,y,x,:,:] = img[:,:,y:y_max:stride,x:x_max:stride]
col = col.transpose(0,4,5,1,2,3).reshape(N*OH*OW,-1)
return col
The only part that took me long time to understand was why were we jumping per stride. And after conducting below kind of experiment as shown in the figure, I finally understood that the trick was with transpose.