More than 3 years have passed since last update.

Understand im2col

im2col

Posted at 2021-02-02

What is im2col

im2col is an important function used in CNN, Convolutional Neural Networks, which transforms 4 dimensional images data to 2 dimensional numpy array. An image has a height, width and channel dimension (usually 3 channels for RGB colors). The last dimension is the number of images that is fed into CNN model. So usually you have input data that has a shape of (N,C,H,W), where N is number of images in the batch, C is number of channels (such as R,G,B), H is the height of the image in pixels, W is the width of the image.
im2col transforms this data to 2 dimensional format. Specifically it will convert it to (NOHOW,CFHFW) shape, where OH and OW are the resulting Output Height and Output Width after applying filter to the image, FH and FW are filter Height and Width respectively.

im2col code

We provide im2col code in below. Essentially this function takes all (C,FH,FW) cubes to which the filter is applied and transforms them into horizontal vector. Below figure helps visualizing this process

def im2col(input_data,fh,fw,stride=1,pad=0):
    # first we apply padding to the input data
    img=np.pad(input_data,[(0,0),(0,0),(pad,pad),(pad,pad)],'constant')
    N,C,H,W=img.shape
    OH = (H+2*pad-fh)//stride+1
    OW = (W+2*pad-fw)//stride+1
    col = np.zeros((N,C,fh,fw,OH,OW))
    for y in range(fh):
        y_max = y + OH * stride
        for x in range(fw):
            x_max = x + OW * stride
            col[:,:,y,x,:,:] = img[:,:,y:y_max:stride,x:x_max:stride]
    col = col.transpose(0,4,5,1,2,3).reshape(N*OH*OW,-1)
    return col

The only part that took me long time to understand was why were we jumping per stride. And after conducting below kind of experiment as shown in the figure, I finally understood that the trick was with transpose.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up