LoginSignup
0
0

More than 3 years have passed since last update.

Understand im2col

Posted at

What is im2col

im2col is an important function used in CNN, Convolutional Neural Networks, which transforms 4 dimensional images data to 2 dimensional numpy array. An image has a height, width and channel dimension (usually 3 channels for RGB colors). The last dimension is the number of images that is fed into CNN model. So usually you have input data that has a shape of (N,C,H,W), where N is number of images in the batch, C is number of channels (such as R,G,B), H is the height of the image in pixels, W is the width of the image.
im2col transforms this data to 2 dimensional format. Specifically it will convert it to (N*OH*OW,C*FH*FW) shape, where OH and OW are the resulting Output Height and Output Width after applying filter to the image, FH and FW are filter Height and Width respectively.

im2col code

We provide im2col code in below. Essentially this function takes all (C,FH,FW) cubes to which the filter is applied and transforms them into horizontal vector. Below figure helps visualizing this process
im2col_figure.png

def im2col(input_data,fh,fw,stride=1,pad=0):
    # first we apply padding to the input data
    img=np.pad(input_data,[(0,0),(0,0),(pad,pad),(pad,pad)],'constant')
    N,C,H,W=img.shape
    OH = (H+2*pad-fh)//stride+1
    OW = (W+2*pad-fw)//stride+1
    col = np.zeros((N,C,fh,fw,OH,OW))
    for y in range(fh):
        y_max = y + OH * stride
        for x in range(fw):
            x_max = x + OW * stride
            col[:,:,y,x,:,:] = img[:,:,y:y_max:stride,x:x_max:stride]
    col = col.transpose(0,4,5,1,2,3).reshape(N*OH*OW,-1)
    return col

The only part that took me long time to understand was why were we jumping per stride. And after conducting below kind of experiment as shown in the figure, I finally understood that the trick was with transpose.
convolution_transpose_explanation.jpg

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0