Preface
In this article, I would like to talk about the meaning of convolutional operation in CNN. I DO expect that you have certain understanding of CNN already.
References
畳み込みとは何か? http://www.ice.tohtech.ac.jp/~nakagawa/laplacetrans/convolution1.htm
畳み込みによる画像処理とは? https://www.clg.niigata-u.ac.jp/~medimg/practice_medical_imaging/imgproc_scion/4filter/index.htm
Cross-Correlation: http://mathworld.wolfram.com/Cross-Correlation.html
Cross-Correlation vs Convolution: https://www.youtube.com/watch?v=C3EEy8adxvc
Signal and Image
Generally speaking, we can say that signals build the foundation of any images. As you can see below. we can map the luminance of image on 3D region.
And the signal is able to be projected onto 2/3D region as well.

Also, images are compositions of integers in each pixel.
Reference
Hence, we can borrow some arithmetic operations from signal processing to here, image processing.
And one of the most obvious one is convolution, which I aim at describing in this article.
Before facing the actual math operation, however, I would like to confirm the similarity of the meaning of the convolution process in signal processing and image processing.
Convolution in signal processing (Conceptual Explanation)
In signal processing, we are using continuous version of convolutional approach as below.
reference
By this approach, we could blend two functions $f(t)$ and $g(x-t)$.
reference
good reference in japanese: http://www.yukisako.xyz/entry/tatamikomi
Convolution in image processing
In image processing, as we can imagine, the image is basically discrete area.
Discrete area means, it has edges. Hence in image processing, we normally use discrete version of convolution process as below.

With this approach, we can blend the image(feature map) and the weight matrix(filter). So that, the output of this math operation is partially trimmed feature map. And as you know, we apply this to each colour channel, like yellow, blue and red and so on. So the output of convolutional layer becomes the trimmed feature map in each colour channel. Note that due to the weight share rule in CNN, feature maps in channels will look similar.
So far we have seen the relationship between image processing and signal processing. But if we investigate more about convolution process, we will encounter the cross-correlation. Which is similar operation of convolution.
The brightest video I have confirmed ever: https://www.youtube.com/watch?v=MQm6ZP1F6ms
So what they do is fundamentally analogous. But the order to traverse is completely opposite.
Conclusion
By convolution operation, a image is multiplied a weight matrix with regard to each area containing pixels. And this gives us the smoothed version of images.
Thank you.