3
7

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

(論文) PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization (2020)

Posted at

Source

Introduction

Anomaly detection is a binary classification between the normal and the anomalous classes. While this problem can be tackled easily through the conventional method of learning from both normal and abnormal data, it is impractical for real world use cases as there is often a lack of anomaly data. Therefore, the general approach is to use unsupervised learning on normal data only.

One popular way to perform anomaly detection and localization is to use generative methods such as AutoEncoders (AE), Variational AutoEncoders (VAE) or Generative AutoEncoders (GAN), where areas of the generated image that differ from the original image is considered an anomaly. However, all of these methods require cumbersome deep neural network training and sometimes they may yield good reconstruction results even for anomalous images.

While there have been other approaches such as SPADE that avoids training a deep neural network, it employs K-nearest-neighbour (KNN) on entire training dataset at test time, which causes long inference time.

The proposed method introduced in this paper uses a pre-trained CNN for embedding extraction while taking into consideration of the correlations between different semantic levels of the pre-trained CNN, yielding state of the art result for anomaly detection.

Patch Distribution Modelling (PaDiM)

Embedding extraction

Activation vectors from different layers of the CNN are concatenated to get the embedding vectors that contains both fine-grained and global context regarding the image.

圖片.png

An input image is divided into patches that are sized $W$x$H$, where $W$x$H$ is the resolution of the largest activation map (output size of the 1st layer of the CNN). Each of these patches is associated to an embedding vector $x_{ij}$ computed as described above.

As these generated patch embedding vectors may contain redundant information, an ablation study was performed and it was discovered that randomly selecting few dimensions not only decrease the complexity of the model significantly, it is also more efficient than conducting dimension reduction through Principal Component Analysis (PCA).

Learning the normality

$X_{ij}$ is the total number of embedding vectors $x_{ij}$ gathered from $N$ normal images. It is under the assumption that $X_{ij}$ is generated by a multivariate Gaussian distribution $\mathcal{N}(\mu_{ij}, \Sigma_{ij})$ where $\mu_{ij}$ is the sample mean of $X_{ij}$, and $\Sigma_{ij}$ is the sample covariance which is obtained through the following equation:
圖片.png
where $\epsilon$ is a constant regularization term.

Inferencing

During inferencing, the Mahalanobis distance $M(x_{ij})$ is used to compute difference between the patch embedding vectors of a test image and the learnt distribution $\mathcal{N}(\mu_{ij}, \Sigma_{ij})$. It is calculated as follows:
圖片.png
High scores in the anomaly map $M$ indicate the location of anomalies, and the anomaly score of an image is interpreted as the maximum value of the anomaly map.

Result

Setup

Two metrics are used for model effectiveness evaluation. One is the Area Under the Receiver Operating Characteristic curve (AUROC), where the true positive rate is the percentage of pixels correctly classified as anomalous. Another metric is the per-region-overlap score (PRO-score), as it avoids the bias in favor of large anomalies present in AUROC. A high PRO-score indicates that both large and small anomalies are well-localized.

Two different datasets were used; MVTec AD and Shanghai Tec Campus. The images from MVTec AD are resized to 256x256 then center-cropped to 224x224, while STC images are resized to 256x256 only. Additionally, a modified version of MVTec AD named Rd-MVTec AD is also used, where random rotation and random crop is applied to both the train and test sets in order to test a model's robustness against non-aligned images.

Three different pre-trained CNN were tested to extract the embedding vectors; ResNet18, Wide ResNet-50-2 and EfficientNet-B5.

Effects of layers

In general, the performance increases along with the level of semantic level information. Taking the correlations between different semantic levels into account also yields better result than simply summing the outputs (PaDiM-R18 vs Layer 1+2+3).
圖片.png

Dimension Reduction

As mentioned earlier, the random dimensionality reduction (Rd) outperforms PCA in all areas. Reducing the embedding vector size has very little impact on the anomaly localization performance while greatly improves the time and space complexity of the algorithm.
圖片.png

Anomaly Detection

PaDim is superior at detecting defects in textured classes in MVTec AD, and it is also the best overall performing algorithm.
圖片.png

Similarly, it has the highest AUROC on the STC dataset.
圖片.png

In addition, PaDiM is more robust to non-aligned images, as shown below.
圖片.png

Result Visualization

圖片.png
圖片.png

Time and Space Complexity

The inference time for PaDiM is almost on par with VAE, and significantly faster than SPADE as it does not need to perform KNN on the test image and training data.
圖片.png

The memory required by PaDiM is much lower than SPADE as it only keeps the pre-trained CNN and Gaussian parameters associated with each patch in memory.
圖片.png

Conclusion

PaDiM has achieve a new state of the art result on anomaly detection. However, it remains to be seen whether it can yield equivalent performance when there are multiple objects and/or defects in an image.

Another concern is the memory requirement when there is a large training data, both terms of image count and image resolution, as they need to be stored in order to to derive the Gaussian parameters.

3
7
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
7

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?