More than 1 year has passed since last update.

Project Gradient Descent Attack 実装

Posted at 2023-09-09

はじめに

「Project Gradient Descent Attack」という敵対的サンプル攻撃をPyTorchで実装しました。
画像データはCIFAR10を用いました。
実装にあたり書籍「AIセキュリティから学ぶディープラーニング[技術]入門」に掲載のソースコードを一部引用しました。書籍ではARTというライブラリを用いて実装していますがここではナイーブに実装していきます。ARTを用いた実装は以下のページからご覧ください。

実装

ライブラリのインポート

import torch
import torch.nn as nn
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

データセットの用意

mu=0.5
sigma=0.5

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((mu, mu, mu), (sigma, sigma, sigma)), 
])
trainset = torchvision.datasets.CIFAR10(
    root='./data', 
    train=True,
    download=True,
    transform=transform
)
trainloader = torch.utils.data.DataLoader(
    trainset,
    batch_size=100,
    shuffle=True,
    num_workers=2
)
testset = torchvision.datasets.CIFAR10(
    root='./data', 
    train=False, 
    download=True, 
    transform=transform
)
testloader = torch.utils.data.DataLoader(
    testset, 
    batch_size=1, 
    shuffle=False, 
    num_workers=2
)

学習済みモデルの用意

def Net():
    model_ft=models.resnet50(pretrained=True)
    model_ft.fc=nn.Linear(model_ft.fc.in_features, 10)
    return  model_ft

# Define what device we are using
use_cuda=True
device = torch.device("cuda" if (use_cuda and torch.cuda.is_available()) else "cpu")

# Initialize the network
model = Net().to(device)

# Load the pretrained model
path="./exp/bestloss.pth" # 各自で用意してください
model.load_state_dict(torch.load(path))

# Set the model in evaluation mode. In this case this is for the Dropout layers
model.eval()

前準備

class2name={
    0: "airplane", 
    1: "automobile", 
    2: "bird", 
    3: "cat", 
    4: "deer", 
    5: "dog", 
    6: "frog", 
    7: "horse", 
    8: "ship", 
    9: "truck", 
}

# Define what device we are using
print("CUDA Available: ",torch.cuda.is_available())
device = torch.device("cuda" if (use_cuda and torch.cuda.is_available()) else "cpu")

# Initialize the network
model = Net().to(device)

# Load the pretrained model
path="./exp/bestloss.pth"
model.load_state_dict(torch.load(path))

# Set the model in evaluation mode. In this case this is for the Dropout layers
model.eval()

eps=0.1
eps_step=0.005
max_step=40

criterion = nn.CrossEntropyLoss()

攻撃

def fgsm_attack(image, epsilon, data_grad):
    """
    INPUT
    image: (B, C, H, W), [-1,1]
    epsilon: (1, )
    data_grad: (B, C, H, W)
    OUTPUT
    perturbed_image: (B, C, H, W), [-1,1]
    """
    # Collect the element-wise sign of the data gradient
    sign_data_grad = data_grad.sign()
    # Create the perturbed image by adjusting each pixel of the input image
    perturbed_image = image + epsilon * sign_data_grad
    # Adding clipping to maintain [-1,1] range
    perturbed_image = torch.clamp(perturbed_image, -1, 1)
    # Return the perturbed image
    return perturbed_image

def test(
        model, 
        data, 
        target, 
        device, 
        max_step, 
        eps_step, 
        eps
    ):
    """
    """
    success = True
    data, target = data.to(device), target.to(device)
    orig_img = data.clone()
    data.requires_grad = True
    output=model(data)
    pred=output.max(1, keepdim=True)[1]

    # print(pred.item(), target.item())
    if pred.item() == target.item():
        for t in range(max_step):
            # calcurate loss and backward
            loss=criterion(output, target)
            model.zero_grad()
            loss.backward()
            data_grad=data.grad.data

            # Attack and clip
            data=fgsm_attack(data, eps_step, data_grad)
            data=torch.clamp(
                data, 
                orig_img-eps, 
                orig_img+eps
            )
            data=data.detach().to(device)
            data.requires_grad = True
            output=model(data)

            # pred
            pred=output.max(1, keepdim=True)[1]

            # Check for success
            if pred.item() != target.item():
                break
    
    if pred.item() == target.item():
        success = False

    return data, pred, success

クリッピングとdetach()処理の実装を忘れずに。（忘れると予期せぬ結果になったりエラーになったりします）

関数名が fgsm_attack になっているのはPGDAの内部で用いられるFGSMを実装しているためです。

可視化

data, target = next(iter(testloader))
![uploading...0]()

orig_img=data[0]
orig_img=orig_img.detach().numpy()
orig_img=orig_img.transpose(1,2,0)
orig_img=mu+sigma*orig_img

pred=torch.softmax(model(data), dim=1)
original_class=torch.argmax(pred).item()
original_name=class2name[original_class]
original_score=torch.max(pred).item()*100

adv_img=adv_data[0]
adv_img=adv_img.detach().numpy()
adv_img=adv_img.transpose(1,2,0)
adv_img=mu+sigma*adv_img

adv_pred=torch.softmax(model(adv_data), dim=1)
adv_class=torch.argmax(adv_pred).item()
adv_name=class2name[adv_class]
adv_score=torch.max(adv_pred).item()*100

plt.figure(figsize=(11,5.5))

plt.subplot(1, 2, 1)
plt.axis("off")
plt.title(f"original (class={original_name}, score={original_score:.2f}%)")
plt.imshow(orig_img)

plt.subplot(1, 2, 2)
plt.axis("off")
plt.title(f"adversarial (class={adv_name}, score={adv_score:.2f}%)")
plt.imshow(adv_img)

右側の画像は cat に見えますが dog と認識されてしまいました。

終わりに

PGDAのイメージは「FGSMを細かく繰り返す」。
考え方はシンプルですが、FGSMよりも見た目に違和感のない敵対的サンプルが生成できています。（その分時間はかかりますが）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up