DeepLearning学習日誌１１誤差逆伝播法バックプロパゲーション　XORの実装

Last updated at 2026-01-18Posted at 2026-01-18

Geminiとの対話をベースに、記事を構成しています。

参考書：
ゼロから作るDeepLearning　Pythonで学ぶディープラーニングの理論と実践　斎藤康毅　著

開発環境:
VScode + 拡張機能Python(microsoft) + anaconda(統計処理、参考書の推薦ライブラリ）

この記事は、ゼロから作るDeep Learning 第５章の学習記録と、補足知識の記録になります。
内容は、徐々に追記していきます。

誤差逆伝播法 XORの実装

出力は一つだけだと、softmaxが使えないので、２つのノードを用意し、
output1:'０'の確率を示すノード
output2:'１'の確率を示すノード
とします。

x	y	xor
0	0	0
0	1	1
1	0	1
1	1	0

学習の停滞

隠れ層が２層で、学習率が小さい場合、学習の停滞（Local Minima/Saddle Point）が起こります。
出力層の値:
[[0.49939392 0.50060608]
[0.49999954 0.50000046]
[0.49999967 0.50000033]
[0.50060626 0.49939374]]
出力が 0.5 付近（どっちつかず）で止まっているのは、ネットワークが「どう分類していいかわからないから、とりあえず真ん中の値を答えておこう」という状態で諦めてしまっていることを意味します。
損失関数も、計算途中で、一定の値に、膠着状態になります。

最初、２層でやったら、まったく収束せず。Geminiに聞いて、初めて状態がわかりました。

誤差逆伝播法　XOR

import sys ,os
import numpy as np
import matplotlib.pyplot as plt
from collections import OrderedDict


def sigmoid(x):
    return 1/(1 + np.exp(-x))

def softmax(x):
    #print('-----------softmax:\n',x)
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis = 0)
        y = np.exp(x)/np.sum(np.exp(x), axis = 0)
        #print('ndim=2:',y)
        return y.T
    x = x - np.max(x)
    y = np.exp(x)/ np.sum(np.exp(x))
    #print('ndim=1:',y)
    return y

def cross_entropy_error(y,t):
    #print('cee:y,t.shape',y.shape, t.shape)
    if y.ndim == 1:
        t = t.reshape(1,t.size)
        y = y.reshape(1,y.size)
    if t.size == y.size:
        t = t.argmax(axis = 1)
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size),t]+1e-7)) / batch_size

class softmaxWithloss:
    def __init__(self):
        self.loss = None
        self.t = None
        self.y = None
    def forward(self,x,t):
        #print('-----softmaxWithloss forward')
        #print('softmax xshape',x.shape)
        self.t = t
        if x.shape[1]==1:
            x=x.squeeze()
        #print('sq:\n',x)
        self.y = softmax(x)
        #print('y',self.y)
        #print("shpape y, t",self.y.shape, self.t.shape)
        self.loss = cross_entropy_error(self.y,self.t)
        #print("loss:",self.loss)
        return self.loss
    def backward(self, dout = 1):
        batch_size = self.t.shape[0]
        #print("-------softmaxWithloss backward:", self.t.size, self.y.size)
        #print(self.y)
        #print(self.t)
        if self.t.size == self.y.size:#教師データがone-hot-vectorの場合
            dx = (self.y - self.t)/ batch_size
        else:
            dx = self.y.copy()
            dx[np.arange(batch_size),self.t] -=1
            dx = dx / batch_size #
        #print('dx:', dx.shape)
        #print(dx)
        if dx.ndim == 1:
            #print(dx.shape[0])
            dx = dx.reshape(dx.shape[0],1)
        return dx

class OneLayerNet:
    def __init__(self, inputSize, hiddenSize1, outputSize):
        scale = 1.0
        #重み初期化
        self.params = {}
        #1層目 W1(2,2) b1(2,) 
        self.params['W1'] = scale * np.random.rand(inputSize,hiddenSize1)
        self.params['b1'] = np.random.randn(hiddenSize1)
        #2層目 W2(2,3) b2(3,)
        self.params['W2'] = scale * np.random.rand(hiddenSize1,outputSize)
        self.params['b2'] = np.random.randn(outputSize)
        

        #レイヤー初期化
        self.layers = OrderedDict()
        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
        self.layers['Sigmoid1'] = Sigmoid()
        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])

        self.lastLayer = softmaxWithloss()

    def predict(self, x):
        #print("predict")
        #for layer in self.layers.values():
        for keyName, layer in self.layers.items():
            #print(f"{keyName}:",layer)
            x = layer.forward(x)
            #print(x)
        return x
    
    def loss(self, x, t):
        #print('------loss')
        y = self.predict(x)
        #print('------loss/predict y\n',y)
        return self.lastLayer.forward(y,t)
    
    def gradient(self, x, t):
        #print('---------------gradient')
        #forward
        loss= self.loss(x,t)
        #backward
        dout = 1
        dout = self.lastLayer.backward(dout)
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        grads = {}
        grads['W1'],grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W2'],grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db
        return loss,grads
    
class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
    
    def forward(self,x):
        self.x = x
        out = np.dot(self.x,self.W) + self.b
        return out
    
    def backward(self,dout):
        dx = np.dot(dout,self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout,axis=0)
        return dx

class Relu:
    def __init__(self):
        self.mask = None
    def forward(self, x):
        self.mask = (x<=0) #[[false,true],[false,true]]
        out = x.copy()
        out[self.mask]=0
        return out
    def backward(self,dout):
        dout[self.mask] = 0
        dx = dout
        return dx



class Sigmoid:
    def __init__(self):
        self.out = None
    def forward(self, x):
        out = sigmoid(x)
        self.out = out
        return out
    def backward(self, dout):
        dx = dout *(1.0 - self.out) * self.out
        return dx

trainSize = 4
inputSize = 2
hiddenSize1 = 4
outputSize = 2

np.random.seed(42)
"""x1 = np.random.choice([0,1],size=(trainSize,inputSize))
t = np.zeros((trainSize,outputSize))
for i in range(trainSize):
    if(x1[i][0]==1 & x1[i][1]==1):
        t[i][0] = 1
"""
x1 = np.array([
    [0,0],[0,1],[1,0],[1,1]
])
#t = np.array([[1,0],[1,0],[1,0],[0,1]])

# XORの正解データ (入力が異なるときだけ 1 )
# [0,0]->0, [0,1]->1, [1,0]->1, [1,1]->0
t = np.array([
    [1,0], # False
    [0,1], # True
    [0,1], # True
    [1,0]  # False
])

print("x",x1)
print("t",t)

train_loss_list = []

net = OneLayerNet(inputSize, hiddenSize1, outputSize)
for i in range(60000):
    loss, grads = net.gradient(x1,t)


    for key in ('W1', 'b1', 'W2', 'b2'):
        net.params[key] -= 0.01 * grads[key]

    train_loss_list.append(loss)
    if i% 100 == 0:
        print("loss",loss)

print('*'*50)
np.set_printoptions(suppress=True)

for key in ('W1', 'b1', 'W2', 'b2'):
    print(key)
    print(net.params[key])

print()
print("x:",x1)
print("t:",t)
y = net.predict(x1)
y = y.squeeze()
y = softmax(y)
print("y:",y)

# 隠れ層（Sigmoid1）の出力を確認する
print("\n--- Hidden Layer Output (Transformation) ---")
# Affine1を通ったあとのSigmoidの出力
hidden_output = net.layers['Sigmoid1'].out 
print(hidden_output)

# グラフの描画
x = np.arange(len(train_loss_list))
plt.plot(x,train_loss_list)
plt.xlabel("iterations")
plt.ylabel("loss")
plt.title("Training Loss Curve")
plt.show()

出力結果　誤差逆伝播法XOR

W1
[[ 0.69638447  5.57807722  3.73301229  1.27710358]
 [-0.02576369  5.5070093   3.82151436  1.32686045]]
b1
[ 1.46459913 -2.22010371 -5.78779532  0.7035585 ]
W2
[[ 1.43463791 -1.06940843]
 [-5.00845106  5.83744973]
 [ 5.90110456 -5.1779304 ]
 [ 0.41666394  0.33468281]]
b2
[ 0.30761498 -2.62794276]

x: [[0 0]
 [0 1]
 [1 0]
 [1 1]]
t: [[1 0]
 [0 1]
 [0 1]
 [1 0]]
y: [[0.98193707 0.01806293]
 [0.01690937 0.98309063]
 [0.01852566 0.98147434]
 [0.97960081 0.02039919]]

--- Hidden Layer Output (Transformation) ---
[[0.81223509 0.09795964 0.00305537 0.66897627]
 [0.80827425 0.96397685 0.12278891 0.88395406]
 [0.8966907  0.96636497 0.11356997 0.87875172]
 [0.89427953 0.99985877 0.85405071 0.96468598]]

--- Hidden Layer Output (Transformation) ---について

データの対応関係
この行列（4行4列）は、以下のような意味を持っています。

1行目: 入力 [0, 0] が、隠れ層の4つのニューロンによって変換された姿

2行目: 入力 [0, 1] が、隠れ層の4つのニューロンによって変換された姿

3行目: 入力 [1, 0] が、隠れ層の4つのニューロンによって変換された姿

4行目: 入力 [1, 1] が、隠れ層の4つのニューロンによって変換された姿

※ 4列あるのは、hiddenSize1 = 4 に設定して、ニューロンを4つ用意したからです。

なぜこれが「すごい」ことなのか？
これこそがディープラーニングの　「特徴量抽出」　の正体です。少し詳しく見てみると、非常に面白いことが起きています。あなたの出力データの 3列目（右から2番目）の値に注目してください。

[0, 0] → 0.003 (ほぼ0)

[0, 1] → 0.122 (小さい)

[1, 0] → 0.113 (小さい)

[1, 1] → 0.854 (大きい！)

この「3列目のニューロン」は、「入力が [1, 1] の時だけ強く反応する（発火する）専門家」になっていることがわかります。

最後の層（出力層）は何をしている？
最後の層（Affine2 + Softmax）は、もはや元の [0, 0] や [1, 1] という入力を見ていません。代わりに、隠れ層が作った**「変換後の値（今の4つの数値）」**を見て判断しています。

入力層: [0, 1] と [1, 0] は別々のデータだなぁ...

隠れ層: 「この2つは似ているから、近い数値に変換しちゃえ！」

2行目: [0.80, 0.96, 0.12, 0.88]

3行目: [0.89, 0.96, 0.11, 0.87]

（↑ 見てわかる通り、値がすごく似ていますよね？）

出力層: 「隠れ層から来たデータがすごく似てるな。じゃあ、どっちも同じ『Trueクラス』に分類しよう！」

このように、「隠れ層が良い感じにデータを変換してくれたおかげで、最後の層が楽に正解を出せるようになった」というのが、今回の学習の成功の理由です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

DeepLearning学習日誌１１ 誤差逆伝播法バックプロパゲーション XORの実装

誤差逆伝播法 XORの実装

学習の停滞

--- Hidden Layer Output (Transformation) ---について

DeepLearning学習日誌１１誤差逆伝播法バックプロパゲーション　XORの実装