Impermanence of all things. "The world is ever changing in search of the optimal solution. Neural networks are no different." by Biwa-hoshi.

Posted at 2024-10-26

"The sound of impermanence of all things" is the opening phrase of the war tale "The Tale of the Heike" from the end of the Heian period, meaning that "the sound of the bell at Gion-shoja has the sound that all phenomena in this world are constantly changing."
"Impermanence of all things" is one of the three fundamental Buddhist ideas, the Dharma Seals, and means that everything in the world changes, repeating the fate of being born and disappearing, and nothing remains forever.

Short story "The world is ever changing in search of the optimal solution. Neural networks are no different."
Kazuya Sanada, a programmer living in Tokyo, works on optimizing AI models every day. His latest challenge was to unravel patterns hidden in data through training neural networks to derive the best possible model parameters. However, training neural networks is not easy, and the process of searching for the optimal solution is sometimes like a maze.

Kazuya thought again. "After all, neural networks are nothing more than an optimization problem of weight matrices and weight functions for mapping desired outputs and inputs." The gradient descent method is used to adjust this weight function, reducing the error and aiming for convergence, but this method sometimes leads to local minimums and often fails to reach the true optimal solution. He felt a little frustrated by the reality that the model he had built with his own skills and time was ultimately only an "approximate solution."

Kazuya suddenly thought. "If there was an innovative method to optimize this weight function..." The thought occurred to him that a more flexible and creative optimization process was needed to replace the current gradient descent method.

What if we reversed the direction of the gradient when the error stopped decreasing? By inverting the calculation sign of the gradient descent method and moving in the direction of temporarily diffusing the error, we may be able to discover new paths that were not visible before. Then, the gradient sign is restored again and convergence is aimed again. In the diffusion phase, it is pulled back to the height once and converged again. This alternating process was a new attempt to get out of a local solution and search for an optimal solution in a wider space.

When the error stops decreasing, the gradient direction is reversed to temporarily diffuse it, and then the approach of converging again is repeated.

Kazuya felt motivated to explore further.

Thus, he began to explore new methods to evolve the optimization of the weight function. It was a dynamic and exploratory optimization journey that went beyond simple numerical adjustment, as if the network was searching for its own "answer". Kazuya decided to embark on an endless challenge to find the unknown optimal solution that lay ahead.

He stared at the computer screen and quietly muttered. "Neural network optimization may be like life. Constantly searching for answers and continuing to evolve. That may be the true optimal solution."

Paste the code into a text editor such as Notepad and save the file as "index.html". Then, open the saved file in a browser and the code will run.

Description
Matrix generation:
Generate A, B_normal, B_reversal, and C_target using tf.randomUniform in TensorFlow.js.

Training loop:
Calculate the current output using tf.matMul at each iteration.
Calculate the error matrix, loss, update the weights, and invert the sign.

Prevent memory leaks:
Release unnecessary tensors with dispose.

Graph drawing:
Display lossHistoryNormal and lossHistoryReversal using Chart.js.

When you run this code, you can see the progress of the loss as a graph in your browser and compare the normal version of gradient descent with the sign-inverted version.

Code from "The Tale of the Heike".

<!DOCTYPE html>
<html lang="ja">
<head>
    <meta charset="UTF-8">
    <title>Gradient Descent: Normal vs Reversal</title>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
</head>
<body>
    <h1>勾配降下法のシミュレーション: 通常 vs 符号反転</h1>
    <canvas id="chart" width="600" height="400"></canvas>
    <script>
        // 行列の次元
        const n = 10;
        const learningRate = 0.001;
        const iterations = 3000;
        const switchInterval = 300;
        let sign = 1;

        // 行列の初期化
        const A = tf.randomUniform([n, n]);
        let B_normal = tf.randomUniform([n, n]);
        let B_reversal = tf.randomUniform([n, n]);
        const C_target = tf.randomUniform([n, n]);

        // ロスの保存リスト
        let lossHistoryNormal = [];
        let lossHistoryReversal = [];

        async function train() {
            for (let i = 0; i < iterations; i++) {
                // 通常のCと符号反転Cを計算
                const C_current_normal = tf.matMul(A, B_normal);
                const C_current_reversal = tf.matMul(A, B_reversal);

                // 誤差行列を計算
                const error_normal = C_current_normal.sub(C_target);
                const error_reversal = C_current_reversal.sub(C_target);

                // 損失（MSE）を計算して記録
                const lossNormal = error_normal.square().mean().dataSync()[0];
                const lossReversal = error_reversal.square().mean().dataSync()[0];

                lossHistoryNormal.push(lossNormal);
                lossHistoryReversal.push(lossReversal);

                // 勾配を計算してBを更新
                const grad_B_normal = tf.matMul(A.transpose(), error_normal);
                const grad_B_reversal = tf.matMul(A.transpose(), error_reversal);
                
                B_normal = B_normal.sub(grad_B_normal.mul(learningRate));
                B_reversal = B_reversal.sub(grad_B_reversal.mul(sign * learningRate));

                // インターバルごとに符号を反転
                if ((i + 1) % switchInterval === 0) {
                    sign *= -1;
                    console.log(`Iteration ${i + 1}: Loss (reversal) = ${lossReversal} (sign reversed)`);
                } else if (i % 100 === 0) {
                    console.log(`Iteration ${i}: Loss (normal) = ${lossNormal}, Loss (reversal) = ${lossReversal}`);
                }

                // メモリリークを防ぐためにテンソルを削除
                C_current_normal.dispose();
                C_current_reversal.dispose();
                error_normal.dispose();
                error_reversal.dispose();
                grad_B_normal.dispose();
                grad_B_reversal.dispose();

                await tf.nextFrame(); // 画面を更新
            }
            plotLoss(); // 学習が完了したらロスのプロットを実行
        }

        function plotLoss() {
            const ctx = document.getElementById('chart').getContext('2d');
            const chart = new Chart(ctx, {
                type: 'line',
                data: {
                    labels: Array.from({ length: iterations }, (_, i) => i + 1),
                    datasets: [
                        {
                            label: 'Normal Gradient Descent',
                            data: lossHistoryNormal,
                            borderColor: 'blue',
                            fill: false
                        },
                        {
                            label: 'Reversal Gradient Descent',
                            data: lossHistoryReversal,
                            borderColor: 'red',
                            fill: false
                        }
                    ]
                },
                options: {
                    responsive: true,
                    scales: {
                        x: { type: 'linear', title: { display: true, text: 'Iteration' } },
                        y: { title: { display: true, text: 'Loss' } }
                    }
                }
            });
        }

        // 学習を実行
        train();
    </script>
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</body>
</html>

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up