C
アルゴリズム
画像処理
数値計算

2D IDCT を計算する ~ Feig & Winograd

はじめに

次のPaperに記載されている Feig & Winograd のアルゴリズムを説明します。

E. Feig and S. Winograd. Fast Algorithms for the Discrete Cosine Transform, IEEE Trans. Signal Processing, vol. 40, no. 9, pp. 2174-2193, Sep. 1992.

1D DCT

1D DCTの定義式は次の通りです。式1とします。

\begin{eqnarray}
y_i &=& \sum_{k}{c_i \cos\frac{2\pi i(2k+1)}{4N}}x_k \\
c_0 &=& \frac{1}{\sqrt{N}} \\
\end{eqnarray}

i!=0

c_i = \sqrt{\frac{2}{N}} \\

N=8とします。 $\gamma_i = \cos\frac{2\pi i}{32} $ とします。
次のように整理します。

y_i = \sum_k{c_i \gamma_{i(2k+1)}} x_k

行列で表現します。
$C_8$の記号を導入します。i行j列の成分は次のように表現できます。

\begin{eqnarray}
C_8 &=& ( c_i \cos \frac{2\pi i(2j+1)}{32} )_{ij} = ( c_i \gamma_{i(2j+1)} )_{ij} \\
\vec{y} &=& C_8 \vec{x} \\
\vec{y} &=&
\begin{bmatrix}
y_0 & y_1 & y_2 & y_3 & y_4 & y_5 & y_6 & y_7
\end{bmatrix}^t \\
\vec{x} &=&
\begin{bmatrix}
x_0 & x_1 & x_2 & x_3 & x_4 & x_5 & x_6 & x_7
\end{bmatrix}^t \\
\begin{bmatrix}
y_0 \\
y_1 \\
y_2 \\
y_3 \\
y_4 \\
y_5 \\
y_6 \\
y_7 \\
\end{bmatrix}
&=&
\begin{bmatrix}
c_0 \gamma_{0} & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  \\
c_1 \gamma_{1} & c_1 \gamma_{3}  & c_1 \gamma_{5}  & c_1 \gamma_{7}  & c_1 \gamma_{9}  & c_1 \gamma_{11} & c_1 \gamma_{13} & c_1 \gamma_{15} \\
c_2 \gamma_{2} & c_2 \gamma_{6}  & c_2 \gamma_{10} & c_2 \gamma_{14} & c_2 \gamma_{18} & c_2 \gamma_{22} & c_2 \gamma_{26} & c_2 \gamma_{30} \\
c_3 \gamma_{3} & c_3 \gamma_{9}  & c_3 \gamma_{15} & c_3 \gamma_{21} & c_3 \gamma_{27} & c_3 \gamma_{33} & c_3 \gamma_{39} & c_3 \gamma_{45} \\
c_4 \gamma_{4} & c_4 \gamma_{12} & c_4 \gamma_{20} & c_4 \gamma_{28} & c_4 \gamma_{36} & c_4 \gamma_{44} & c_4 \gamma_{52} & c_4 \gamma_{60} \\
c_5 \gamma_{5} & c_5 \gamma_{15} & c_5 \gamma_{25} & c_5 \gamma_{35} & c_5 \gamma_{45} & c_5 \gamma_{55} & c_5 \gamma_{65} & c_5 \gamma_{75} \\
c_6 \gamma_{6} & c_6 \gamma_{18} & c_6 \gamma_{30} & c_6 \gamma_{42} & c_6 \gamma_{54} & c_6 \gamma_{66} & c_6 \gamma_{78} & c_6 \gamma_{90} \\
c_7 \gamma_{7} & c_7 \gamma_{21} & c_7 \gamma_{35} & c_7 \gamma_{49} & c_7 \gamma_{63} & c_7 \gamma_{77} & c_7 \gamma_{91} & c_7 \gamma_{105} \\
\end{bmatrix}
\begin{bmatrix}
x_0 \\
x_1 \\
x_2 \\
x_3 \\
x_4 \\
x_5 \\
x_6 \\
x_7 \\
\end{bmatrix} \\
\end{eqnarray}

$c_0 \gamma_0$は$\frac{1}{2} \gamma_4$になります。詳細は次の通りです。

c_0 \gamma_0 = \frac{1}{\sqrt{8}} \cos 0 = \frac{1}{2\sqrt{2}} =  \frac{1}{2} \cos \frac{2 \pi}{8} = \frac{1}{2} \cos \frac{2 \pi * 4}{32} = \frac{1}{2} \gamma_{4}

0行目以降は次の通りcosの周期性と対称性から$\gamma_1$ ~ $\gamma_7$のみで表現できます。

image.png

$C_8$は次のように整理できます。

C_8 = \frac{1}{2}
\begin{bmatrix}
\gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 \\
\gamma_1 & \gamma_3 & \gamma_5 & \gamma_7 & -\gamma_7 & -\gamma_5 & -\gamma_3 & -\gamma_1 \\
\gamma_2 & \gamma_6 & -\gamma_6 & -\gamma_2 & -\gamma_2 & -\gamma_6 & \gamma_6 & \gamma_2 \\
\gamma_3 & -\gamma_7 & -\gamma_1 & -\gamma_5 & \gamma_5 & \gamma_1 & \gamma_7 & -\gamma_3 \\
\gamma_4 & -\gamma_4 & -\gamma_4 & \gamma_4 & \gamma_4 & -\gamma_4 & -\gamma_4 & \gamma_4 \\
\gamma_5 & -\gamma_1 & \gamma_7 & \gamma_3 & -\gamma_3 & -\gamma_7 & \gamma_1 & -\gamma_5 \\
\gamma_6 & -\gamma_2 & \gamma_2 & -\gamma_6 & -\gamma_6 & \gamma_2 & -\gamma_2 & \gamma_6 \\
\gamma_7 & -\gamma_5 & \gamma_3 & -\gamma_1 & \gamma_1 & -\gamma_3 & \gamma_5 & -\gamma_7 \\
\end{bmatrix}

$C_8$は次のように分解できます。

C_8 = P_8 K_8 B \\

$B$は次の通りです。

\begin{eqnarray}
B &=& B_1 B_2 B_3 \\
B_3 &=&
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 \\
0 & 1 & 0 & 0 & 0 & 0 & -1 & 0 \\
0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 \\
0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 \\
\end{bmatrix} \\
B_2 &=&
\begin{bmatrix}
1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\
0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{bmatrix} \\
B_1 &=&
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 \\
0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 \\
\end{bmatrix} \\
\end{eqnarray}

$K_8$は次のように分解できます。

\begin{eqnarray}
K_8 &=& \frac{1}{2}
\begin{bmatrix}
G_1 &     &     &     \\
    & G_1 &     &     \\
    &     & G_2 &     \\
    &     &     & G_4 \\
\end{bmatrix} \\
G_1 &=& \gamma_4 \\
G_2 &=& 
\begin{bmatrix}
\gamma_6 &  \gamma_2 \\
-\gamma_2 &  \gamma_6 \\
\end{bmatrix} \\
G_4 &=&
\begin{bmatrix}
\gamma_5 & -\gamma_7 & \gamma_3 & \gamma_1 \\
-\gamma_1 & \gamma_5 & -\gamma_7 & \gamma_3 \\
-\gamma_3 & -\gamma_1 & \gamma_5 & -\gamma_7 \\
\gamma_7 & -\gamma_3 & -\gamma_1 & \gamma_5 \\
\end{bmatrix}
\end{eqnarray}

$P_8$は次の通りです。

P_8 =
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
\end{bmatrix} \\

$G_2$は次のように変形できます。加算3回、乗算3回で計算できます。

G_2 = \frac{1}{2}
\begin{bmatrix}
\gamma_6 & \\
& \gamma_2 \\
\end{bmatrix}^{-1}
\begin{bmatrix}
1 & 1 \\
-1 & 1 \\
\end{bmatrix}
\begin{bmatrix}
1 & \\
 & G_1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
-1 & 1 \\
\end{bmatrix}

$G_4$は次のように変形できます。加算12回、乗算8回で計算できます。

\begin{eqnarray}
G_4 &=& \frac{1}{2} D_4^{-1} H_{4,1}
\begin{bmatrix}
1 & & \\
& G_1 & \\
& & G_2 \\
\end{bmatrix}
H_{4,2} \\
D_4 &=&
\begin{bmatrix}
\gamma_5 & & & \\
 & \gamma_1 & & \\
 & & \gamma_3 & \\
 & & & \gamma_7 \\
\end{bmatrix} \\
H_{4,1} &=&
\begin{bmatrix}
1 & 1 & -1 & 0 \\
-1 & 1 & 0 & 1 \\
-1 & -1 & -1 & 0 \\
1 & -1 & 0 & 1 \\
\end{bmatrix} \\
H_{4,2} &=&
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 1 \\
1 & 0 & 0 & -1 \\
0 & 1 & -1 & 0 \\
\end{bmatrix} \\
\end{eqnarray}

2D DCT

2D DCTの定義式は次の通りです。式2とします。

y_{ij} = \sum_{l}{c_j \gamma_{j(2l+1)}} x_{ij} \sum_{k} {c_i \gamma_{i(2k+1)}} \\

Xを行列とします。次のように定義します。

X = 
\begin{bmatrix}
x_{00} & x_{01} & x_{02} & x_{03} & x_{04} & x_{05} & x_{06} & x_{07} \\
x_{10} & x_{11} & x_{12} & x_{13} & x_{14} & x_{15} & x_{16} & x_{17} \\
x_{20} & x_{21} & x_{22} & x_{23} & x_{24} & x_{25} & x_{26} & x_{27} \\
x_{30} & x_{31} & x_{32} & x_{33} & x_{34} & x_{35} & x_{36} & x_{37} \\
x_{40} & x_{41} & x_{42} & x_{43} & x_{44} & x_{45} & x_{46} & x_{47} \\
x_{50} & x_{51} & x_{52} & x_{53} & x_{54} & x_{55} & x_{56} & x_{57} \\
x_{60} & x_{61} & x_{62} & x_{63} & x_{64} & x_{65} & x_{66} & x_{67} \\
x_{70} & x_{71} & x_{72} & x_{73} & x_{74} & x_{75} & x_{76} & x_{77} \\
\end{bmatrix} =
\begin{bmatrix}
\vec{x_0} & \vec{x_1} & \vec{x_2} & \vec{x_3} & \vec{x_4} & \vec{x_5} & \vec{x_6} & \vec{x_7}
\end{bmatrix} \\

$\vec{x_0}$は列ベクトルです。$\vec{x_1}$以降も同様です。

\vec{x_0} = 
\begin{bmatrix}
x_{00} \\
x_{10} \\
x_{20} \\
x_{30} \\
x_{40} \\
x_{50} \\
x_{60} \\
x_{70} \\
\end{bmatrix}

$\vec{x}$を次のように定義します。行列Xを1本の列ベクトルにしたものです。

\vec{x} = 
\begin{bmatrix}
\vec{x_0} \\
\vec{x_1} \\
\vec{x_2} \\
\vec{x_3} \\
\vec{x_4} \\
\vec{x_5} \\
\vec{x_6} \\
\vec{x_7} \\
\end{bmatrix}

Yを行列とします。次のように定義します。

Y = 
\begin{bmatrix}
y_{00} & y_{01} & y_{02} & y_{03} & y_{04} & y_{05} & y_{06} & y_{07} \\
y_{10} & y_{11} & y_{12} & y_{13} & y_{14} & y_{15} & y_{16} & y_{17} \\
y_{20} & y_{21} & y_{22} & y_{23} & y_{24} & y_{25} & y_{26} & y_{27} \\
y_{30} & y_{31} & y_{32} & y_{33} & y_{34} & y_{35} & y_{36} & y_{37} \\
y_{40} & y_{41} & y_{42} & y_{43} & y_{44} & y_{45} & y_{46} & y_{47} \\
y_{50} & y_{51} & y_{52} & y_{53} & y_{54} & y_{55} & y_{56} & y_{57} \\
y_{60} & y_{61} & y_{62} & y_{63} & y_{64} & y_{65} & y_{66} & y_{67} \\
y_{70} & y_{71} & y_{72} & y_{73} & y_{74} & y_{75} & y_{76} & y_{77} \\
\end{bmatrix} =
\begin{bmatrix}
\vec{y_0} & \vec{y_1} & \vec{y_2} & \vec{y_3} & \vec{y_4} & \vec{y_5} & \vec{y_6} & \vec{y_7}
\end{bmatrix} \\

$\vec{y_0}$は列ベクトルです。$\vec{y_1}$以降も同様です。

\vec{y_0} = 
\begin{bmatrix}
y_{00} \\
y_{10} \\
y_{20} \\
y_{30} \\
y_{40} \\
y_{50} \\
y_{60} \\
y_{70} \\
\end{bmatrix}

$\vec{y}$を次のように定義します。行列Yを1本の列ベクトルにしたものです。

\vec{y} = 
\begin{bmatrix}
\vec{y_0} \\
\vec{y_1} \\
\vec{y_2} \\
\vec{y_3} \\
\vec{y_4} \\
\vec{y_5} \\
\vec{y_6} \\
\vec{y_7} \\
\end{bmatrix}

式2は次のように表現できます。

Y = C_8 X C_8^{T}

上記式は $\vec{x}, \vec{y}$ と $C_8$のKronecker productを用いて次のように変形できます。

\vec{y} = (C_8 \otimes C_8) \vec{x}

$C_8 \otimes C_8$は64x64の巨大な行列になります。
Feig & Winogradはこの行列をうまく変形して計算量を削減します。

$C_8 \otimes C_8$を$P_8,K_8,B$で表します。

(C_8 \otimes C_8) = (P_8 \otimes P_8)(K_8 \otimes K_8)(B \otimes B)

$K_8 \otimes K_8$を計算します。

(K_8 \otimes K_8) = \frac{1}{4}
\begin{bmatrix}
G_1 & & & \\
 & G_1 & & \\
 & & G_2 & \\
 & & & G_4 \\
\end{bmatrix}
\begin{bmatrix}
G_1 & & & \\
 & G_1 & & \\
 & & G_2 & \\
 & & & G_4 \\
\end{bmatrix}

次に$G_2 \otimes G_2$ $G_2 \otimes G_4$ $G_4 \otimes G_4$ の計算を説明します。
この計算を行うにあたって記号を導入します。

$\tilde{\rho_n}(A)$は行列Aの要素に$\rho_n$を適用したものです。nは2 or 4です。
具体的にAを2x2行列とすると次のようになります。

\tilde{\rho_n}(A) = 
\begin{bmatrix}
\rho_n(A_{00}) & \rho_n(A_{01}) \\
\rho_n(A_{10}) & \rho_n(A_{11}) \\
\end{bmatrix}

次のように分解できます。

\begin{eqnarray}
G_2 \otimes G_2 &=& (2\tilde{\rho}_2(V_2)^{-1})(\frac{1}{2}D_{2,2})(\tilde{\rho}_2(V_2)) \\
G_2 \otimes G_4 &=& (2\tilde{\rho}_4(V_2)^{-1})(\frac{1}{2}D_{2,4})(\tilde{\rho}_4(V_2)) \\
G_4 \otimes G_4 &=& (4\tilde{\rho}_4(V_4)^{-1})(\frac{1}{4}D_{4,4})(\tilde{\rho}_4(V_4)) \\
\end{eqnarray}

$V_2$は次の通りです。

\begin{eqnarray}
V_2 &=& 
\begin{bmatrix}
 1 & \mathrm{i} \\
 1 & -\mathrm{i} \\
\end{bmatrix} =
\begin{bmatrix}
 1 & 1 \\
 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 \\
 0 & \mathrm{i} \\
\end{bmatrix} \\
V_2^{-1} &=& \frac{1}{2}
\begin{bmatrix}
 1 & 0 \\
 0 & -\mathrm{i} \\
\end{bmatrix}
\begin{bmatrix}
 1 & 1 \\
 1 & -1 \\
\end{bmatrix}
\end{eqnarray}

$V_4$は次の通りです。

\begin{eqnarray}
V_4 &=& 
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
 1 & 1 & 0 & 0 \\
 1 & -1 & 0 & 0 \\
 0 & 0 & 1 & 1 \\
 0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 0 & 0 & \mathrm{i} \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 & 1 & 0 \\
 0 & 1 & 0 & 1 \\
 1 & 0 & -1 & 0 \\
 0 & 1 & 0 & -1 \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & \mathrm{w} & 0 & 0 \\
 0 & 0 & \mathrm{w}^2 & 0 \\
 0 & 0 & 0 & \mathrm{w}^3 \\
\end{bmatrix} \\

V_4^{-1} &=& \frac{1}{4}
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & -\mathrm{w}^3 & 0 & 0 \\
 0 & 0 & -\mathrm{w}^2 & 0 \\
 0 & 0 & 0 & -\mathrm{w} \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 & 1 & 0 \\
 0 & 1 & 0 & 1 \\
 1 & 0 & -1 & 0 \\
 0 & 1 & 0 & -1 \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 0 & 0 & -\mathrm{i} \\
\end{bmatrix}
\begin{bmatrix}
 1 & 1 & 0 & 0 \\
 1 & -1 & 0 & 0 \\
 0 & 0 & 1 & 1 \\
 0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 0 & 1 \\
\end{bmatrix}
\end{eqnarray}

$D_{2,2}$は次の通りです。

D_{2,2} = 
\begin{bmatrix}
 -\gamma_4 & \gamma_4 & 0 & 0 \\
 -\gamma_4 & -\gamma_4 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 0 & 0 & 1 \\
\end{bmatrix}

$D_{2,4}$は次の通りです。

\begin{eqnarray}
D_{2,4} &=& 
\begin{bmatrix}
 D_{2,4,1} \\
 & D_{2,4,2} \\ 
\end{bmatrix} \\
D_{2,4,1} &=& 
\begin{bmatrix}
 -\gamma_5 & -\gamma_1 & \gamma_3 & \gamma_7 \\
 -\gamma_7 & -\gamma_5 & -\gamma_1 & \gamma_3 \\
 -\gamma_3 & -\gamma_7 & -\gamma_5 & -\gamma_1 \\
  \gamma_1 & -\gamma_3 & -\gamma_7 & -\gamma_5 \\
\end{bmatrix} \\
D_{2,4,2} &=& 
\begin{bmatrix}
 \gamma_1 & \gamma_3 & -\gamma_7 & \gamma_5 \\
 -\gamma_5 & \gamma_1 & \gamma_3 & -\gamma_7 \\
 \gamma_7 & -\gamma_5 &  \gamma_1 & \gamma_3 \\
 -\gamma_3 & \gamma_7 &  -\gamma_5 & \gamma_1 \\
\end{bmatrix} \\
\end{eqnarray}

$D_{2,4,1}, D_{2,4,2}$ は $G_4$ で表せます。

\begin{eqnarray}
P_{2,4,1} &=& 
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 0 & 0 & 1 \\
 0 & 0 & -1 & 0 \\
 0 & 1 & 0 & 0 \\
\end{bmatrix} \\
D_{2,4,1} &=& - P_{2,4,1} G_4 P_{2,4,1} \\

P_{2,4,2,1} &=& 
\begin{bmatrix}
 0 & 1 & 0 & 0 \\
 1 & 0 & 0 & 0 \\
 0 & 0 & 0 & -1 \\
 0 & 0 & -1 & 0 \\
\end{bmatrix} \\

P_{2,4,2,2} &=& 
\begin{bmatrix}
 -1 & 0 & 0 & 0 \\
 0 & 0 & 0 & 1 \\
 0 & 0 & 1 & 0 \\
 0 & 1 & 0 & 0 \\
\end{bmatrix} \\

D_{2,4,2} &=& P_{2,4,2,1} G_4 P_{2,4,2,2} \\
\end{eqnarray}

$D_{4,4}$は次の通りです。

\begin{eqnarray}
D_{4,4} &=& 2
\begin{bmatrix}
 H_1 \\
 & H_2 \\
 & & H_3 \\
 & & & H_4 \\
\end{bmatrix} \\
H_1 &=& 
\begin{bmatrix}
 0 & -\gamma_2 & 0 & \gamma_6 \\
 -\gamma_6 & 0 & -\gamma_2 & 0 \\
 0 & -\gamma_6 & 0 & -\gamma_2 \\
 \gamma_2 & 0 & -\gamma_6 & 0 \\
\end{bmatrix} \\
H_2 &=&
\begin{bmatrix}
 0 & \gamma_4 & 0 & \gamma_4 \\
 -\gamma_4 & 0 & \gamma_4 & 0 \\
 0 & -\gamma_4 & 0 & \gamma_4 \\
 -\gamma_4 & 0 & -\gamma_4 & 0 \\
\end{bmatrix} \\
H_3 &=&
\begin{bmatrix}
 -\gamma_6 & 0 & \gamma_2 & 0 \\
 0 & -\gamma_6 & 0 & \gamma_2 \\
 -\gamma_2 & 0 & -\gamma_6 & 0 \\
 0 & -\gamma_2 & 0 & -\gamma_6 \\
\end{bmatrix} \\
H_4 &=& I_4
\end{eqnarray}

$H_1とH_3$は$G_2$で表せます。

\begin{eqnarray}
H_1 &=& P_{H1,1}
\begin{bmatrix}
 G_2 \\
 & G_2 \\
\end{bmatrix}
P_{H1,2} \\

P_{H1,1} &=&
\begin{bmatrix}
 0 & 0 & 0 & 1 \\
 -1 & 0 & 0 & 0 \\
 0 & 0 & -1 & 0 \\
 0 & -1 & 0 & 0 \\
\end{bmatrix} \\

P_{H1,2} &=&
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 0 & 1 \\
\end{bmatrix} \\

H_3 &=& - P_{H3}
\begin{bmatrix}
 G_2^t \\
 & G_2^t \\
\end{bmatrix}
P_{H3} \\

P_{H3} &=&
\begin{bmatrix}
 1 & 0 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 0 & 1 \\
\end{bmatrix} \\
\end{eqnarray}

$G_4 \otimes G_2$はperfect-shuffle permutation($P_s$)を用いることで $G_2 \otimes G_4$ で表せます。

G_4 \otimes G_2 = P_s (G_2 \otimes G_4) P_s^{-1}

$P_s$は次の通りです。

P_s = 
\begin{bmatrix}
 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{bmatrix}

$\rho_2, \rho_4$は次の通りです。

\begin{eqnarray}
\rho_2(\mathrm{i}) &=& - \rho_2(\mathrm{i})^t =
\begin{bmatrix}
 0 & -1 \\
 1 & 0 \\
\end{bmatrix} \\

\mathrm{w} &=& \mathrm{e}^{\frac{2\pi \mathrm{i}}{8}} \\

\rho_4(\mathrm{w}) &=& -\rho_4(\mathrm{w}^3)^t =
\begin{bmatrix}
 0 & 0 & 0 & -1 \\
 1 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 \\
 0 & 0 & 1 & 0 \\
\end{bmatrix} \\

\rho_4(\mathrm{i}) &=& \rho_4(\mathrm{w}^2) = -\rho_4(\mathrm{w}^2)^t =
\begin{bmatrix}
 0 & 0 & -1 & 0 \\
 0 & 0 & 0 & -1 \\
 1 & 0 & 0 & 0 \\
 0 & 1 & 0 & 0 \\
\end{bmatrix} \\

\rho_4(\mathrm{w}^3) &=& -\rho_4(\mathrm{w})^t =
\begin{bmatrix}
 0 & -1 & 0 & 0 \\
 0 & 0 & -1 & 0 \\
 0 & 0 & 0 & -1 \\
 1 & 0 & 0 & 0 \\
\end{bmatrix} \\
\end{eqnarray}

計算回数をまとめます。

Times Operator Mults Adds Shifts Total Mults Total Adds Total Shifts
4 $G_1 G_1$ 0 0 1 0 0 4
4 $G_1 G_2$ 3 3 0 12 12 0
4 $G_1 G_4$ 8 12 0 32 48 0
1 $G_2 \otimes G_2$ 2 10 2 2 10 2
2 $G_2 \otimes G_4$ 16 40 0 32 80 0
1 $G_4 \otimes G_4$ 16 80 4 16 80 4
$B \otimes B$ 224
Total 94 454 10

8x8 の 2D DCT は 加算454回、乗算94回、シフト10回で計算できます。

2D IDCT

2D IDCTの計算は次のように2D DCTのアルゴリズムをそのまま利用できます。
行列を転置するだけです。行列の掛け算の順番が左右逆転することに注意してください。

\begin{eqnarray}
(C_8 \otimes C_8)^t &=& (B^t \otimes B^t)(K_8^t \otimes K_8^t)(P_8^t \otimes P_8^t) \\
(C_2^t \otimes C_2^t) &=& \tilde{\rho_2}(V_2)^t (\frac{1}{2}D_{2,2}^t) (2\tilde{\rho_2}(V_2)^{\mathrm{*}}) \\ 
(C_2^t \otimes C_4^t) &=& \tilde{\rho_4}(V_2)^t (\frac{1}{2}D_{2,4}^t) (2\tilde{\rho_4}(V_2)^{\mathrm{*}}) \\ 
(C_4^t \otimes C_4^t) &=& \tilde{\rho_4}(V_4)^t (\frac{1}{4}D_{4,4}^t) (4\tilde{\rho_4}(V_4)^{\mathrm{*}}) \\ 
B^t \otimes B^t &=& (B_3^t \otimes B_3^t)(B_2^t \otimes B_2^t)(B_1^t \otimes B_1^t) \\
\end{eqnarray}

$\mathrm{*}$は逆行列を適用後に転置する操作を表します。

sample code

サンプルコードを示します。

makefile
CFLAGS=-I. -Wall -Werror -O2 -march=native
INCS=
OBJS=test.c
LIBS=
TARGET=test

all: $(TARGET)

%.o: %.c $(INCS)
    $(CC) $(CFLAGS) -c -o $@ $<

$(TARGET): $(OBJS)
    $(CC) $(CFLAGS) -o $@ $^ $(LIBS)

clean:
    rm -rf $(TARGET) *.o
test.c
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

/*
  k g(k) = cos(2π*k/32)
  1 0.980785280403230
  2 0.923879532511287
  3 0.831469612302545
  4 0.707106781186548
  5 0.555570233019602
  6 0.382683432365090
  7 0.195090322016128
*/
static const double g4 = 0.707106781186548;
static const double G1G1 = 0.125; // g4 * g4 / 4.0
static const double G1G2[] = {
  0.230969883127822, // g4 / g6 / 8.0
  0.095670858091273, // g4 / g2 / 8.0
};
static const double G2[] = {
  1.30656296487638,  // 1.0 / g6 / 2.0
  0.54119610014620,  // 1.0 / g2 / 2.0
};
static const double G2_2[] = {
  0.1633203706095470,  // 1.0 / g6 / 16.0
  0.0676495125182746,  // 1.0 / g2 / 16.0

};
static const double G1G4[] = {
  0.159094822571604, // g4 / g5 / 8.0
  0.090119977750869, // g4 / g1 / 8.0
  0.106303761845907, // g4 / g3 / 8.0
  0.453063723176445, // g4 / g7 / 8.0
};
static const double G2G2[] = {
  0.0883883476483185, // g4 / 8.0
};
static const double G2G4[] = {
  0.1124970278920520, // 1.0 / g5 / 16.0
  0.0637244473880199, // 1.0 / g1 / 16.0
  0.0751681108668807, // 1.0 / g3 / 16.0
  0.3203644309676890, // 1.0 / g7 / 16.0
};
static const double G4G4[] = {
  0.0478354290456363, // g6 / 8.0
  0.1154849415639110, // g2 / 8.0
  0.0883883476483185, // g4 / 8.0
};

static void printElements(char* header, double *data, int len)
{
  int i;

  printf("%s", header);
  for (i=0; i<len; i++) {
    printf("[%02d] %+.7lf\n", i, data[i]);
  }
}

static void IDCT(int16_t F[][8], int16_t f[][8])
{
  int i,j;
  double tF[64] = {0}; /* tmp F */
  double P[64] = {0};  /* Pt x Pt */
  double G[64] = {0};  /* Gt x Gt */
  double B[64] = {0};  /* Bt x Bt */

  /* (C x C)t = (Bt x Bt)(Kt x Kt)(Pt x Pt) */
  for (i=0; i<8; i++) {
    for (j=0; j<8; j++) {
      tF[i*8+j] = F[j][i];
    }
  }
  if (0) printElements("tF\n", tF, 64);
  /* '-' = -1 */
  /* Pt =            */
  /* 1 0 0 0 0 0 0 0 */
  /* 0 0 0 0 1 0 0 0 */
  /* 0 0 1 0 0 0 0 0 */
  /* 0 0 0 0 0 0 1 0 */
  /* 0 - 0 0 0 0 0 0 */
  /* 0 0 0 - 0 0 0 0 */
  /* 0 0 0 0 0 0 0 1 */
  /* 0 0 0 0 0 - 0 0 */
  /* Pt x Pt = */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | - 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 - 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 - 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 - 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 1 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | - 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 - 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 - 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 - 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 1 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | - 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 - 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 - 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 - 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 1 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
  /* --------------------------------------------------------------------------------------------------------------------------------------------- */
  P[ 0] =  tF[ 0 + 0];
  P[ 1] =  tF[ 0 + 4];
  P[ 2] =  tF[ 0 + 2];
  P[ 3] =  tF[ 0 + 6];
  P[ 4] = -tF[ 0 + 1];
  P[ 5] = -tF[ 0 + 3];
  P[ 6] =  tF[ 0 + 7];
  P[ 7] = -tF[ 0 + 5];

  P[ 8] =  tF[32 + 0];
  P[ 9] =  tF[32 + 4];
  P[10] =  tF[32 + 2];
  P[11] =  tF[32 + 6];
  P[12] = -tF[32 + 1];
  P[13] = -tF[32 + 3];
  P[14] =  tF[32 + 7];
  P[15] = -tF[32 + 5];

  P[16] =  tF[16 + 0];
  P[17] =  tF[16 + 4];
  P[18] =  tF[16 + 2];
  P[19] =  tF[16 + 6];
  P[20] = -tF[16 + 1];
  P[21] = -tF[16 + 3];
  P[22] =  tF[16 + 7];
  P[23] = -tF[16 + 5];

  P[24] =  tF[48 + 0];
  P[25] =  tF[48 + 4];
  P[26] =  tF[48 + 2];
  P[27] =  tF[48 + 6];
  P[28] = -tF[48 + 1];
  P[29] = -tF[48 + 3];
  P[30] =  tF[48 + 7];
  P[31] = -tF[48 + 5];

  P[32] = -tF[ 8 + 0];
  P[33] = -tF[ 8 + 4];
  P[34] = -tF[ 8 + 2];
  P[35] = -tF[ 8 + 6];
  P[36] =  tF[ 8 + 1];
  P[37] =  tF[ 8 + 3];
  P[38] = -tF[ 8 + 7];
  P[39] =  tF[ 8 + 5];

  P[40] = -tF[24 + 0];
  P[41] = -tF[24 + 4];
  P[42] = -tF[24 + 2];
  P[43] = -tF[24 + 6];
  P[44] =  tF[24 + 1];
  P[45] =  tF[24 + 3];
  P[46] = -tF[24 + 7];
  P[47] =  tF[24 + 5];

  P[48] =  tF[56 + 0];
  P[49] =  tF[56 + 4];
  P[50] =  tF[56 + 2];
  P[51] =  tF[56 + 6];
  P[52] = -tF[56 + 1];
  P[53] = -tF[56 + 3];
  P[54] =  tF[56 + 7];
  P[55] = -tF[56 + 5];

  P[56] = -tF[40 + 0];
  P[57] = -tF[40 + 4];
  P[58] = -tF[40 + 2];
  P[59] = -tF[40 + 6];
  P[60] =  tF[40 + 1];
  P[61] =  tF[40 + 3];
  P[62] = -tF[40 + 7];
  P[63] =  tF[40 + 5];

  if (0) printElements("P\n", P, 64);

  /* Kt x Kt = Gt x Gt */
  /* Gt  = */
  /* G1  0   0   0 */
  /*  0 G1   0   0 */
  /*  0  0 G2t   0 */
  /*  0  0   0 G4t */
  /* (*)G1t = G1 */
  /* Gt x Gt = 1/4 */
  /* ------------------------------------------------------------------------------------------------- */
  /* G1G1    0    0    0|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
  /*    0 G1G1    0    0|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
  /*    0    0 G1G2t   0|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
  /*    0    0    0 G1G4|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
  /* ------------------------------------------------------------------------------------------------- */
  /*    0    0    0    0|G1G1    0     0     0|    0     0       0       0|    0     0       0     0   */
  /*    0    0    0    0|   0 G1G1     0     0|    0     0       0       0|    0     0       0     0   */
  /*    0    0    0    0|   0    0  G1G2t    0|    0     0       0       0|    0     0       0     0   */
  /*    0    0    0    0|   0    0     0 G1G4t|    0     0       0       0|    0     0       0     0   */
  /* ------------------------------------------------------------------------------------------------- */
  /*    0    0    0    0|   0    0     0     0|G2tG1     0       0       0|    0     0       0     0   */
  /*    0    0    0    0|   0    0     0     0|    0 G2tG1       0       0|    0     0       0     0   */
  /*    0    0    0    0|   0    0     0     0|    0     0 G2txG2t       0|    0     0       0     0   */
  /*    0    0    0    0|   0    0     0     0|    0     0       0 G2txG4t|    0     0       0     0   */
  /* ------------------------------------------------------------------------------------------------- */
  /*    0    0    0    0|   0    0     0     0|    0     0       0       0|G4tG1     0       0     0   */
  /*    0    0    0    0|   0    0     0     0|    0     0       0       0|    0 G4tG1       0     0   */
  /*    0    0    0    0|   0    0     0     0|    0     0       0       0|    0     0 G4txG2t     0   */
  /*    0    0    0    0|   0    0     0     0|    0     0       0       0|    0     0       0 G4txG4t */
  /* ------------------------------------------------------------------------------------------------- */
  /* G1G1 */
  /* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
  G[0] = G1G1 * P[0];
  /* G1G1 */
  /* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
  G[1] = G1G1 * P[1];
  /* G2t = 1/2 */
  /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
  /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
  /* G1G2t */
  /* mul  3 add   3 shift 0 */
  {
    double T1[2], T2[2], T3[2];
    T1[0] = G1G2[0] * P[2]; // G1G2[0] = g4 * 1.0/g6/2.0 / 4.0
    T1[1] = G1G2[1] * P[3]; // G1G2[1] = g4 * 1.0/g2/2.0 / 4.0
    T2[0] = T1[0] - T1[1];
    T2[1] = T1[0] + T1[1];
    T3[0] = T2[0];
    T3[1] = g4 * T2[1];
    G[2] = T3[0] - T3[1];
    G[3] = T3[1];
  }
  /* G4t = 1/2 */
  /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
  /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
  /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
  /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
  /* G1G4t */
  /* mul  8 add  12 shift 0 */
  {
    double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
    T1[0] = G1G4[0] * P[4]; // G1G4[0] = g4 / g5 / 8.0
    T1[1] = G1G4[1] * P[5]; // G1G4[1] = g4 / g1 / 8.0
    T1[2] = G1G4[2] * P[6]; // G1G4[2] = g4 / g3 / 8.0
    T1[3] = G1G4[3] * P[7]; // G1G4[3] = g4 / g7 / 8.0
    tmp1 = T1[0] - T1[2];
    tmp2 = T1[1] - T1[3];
    T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
    T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
    T2[2] = -T1[0] - T1[2];
    T2[3] =  T1[1] + T1[3];
    T3[0] =  T2[0];
    T3[1] =  g4 * T2[1];
    /* G2t = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
    T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
    T5[0] = T4[0] - T4[1];
    T5[1] = T4[0] + T4[1];
    T6[0] = T5[0];
    T6[1] = g4 * T5[1];
    T3[2] = T6[0] - T6[1];
    T3[3] = T6[1];
    G[4] =  T3[0] + T3[2];
    G[5] =  T3[1] + T3[3];
    G[6] = -T3[3];
    G[7] =  T3[1] - T3[2];
  }

  /* G1G1 */
  /* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
  G[8] = G1G1 * P[8];
  /* G1G1 */
  /* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
  G[9] = G1G1 * P[9];
  /* G2t = 1/2 */
  /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
  /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
  /* G1G2t */
  /* mul  3 add   3 shift 0 */
  {
    double T1[2], T2[2], T3[2];
    T1[0] = G1G2[0] * P[10]; // G1G2[0] = g4 / g6 / 8.0
    T1[1] = G1G2[1] * P[11]; // G1G2[1] = g4 / g2 / 8.0
    T2[0] = T1[0] - T1[1];
    T2[1] = T1[0] + T1[1];
    T3[0] = T2[0];
    T3[1] = g4 * T2[1];
    G[10] = T3[0] - T3[1];
    G[11] = T3[1];
  }
  /* G4t = 1/2 */
  /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
  /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
  /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
  /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
  /* G1G4t */
  /* mul  8 add  12 shift 0 */
  {
    double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
    T1[0] = G1G4[0] * P[12]; // G1G4[0] = g4 / g5 / 8.0
    T1[1] = G1G4[1] * P[13]; // G1G4[1] = g4 / g1 / 8.0
    T1[2] = G1G4[2] * P[14]; // G1G4[2] = g4 / g3 / 8.0
    T1[3] = G1G4[3] * P[15]; // G1G4[3] = g4 / g7 / 8.0
    tmp1 = T1[0] - T1[2];
    tmp2 = T1[1] - T1[3];
    T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
    T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
    T2[2] = -T1[0] - T1[2];
    T2[3] =  T1[1] + T1[3];
    T3[0] =  T2[0];
    T3[1] =  g4 * T2[1];
    /* G2 = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
    T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
    T5[0] = T4[0] - T4[1];
    T5[1] = T4[0] + T4[1];
    T6[0] = T5[0];
    T6[1] = g4 * T5[1];
    T3[2] = T6[0] - T6[1];
    T3[3] = T6[1];
    G[12] =  T3[0] + T3[2];
    G[13] =  T3[1] + T3[3];
    G[14] = -T3[3];
    G[15] =  T3[1] - T3[2];
  }

  /* G2t = 1/2 */
  /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
  /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
  /* G2tG1 */
  /* mul  3 add   3 shift 0 */
  {
    double T1[2], T2[2], T3[2];
    T1[0] = G1G2[0] * P[16]; // G1G2[0] = g4 / g6 / 8.0
    T1[1] = G1G2[1] * P[24]; // G1G2[1] = g4 / g2 / 8.0
    T2[0] = T1[0] - T1[1];
    T2[1] = T1[0] + T1[1];
    T3[0] = T2[0];
    T3[1] = g4 * T2[1];
    G[16] = T3[0] - T3[1];
    G[24] = T3[1];
  }

  /* G2t = 1/2 */
  /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
  /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
  /* G2tG1 */
  /* mul  3 add   3 shift 0 */
  {
    double T1[2], T2[2], T3[2];
    T1[0] = G1G2[0] * P[17]; // G1G2[0] = g4 / g6 / 8.0
    T1[1] = G1G2[1] * P[25]; // G1G2[1] = g4 / g2 / 8.0
    T2[0] = T1[0] - T1[1];
    T2[1] = T1[0] + T1[1];
    T3[0] = T2[0];
    T3[1] = g4 * T2[1];
    G[17] = T3[0] - T3[1];
    G[25] = T3[1];
  }

  /* G2txG2t = r~2(V2)t 1/2 (D2,2)t 2r~2(V2)* */
  /* mul  4 add  10 shift 0 of mul  2 add  10 shift 2 */
  {
    double T1[4], T2[4], tmp1, tmp2;
    /* 2r~2(V2)* = */
    /* | r(1)  r(1) | | r(1)  r(0) | */
    /* | r(1) -r(1) | | r(0)  r(i) | */
    /* = */
    /* | 1 0  1  0 | | 1 0 0  0 | */
    /* | 0 1  0  1 | | 0 1 0  0 | */
    /* | 1 0 -1  0 | | 0 0 0 -1 | */
    /* | 0 1  0 -1 | | 0 0 1  0 | */
    /* = */
    /* | 1 0 0 - | */
    /* | 0 1 1 0 | */
    /* | 1 0 0 1 | */
    /* | 0 1 - 0 | */
    T1[0] =  P[18] - P[27];
    T1[1] =  P[19] + P[26];
    T1[2] =  P[18] + P[27];
    T1[3] =  P[19] - P[26];
    /* 1/2 (D2,2)t = */
    /* -g4 -g4  0  0 */
    /*  g4 -g4  0  0 */
    /*   0   0  1  0 */
    /*   0   0  0  1 */
    tmp1 = G2G2[0] * T1[0]; // G2G2[0] = g4 / 8.0
    tmp2 = G2G2[0] * T1[1];
    T2[0] = - tmp1 - tmp2;  // - g4 * T1[0] - g4 * T1[1];
    T2[1] =   tmp1 - tmp2;  //   g4 * T1[0] - g4 * T1[1];
    T2[2] = T1[2]/8.0;
    T2[3] = T1[3]/8.0;
    /* r~2(V2)t = */
    /* | r(1)  r(0) | | r(1)  r(1) | */
    /* | r(0) -r(i) | | r(1) -r(1) | */
    /* = */
    /* | 1 0  0  0 | | 1 0  1  0 | */
    /* | 0 1  0  0 | | 0 1  0  1 | */
    /* | 0 0  0  1 | | 1 0 -1  0 | */
    /* | 0 0 -1  0 | | 0 1  0 -1 | */
    /* = */
    /* | 1 0 1 0 | */
    /* | 0 1 0 1 | */
    /* | 0 1 0 - | */
    /* | - 0 1 0 | */
    G[18] =  T2[0] + T2[2];
    G[19] =  T2[1] + T2[3];
    G[26] =  T2[1] - T2[3];
    G[27] = -T2[0] + T2[2];
  }

  /* G2txG4t = r~4(V2)t 1/2 (D2,4)t 2r~4(V2)* */
  /* r = r4 */
  /* mul 16 add  40 shift 0 */
  {
    double T1[8], T2[4], T3[4], T4[4], T5[4], T6[2], T7[2], T8[2], T9[4], T10[8], tmp1, tmp2;
    /* r4(i) = */
    /*  0 0  -1  0 */
    /*  0 0   0 -1 */
    /*  1 0   0  0 */
    /*  0 1   0  0 */
    /* 2r~4(V2)* = */
    /* | r(1)  r(1) | | r(1)  r(0) | */
    /* | r(1) -r(1) | | r(0)  r(i) | */
    /* = */
    /* | 1 0 0 0 1 0 0 0 | | 1 0 0 0 0 0 0 0 | */
    /* | 0 1 0 0 0 1 0 0 | | 0 1 0 0 0 0 0 0 | */
    /* | 0 0 1 0 0 0 1 0 | | 0 0 1 0 0 0 0 0 | */
    /* | 0 0 0 1 0 0 0 1 | | 0 0 0 1 0 0 0 0 | */
    /* | 1 0 0 0 - 0 0 0 | | 0 0 0 0 0 0 - 0 | */
    /* | 0 1 0 0 0 - 0 0 | | 0 0 0 0 0 0 0 - | */
    /* | 0 0 1 0 0 0 - 0 | | 0 0 0 0 1 0 0 0 | */
    /* | 0 0 0 1 0 0 0 - | | 0 0 0 0 0 1 0 0 | */
    /* = */
    /* | 1 0 0 0 0 0 - 0 | */
    /* | 0 1 0 0 0 0 0 - | */
    /* | 0 0 1 0 1 0 0 0 | */
    /* | 0 0 0 1 0 1 0 0 | */
    /* | 1 0 0 0 0 0 1 0 | */
    /* | 0 1 0 0 0 0 0 1 | */
    /* | 0 0 1 0 - 0 0 0 | */
    /* | 0 0 0 1 0 - 0 0 | */
    T1[0] =  P[20] - P[30];
    T1[1] =  P[21] - P[31];
    T1[2] =  P[22] + P[28];
    T1[3] =  P[23] + P[29];
    T1[4] =  P[20] + P[30];
    T1[5] =  P[21] + P[31];
    T1[6] =  P[22] - P[28];
    T1[7] =  P[23] - P[29];
    /* 1/2 (D2,4)t = */
    /* -g5 -g7 -g3  g1   0   0   0   0 */
    /* -g1 -g5 -g7 -g3   0   0   0   0 */
    /*  g3 -g1 -g5 -g7   0   0   0   0 */
    /*  g7  g3 -g1 -g5   0   0   0   0 */
    /*   0   0   0   0  g1 -g5  g7 -g3 */
    /*   0   0   0   0  g3  g1 -g5  g7 */
    /*   0   0   0   0 -g7  g3  g1 -g5 */
    /*   0   0   0   0  g5 -g7  g3  g1 */
    /* D241 = */
    /* -g5 -g7 -g3  g1 */
    /* -g1 -g5 -g7 -g3 */
    /*  g3 -g1 -g5 -g7 */
    /*  g7  g3 -g1 -g5 */
    /* D241 = - P241 G4 P241 */
    /* P241 = */
    /*  1 0 0 0 */
    /*  0 0 0 1 */
    /*  0 0 - 0 */
    /*  0 1 0 0 */
    T2[0] =  T1[0];
    T2[1] =  T1[3];
    T2[2] = -T1[2];
    T2[3] =  T1[1];
    /* G4t = 1/2 */
    /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
    /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
    /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
    /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
    T3[0] = G2G4[0] * T2[0]; // G2G4[0] = 1.0 / g5 / 16.0
    T3[1] = G2G4[1] * T2[1]; // G2G4[1] = 1.0 / g1 / 16.0
    T3[2] = G2G4[2] * T2[2]; // G2G4[2] = 1.0 / g3 / 16.0
    T3[3] = G2G4[3] * T2[3]; // G2G4[3] = 1.0 / g7 / 16.0
    tmp1 = T3[0] - T3[2];
    tmp2 = T3[1] - T3[3];
    T4[0] =  tmp1 - tmp2; // T3[0] - T3[1] - T3[2] + T3[3];
    T4[1] =  tmp1 + tmp2; // T3[0] + T3[1] - T3[2] - T3[3];
    T4[2] = -T3[0] - T3[2];
    T4[3] =  T3[1] + T3[3];
    T5[0] =  T4[0];
    T5[1] =  g4 * T4[1];
    /* G2t = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T6[0] = G2[0] * T4[2]; // G2[0] = 1.0 / g6 / 2.0
    T6[1] = G2[1] * T4[3]; // G2[1] = 1.0 / g2 / 2.0
    T7[0] = T6[0] - T6[1];
    T7[1] = T6[0] + T6[1];
    T8[0] = T7[0];
    T8[1] = g4 * T7[1];
    T5[2] = T8[0] - T8[1];
    T5[3] = T8[1];
    T9[0] =  T5[0] + T5[2];
    T9[1] =  T5[1] + T5[3];
    T9[2] = -T5[3];
    T9[3] =  T5[1] - T5[2];
    /* - P241 = */
    /*  - 0 0 0 */
    /*  0 0 0 - */
    /*  0 0 1 0 */
    /*  0 - 0 0 */
    T10[0] = -T9[0];
    T10[1] = -T9[3];
    T10[2] =  T9[2];
    T10[3] = -T9[1];

    /* D242 = */
    /*  g1 -g5  g7 -g3 */
    /*  g3  g1 -g5  g7 */
    /* -g7  g3  g1 -g5 */
    /*  g5 -g7  g3  g1 */
    /* D242 = P2422 G4 P2421 */
    /* P2421 = */
    /*  0 1 0 0 */
    /*  1 0 0 0 */
    /*  0 0 0 - */
    /*  0 0 - 0 */
    T2[0] =  T1[5];
    T2[1] =  T1[4];
    T2[2] = -T1[7];
    T2[3] = -T1[6];
    /* G4t = 1/2 */
    /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
    /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
    /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
    /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
    T3[0] = G2G4[0] * T2[0]; // G2G4[0] = 1.0 / g5 / 16.0
    T3[1] = G2G4[1] * T2[1]; // G2G4[1] = 1.0 / g1 / 16.0
    T3[2] = G2G4[2] * T2[2]; // G2G4[2] = 1.0 / g3 / 16.0
    T3[3] = G2G4[3] * T2[3]; // G2G4[3] = 1.0 / g7 / 16.0
    tmp1 = T3[0] - T3[2];
    tmp2 = T3[1] - T3[3];
    T4[0] =  tmp1 - tmp2; // T3[0] - T3[1] - T3[2] + T3[3];
    T4[1] =  tmp1 + tmp2; // T3[0] + T3[1] - T3[2] - T3[3];
    T4[2] = -T3[0] - T3[2];
    T4[3] =  T3[1] + T3[3];
    T5[0] =  T4[0];
    T5[1] =  g4 * T4[1];
    /* G2t = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T6[0] = G2[0] * T4[2]; // G2[0] = 1.0 / g6 / 2.0
    T6[1] = G2[1] * T4[3]; // G2[1] = 1.0 / g2 / 2.0
    T7[0] = T6[0] - T6[1];
    T7[1] = T6[0] + T6[1];
    T8[0] = T7[0];
    T8[1] = g4 * T7[1];
    T5[2] = T8[0] - T8[1];
    T5[3] = T8[1];
    T9[0] =  T5[0] + T5[2];
    T9[1] =  T5[1] + T5[3];
    T9[2] = -T5[3];
    T9[3] =  T5[1] - T5[2];
    /* P2422 = */
    /*  - 0 0 0 */
    /*  0 0 0 1 */
    /*  0 0 1 0 */
    /*  0 1 0 0 */
    T10[4] = -T9[0];
    T10[5] =  T9[3];
    T10[6] =  T9[2];
    T10[7] =  T9[1];

    /* r~2(V2)t = */
    /* | r(1)  r(0) | | r(1)  r(1) | */
    /* | r(0) -r(i) | | r(1) -r(1) | */
    /* = */
    /* | 1 0 0 0 0 0 0 0 | | 1 0 0 0 1 0 0 0 | */
    /* | 0 1 0 0 0 0 0 0 | | 0 1 0 0 0 1 0 0 | */
    /* | 0 0 1 0 0 0 0 0 | | 0 0 1 0 0 0 1 0 | */
    /* | 0 0 0 1 0 0 0 0 | | 0 0 0 1 0 0 0 1 | */
    /* | 0 0 0 0 0 0 1 0 | | 1 0 0 0 - 0 0 0 | */
    /* | 0 0 0 0 0 0 0 1 | | 0 1 0 0 0 - 0 0 | */
    /* | 0 0 0 0 - 0 0 0 | | 0 0 1 0 0 0 - 0 | */
    /* | 0 0 0 0 0 - 0 0 | | 0 0 0 1 0 0 0 - | */
    /* = */
    /* | 1 0 0 0 1 0 0 0 | */
    /* | 0 1 0 0 0 1 0 0 | */
    /* | 0 0 1 0 0 0 1 0 | */
    /* | 0 0 0 1 0 0 0 1 | */
    /* | 0 0 1 0 0 0 - 0 | */
    /* | 0 0 0 1 0 0 0 - | */
    /* | - 0 0 0 1 0 0 0 | */
    /* | 0 - 0 0 0 1 0 0 | */
    G[20] =  T10[0] + T10[4];
    G[21] =  T10[1] + T10[5];
    G[22] =  T10[2] + T10[6];
    G[23] =  T10[3] + T10[7];
    G[28] =  T10[2] - T10[6];
    G[29] =  T10[3] - T10[7];
    G[30] = -T10[0] + T10[4];
    G[31] = -T10[1] + T10[5];
  }

  /* G4t = 1/2 */
  /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
  /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
  /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
  /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
  /* G4tG1 */
  /* mul  8 add  12 shift 0 */
  {
    double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
    T1[0] = G1G4[0] * P[32]; // G1G4[0] = g4 / g5 / 8.0
    T1[1] = G1G4[1] * P[40]; // G1G4[1] = g4 / g1 / 8.0
    T1[2] = G1G4[2] * P[48]; // G1G4[2] = g4 / g3 / 8.0
    T1[3] = G1G4[3] * P[56]; // G1G4[3] = g4 / g7 / 8.0
    tmp1 = T1[0] - T1[2];
    tmp2 = T1[1] - T1[3];
    T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
    T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
    T2[2] = -T1[0] - T1[2];
    T2[3] =  T1[1] + T1[3];
    T3[0] =  T2[0];
    T3[1] =  g4 * T2[1];
    /* G2t = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
    T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
    T5[0] = T4[0] - T4[1];
    T5[1] = T4[0] + T4[1];
    T6[0] = T5[0];
    T6[1] = g4 * T5[1];
    T3[2] = T6[0] - T6[1];
    T3[3] = T6[1];
    G[32] =  T3[0] + T3[2];
    G[40] =  T3[1] + T3[3];
    G[48] = -T3[3];
    G[56] =  T3[1] - T3[2];
  }

  /* G4t = 1/2 */
  /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
  /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
  /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
  /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
  /* G4tG1 */
  /* mul  8 add  12 shift 0 */
  {
    double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
    T1[0] = G1G4[0] * P[33]; // G1G4[0] = g4 / g5 / 8.0
    T1[1] = G1G4[1] * P[41]; // G1G4[1] = g4 / g1 / 8.0
    T1[2] = G1G4[2] * P[49]; // G1G4[2] = g4 / g3 / 8.0
    T1[3] = G1G4[3] * P[57]; // G1G4[3] = g4 / g7 / 8.0
    tmp1 = T1[0] - T1[2];
    tmp2 = T1[1] - T1[3];
    T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
    T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
    T2[2] = -T1[0] - T1[2];
    T2[3] =  T1[1] + T1[3];
    T3[0] =  T2[0];
    T3[1] =  g4 * T2[1];
    /* G2t = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
    T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
    T5[0] = T4[0] - T4[1];
    T5[1] = T4[0] + T4[1];
    T6[0] = T5[0];
    T6[1] = g4 * T5[1];
    T3[2] = T6[0] - T6[1];
    T3[3] = T6[1];
    G[33] =  T3[0] + T3[2];
    G[41] =  T3[1] + T3[3];
    G[49] = -T3[3];
    G[57] =  T3[1] - T3[2];
  }

  /* G4txG2t = P G2txG4t P-1 */
  /* G2txG4t = r~4(V2)t 1/2 (D2,4)t 2r~4(V2)* */
  /* r = r4 */
  /* mul 16 add  40 shift 0 */
  {
    double _P[8], T1[8], T2[4], T3[4], T4[4], T5[4], T6[2], T7[2], T8[2], T9[4], T10[8], T11[8], tmp1, tmp2;

    /* P-1 = */
    /*  1 0 0 0 0 0 0 0 */
    /*  0 0 1 0 0 0 0 0 */
    /*  0 0 0 0 1 0 0 0 */
    /*  0 0 0 0 0 0 1 0 */
    /*  0 1 0 0 0 0 0 0 */
    /*  0 0 0 1 0 0 0 0 */
    /*  0 0 0 0 0 1 0 0 */
    /*  0 0 0 0 0 0 0 1 */
    _P[0] = P[34];
    _P[1] = P[42];
    _P[2] = P[50];
    _P[3] = P[58];
    _P[4] = P[35];
    _P[5] = P[43];
    _P[6] = P[51];
    _P[7] = P[59];
    /* r4(i) = */
    /*  0 0  -1  0 */
    /*  0 0   0 -1 */
    /*  1 0   0  0 */
    /*  0 1   0  0 */
    /* 2r~4(V2)* = */
    /* | r(1)  r(1) | | r(1)  r(0) | */
    /* | r(1) -r(1) | | r(0)  r(i) | */
    /* = */
    /* | 1 0 0 0 1 0 0 0 | | 1 0 0 0 0 0 0 0 | */
    /* | 0 1 0 0 0 1 0 0 | | 0 1 0 0 0 0 0 0 | */
    /* | 0 0 1 0 0 0 1 0 | | 0 0 1 0 0 0 0 0 | */
    /* | 0 0 0 1 0 0 0 1 | | 0 0 0 1 0 0 0 0 | */
    /* | 1 0 0 0 - 0 0 0 | | 0 0 0 0 0 0 - 0 | */
    /* | 0 1 0 0 0 - 0 0 | | 0 0 0 0 0 0 0 - | */
    /* | 0 0 1 0 0 0 - 0 | | 0 0 0 0 1 0 0 0 | */
    /* | 0 0 0 1 0 0 0 - | | 0 0 0 0 0 1 0 0 | */
    /* = */
    /* | 1 0 0 0 0 0 - 0 | */
    /* | 0 1 0 0 0 0 0 - | */
    /* | 0 0 1 0 1 0 0 0 | */
    /* | 0 0 0 1 0 1 0 0 | */
    /* | 1 0 0 0 0 0 1 0 | */
    /* | 0 1 0 0 0 0 0 1 | */
    /* | 0 0 1 0 - 0 0 0 | */
    /* | 0 0 0 1 0 - 0 0 | */
    T1[0] = _P[0] - _P[6];
    T1[1] = _P[1] - _P[7];
    T1[2] = _P[2] + _P[4];
    T1[3] = _P[3] + _P[5];
    T1[4] = _P[0] + _P[6];
    T1[5] = _P[1] + _P[7];
    T1[6] = _P[2] - _P[4];
    T1[7] = _P[3] - _P[5];
    /* 1/2 (D2,4)t = */
    /* -g5 -g7 -g3  g1   0   0   0   0 */
    /* -g1 -g5 -g7 -g3   0   0   0   0 */
    /*  g3 -g1 -g5 -g7   0   0   0   0 */
    /*  g7  g3 -g1 -g5   0   0   0   0 */
    /*   0   0   0   0  g1 -g5  g7 -g3 */
    /*   0   0   0   0  g3  g1 -g5  g7 */
    /*   0   0   0   0 -g7  g3  g1 -g5 */
    /*   0   0   0   0  g5 -g7  g3  g1 */
    /* D241 = */
    /* -g5 -g7 -g3  g1 */
    /* -g1 -g5 -g7 -g3 */
    /*  g3 -g1 -g5 -g7 */
    /*  g7  g3 -g1 -g5 */
    /* D241 = - P241 G4 P241 */
    /* P241 = */
    /*  1 0 0 0 */
    /*  0 0 0 1 */
    /*  0 0 - 0 */
    /*  0 1 0 0 */
    T2[0] =  T1[0];
    T2[1] =  T1[3];
    T2[2] = -T1[2];
    T2[3] =  T1[1];
    /* G4t = 1/2 */
    /*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
    /*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
    /*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
    /*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
    T3[0] = G2G4[0] * T2[0]; // G2G4[0] = 1.0 / g5 / 16.0
    T3[1] = G2G4[1] * T2[1]; // G2G4[1] = 1.0 / g1 / 16.0
    T3[2] = G2G4[2] * T2[2]; // G2G4[2] = 1.0 / g3 / 16.0
    T3[3] = G2G4[3] * T2[3]; // G2G4[3] = 1.0 / g7 / 16.0
    tmp1 = T3[0] - T3[2];
    tmp2 = T3[1] - T3[3];
    T4[0] =  tmp1 - tmp2; // T3[0] - T3[1] - T3[2] + T3[3];
    T4[1] =  tmp1 + tmp2; // T3[0] + T3[1] - T3[2] - T3[3];
    T4[2] = -T3[0] - T3[2];
    T4[3] =  T3[1] + T3[3];
    T5[0] =  T4[0];
    T5[1] =  g4 * T4[1];
    /* G2t = 1/2 */
    /*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
    /*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
    T6[0] = G2[0] * T4[2]; // G2[0] = 1.0 / g6 / 2.0
    T6[1] = G2[1] * T4[3]; // G2[1] = 1.0 / g2 / 2.0
    T7[0] = T6[0] - T6[1];
    T7[1] = T6[0] + T6[1];
    T8[0] = T7[0];
    T8[1] = g4 * T7[1];
    T5[2] = T8[0] - T8[1];
    T5[3] = T8[1];
    T9[0] =  T5[0] + T5[2];
    T9[1] =  T5[1] + T5[3];
    T9[2] = -T5[3];
    T9[3] =  T5[1] - T5[2];
    /* - P241 = */
    /*  - 0 0 0 */
    /*  0 0 0 - */
    /*  0 0 1 0 */
    /*  0 - 0 0 */
    T10[0] = -T9[0];
    T10[1] = -T9[3];
    T10[2] =  T9[2];
    T10[3] = -T