# はじめに

E. Feig and S. Winograd. Fast Algorithms for the Discrete Cosine Transform, IEEE Trans. Signal Processing, vol. 40, no. 9, pp. 2174-2193, Sep. 1992.

# 1D DCT

1D DCTの定義式は次の通りです。式1とします。

\begin{eqnarray}
y_i &=& \sum_{k}{c_i \cos\frac{2\pi i(2k+1)}{4N}}x_k \\
c_0 &=& \frac{1}{\sqrt{N}} \\
\end{eqnarray}


i!=0

c_i = \sqrt{\frac{2}{N}} \\


N=8とします。 $\gamma_i = \cos\frac{2\pi i}{32}$ とします。

y_i = \sum_k{c_i \gamma_{i(2k+1)}} x_k


$C_8$の記号を導入します。i行j列の成分は次のように表現できます。

\begin{eqnarray}
C_8 &=& ( c_i \cos \frac{2\pi i(2j+1)}{32} )_{ij} = ( c_i \gamma_{i(2j+1)} )_{ij} \\
\vec{y} &=& C_8 \vec{x} \\
\vec{y} &=&
\begin{bmatrix}
y_0 & y_1 & y_2 & y_3 & y_4 & y_5 & y_6 & y_7
\end{bmatrix}^t \\
\vec{x} &=&
\begin{bmatrix}
x_0 & x_1 & x_2 & x_3 & x_4 & x_5 & x_6 & x_7
\end{bmatrix}^t \\
\begin{bmatrix}
y_0 \\
y_1 \\
y_2 \\
y_3 \\
y_4 \\
y_5 \\
y_6 \\
y_7 \\
\end{bmatrix}
&=&
\begin{bmatrix}
c_0 \gamma_{0} & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  & c_0 \gamma_{0}  \\
c_1 \gamma_{1} & c_1 \gamma_{3}  & c_1 \gamma_{5}  & c_1 \gamma_{7}  & c_1 \gamma_{9}  & c_1 \gamma_{11} & c_1 \gamma_{13} & c_1 \gamma_{15} \\
c_2 \gamma_{2} & c_2 \gamma_{6}  & c_2 \gamma_{10} & c_2 \gamma_{14} & c_2 \gamma_{18} & c_2 \gamma_{22} & c_2 \gamma_{26} & c_2 \gamma_{30} \\
c_3 \gamma_{3} & c_3 \gamma_{9}  & c_3 \gamma_{15} & c_3 \gamma_{21} & c_3 \gamma_{27} & c_3 \gamma_{33} & c_3 \gamma_{39} & c_3 \gamma_{45} \\
c_4 \gamma_{4} & c_4 \gamma_{12} & c_4 \gamma_{20} & c_4 \gamma_{28} & c_4 \gamma_{36} & c_4 \gamma_{44} & c_4 \gamma_{52} & c_4 \gamma_{60} \\
c_5 \gamma_{5} & c_5 \gamma_{15} & c_5 \gamma_{25} & c_5 \gamma_{35} & c_5 \gamma_{45} & c_5 \gamma_{55} & c_5 \gamma_{65} & c_5 \gamma_{75} \\
c_6 \gamma_{6} & c_6 \gamma_{18} & c_6 \gamma_{30} & c_6 \gamma_{42} & c_6 \gamma_{54} & c_6 \gamma_{66} & c_6 \gamma_{78} & c_6 \gamma_{90} \\
c_7 \gamma_{7} & c_7 \gamma_{21} & c_7 \gamma_{35} & c_7 \gamma_{49} & c_7 \gamma_{63} & c_7 \gamma_{77} & c_7 \gamma_{91} & c_7 \gamma_{105} \\
\end{bmatrix}
\begin{bmatrix}
x_0 \\
x_1 \\
x_2 \\
x_3 \\
x_4 \\
x_5 \\
x_6 \\
x_7 \\
\end{bmatrix} \\
\end{eqnarray}


$c_0 \gamma_0$は$\frac{1}{2} \gamma_4$になります。詳細は次の通りです。

c_0 \gamma_0 = \frac{1}{\sqrt{8}} \cos 0 = \frac{1}{2\sqrt{2}} =  \frac{1}{2} \cos \frac{2 \pi}{8} = \frac{1}{2} \cos \frac{2 \pi * 4}{32} = \frac{1}{2} \gamma_{4}


0行目以降は次の通りcosの周期性と対称性から$\gamma_1$ ～ $\gamma_7$のみで表現できます。

$C_8$は次のように整理できます。

C_8 = \frac{1}{2}
\begin{bmatrix}
\gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 & \gamma_4 \\
\gamma_1 & \gamma_3 & \gamma_5 & \gamma_7 & -\gamma_7 & -\gamma_5 & -\gamma_3 & -\gamma_1 \\
\gamma_2 & \gamma_6 & -\gamma_6 & -\gamma_2 & -\gamma_2 & -\gamma_6 & \gamma_6 & \gamma_2 \\
\gamma_3 & -\gamma_7 & -\gamma_1 & -\gamma_5 & \gamma_5 & \gamma_1 & \gamma_7 & -\gamma_3 \\
\gamma_4 & -\gamma_4 & -\gamma_4 & \gamma_4 & \gamma_4 & -\gamma_4 & -\gamma_4 & \gamma_4 \\
\gamma_5 & -\gamma_1 & \gamma_7 & \gamma_3 & -\gamma_3 & -\gamma_7 & \gamma_1 & -\gamma_5 \\
\gamma_6 & -\gamma_2 & \gamma_2 & -\gamma_6 & -\gamma_6 & \gamma_2 & -\gamma_2 & \gamma_6 \\
\gamma_7 & -\gamma_5 & \gamma_3 & -\gamma_1 & \gamma_1 & -\gamma_3 & \gamma_5 & -\gamma_7 \\
\end{bmatrix}


$C_8$は次のように分解できます。

C_8 = P_8 K_8 B \\


$B$は次の通りです。

\begin{eqnarray}
B &=& B_1 B_2 B_3 \\
B_3 &=&
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 \\
0 & 1 & 0 & 0 & 0 & 0 & -1 & 0 \\
0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 \\
0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 \\
\end{bmatrix} \\
B_2 &=&
\begin{bmatrix}
1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\
0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{bmatrix} \\
B_1 &=&
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 \\
0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 \\
\end{bmatrix} \\
\end{eqnarray}


$K_8$は次のように分解できます。

\begin{eqnarray}
K_8 &=& \frac{1}{2}
\begin{bmatrix}
G_1 &     &     &     \\
& G_1 &     &     \\
&     & G_2 &     \\
&     &     & G_4 \\
\end{bmatrix} \\
G_1 &=& \gamma_4 \\
G_2 &=&
\begin{bmatrix}
\gamma_6 &  \gamma_2 \\
-\gamma_2 &  \gamma_6 \\
\end{bmatrix} \\
G_4 &=&
\begin{bmatrix}
\gamma_5 & -\gamma_7 & \gamma_3 & \gamma_1 \\
-\gamma_1 & \gamma_5 & -\gamma_7 & \gamma_3 \\
-\gamma_3 & -\gamma_1 & \gamma_5 & -\gamma_7 \\
\gamma_7 & -\gamma_3 & -\gamma_1 & \gamma_5 \\
\end{bmatrix}
\end{eqnarray}


$P_8$は次の通りです。

P_8 =
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
\end{bmatrix} \\


$G_2$は次のように変形できます。加算3回、乗算3回で計算できます。

G_2 = \frac{1}{2}
\begin{bmatrix}
\gamma_6 & \\
& \gamma_2 \\
\end{bmatrix}^{-1}
\begin{bmatrix}
1 & 1 \\
-1 & 1 \\
\end{bmatrix}
\begin{bmatrix}
1 & \\
& G_1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
-1 & 1 \\
\end{bmatrix}


$G_4$は次のように変形できます。加算12回、乗算8回で計算できます。

\begin{eqnarray}
G_4 &=& \frac{1}{2} D_4^{-1} H_{4,1}
\begin{bmatrix}
1 & & \\
& G_1 & \\
& & G_2 \\
\end{bmatrix}
H_{4,2} \\
D_4 &=&
\begin{bmatrix}
\gamma_5 & & & \\
& \gamma_1 & & \\
& & \gamma_3 & \\
& & & \gamma_7 \\
\end{bmatrix} \\
H_{4,1} &=&
\begin{bmatrix}
1 & 1 & -1 & 0 \\
-1 & 1 & 0 & 1 \\
-1 & -1 & -1 & 0 \\
1 & -1 & 0 & 1 \\
\end{bmatrix} \\
H_{4,2} &=&
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 1 \\
1 & 0 & 0 & -1 \\
0 & 1 & -1 & 0 \\
\end{bmatrix} \\
\end{eqnarray}


# 2D DCT

2D DCTの定義式は次の通りです。式2とします。

y_{ij} = \sum_{l}{c_j \gamma_{j(2l+1)}} x_{ij} \sum_{k} {c_i \gamma_{i(2k+1)}} \\


Xを行列とします。次のように定義します。

X =
\begin{bmatrix}
x_{00} & x_{01} & x_{02} & x_{03} & x_{04} & x_{05} & x_{06} & x_{07} \\
x_{10} & x_{11} & x_{12} & x_{13} & x_{14} & x_{15} & x_{16} & x_{17} \\
x_{20} & x_{21} & x_{22} & x_{23} & x_{24} & x_{25} & x_{26} & x_{27} \\
x_{30} & x_{31} & x_{32} & x_{33} & x_{34} & x_{35} & x_{36} & x_{37} \\
x_{40} & x_{41} & x_{42} & x_{43} & x_{44} & x_{45} & x_{46} & x_{47} \\
x_{50} & x_{51} & x_{52} & x_{53} & x_{54} & x_{55} & x_{56} & x_{57} \\
x_{60} & x_{61} & x_{62} & x_{63} & x_{64} & x_{65} & x_{66} & x_{67} \\
x_{70} & x_{71} & x_{72} & x_{73} & x_{74} & x_{75} & x_{76} & x_{77} \\
\end{bmatrix} =
\begin{bmatrix}
\vec{x_0} & \vec{x_1} & \vec{x_2} & \vec{x_3} & \vec{x_4} & \vec{x_5} & \vec{x_6} & \vec{x_7}
\end{bmatrix} \\


$\vec{x_0}$は列ベクトルです。$\vec{x_1}$以降も同様です。

\vec{x_0} =
\begin{bmatrix}
x_{00} \\
x_{10} \\
x_{20} \\
x_{30} \\
x_{40} \\
x_{50} \\
x_{60} \\
x_{70} \\
\end{bmatrix}


$\vec{x}$を次のように定義します。行列Xを1本の列ベクトルにしたものです。

\vec{x} =
\begin{bmatrix}
\vec{x_0} \\
\vec{x_1} \\
\vec{x_2} \\
\vec{x_3} \\
\vec{x_4} \\
\vec{x_5} \\
\vec{x_6} \\
\vec{x_7} \\
\end{bmatrix}


Yを行列とします。次のように定義します。

Y =
\begin{bmatrix}
y_{00} & y_{01} & y_{02} & y_{03} & y_{04} & y_{05} & y_{06} & y_{07} \\
y_{10} & y_{11} & y_{12} & y_{13} & y_{14} & y_{15} & y_{16} & y_{17} \\
y_{20} & y_{21} & y_{22} & y_{23} & y_{24} & y_{25} & y_{26} & y_{27} \\
y_{30} & y_{31} & y_{32} & y_{33} & y_{34} & y_{35} & y_{36} & y_{37} \\
y_{40} & y_{41} & y_{42} & y_{43} & y_{44} & y_{45} & y_{46} & y_{47} \\
y_{50} & y_{51} & y_{52} & y_{53} & y_{54} & y_{55} & y_{56} & y_{57} \\
y_{60} & y_{61} & y_{62} & y_{63} & y_{64} & y_{65} & y_{66} & y_{67} \\
y_{70} & y_{71} & y_{72} & y_{73} & y_{74} & y_{75} & y_{76} & y_{77} \\
\end{bmatrix} =
\begin{bmatrix}
\vec{y_0} & \vec{y_1} & \vec{y_2} & \vec{y_3} & \vec{y_4} & \vec{y_5} & \vec{y_6} & \vec{y_7}
\end{bmatrix} \\


$\vec{y_0}$は列ベクトルです。$\vec{y_1}$以降も同様です。

\vec{y_0} =
\begin{bmatrix}
y_{00} \\
y_{10} \\
y_{20} \\
y_{30} \\
y_{40} \\
y_{50} \\
y_{60} \\
y_{70} \\
\end{bmatrix}


$\vec{y}$を次のように定義します。行列Yを1本の列ベクトルにしたものです。

\vec{y} =
\begin{bmatrix}
\vec{y_0} \\
\vec{y_1} \\
\vec{y_2} \\
\vec{y_3} \\
\vec{y_4} \\
\vec{y_5} \\
\vec{y_6} \\
\vec{y_7} \\
\end{bmatrix}


Y = C_8 X C_8^{T}


\vec{y} = (C_8 \otimes C_8) \vec{x}


$C_8 \otimes C_8$は64x64の巨大な行列になります。

$C_8 \otimes C_8$を$P_8,K_8,B$で表します。

(C_8 \otimes C_8) = (P_8 \otimes P_8)(K_8 \otimes K_8)(B \otimes B)


$K_8 \otimes K_8$を計算します。

(K_8 \otimes K_8) = \frac{1}{4}
\begin{bmatrix}
G_1 & & & \\
& G_1 & & \\
& & G_2 & \\
& & & G_4 \\
\end{bmatrix}
\begin{bmatrix}
G_1 & & & \\
& G_1 & & \\
& & G_2 & \\
& & & G_4 \\
\end{bmatrix}


この計算を行うにあたって記号を導入します。

$\tilde{\rho_n}(A)$は行列Aの要素に$\rho_n$を適用したものです。nは2 or 4です。

\tilde{\rho_n}(A) =
\begin{bmatrix}
\rho_n(A_{00}) & \rho_n(A_{01}) \\
\rho_n(A_{10}) & \rho_n(A_{11}) \\
\end{bmatrix}


\begin{eqnarray}
G_2 \otimes G_2 &=& (2\tilde{\rho}_2(V_2)^{-1})(\frac{1}{2}D_{2,2})(\tilde{\rho}_2(V_2)) \\
G_2 \otimes G_4 &=& (2\tilde{\rho}_4(V_2)^{-1})(\frac{1}{2}D_{2,4})(\tilde{\rho}_4(V_2)) \\
G_4 \otimes G_4 &=& (4\tilde{\rho}_4(V_4)^{-1})(\frac{1}{4}D_{4,4})(\tilde{\rho}_4(V_4)) \\
\end{eqnarray}


$V_2$は次の通りです。

\begin{eqnarray}
V_2 &=&
\begin{bmatrix}
1 & \mathrm{i} \\
1 & -\mathrm{i} \\
\end{bmatrix} =
\begin{bmatrix}
1 & 1 \\
1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 \\
0 & \mathrm{i} \\
\end{bmatrix} \\
V_2^{-1} &=& \frac{1}{2}
\begin{bmatrix}
1 & 0 \\
0 & -\mathrm{i} \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 \\
1 & -1 \\
\end{bmatrix}
\end{eqnarray}


$V_4$は次の通りです。

\begin{eqnarray}
V_4 &=&
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & \mathrm{i} \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & \mathrm{w} & 0 & 0 \\
0 & 0 & \mathrm{w}^2 & 0 \\
0 & 0 & 0 & \mathrm{w}^3 \\
\end{bmatrix} \\

V_4^{-1} &=& \frac{1}{4}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & -\mathrm{w}^3 & 0 & 0 \\
0 & 0 & -\mathrm{w}^2 & 0 \\
0 & 0 & 0 & -\mathrm{w} \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 1 & 0 \\
0 & 1 & 0 & 1 \\
1 & 0 & -1 & 0 \\
0 & 1 & 0 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & -\mathrm{i} \\
\end{bmatrix}
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & -1 & 0 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 1 & -1 \\
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\end{eqnarray}


$D_{2,2}$は次の通りです。

D_{2,2} =
\begin{bmatrix}
-\gamma_4 & \gamma_4 & 0 & 0 \\
-\gamma_4 & -\gamma_4 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}


$D_{2,4}$は次の通りです。

\begin{eqnarray}
D_{2,4} &=&
\begin{bmatrix}
D_{2,4,1} \\
& D_{2,4,2} \\
\end{bmatrix} \\
D_{2,4,1} &=&
\begin{bmatrix}
-\gamma_5 & -\gamma_1 & \gamma_3 & \gamma_7 \\
-\gamma_7 & -\gamma_5 & -\gamma_1 & \gamma_3 \\
-\gamma_3 & -\gamma_7 & -\gamma_5 & -\gamma_1 \\
\gamma_1 & -\gamma_3 & -\gamma_7 & -\gamma_5 \\
\end{bmatrix} \\
D_{2,4,2} &=&
\begin{bmatrix}
\gamma_1 & \gamma_3 & -\gamma_7 & \gamma_5 \\
-\gamma_5 & \gamma_1 & \gamma_3 & -\gamma_7 \\
\gamma_7 & -\gamma_5 &  \gamma_1 & \gamma_3 \\
-\gamma_3 & \gamma_7 &  -\gamma_5 & \gamma_1 \\
\end{bmatrix} \\
\end{eqnarray}


$D_{2,4,1}, D_{2,4,2}$ は $G_4$ で表せます。

\begin{eqnarray}
P_{2,4,1} &=&
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & -1 & 0 \\
0 & 1 & 0 & 0 \\
\end{bmatrix} \\
D_{2,4,1} &=& - P_{2,4,1} G_4 P_{2,4,1} \\

P_{2,4,2,1} &=&
\begin{bmatrix}
0 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 \\
0 & 0 & 0 & -1 \\
0 & 0 & -1 & 0 \\
\end{bmatrix} \\

P_{2,4,2,2} &=&
\begin{bmatrix}
-1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
\end{bmatrix} \\

D_{2,4,2} &=& P_{2,4,2,1} G_4 P_{2,4,2,2} \\
\end{eqnarray}


$D_{4,4}$は次の通りです。

\begin{eqnarray}
D_{4,4} &=& 2
\begin{bmatrix}
H_1 \\
& H_2 \\
& & H_3 \\
& & & H_4 \\
\end{bmatrix} \\
H_1 &=&
\begin{bmatrix}
0 & -\gamma_2 & 0 & \gamma_6 \\
-\gamma_6 & 0 & -\gamma_2 & 0 \\
0 & -\gamma_6 & 0 & -\gamma_2 \\
\gamma_2 & 0 & -\gamma_6 & 0 \\
\end{bmatrix} \\
H_2 &=&
\begin{bmatrix}
0 & \gamma_4 & 0 & \gamma_4 \\
-\gamma_4 & 0 & \gamma_4 & 0 \\
0 & -\gamma_4 & 0 & \gamma_4 \\
-\gamma_4 & 0 & -\gamma_4 & 0 \\
\end{bmatrix} \\
H_3 &=&
\begin{bmatrix}
-\gamma_6 & 0 & \gamma_2 & 0 \\
0 & -\gamma_6 & 0 & \gamma_2 \\
-\gamma_2 & 0 & -\gamma_6 & 0 \\
0 & -\gamma_2 & 0 & -\gamma_6 \\
\end{bmatrix} \\
H_4 &=& I_4
\end{eqnarray}


$H_1とH_3$は$G_2$で表せます。

\begin{eqnarray}
H_1 &=& P_{H1,1}
\begin{bmatrix}
G_2 \\
& G_2 \\
\end{bmatrix}
P_{H1,2} \\

P_{H1,1} &=&
\begin{bmatrix}
0 & 0 & 0 & 1 \\
-1 & 0 & 0 & 0 \\
0 & 0 & -1 & 0 \\
0 & -1 & 0 & 0 \\
\end{bmatrix} \\

P_{H1,2} &=&
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix} \\

H_3 &=& - P_{H3}
\begin{bmatrix}
G_2^t \\
& G_2^t \\
\end{bmatrix}
P_{H3} \\

P_{H3} &=&
\begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix} \\
\end{eqnarray}


$G_4 \otimes G_2$はperfect-shuffle permutation($P_s$)を用いることで $G_2 \otimes G_4$ で表せます。

G_4 \otimes G_2 = P_s (G_2 \otimes G_4) P_s^{-1}


$P_s$は次の通りです。

P_s =
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\
\end{bmatrix}


$\rho_2, \rho_4$は次の通りです。

\begin{eqnarray}
\rho_2(\mathrm{i}) &=& - \rho_2(\mathrm{i})^t =
\begin{bmatrix}
0 & -1 \\
1 & 0 \\
\end{bmatrix} \\

\mathrm{w} &=& \mathrm{e}^{\frac{2\pi \mathrm{i}}{8}} \\

\rho_4(\mathrm{w}) &=& -\rho_4(\mathrm{w}^3)^t =
\begin{bmatrix}
0 & 0 & 0 & -1 \\
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix} \\

\rho_4(\mathrm{i}) &=& \rho_4(\mathrm{w}^2) = -\rho_4(\mathrm{w}^2)^t =
\begin{bmatrix}
0 & 0 & -1 & 0 \\
0 & 0 & 0 & -1 \\
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
\end{bmatrix} \\

\rho_4(\mathrm{w}^3) &=& -\rho_4(\mathrm{w})^t =
\begin{bmatrix}
0 & -1 & 0 & 0 \\
0 & 0 & -1 & 0 \\
0 & 0 & 0 & -1 \\
1 & 0 & 0 & 0 \\
\end{bmatrix} \\
\end{eqnarray}


4 $G_1 G_1$ 0 0 1 0 0 4
4 $G_1 G_2$ 3 3 0 12 12 0
4 $G_1 G_4$ 8 12 0 32 48 0
1 $G_2 \otimes G_2$ 2 10 2 2 10 2
2 $G_2 \otimes G_4$ 16 40 0 32 80 0
1 $G_4 \otimes G_4$ 16 80 4 16 80 4
$B \otimes B$ 224
Total 94 454 10

8x8 の 2D DCT は 加算454回、乗算94回、シフト10回で計算できます。

# 2D IDCT

2D IDCTの計算は次のように2D DCTのアルゴリズムをそのまま利用できます。

\begin{eqnarray}
(C_8 \otimes C_8)^t &=& (B^t \otimes B^t)(K_8^t \otimes K_8^t)(P_8^t \otimes P_8^t) \\
(C_2^t \otimes C_2^t) &=& \tilde{\rho_2}(V_2)^t (\frac{1}{2}D_{2,2}^t) (2\tilde{\rho_2}(V_2)^{\mathrm{*}}) \\
(C_2^t \otimes C_4^t) &=& \tilde{\rho_4}(V_2)^t (\frac{1}{2}D_{2,4}^t) (2\tilde{\rho_4}(V_2)^{\mathrm{*}}) \\
(C_4^t \otimes C_4^t) &=& \tilde{\rho_4}(V_4)^t (\frac{1}{4}D_{4,4}^t) (4\tilde{\rho_4}(V_4)^{\mathrm{*}}) \\
B^t \otimes B^t &=& (B_3^t \otimes B_3^t)(B_2^t \otimes B_2^t)(B_1^t \otimes B_1^t) \\
\end{eqnarray}


$\mathrm{*}$は逆行列を適用後に転置する操作を表します。

# sample code

サンプルコードを示します。

makefile
CFLAGS=-I. -Wall -Werror -O2 -march=native
INCS=
OBJS=test.c
LIBS=
TARGET=test

all: $(TARGET) %.o: %.c$(INCS)
$(CC)$(CFLAGS) -c -o $@$<

$(TARGET):$(OBJS)
$(CC)$(CFLAGS) -o $@$^ $(LIBS) clean: rm -rf$(TARGET) *.o

test.c
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

/*
k g(k) = cos(2π*k/32)
1 0.980785280403230
2 0.923879532511287
3 0.831469612302545
4 0.707106781186548
5 0.555570233019602
6 0.382683432365090
7 0.195090322016128
*/
static const double g4 = 0.707106781186548;
static const double G1G1 = 0.125; // g4 * g4 / 4.0
static const double G1G2[] = {
0.230969883127822, // g4 / g6 / 8.0
0.095670858091273, // g4 / g2 / 8.0
};
static const double G2[] = {
1.30656296487638,  // 1.0 / g6 / 2.0
0.54119610014620,  // 1.0 / g2 / 2.0
};
static const double G2_2[] = {
0.1633203706095470,  // 1.0 / g6 / 16.0
0.0676495125182746,  // 1.0 / g2 / 16.0

};
static const double G1G4[] = {
0.159094822571604, // g4 / g5 / 8.0
0.090119977750869, // g4 / g1 / 8.0
0.106303761845907, // g4 / g3 / 8.0
0.453063723176445, // g4 / g7 / 8.0
};
static const double G2G2[] = {
0.0883883476483185, // g4 / 8.0
};
static const double G2G4[] = {
0.1124970278920520, // 1.0 / g5 / 16.0
0.0637244473880199, // 1.0 / g1 / 16.0
0.0751681108668807, // 1.0 / g3 / 16.0
0.3203644309676890, // 1.0 / g7 / 16.0
};
static const double G4G4[] = {
0.0478354290456363, // g6 / 8.0
0.1154849415639110, // g2 / 8.0
0.0883883476483185, // g4 / 8.0
};

static void printElements(char* header, double *data, int len)
{
int i;

for (i=0; i<len; i++) {
printf("[%02d] %+.7lf\n", i, data[i]);
}
}

static void IDCT(int16_t F[][8], int16_t f[][8])
{
int i,j;
double tF[64] = {0}; /* tmp F */
double P[64] = {0};  /* Pt x Pt */
double G[64] = {0};  /* Gt x Gt */
double B[64] = {0};  /* Bt x Bt */

/* (C x C)t = (Bt x Bt)(Kt x Kt)(Pt x Pt) */
for (i=0; i<8; i++) {
for (j=0; j<8; j++) {
tF[i*8+j] = F[j][i];
}
}
if (0) printElements("tF\n", tF, 64);
/* '-' = -1 */
/* Pt =            */
/* 1 0 0 0 0 0 0 0 */
/* 0 0 0 0 1 0 0 0 */
/* 0 0 1 0 0 0 0 0 */
/* 0 0 0 0 0 0 1 0 */
/* 0 - 0 0 0 0 0 0 */
/* 0 0 0 - 0 0 0 0 */
/* 0 0 0 0 0 0 0 1 */
/* 0 0 0 0 0 - 0 0 */
/* Pt x Pt = */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | - 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 - 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 - 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 - 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 1 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | - 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 - 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 - 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 - 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 1 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 1 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 1 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 1 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 1 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 - 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 - 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 1 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 - 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | - 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 - 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 - 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 - 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 1 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 1 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 - | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0 */
/* --------------------------------------------------------------------------------------------------------------------------------------------- */
P[ 0] =  tF[ 0 + 0];
P[ 1] =  tF[ 0 + 4];
P[ 2] =  tF[ 0 + 2];
P[ 3] =  tF[ 0 + 6];
P[ 4] = -tF[ 0 + 1];
P[ 5] = -tF[ 0 + 3];
P[ 6] =  tF[ 0 + 7];
P[ 7] = -tF[ 0 + 5];

P[ 8] =  tF[32 + 0];
P[ 9] =  tF[32 + 4];
P[10] =  tF[32 + 2];
P[11] =  tF[32 + 6];
P[12] = -tF[32 + 1];
P[13] = -tF[32 + 3];
P[14] =  tF[32 + 7];
P[15] = -tF[32 + 5];

P[16] =  tF[16 + 0];
P[17] =  tF[16 + 4];
P[18] =  tF[16 + 2];
P[19] =  tF[16 + 6];
P[20] = -tF[16 + 1];
P[21] = -tF[16 + 3];
P[22] =  tF[16 + 7];
P[23] = -tF[16 + 5];

P[24] =  tF[48 + 0];
P[25] =  tF[48 + 4];
P[26] =  tF[48 + 2];
P[27] =  tF[48 + 6];
P[28] = -tF[48 + 1];
P[29] = -tF[48 + 3];
P[30] =  tF[48 + 7];
P[31] = -tF[48 + 5];

P[32] = -tF[ 8 + 0];
P[33] = -tF[ 8 + 4];
P[34] = -tF[ 8 + 2];
P[35] = -tF[ 8 + 6];
P[36] =  tF[ 8 + 1];
P[37] =  tF[ 8 + 3];
P[38] = -tF[ 8 + 7];
P[39] =  tF[ 8 + 5];

P[40] = -tF[24 + 0];
P[41] = -tF[24 + 4];
P[42] = -tF[24 + 2];
P[43] = -tF[24 + 6];
P[44] =  tF[24 + 1];
P[45] =  tF[24 + 3];
P[46] = -tF[24 + 7];
P[47] =  tF[24 + 5];

P[48] =  tF[56 + 0];
P[49] =  tF[56 + 4];
P[50] =  tF[56 + 2];
P[51] =  tF[56 + 6];
P[52] = -tF[56 + 1];
P[53] = -tF[56 + 3];
P[54] =  tF[56 + 7];
P[55] = -tF[56 + 5];

P[56] = -tF[40 + 0];
P[57] = -tF[40 + 4];
P[58] = -tF[40 + 2];
P[59] = -tF[40 + 6];
P[60] =  tF[40 + 1];
P[61] =  tF[40 + 3];
P[62] = -tF[40 + 7];
P[63] =  tF[40 + 5];

if (0) printElements("P\n", P, 64);

/* Kt x Kt = Gt x Gt */
/* Gt  = */
/* G1  0   0   0 */
/*  0 G1   0   0 */
/*  0  0 G2t   0 */
/*  0  0   0 G4t */
/* (*)G1t = G1 */
/* Gt x Gt = 1/4 */
/* ------------------------------------------------------------------------------------------------- */
/* G1G1    0    0    0|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
/*    0 G1G1    0    0|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
/*    0    0 G1G2t   0|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
/*    0    0    0 G1G4|   0    0     0     0|    0     0       0       0|    0     0       0     0   */
/* ------------------------------------------------------------------------------------------------- */
/*    0    0    0    0|G1G1    0     0     0|    0     0       0       0|    0     0       0     0   */
/*    0    0    0    0|   0 G1G1     0     0|    0     0       0       0|    0     0       0     0   */
/*    0    0    0    0|   0    0  G1G2t    0|    0     0       0       0|    0     0       0     0   */
/*    0    0    0    0|   0    0     0 G1G4t|    0     0       0       0|    0     0       0     0   */
/* ------------------------------------------------------------------------------------------------- */
/*    0    0    0    0|   0    0     0     0|G2tG1     0       0       0|    0     0       0     0   */
/*    0    0    0    0|   0    0     0     0|    0 G2tG1       0       0|    0     0       0     0   */
/*    0    0    0    0|   0    0     0     0|    0     0 G2txG2t       0|    0     0       0     0   */
/*    0    0    0    0|   0    0     0     0|    0     0       0 G2txG4t|    0     0       0     0   */
/* ------------------------------------------------------------------------------------------------- */
/*    0    0    0    0|   0    0     0     0|    0     0       0       0|G4tG1     0       0     0   */
/*    0    0    0    0|   0    0     0     0|    0     0       0       0|    0 G4tG1       0     0   */
/*    0    0    0    0|   0    0     0     0|    0     0       0       0|    0     0 G4txG2t     0   */
/*    0    0    0    0|   0    0     0     0|    0     0       0       0|    0     0       0 G4txG4t */
/* ------------------------------------------------------------------------------------------------- */
/* G1G1 */
/* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
G[0] = G1G1 * P[0];
/* G1G1 */
/* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
G[1] = G1G1 * P[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
/* G1G2t */
/* mul  3 add   3 shift 0 */
{
double T1[2], T2[2], T3[2];
T1[0] = G1G2[0] * P[2]; // G1G2[0] = g4 * 1.0/g6/2.0 / 4.0
T1[1] = G1G2[1] * P[3]; // G1G2[1] = g4 * 1.0/g2/2.0 / 4.0
T2[0] = T1[0] - T1[1];
T2[1] = T1[0] + T1[1];
T3[0] = T2[0];
T3[1] = g4 * T2[1];
G[2] = T3[0] - T3[1];
G[3] = T3[1];
}
/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
/* G1G4t */
/* mul  8 add  12 shift 0 */
{
double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
T1[0] = G1G4[0] * P[4]; // G1G4[0] = g4 / g5 / 8.0
T1[1] = G1G4[1] * P[5]; // G1G4[1] = g4 / g1 / 8.0
T1[2] = G1G4[2] * P[6]; // G1G4[2] = g4 / g3 / 8.0
T1[3] = G1G4[3] * P[7]; // G1G4[3] = g4 / g7 / 8.0
tmp1 = T1[0] - T1[2];
tmp2 = T1[1] - T1[3];
T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
T2[2] = -T1[0] - T1[2];
T2[3] =  T1[1] + T1[3];
T3[0] =  T2[0];
T3[1] =  g4 * T2[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
T5[0] = T4[0] - T4[1];
T5[1] = T4[0] + T4[1];
T6[0] = T5[0];
T6[1] = g4 * T5[1];
T3[2] = T6[0] - T6[1];
T3[3] = T6[1];
G[4] =  T3[0] + T3[2];
G[5] =  T3[1] + T3[3];
G[6] = -T3[3];
G[7] =  T3[1] - T3[2];
}

/* G1G1 */
/* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
G[8] = G1G1 * P[8];
/* G1G1 */
/* mul  1 add   0 shift 0 or mul  0 add   0 shift 1 */
G[9] = G1G1 * P[9];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
/* G1G2t */
/* mul  3 add   3 shift 0 */
{
double T1[2], T2[2], T3[2];
T1[0] = G1G2[0] * P[10]; // G1G2[0] = g4 / g6 / 8.0
T1[1] = G1G2[1] * P[11]; // G1G2[1] = g4 / g2 / 8.0
T2[0] = T1[0] - T1[1];
T2[1] = T1[0] + T1[1];
T3[0] = T2[0];
T3[1] = g4 * T2[1];
G[10] = T3[0] - T3[1];
G[11] = T3[1];
}
/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
/* G1G4t */
/* mul  8 add  12 shift 0 */
{
double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
T1[0] = G1G4[0] * P[12]; // G1G4[0] = g4 / g5 / 8.0
T1[1] = G1G4[1] * P[13]; // G1G4[1] = g4 / g1 / 8.0
T1[2] = G1G4[2] * P[14]; // G1G4[2] = g4 / g3 / 8.0
T1[3] = G1G4[3] * P[15]; // G1G4[3] = g4 / g7 / 8.0
tmp1 = T1[0] - T1[2];
tmp2 = T1[1] - T1[3];
T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
T2[2] = -T1[0] - T1[2];
T2[3] =  T1[1] + T1[3];
T3[0] =  T2[0];
T3[1] =  g4 * T2[1];
/* G2 = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
T5[0] = T4[0] - T4[1];
T5[1] = T4[0] + T4[1];
T6[0] = T5[0];
T6[1] = g4 * T5[1];
T3[2] = T6[0] - T6[1];
T3[3] = T6[1];
G[12] =  T3[0] + T3[2];
G[13] =  T3[1] + T3[3];
G[14] = -T3[3];
G[15] =  T3[1] - T3[2];
}

/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
/* G2tG1 */
/* mul  3 add   3 shift 0 */
{
double T1[2], T2[2], T3[2];
T1[0] = G1G2[0] * P[16]; // G1G2[0] = g4 / g6 / 8.0
T1[1] = G1G2[1] * P[24]; // G1G2[1] = g4 / g2 / 8.0
T2[0] = T1[0] - T1[1];
T2[1] = T1[0] + T1[1];
T3[0] = T2[0];
T3[1] = g4 * T2[1];
G[16] = T3[0] - T3[1];
G[24] = T3[1];
}

/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
/* G2tG1 */
/* mul  3 add   3 shift 0 */
{
double T1[2], T2[2], T3[2];
T1[0] = G1G2[0] * P[17]; // G1G2[0] = g4 / g6 / 8.0
T1[1] = G1G2[1] * P[25]; // G1G2[1] = g4 / g2 / 8.0
T2[0] = T1[0] - T1[1];
T2[1] = T1[0] + T1[1];
T3[0] = T2[0];
T3[1] = g4 * T2[1];
G[17] = T3[0] - T3[1];
G[25] = T3[1];
}

/* G2txG2t = r~2(V2)t 1/2 (D2,2)t 2r~2(V2)* */
/* mul  4 add  10 shift 0 of mul  2 add  10 shift 2 */
{
double T1[4], T2[4], tmp1, tmp2;
/* 2r~2(V2)* = */
/* | r(1)  r(1) | | r(1)  r(0) | */
/* | r(1) -r(1) | | r(0)  r(i) | */
/* = */
/* | 1 0  1  0 | | 1 0 0  0 | */
/* | 0 1  0  1 | | 0 1 0  0 | */
/* | 1 0 -1  0 | | 0 0 0 -1 | */
/* | 0 1  0 -1 | | 0 0 1  0 | */
/* = */
/* | 1 0 0 - | */
/* | 0 1 1 0 | */
/* | 1 0 0 1 | */
/* | 0 1 - 0 | */
T1[0] =  P[18] - P[27];
T1[1] =  P[19] + P[26];
T1[2] =  P[18] + P[27];
T1[3] =  P[19] - P[26];
/* 1/2 (D2,2)t = */
/* -g4 -g4  0  0 */
/*  g4 -g4  0  0 */
/*   0   0  1  0 */
/*   0   0  0  1 */
tmp1 = G2G2[0] * T1[0]; // G2G2[0] = g4 / 8.0
tmp2 = G2G2[0] * T1[1];
T2[0] = - tmp1 - tmp2;  // - g4 * T1[0] - g4 * T1[1];
T2[1] =   tmp1 - tmp2;  //   g4 * T1[0] - g4 * T1[1];
T2[2] = T1[2]/8.0;
T2[3] = T1[3]/8.0;
/* r~2(V2)t = */
/* | r(1)  r(0) | | r(1)  r(1) | */
/* | r(0) -r(i) | | r(1) -r(1) | */
/* = */
/* | 1 0  0  0 | | 1 0  1  0 | */
/* | 0 1  0  0 | | 0 1  0  1 | */
/* | 0 0  0  1 | | 1 0 -1  0 | */
/* | 0 0 -1  0 | | 0 1  0 -1 | */
/* = */
/* | 1 0 1 0 | */
/* | 0 1 0 1 | */
/* | 0 1 0 - | */
/* | - 0 1 0 | */
G[18] =  T2[0] + T2[2];
G[19] =  T2[1] + T2[3];
G[26] =  T2[1] - T2[3];
G[27] = -T2[0] + T2[2];
}

/* G2txG4t = r~4(V2)t 1/2 (D2,4)t 2r~4(V2)* */
/* r = r4 */
/* mul 16 add  40 shift 0 */
{
double T1[8], T2[4], T3[4], T4[4], T5[4], T6[2], T7[2], T8[2], T9[4], T10[8], tmp1, tmp2;
/* r4(i) = */
/*  0 0  -1  0 */
/*  0 0   0 -1 */
/*  1 0   0  0 */
/*  0 1   0  0 */
/* 2r~4(V2)* = */
/* | r(1)  r(1) | | r(1)  r(0) | */
/* | r(1) -r(1) | | r(0)  r(i) | */
/* = */
/* | 1 0 0 0 1 0 0 0 | | 1 0 0 0 0 0 0 0 | */
/* | 0 1 0 0 0 1 0 0 | | 0 1 0 0 0 0 0 0 | */
/* | 0 0 1 0 0 0 1 0 | | 0 0 1 0 0 0 0 0 | */
/* | 0 0 0 1 0 0 0 1 | | 0 0 0 1 0 0 0 0 | */
/* | 1 0 0 0 - 0 0 0 | | 0 0 0 0 0 0 - 0 | */
/* | 0 1 0 0 0 - 0 0 | | 0 0 0 0 0 0 0 - | */
/* | 0 0 1 0 0 0 - 0 | | 0 0 0 0 1 0 0 0 | */
/* | 0 0 0 1 0 0 0 - | | 0 0 0 0 0 1 0 0 | */
/* = */
/* | 1 0 0 0 0 0 - 0 | */
/* | 0 1 0 0 0 0 0 - | */
/* | 0 0 1 0 1 0 0 0 | */
/* | 0 0 0 1 0 1 0 0 | */
/* | 1 0 0 0 0 0 1 0 | */
/* | 0 1 0 0 0 0 0 1 | */
/* | 0 0 1 0 - 0 0 0 | */
/* | 0 0 0 1 0 - 0 0 | */
T1[0] =  P[20] - P[30];
T1[1] =  P[21] - P[31];
T1[2] =  P[22] + P[28];
T1[3] =  P[23] + P[29];
T1[4] =  P[20] + P[30];
T1[5] =  P[21] + P[31];
T1[6] =  P[22] - P[28];
T1[7] =  P[23] - P[29];
/* 1/2 (D2,4)t = */
/* -g5 -g7 -g3  g1   0   0   0   0 */
/* -g1 -g5 -g7 -g3   0   0   0   0 */
/*  g3 -g1 -g5 -g7   0   0   0   0 */
/*  g7  g3 -g1 -g5   0   0   0   0 */
/*   0   0   0   0  g1 -g5  g7 -g3 */
/*   0   0   0   0  g3  g1 -g5  g7 */
/*   0   0   0   0 -g7  g3  g1 -g5 */
/*   0   0   0   0  g5 -g7  g3  g1 */
/* D241 = */
/* -g5 -g7 -g3  g1 */
/* -g1 -g5 -g7 -g3 */
/*  g3 -g1 -g5 -g7 */
/*  g7  g3 -g1 -g5 */
/* D241 = - P241 G4 P241 */
/* P241 = */
/*  1 0 0 0 */
/*  0 0 0 1 */
/*  0 0 - 0 */
/*  0 1 0 0 */
T2[0] =  T1[0];
T2[1] =  T1[3];
T2[2] = -T1[2];
T2[3] =  T1[1];
/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
T3[0] = G2G4[0] * T2[0]; // G2G4[0] = 1.0 / g5 / 16.0
T3[1] = G2G4[1] * T2[1]; // G2G4[1] = 1.0 / g1 / 16.0
T3[2] = G2G4[2] * T2[2]; // G2G4[2] = 1.0 / g3 / 16.0
T3[3] = G2G4[3] * T2[3]; // G2G4[3] = 1.0 / g7 / 16.0
tmp1 = T3[0] - T3[2];
tmp2 = T3[1] - T3[3];
T4[0] =  tmp1 - tmp2; // T3[0] - T3[1] - T3[2] + T3[3];
T4[1] =  tmp1 + tmp2; // T3[0] + T3[1] - T3[2] - T3[3];
T4[2] = -T3[0] - T3[2];
T4[3] =  T3[1] + T3[3];
T5[0] =  T4[0];
T5[1] =  g4 * T4[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T6[0] = G2[0] * T4[2]; // G2[0] = 1.0 / g6 / 2.0
T6[1] = G2[1] * T4[3]; // G2[1] = 1.0 / g2 / 2.0
T7[0] = T6[0] - T6[1];
T7[1] = T6[0] + T6[1];
T8[0] = T7[0];
T8[1] = g4 * T7[1];
T5[2] = T8[0] - T8[1];
T5[3] = T8[1];
T9[0] =  T5[0] + T5[2];
T9[1] =  T5[1] + T5[3];
T9[2] = -T5[3];
T9[3] =  T5[1] - T5[2];
/* - P241 = */
/*  - 0 0 0 */
/*  0 0 0 - */
/*  0 0 1 0 */
/*  0 - 0 0 */
T10[0] = -T9[0];
T10[1] = -T9[3];
T10[2] =  T9[2];
T10[3] = -T9[1];

/* D242 = */
/*  g1 -g5  g7 -g3 */
/*  g3  g1 -g5  g7 */
/* -g7  g3  g1 -g5 */
/*  g5 -g7  g3  g1 */
/* D242 = P2422 G4 P2421 */
/* P2421 = */
/*  0 1 0 0 */
/*  1 0 0 0 */
/*  0 0 0 - */
/*  0 0 - 0 */
T2[0] =  T1[5];
T2[1] =  T1[4];
T2[2] = -T1[7];
T2[3] = -T1[6];
/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
T3[0] = G2G4[0] * T2[0]; // G2G4[0] = 1.0 / g5 / 16.0
T3[1] = G2G4[1] * T2[1]; // G2G4[1] = 1.0 / g1 / 16.0
T3[2] = G2G4[2] * T2[2]; // G2G4[2] = 1.0 / g3 / 16.0
T3[3] = G2G4[3] * T2[3]; // G2G4[3] = 1.0 / g7 / 16.0
tmp1 = T3[0] - T3[2];
tmp2 = T3[1] - T3[3];
T4[0] =  tmp1 - tmp2; // T3[0] - T3[1] - T3[2] + T3[3];
T4[1] =  tmp1 + tmp2; // T3[0] + T3[1] - T3[2] - T3[3];
T4[2] = -T3[0] - T3[2];
T4[3] =  T3[1] + T3[3];
T5[0] =  T4[0];
T5[1] =  g4 * T4[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T6[0] = G2[0] * T4[2]; // G2[0] = 1.0 / g6 / 2.0
T6[1] = G2[1] * T4[3]; // G2[1] = 1.0 / g2 / 2.0
T7[0] = T6[0] - T6[1];
T7[1] = T6[0] + T6[1];
T8[0] = T7[0];
T8[1] = g4 * T7[1];
T5[2] = T8[0] - T8[1];
T5[3] = T8[1];
T9[0] =  T5[0] + T5[2];
T9[1] =  T5[1] + T5[3];
T9[2] = -T5[3];
T9[3] =  T5[1] - T5[2];
/* P2422 = */
/*  - 0 0 0 */
/*  0 0 0 1 */
/*  0 0 1 0 */
/*  0 1 0 0 */
T10[4] = -T9[0];
T10[5] =  T9[3];
T10[6] =  T9[2];
T10[7] =  T9[1];

/* r~2(V2)t = */
/* | r(1)  r(0) | | r(1)  r(1) | */
/* | r(0) -r(i) | | r(1) -r(1) | */
/* = */
/* | 1 0 0 0 0 0 0 0 | | 1 0 0 0 1 0 0 0 | */
/* | 0 1 0 0 0 0 0 0 | | 0 1 0 0 0 1 0 0 | */
/* | 0 0 1 0 0 0 0 0 | | 0 0 1 0 0 0 1 0 | */
/* | 0 0 0 1 0 0 0 0 | | 0 0 0 1 0 0 0 1 | */
/* | 0 0 0 0 0 0 1 0 | | 1 0 0 0 - 0 0 0 | */
/* | 0 0 0 0 0 0 0 1 | | 0 1 0 0 0 - 0 0 | */
/* | 0 0 0 0 - 0 0 0 | | 0 0 1 0 0 0 - 0 | */
/* | 0 0 0 0 0 - 0 0 | | 0 0 0 1 0 0 0 - | */
/* = */
/* | 1 0 0 0 1 0 0 0 | */
/* | 0 1 0 0 0 1 0 0 | */
/* | 0 0 1 0 0 0 1 0 | */
/* | 0 0 0 1 0 0 0 1 | */
/* | 0 0 1 0 0 0 - 0 | */
/* | 0 0 0 1 0 0 0 - | */
/* | - 0 0 0 1 0 0 0 | */
/* | 0 - 0 0 0 1 0 0 | */
G[20] =  T10[0] + T10[4];
G[21] =  T10[1] + T10[5];
G[22] =  T10[2] + T10[6];
G[23] =  T10[3] + T10[7];
G[28] =  T10[2] - T10[6];
G[29] =  T10[3] - T10[7];
G[30] = -T10[0] + T10[4];
G[31] = -T10[1] + T10[5];
}

/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
/* G4tG1 */
/* mul  8 add  12 shift 0 */
{
double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
T1[0] = G1G4[0] * P[32]; // G1G4[0] = g4 / g5 / 8.0
T1[1] = G1G4[1] * P[40]; // G1G4[1] = g4 / g1 / 8.0
T1[2] = G1G4[2] * P[48]; // G1G4[2] = g4 / g3 / 8.0
T1[3] = G1G4[3] * P[56]; // G1G4[3] = g4 / g7 / 8.0
tmp1 = T1[0] - T1[2];
tmp2 = T1[1] - T1[3];
T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
T2[2] = -T1[0] - T1[2];
T2[3] =  T1[1] + T1[3];
T3[0] =  T2[0];
T3[1] =  g4 * T2[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
T5[0] = T4[0] - T4[1];
T5[1] = T4[0] + T4[1];
T6[0] = T5[0];
T6[1] = g4 * T5[1];
T3[2] = T6[0] - T6[1];
T3[3] = T6[1];
G[32] =  T3[0] + T3[2];
G[40] =  T3[1] + T3[3];
G[48] = -T3[3];
G[56] =  T3[1] - T3[2];
}

/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
/* G4tG1 */
/* mul  8 add  12 shift 0 */
{
double T1[4], T2[4], T3[4], T4[2], T5[2], T6[2], tmp1, tmp2;
T1[0] = G1G4[0] * P[33]; // G1G4[0] = g4 / g5 / 8.0
T1[1] = G1G4[1] * P[41]; // G1G4[1] = g4 / g1 / 8.0
T1[2] = G1G4[2] * P[49]; // G1G4[2] = g4 / g3 / 8.0
T1[3] = G1G4[3] * P[57]; // G1G4[3] = g4 / g7 / 8.0
tmp1 = T1[0] - T1[2];
tmp2 = T1[1] - T1[3];
T2[0] =  tmp1 - tmp2; // T1[0] - T1[1] - T1[2] + T1[3];
T2[1] =  tmp1 + tmp2; // T1[0] + T1[1] - T1[2] - T1[3];
T2[2] = -T1[0] - T1[2];
T2[3] =  T1[1] + T1[3];
T3[0] =  T2[0];
T3[1] =  g4 * T2[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T4[0] = G2[0] * T2[2]; // G2[0] = 1.0 / g6 / 2.0
T4[1] = G2[1] * T2[3]; // G2[1] = 1.0 / g2 / 2.0
T5[0] = T4[0] - T4[1];
T5[1] = T4[0] + T4[1];
T6[0] = T5[0];
T6[1] = g4 * T5[1];
T3[2] = T6[0] - T6[1];
T3[3] = T6[1];
G[33] =  T3[0] + T3[2];
G[41] =  T3[1] + T3[3];
G[49] = -T3[3];
G[57] =  T3[1] - T3[2];
}

/* G4txG2t = P G2txG4t P-1 */
/* G2txG4t = r~4(V2)t 1/2 (D2,4)t 2r~4(V2)* */
/* r = r4 */
/* mul 16 add  40 shift 0 */
{
double _P[8], T1[8], T2[4], T3[4], T4[4], T5[4], T6[2], T7[2], T8[2], T9[4], T10[8], T11[8], tmp1, tmp2;

/* P-1 = */
/*  1 0 0 0 0 0 0 0 */
/*  0 0 1 0 0 0 0 0 */
/*  0 0 0 0 1 0 0 0 */
/*  0 0 0 0 0 0 1 0 */
/*  0 1 0 0 0 0 0 0 */
/*  0 0 0 1 0 0 0 0 */
/*  0 0 0 0 0 1 0 0 */
/*  0 0 0 0 0 0 0 1 */
_P[0] = P[34];
_P[1] = P[42];
_P[2] = P[50];
_P[3] = P[58];
_P[4] = P[35];
_P[5] = P[43];
_P[6] = P[51];
_P[7] = P[59];
/* r4(i) = */
/*  0 0  -1  0 */
/*  0 0   0 -1 */
/*  1 0   0  0 */
/*  0 1   0  0 */
/* 2r~4(V2)* = */
/* | r(1)  r(1) | | r(1)  r(0) | */
/* | r(1) -r(1) | | r(0)  r(i) | */
/* = */
/* | 1 0 0 0 1 0 0 0 | | 1 0 0 0 0 0 0 0 | */
/* | 0 1 0 0 0 1 0 0 | | 0 1 0 0 0 0 0 0 | */
/* | 0 0 1 0 0 0 1 0 | | 0 0 1 0 0 0 0 0 | */
/* | 0 0 0 1 0 0 0 1 | | 0 0 0 1 0 0 0 0 | */
/* | 1 0 0 0 - 0 0 0 | | 0 0 0 0 0 0 - 0 | */
/* | 0 1 0 0 0 - 0 0 | | 0 0 0 0 0 0 0 - | */
/* | 0 0 1 0 0 0 - 0 | | 0 0 0 0 1 0 0 0 | */
/* | 0 0 0 1 0 0 0 - | | 0 0 0 0 0 1 0 0 | */
/* = */
/* | 1 0 0 0 0 0 - 0 | */
/* | 0 1 0 0 0 0 0 - | */
/* | 0 0 1 0 1 0 0 0 | */
/* | 0 0 0 1 0 1 0 0 | */
/* | 1 0 0 0 0 0 1 0 | */
/* | 0 1 0 0 0 0 0 1 | */
/* | 0 0 1 0 - 0 0 0 | */
/* | 0 0 0 1 0 - 0 0 | */
T1[0] = _P[0] - _P[6];
T1[1] = _P[1] - _P[7];
T1[2] = _P[2] + _P[4];
T1[3] = _P[3] + _P[5];
T1[4] = _P[0] + _P[6];
T1[5] = _P[1] + _P[7];
T1[6] = _P[2] - _P[4];
T1[7] = _P[3] - _P[5];
/* 1/2 (D2,4)t = */
/* -g5 -g7 -g3  g1   0   0   0   0 */
/* -g1 -g5 -g7 -g3   0   0   0   0 */
/*  g3 -g1 -g5 -g7   0   0   0   0 */
/*  g7  g3 -g1 -g5   0   0   0   0 */
/*   0   0   0   0  g1 -g5  g7 -g3 */
/*   0   0   0   0  g3  g1 -g5  g7 */
/*   0   0   0   0 -g7  g3  g1 -g5 */
/*   0   0   0   0  g5 -g7  g3  g1 */
/* D241 = */
/* -g5 -g7 -g3  g1 */
/* -g1 -g5 -g7 -g3 */
/*  g3 -g1 -g5 -g7 */
/*  g7  g3 -g1 -g5 */
/* D241 = - P241 G4 P241 */
/* P241 = */
/*  1 0 0 0 */
/*  0 0 0 1 */
/*  0 0 - 0 */
/*  0 1 0 0 */
T2[0] =  T1[0];
T2[1] =  T1[3];
T2[2] = -T1[2];
T2[3] =  T1[1];
/* G4t = 1/2 */
/*  |  1  0  1  0 | |  1  0   0   0 | |  1 -1 -1  1 | | 1/g5    0    0    0 | */
/*  |  0  1  0  1 | |  0 G1   0   0 | |  1  1 -1 -1 | |    0 1/g1    0    0 | */
/*  |  0  0  0 -1 | |  0  0 G2t G2t | | -1  0 -1  0 | |    0    0 1/g3    0 | */
/*  |  0  1 -1  0 | |  0  0 G2t G2t | |  0  1  0  1 | |    0    0    0 1/g7 | */
T3[0] = G2G4[0] * T2[0]; // G2G4[0] = 1.0 / g5 / 16.0
T3[1] = G2G4[1] * T2[1]; // G2G4[1] = 1.0 / g1 / 16.0
T3[2] = G2G4[2] * T2[2]; // G2G4[2] = 1.0 / g3 / 16.0
T3[3] = G2G4[3] * T2[3]; // G2G4[3] = 1.0 / g7 / 16.0
tmp1 = T3[0] - T3[2];
tmp2 = T3[1] - T3[3];
T4[0] =  tmp1 - tmp2; // T3[0] - T3[1] - T3[2] + T3[3];
T4[1] =  tmp1 + tmp2; // T3[0] + T3[1] - T3[2] - T3[3];
T4[2] = -T3[0] - T3[2];
T4[3] =  T3[1] + T3[3];
T5[0] =  T4[0];
T5[1] =  g4 * T4[1];
/* G2t = 1/2 */
/*  | 1 -1 | | 1  0 | | 1 -1 | | 1/g6    0 | */
/*  | 0  1 | | 0 G1 | | 1  1 | |    0 1/g2 | */
T6[0] = G2[0] * T4[2]; // G2[0] = 1.0 / g6 / 2.0
T6[1] = G2[1] * T4[3]; // G2[1] = 1.0 / g2 / 2.0
T7[0] = T6[0] - T6[1];
T7[1] = T6[0] + T6[1];
T8[0] = T7[0];
T8[1] = g4 * T7[1];
T5[2] = T8[0] - T8[1];