0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

圧縮しやすく変形 Sort Transformation

Posted at

その界隈ではSchindler Transformationなどとも呼ばれています。文字列を並び替えて圧縮しやすくする手法です。BWTなどと同様に同じ文字が連続するような変換を施します。
BWTは巡回文字列の完全整列を行いますが、Sort Transformationは限定的な整列なので非常に高速。ただしBWTより概ね圧縮性は劣ります。
例えば以下のような文字列を変形すると…

That that is is that that is not is not is that it it is
1文字分だけの変換
"Ttiittininitiiihtttttaaaaasssssttsoott     T h h h   h   "
2文字分だけの変換
"Thitnnttitiiiiiihtttttaaaaasssssttsoott     Thhhh         "
BWT
"ttttstttssttssshhhhhtttTt          nniiiiiiaaooiiaaa    "

という具合です。

実装編

全関数共通で引数は数値配列、戻り値も数値配列です。

JavaScript版
/****** Sorting Transform order1 ******
	先頭1文字で整列
*/
//encode
function ST1e(A){
	for(var c,l=-1,i=0,s=0,C=new Uint32Array(256),O=[];++C[A[++l]];);
	for(l--;i<256;s+=c)c=C[i],C[i++]=s;
	for(i=0;i<l;)O[++C[A[i++]]]=A[i];
	O[0]=O[++C[A[l]]]=A[0];
	return O
}
//decode
function ST1d(A){
	for(var O=[A[0]],L=A.length-1,i=L,C=new Uint32Array(256),s=0;i;)C[A[i--]]++;
	while(i<255)C[++i]=(s+=C[i])-C[i];
	for(i=0;++i<L;)O[i]=A[++C[O[i-1]]];
	return O
}
/****** Sorting Transform order2 ******
	先頭2文字で整列
*/
//encode
function ST2e(A){
	for(var c,l=A.length,i=l,s=0,C=new Uint32Array(65536),O=[];--i;)C[A[i]<<8|A[i-1]]++;
	for(C[A[0]<<8|A[l-1]]++;s<l;s+=c)c=C[i],C[i++]=s;
	for(i=2;i<l;)O[++C[A[i-1]<<8|A[i-2]]+1]=A[i++];
	O[0]=O[++C[A[l-1]<<8|A[l-2]]+1]=A[0],O[1]=O[++C[A[0]<<8|A[l-1]]+1]=A[1];
	return O
}
//decode
function ST2d(A){
	for(var C=new Uint32Array(65536),O=[A[0],A[1]],l=(A=A.slice(2)).length,i=l,j,s=0;i--;)C[A[i]]++;
	while(i<255)for(j=s,s+=C[++i],C[i]=0;j<s;)C[A[j++]<<8|i]++;
	for(i=s=0;s<l;s+=j)j=C[i],C[i++]=s;
	for(i=2;i<l;)O[i]=A[C[O[i-1]<<8|O[i++-2]]++];
	return O
}
使用例
let txt=Uint8Array.from("That that is is that that is not is not is that it it is",a=>a.charCodeAt());
let e=ST1e(txt), d=ST1d(e);
document.write("<pre>encoded ",String.fromCharCode(...e),"\ndecoded ",String.fromCharCode(...d))

先頭3文字以上で整列する実装も可能ですが、どんどん複雑になります。szipというprogramはうまい事実装しています。

参考文献

compressconsult

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?