LZP圧縮

Posted at 2024-06-24

LZ77系の変異種、LZP様を紹介します。これは過去出現した文字列と一致する文字列を少ない情報量で出力できます。なんと一致判定flagと一致長のみ。LZ77系は一致位置というかなり幅のある情報も出力。それに比べると大幅な改善です。………が、良い事ずくめではありません。
一致率が非常に控え目なのです。展開速度は高速ですがLZ77系には劣ります。そういう訳で、これ単体ではほぼ役に立ちません。他の圧縮法の前処理として利用するのが一般的です。

原理

直前の数文字からhash値を計算し、その値を添字にして文字列の位置を配列に格納しまくっていきす。
その配列から得た位置を基準に次の文字列と一致する長さが一定以上あれば、一致成功と判定します。

実装

今回の実装ではhash値というより、3文字分そのまんまの値を添字にして配列に位置を格納。故に配列長は16MB。
一致flagは引数flagで指定(0～255)。256以上にすると自動で決定。
一致成功とする最低一致長は引数minで指定(0～127)。実際には+2される。
引数levelで一致探索法を指定(0～2)。圧縮率にささやかな変化がある模様。

/*compress
@In: input(Array / Uint8Array)
@flag: match flag symbol(0-256). 256 means auto detect.
@min: minimum match length(0-127)
@level: lzp parser. 0=greedy, 1=lazy, 2=flexible
@done: call back of last process
	done(A,a,z)
	@A: compressed array of input
	@a: input size
	@z: compressed size
@return: compressed array of @In
*/
function LZPencode(In,flag,min,level,done=a=>a){
	var a=0,o=2,size=In.length,H=new Uint32Array(1<<24),Out=[min&=127];
	for(min+=2;a<3&&a<size;)Out[o++]=In[a++];
	if(flag>>>0>255){// detect match flag
		let i=In.length,c,h=1/0;
		for(;i--;)++H[In[i]];
		for(flag=0;i<255;)if((c=H[++i])<h)h=c,flag=i;
		for(;H[i]=0,i--;);
	}
	for(Out[1]=flag&=255;a<size;){
		let h=In[a-1]|In[a-2]<<8|In[a-3]<<16, i=H[h], c=In[H[h]=a],l;
		if(i){
			for(l=0;In[i++]===In[a+l];)l++;
			if(l>=min)lz:{
				if(level>1){// flexible parsing
					for(let d,e=0,n,p=min>(c=l+1)-25?min:l-24;c>p;){
						d=--c+a;
						i=H[In[d-1]|In[d-2]<<8|In[d-3]<<16];
						for(n=0;In[i++]===In[d++];n++);
						if(!e||n+c>e&&n>min)e=n+c,l=c
					}
				}else if(level&&a+min<size){// lazy parsing
					let d=a,n=0;
					i=H[In[a]|In[a-1]<<8|In[a-2]<<16];
					for(;In[i++]===In[++d];n++);
					if(l<n)break lz
				}
				// lzp match
				Out[o++]=flag;a+=l;l-=min-1;
				if(l<253)Out[o++]=l;
				else for(Out[o++]=(l-=253)<256?253:l<65536?254:255;Out[o++]=l&255,l>>=8;);
				continue
			}a++,Out[o++]=c;
			if(c===flag)Out[o++]=0
		}else a++,Out[o++]=c
	}
	done(Out,a,o);
	return Out
}
/*decompress
@In: input (Array/Uint8Array)
@done: call back of last process.
	done(A,a,z)
	@A: decompressed array of @In.
	@a: input size
	@z: decompressed size
@return: decompressed array of @In
*/
function LZPdecode(In,done=a=>a){
	var a=2,size=In.length,l,o=-1,flag=In[1],min=In[0]+1,
		H=new Uint32Array(1<<24),Out=[];
	for(;size>a&&o<2;)Out[++o]=In[a++];
	done=done||function(O){return O};
	for(;a<size;){
		let c=In[a++],h=Out[o]|Out[o-1]<<8|Out[o-2]<<16,i=H[h];
		H[h]=o;
		if(!i||c^flag||!(h=In[a++]))Out[++o]=c;
		else{
			if(h>252)for(c=h,l=0;h+=In[a++]<<l,--c>252;h--)l+=8;
			for(h+=min;h--;)Out[++o]=Out[++i]
		}
	}
	done(Out,a,o+1);
	return Out
}

codepenで実験

See the Pen LZP compression by xezz (@xezz) on CodePen.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up