組み込み関数のCompressionStream
の圧縮力があまりにお粗末なのでzopfli級の最適化を施そうと目論んだ次第です。
C言語版のzopfliもどきを手作業でJavaScriptに移植するだけの超簡単ではない程度お仕事です。
本家は圧縮力0~9を指定できますが、それだけでは能力を十分に発揮できていません。JavaScript版ではもっと細かい設定ができるようにしていきます。それでは動作検証
See the Pen deflate/zopfli compression + inflate by xezz (@xezz) on CodePen.
設定項目
まずC言語版の構造体では18個の設定項目があります。圧縮力0~9はこれらをある決まった値に設定するだけの仕様です。
typedef struct ZopfliOptions{
/*
Maximum amount of times to rerun forward and backward pass to optimize LZ77
compression cost. Good values: 10, 15 for small files, 5 for files over
several MB in size or it will be too slow.
*/
int numiterations;
unsigned filter_style;
//Don't try to use dynamic block under this size
unsigned skipdynamic;
//Try static block if dynamic block is smaller than this. Needs to be much higher than skipdynamic
unsigned trystatic;
//Don't blocksplit under this size
unsigned noblocksplit;
// Don't blocksplit under this size (LZ77'd data)
unsigned noblocksplitlz;
//Number of parallel block split point searches
unsigned num;
/* Search longer for best huffman header compression:
Value outstream important less important
0: 1 1 1
1: 10 10 1
2: 32 32 32
*/
unsigned searchext;
unsigned reuse_costmodel;
//When using more than one iteration, this will save the found matches on the first run so they don't need to be found again. Uses large amounts of memory
unsigned useCache;
//Use per block multithreading
unsigned multithreading;
//Use tuning for PNG files
unsigned isPNG;
//Replace short lengths with literals if that improves compression. Higher numbers mean more aggressive behaviour
unsigned replaceCodes;
//Block split twice
unsigned twice;
//Sorry, can't figure out a better name. Leads to better final LZ costmodel
unsigned ultra;
//Use greedy search instead of lazy search above this value
unsigned greed;
//Use shannon entropy instead of real code lengths in blocksplitting
unsigned entropysplit;
//Use advanced huffman and header optimizations
unsigned advanced;
}
JavaScript(CodePen)での仕様
/* deflate圧縮関数
@set: object. 設定を格納
@last: 最後のbitを最後のdeflate blockに設定するかどうかの真偽
@A: 圧縮元配列
@z: Aの大きさ
@O: 圧縮先配列
*/
function zopfliDeflate(set,last,A,z,O)
/* deflate-raw的な関数. zopfliDeflate包括関数
@A: 圧縮元配列
@set: object. 設定を格納
@done: 後処理用関数(省略可)
@rate: 保留(省略可)
@return: Aを圧縮した配列
*/
function rawComp(A,set,done,rate)
{
x:0-255 // Maximum amount of times to rerun forward and backward pass to optimize LZ77 compression cost
skipdynamic: 0-511 // Don't try to use dynamic block under this size
trystatic: 0-4095 // Try static block if dynamic block is smaller than this. Needs to be much higher than skipdynamic
noblocksplit: 0-2047 // Don't blocksplit under this size
noblocksplitlz: 0-2047 // Don't blocksplit under this size (LZ77's data)
num: 0-15 // [2-9?] Number of parallel block split point searches
searchext: 0-2 // [0-2] Search longer for best huffman header compression
reuseModel: 0-1 // numiterations must be >1
useCache: 0-1 // When using more than one iteration, this will save the found matches on the first run so they don't need to be found again. Uses large amounts of memory
multithreading: 0 // Use per block multithreading
isPNG: 0-1 // Use tuning for PNG files
replaceCodes: 0-2047 // Replace short lengths with literals if that improves compression. Higher numbers mean more aggressive behaviour
twice: 0-7 // [0-7] Block split twice
ultra: 0-3 // [0-3] Leads to better final LZ costmodel
greed: 0-258 // [0-258] Use greedy search instead of lazy search above this value
entropysplit: 0-1 // Use shannon entropy instead of real code lengths in blocksplitting
advanced: 0-1 // Use advanced huffman and header optimizations
bpm:0-1 // Use Boundary Package Merge for huffman code
}
ちなみにbpm
という項目は本家にはありません。その実体ですがCodePenではBoundary PM
という項目に該当します。有効にすると本家同様にBounded package merge algorithmと呼ばれる手法でhuffman符号を構築。無効の場合は7zipで採用されている高速な手法で構築。圧縮率が同じになるとは限りません。
isPNG
を有効にすると、内部でPNG画像用の閾値を設定します。別にPNG画像を作るわけではありません。
CodePenではlevel
という項目があり、これが本家の圧縮力0-9に該当します(多分)