今回は無秩序に単調増加する数列の圧縮法を紹介します。この手の数列は差分圧縮するのがド定番ですが、それとは異なる手法とやらを書き散らかしてみます。
256要素の単調増加数列
801,1341,1731,2395,2986,3983,4759,5128,5949,6186,6472,6835,7416,7663,8033,8657,9187,9422,9527,10352,11187,11352,11450,11903,11926,12223,12488,13338,13992,14840,15075,15872,16207,16545,16825,17526,18203,19132,19919,20040,21026,21472,22004,22231,22352,22601,23394,24004,24837,25448,26011,27009,27073,27846,28290,28808,29334,30216,30972,31026,31324,32165,33057,33716,33903,34490,35104,35982,36176,36480,36662,37463,37846,38719,39189,39603,40287,41082,41993,42791,43400,43440,43911,44801,45054,45761,46729,47419,48299,48948,49657,49670,49905,50695,50888,51712,52013,52688,52926,53779,53911,54281,54952,55466,56379,56483,57114,57849,57888,58363,58480,58875,59861,60745,61124,62041,62198,62528,62901,63432,63989,64091,64530,64942,65003,65864,66452,67265,67948,68755,69213,70141,70945,71595,72126,72173,73014,73454,73882,73918,74028,74744,75137,76020,76265,77033,78029,78465,78839,78958,79278,79554,80223,80645,81643,82601,82624,82766,83505,84269,85079,85937,86189,87066,87352,87933,88540,88782,89332,90061,90595,91068,91565,91817,91841,92625,93168,94003,94218,94255,95146,96009,96123,97063,97138,97416,98342,98503,98514,99051,99200,99466,99917,100312,101133,102042,102975,103027,103956,104066,104905,105253,106232,106706,107364,107543,107869,108234,108721,109605,109961,110149,111093,111854,112485,112591,113412,114161,114910,115232,115688,116181,117095,117486,118443,118893,119512,120270,120762,121177,121916,122435,122572,122978,123393,124137,124706,124820,125314,126050,126064,126297,126566,126850,127748,128174,128611,129525,130284,130317,130621,130994,131833,132033,132162,132486
差分はまばらで規則性は皆無
差分数列
801,540,390,664,591,997,776,369,821,237,286,363,581,247,370,624,530,235,105,825,835,165,98,453,23,297,265,850,654,848,235,797,335,338,280,701,677,929,787,121,986,446,532,227,121,249,793,610,833,611,563,998,64,773,444,518,526,882,756,54,298,841,892,659,187,587,614,878,194,304,182,801,383,873,470,414,684,795,911,798,609,40,471,890,253,707,968,690,880,649,709,13,235,790,193,824,301,675,238,853,132,370,671,514,913,104,631,735,39,475,117,395,986,884,379,917,157,330,373,531,557,102,439,412,61,861,588,813,683,807,458,928,804,650,531,47,841,440,428,36,110,716,393,883,245,768,996,436,374,119,320,276,669,422,998,958,23,142,739,764,810,858,252,877,286,581,607,242,550,729,534,473,497,252,24,784,543,835,215,37,891,863,114,940,75,278,926,161,11,537,149,266,451,395,821,909,933,52,929,110,839,348,979,474,658,179,326,365,487,884,356,188,944,761,631,106,821,749,749,322,456,493,914,391,957,450,619,758,492,415,739,519,137,406,415,744,569,114,494,736,14,233,269,284,898,426,437,914,759,33,304,373,839,200,129,324
以下の圧縮法により349 bytes程度に圧縮できるかも
実装編
単純にRangeCoderを無闇に使いまくるだけの簡単な実装です。
JavaScript版
//乱数列圧縮
function raEnc(A,max=256,size=A.length){
function encode(c,f,t){
R=R/t>>>0;L+=c*R;
for(R*=f;R<16777216;L=L<<8>>>0,R*=256)
if((c=L>0xffffffff)||255>L>>>24)
for(O[o++]=255&c--+B,B=L>>>24;C;C--)O[o++]=255&c;
else++C
}
if(max<size)throw"error";
let a=0,b=0,o=-1,O=[];
let L=0,R=-1>>>0,B=0,C=0;// range coder
//main loop
for(;a<size&&size-a-max+b;)
if(A[a]===b++)encode(0,size,max),a++;
else encode(size,max-size,max);
//range coder最終出力
for(b=5;b--;L=L<<8>>>0)
if((a=L>0xffffffff)||255>L>>>24)
for(O[o++]=255&a--+B,B=L>>>24;C;C--)O[o++]=255&a;
else++C;
for(;o&&!O[--o];)O.length--;//末尾の0削除. しなくても良い
return O
}
//乱数列展開
function raDec(A,max=256,size=256){
function decode(C,n){
R=R/C[n]>>>0;
for(var b=0,c=L/R>>>0;C[++b]<=c;);
c=C[b-1];L-=R*c;
for(R*=C[b]-c;R<16777216;R*=256)L=(L<<8|A[p++])>>>0;
return--b
}
let a=0,b=5,p=0,C=[0,size,max],O=[];
let L,R=-1>>>0;//range coder
for(;--b;)L=(L<<8|A[p++])>>>0;//初期化
for(;a<size&&size-a-max+b;b++)
if(!decode(C,2))O[a++]=b;
for(;a<size;)O[a++]=b++;//不足分補完
return O
}
-
raEnc(A,max,size)
: 圧縮関数- A: 乱数列
- max: Aに含まれる最大値を超える値
- size: Aの要素数
-
raDec(A,max,size)
: 展開関数- A: 乱数圧縮された配列
- max: Aに含まれる最大値を超える値
- size: Aの要素数
使用例
let ra=[4,10,16,18,24,28,33,39,40,47,52,59,61,66,69,75,78,79,83,84,88,94,101,103,104,107,113,114,120,121,126,128];
let max=ra.at(-1)+1, size=ra.length;
let e=raEnc(ra,max), d=raDec(e,max,size);
document.write(size," to ",e.length,"<br>decoded:",d)
第2引数max
は小さい程圧縮率が良くなります。乱数列に含まれる最大値が255なら、max
は最低でも256を指定する必要があります。ややこしい仕様です。また、max
はsize
以上の値にする必要があります。
loop継続条件のsize-a-max+b
の意味ですが、size-a
とmax-b
が等しい場合、符号化の必要が無いので打ち切るという事です(残りの数列の値が確定している)