今回は静的Range符号の圧縮率を高める為の方法を紹介します(過去の記事でもちらっと言及していますが)。静的と言っておきながら実は頻度表を更新するというものです。
最初に出現しうる記号全ての頻度を求め、そこから適応型Range符号の場合とは逆に、符号化の過程で文字の頻度をどんどん減らしていきます。
ただし毎回更新していては遅過ぎるので、条件を満たした時のみ更新します。
まず頻度表を複製しておき、通常はそっちの方の文字頻度を減らしまくっていきます。いずれかの文字の頻度が0になった時に、元の頻度表を複製先と同じ内容に更新するのです。
その方が高速になる理由は、累積頻度を求める計算量が減る為です。Range符号はいちいち累積頻度も必要になるのです。
速度だけでなく圧縮率の点でもその方が優れている場合が多いっぽいです。
更新する恩恵は統計をとる文脈が高次になる程顕著になっていきます。
問題点
この方法は頻度を正確無比にしないと破綻します。巨大fileを処理する場合、1文字辺りの頻度が多くなり、しまいにはRange符号の設計上の限界を超えてしまいます。
それを避ける方法の1つは、文字1種類でも頻度の限界がきたら、そこまでの区間で一旦圧縮を打ち切り、頻度表を初期化して次の区間へ移行する事です。
何度も区間を分割すると、圧縮fileに刻印する頻度表が増えていきます。頻度表自体の圧縮も巧妙にしないと、赤字になりかねません。
昔ながらの静的Range符号だと、頻度を丸め込んで精度を犠牲にしていたものです。
実装編
いつもの薄汚い設計だと、読者の目が腐るかもしれないので、AIさんに手直しして頂きました。長ったらしいです…。
統計をとる文脈の長さ(order)は1~4まで対応。order1~2は万能の性能、order3は巨大fileで真価を発揮、order4はほぼ使い物にならなりません。
頻度表、複製頻度表、累積頻度表を1個のTypedArrayで管理(2582+2562+257*4 bytes)。かなり贅沢な設計です。
function calculateHash(input, shift){
return ((input >>> 16) * 52503 + (input &= 65535) * 1883 << 16) + input * 52503 >>> shift;
}
// Helper functions
function createProbabilityModel(size, updateRate){
var model = new Uint16Array(size).fill(2048);
model.up = updateRate;
return model;
}
function calculateLog2(value, bits){
for(bits = 0; value >> ++bits; );
return bits;
}
function calculateGammaRLESize(array, length, compareFunction, encoder){
compareFunction = compareFunction || function(a){ return !a; };
var position = 0, runEnd, runLength, distinctSymbols, entropy, runStart, bitCost = 1;
for(encoder && encoder(1, !compareFunction(array[0])); runEnd = position < length; bitCost += runLength){
for(runLength = compareFunction(array[position]), distinctSymbols = calculateLog2(length - position);
runLength == compareFunction(array[++position]) && position < length; ){
runEnd++;
}
entropy = runLength = calculateLog2(runStart = runEnd);
if(runLength-- < distinctSymbols){
/* length limited */
} else {
runEnd ^= 1 << runLength--;
}
runLength += runStart < 256 ? entropy : 1;
encoder && encoder(runLength, runEnd);
}
return bitCost; // bit cost
}
function encodeGammaRLE(array, length, output, probabilityModel){
var position = 0, runEnd, runLength, distinctSymbols, entropy;
for(output.encodeBinary(7, 1 - !array[0], probabilityModel[0]); runEnd = position < length; ){
for(runLength = !array[position], distinctSymbols = calculateLog2(length - position) - 1;
runLength == !array[++position] && position < length; ){
runEnd++;
}
for(runLength = 0, entropy = runEnd; entropy >>= 1; ){
output.encodeBinary(runLength++, 0, probabilityModel[0]);
}
if(runLength < distinctSymbols) output.encodeBinary(runLength, 1, probabilityModel[0]);
if(runLength && runLength < 8){
for(; runLength; ){
output.encodeBinary(--runLength, runEnd >> runLength & 1, probabilityModel[1]);
}
}
}
}
function decodeGammaRLD(array, output, probabilityModel){
for(var position = 0, runEnd = 0, runLength, distinctSymbols, entropy = output.decodeBinary(7, probabilityModel[0]); runEnd < 256; entropy ^= 1){
for(runLength = 0, distinctSymbols = calculateLog2(256 - runEnd) - 1; runLength < distinctSymbols && !output.decodeBinary(runLength, probabilityModel[0]); ){
runLength++;
}
if(distinctSymbols = 1 << runLength, runLength < 8){
for(; runLength; ){
distinctSymbols += output.decodeBinary(--runLength, probabilityModel[1]) << runLength;
}
}
for(; distinctSymbols--; runEnd++){
if(entropy) array[position++] = runEnd;
}
}
return position;
}
function calculateDeltaSize(array, length, encoder){
var position = 0, nextByte, currentByte = array[0], delta, entropy, bitLength, bitCost = 8;
for(encoder && encoder(8, currentByte); ++position < length && length - position < (entropy = 255 - currentByte); currentByte = nextByte){
nextByte = array[position];
delta = ~currentByte + nextByte;
bitLength = calculateLog2(entropy);
entropy = (1 << bitLength) - entropy;
if(delta < entropy) bitLength--;
else delta += entropy;
encoder && encoder(bitLength, delta);
bitCost += bitLength;
}
return bitCost; // bit cost
}
function encodeDelta(array, length, output, probabilityModel, secondaryModel){
var position = 0, nextByte, currentByte = array[0], delta, entropy, bitLength;
for(output.encodeSequence(8, currentByte, probabilityModel); ++position < length && length - position < (entropy = 255 - currentByte); currentByte = nextByte){
nextByte = array[position];
delta = ~currentByte + nextByte;
bitLength = calculateLog2(entropy);
entropy = (1 << bitLength--) - entropy;
if(delta < entropy){
output.encodeSequence(bitLength, delta, secondaryModel);
} else {
delta += entropy;
output.encodeSequence(bitLength, delta >> 1, secondaryModel);
output.writeBits(1, delta & 1);
}
}
}
function decodeDelta(array, length, output, probabilityModel, secondaryModel){
for(var position = 1, nextByte, currentByte = array[0] = output.decodeSequence(8, probabilityModel), entropy, bitLength;
position < length && length - position < (entropy = 255 - currentByte);
array[position++] = currentByte -= ~nextByte){
bitLength = calculateLog2(entropy);
entropy = (1 << bitLength) - entropy;
nextByte = output.decodeSequence(bitLength - 1, secondaryModel);
if(nextByte < entropy){
/* nextByte is already correct */
} else {
nextByte += nextByte + output.readBits(1) - entropy;
}
}
for(; position < length; ){
array[position++] = ++currentByte;
}
return array;
}
/* Compression function
@inputArray: 要素が0~255の数値配列
@contextBits: 文字頻度のbit数(+7). 最大値に達したら圧縮区間を区切る
@blockSize: 最大の圧縮区間幅. bs==0なら1<<24, bs<16なら1<<bs+9, bs<1024ならbs<<10, その他は直値
@literalBits: 1文字の統計量(0-7. 頻度表の記号が1個の場合等)
@orderNum: 文脈=(orderNum:0-3)+1
@hashNum: 文脈のbit数. 実際はhN(0-15)+12. orderNum=0ならhNは8以下に自動補正
@onComplete: call back of last process.
onComplete(A,a,z)
@A: compressed array of input.
@a: input size
@z: compressed size
@onProgress: call back of progress
onProgress(a,z)
@a: current position
@z: last position
*/
function compress(inputArray, contextBits, blockSize, literalBits, orderNum, hashNum, onComplete, onProgress){
// Write plain bits to output
function writePlainBits(bitCount, value, carry){
for(; bitCount; ){
for(low += (range >>>= 1) * (value >>> --bitCount & 1); range < 16777216; low = low << 8 >>> 0, range *= 256){
if((carry = low > 0xffffffff) || 255 > low >>> 24){
for(output[outputPos++] = 255 & carry-- + carryByte, carryByte = low >>> 24; carryCount; carryCount--){
output[outputPos++] = 255 & carry;
}
} else {
++carryCount;
}
}
}
}
// Encode binary with model
function encodeBinary(symbolIndex, bit, probabilityModel){
var probability = probabilityModel[symbolIndex],
scaledRange = (range >>> 12) * probability;
if(bit){
range = scaledRange;
} else {
range -= scaledRange;
low += scaledRange;
}
// Update probability
for(probabilityModel[symbolIndex] -= (probability - (bit << 12) >> probabilityModel.up) + bit; range < 16777216; low = low << 8 >>> 0, range *= 256){
if((carry = low > 0xffffffff) || 255 > low >>> 24){
for(output[outputPos++] = 255 & carry-- + carryByte, carryByte = low >>> 24; carryCount; carryCount--){
output[outputPos++] = 255 & carry;
}
} else {
++carryCount;
}
}
}
// Encode binary sequence (reverse order)
function encodeBinarySequence(bitCount, value, probabilityModel){
for(var bit, context = 1; bitCount; context += context + bit){
encodeBinary(context, bit = value >> --bitCount & 1, probabilityModel);
}
}
// Encode binary sequence (forward order)
function encodeBinaryForward(bitCount, value, probabilityModel){
for(; bitCount; ){
encodeBinary(--bitCount, value >> bitCount & 1, probabilityModel);
}
}
// Encode count with binary tree
function encodeCountWithTree(count, maxCount, bitLength, probabilityModel){
maxCount = (1 << bitLength--) - maxCount;
if(count < maxCount){
encodeBinarySequence(bitLength, count, probabilityModel);
} else {
count += maxCount;
encodeBinarySequence(bitLength, count >> 1, probabilityModel);
writePlainBits(1, count & 1);
}
}
function createNewModel(frequencyTable, symbolArray, previousByte){
// Check previous model, collect symbols and counters
var byte = 0, count, distinctSymbols = 0, entropy = 0, bitLength, lastSymbol, alphabetSize = 0,
counters = frequencyTable.counters, frequencies = frequencyTable.subarray(258);
for(; byte < 256; bitLength >> 15 ^ !count || entropy++, frequencies[byte++] = count){
if(bitLength = frequencies[byte], count = frequencyTable[byte]){
if(distinctSymbols < count) distinctSymbols = count;
counters[alphabetSize] = count - 1;
symbolArray[alphabetSize++] = lastSymbol = byte;
}
}
frequencyTable[byte] = alphabetSize;
frequencyTable[257] = ++lastSymbol;
previousByte = literalBits < 2 ? literalBits && previousByte < 97 & 1 : previousByte >> 8 - literalBits;
// Write symbol table
if(alphabetSize < 2){
encodeBinary(0, 0, headerModel0);
encodeBinary(1, 0, headerModel0);
encodeBinarySequence(8, byte = symbolArray[0], symbolModels[previousByte]);
counters[lastSymbol] = counters[0] = frequencyTable[byte] = 0;
modelStack[stackPointer++] = frequencyTable;
} else {
bitLength = calculateLog2(--distinctSymbols);
count = 0;
if(!entropy){
encodeBinary(0, 1, headerModel0);
encodeBinarySequence(4, 0, headerModel5); // current is the same as previous
} else if(alphabetSize < 21 && calculateDeltaSize(symbolArray, alphabetSize) + 4 < calculateGammaRLESize(frequencyTable, byte)){
encodeBinary(0, 1, headerModel0);
encodeCountWithTree(alphabetSize - 1, 20, 5, headerModel5);
encodeDelta(symbolArray, alphabetSize, output, symbolModels[previousByte], symbolModel1); // save 2-20 symbols
} else {
encodeBinary(0, 0, headerModel0);
encodeBinary(1, 1, headerModel0);
encodeGammaRLE(frequencyTable, byte, output, gammaModel); // encode bit flags by RLE
}
if(bitLength < 5){
encodeBinary(2, 1, headerModel0);
encodeBinarySequence(2, bitLength - 1, headerModel1);
} else {
encodeBinary(2, 0, headerModel0);
encodeCountWithTree(bitLength - 5, contextBits - 4, 4, headerModel7);
}
// Write probabilities using fixed bit code or gamma code
if(bitLength < 3){
for(; count < alphabetSize; ){
encodeBinarySequence(bitLength, counters[count++], lengthModels[bitLength - 1]);
}
} else {
for(; count < alphabetSize; ){
if(byte = distinctSymbols = counters[count++]){
for(encodeBinary(entropy = 0, 0, lengthModel2[bitLength]); byte >>= 1; ){
encodeBinary(entropy++, 0, lengthModel3[bitLength]);
}
if(entropy + 1 < bitLength) encodeBinary(entropy, 1, lengthModel3[bitLength]);
if(entropy) encodeBinaryForward(entropy, distinctSymbols, lengthModel4[bitLength]);
} else {
encodeBinary(0, 1, lengthModel2[bitLength]);
}
}
}
// Compute cumulative frequencies
for(counters[0] = count = distinctSymbols = 0; count < lastSymbol; ){
counters[count + 1] = distinctSymbols += frequencyTable[count++];
}
}
}
onComplete = onComplete || function(output){ return output; };
onProgress = onProgress || function(){};
orderNum &= 3;
hashNum &= 15;
if(!orderNum || hashNum < 9) hashNum = 8;
var inputLength = inputArray.length,
inputPos = 0,
byte,
prefixLength = inputLength < orderNum + 1 ? inputLength : orderNum + 1,
hashMask = -1 >>> 8 * (3 - orderNum),
outputPos = 1,
previousByte,
carryByte = (contextBits &= 7) | (literalBits &= 7) << 3 | orderNum << 6,
carryCount = 0,
low = 0,
range = -1 >>> 0,
frequencyMax = 1 << (contextBits += 8),
output = [hashNum | prefixLength << 4],
frequencyTables = [],
modelStack = [],
stackPointer,
// Header models
headerModel0 = createProbabilityModel(3, 4),
headerModel1 = createProbabilityModel(16, 5),
headerModel5 = createProbabilityModel(16, 5),
headerModel7 = createProbabilityModel(8, 4),
lengthModels = [createProbabilityModel(2, 6), createProbabilityModel(4, 6), createProbabilityModel(8, 5)],
gammaModel = [createProbabilityModel(8, 5), createProbabilityModel(8, 5), createProbabilityModel(32, 4)],
lengthModel1 = [],
lengthModel2 = [],
lengthModel3 = [],
lengthModel4 = [],
symbolModels = [],
symbolModel1 = createProbabilityModel(256, 4);
output.writeBits = writePlainBits;
output.encodeBinary = encodeBinary;
output.encodeSequence = encodeBinarySequence;
blockSize &= -1 >>> 8;
blockSize = !blockSize ? 1 << 24 : blockSize < 16 ? 1 << blockSize + 9 : blockSize < 1024 ? blockSize << 10 : blockSize;
// Write first 0-4 bytes
for(hashNum = 20 - hashNum; inputPos < inputLength && inputPos < prefixLength; previousByte = previousByte << 8 | byte){
writePlainBits(8, byte = inputArray[inputPos++]);
}
// Initialize models
for(prefixLength = 15; prefixLength; lengthModel4[prefixLength] = createProbabilityModel(prefixLength--, 5)){
lengthModel1[prefixLength] = createProbabilityModel(prefixLength, 5);
lengthModel2[prefixLength] = createProbabilityModel(1, 5);
lengthModel3[prefixLength] = createProbabilityModel(prefixLength, 5);
}
for(; prefixLength < 23; ){
gammaModel[prefixLength + 3] = createProbabilityModel(++prefixLength, 4);
}
for(prefixLength = 1 << literalBits; prefixLength--; ){
symbolModels[prefixLength] = createProbabilityModel(256, literalBits < 2 ? 4 : 3);
}
function processBlock(){
if(inputPos >= inputLength){
encodeBinarySequence(5, 0, gammaModel[2]); // end marker
for(inputPos = 5; inputPos--; low = low << 8 >>> 0){
if((prefixLength = low > 0xffffffff) || 255 > low >>> 24){
for(output[outputPos++] = 255 & prefixLength-- + carryByte, carryByte = low >>> 24; carryCount; carryCount--){
output[outputPos++] = 255 & prefixLength;
}
} else {
++carryCount;
}
}
return onComplete(output, inputLength, outputPos);
}
// Decide block size
var blockEnd = inputPos + blockSize,
context = previousByte,
distinctSymbols, entropy, blockStart = inputPos,
counters, frequencyTable,
symbolArray = new Uint8Array(256);
if(blockEnd > inputLength) blockEnd = inputLength;
for(; blockStart < blockEnd; context = context << 8 | distinctSymbols){
frequencyTable = frequencyTables[distinctSymbols = orderNum > 1 ? calculateHash(context & hashMask, hashNum) : context & hashMask];
if(!frequencyTable){
frequencyTable = frequencyTables[distinctSymbols] = new Uint16Array(1028);
frequencyTable.counters = new Uint32Array(frequencyTable.buffer).subarray(257);
}
if(++frequencyTable[distinctSymbols = inputArray[blockStart++]] >= frequencyMax) break;
}
for(prefixLength = stackPointer = 0, blockEnd = blockStart - inputPos; blockEnd >> ++prefixLength; );
encodeBinarySequence(5, prefixLength, gammaModel[2]);
encodeBinaryForward(--prefixLength, blockEnd, gammaModel[3 + prefixLength]);
// Main encoding loop
for(onProgress(inputPos, inputLength);;){
frequencyTable = frequencyTables[orderNum > 1 ? calculateHash(previousByte & hashMask, hashNum) : previousByte & hashMask];
if(!frequencyTable[256]) createNewModel(frequencyTable, symbolArray, 255 & previousByte);
counters = frequencyTable.counters;
byte = inputArray[inputPos++];
previousByte = previousByte << 8 | byte;
distinctSymbols = counters[entropy = frequencyTable[257]];
if(frequencyTable[256] > 1){ // model has multiple symbols
// Encode using arithmetic coding
range = range / distinctSymbols >>> 0;
low += range * counters[byte];
for(range *= frequencyTable[byte]; range < 16777216; low = low << 8 >>> 0, range *= 256){
if((prefixLength = low > 0xffffffff) || 255 > low >>> 24){
for(output[outputPos++] = 255 & prefixLength-- + carryByte, carryByte = low >>> 24; carryCount; carryCount--){
output[outputPos++] = 255 & prefixLength;
}
} else {
++carryCount;
}
}
if(!--frequencyTable[byte + 258]){
frequencyTable[byte + 258] = 32768;
--frequencyTable[256]; // exclude symbol
// Update model by counters
for(prefixLength = distinctSymbols = 0; prefixLength < entropy; ){
counters[prefixLength + 1] = distinctSymbols += frequencyTable[prefixLength] = frequencyTable[prefixLength + 258] & 65535 >> (frequencyTable[prefixLength++] < 32768);
}
}
} else { // only 1 symbol, no need to encode
if(distinctSymbols && !--frequencyTable[byte]){
frequencyTable[byte + 258] = 32768;
--frequencyTable[256];
}
if(inputPos >= blockStart) break;
}
}
if(inputPos < inputLength){
for(; stackPointer; ){
modelStack[--stackPointer][256] = 0; // reset deterministic model
}
}
setTimeout(processBlock, 0);
}
return setTimeout(processBlock, 0);
}
/* Decompression function
@inputArray :要素が0~255の数値配列
@onComplete: call back of last process.
onComplete(A,a,z)
@A: decompressed array of input
@a: input size
@z: decompressed size
@onProgress: call back of progress
onProgress(a,z)
@a: current position
@z: last position
*/
function decompress(inputArray, onComplete, onProgress){
// Read plain bits from input
function readPlainBits(bitCount, result, bit){
for(result = 0; bitCount--; result += result - bit){
for(range >>>= 1, bit = low - range >>> 31, low -= range & --bit; range < 16777216; range *= 256){
low = (low << 8 | inputArray[inputPos++]) >>> 0;
}
}
return result;
}
// Decode binary with model
function decodeBinary(symbolIndex, probabilityModel){
var probability = probabilityModel[symbolIndex],
scaledRange = (range >>> 12) * probability,
bit = 1;
if(low < scaledRange){
range = scaledRange;
} else {
range -= scaledRange;
low -= scaledRange;
bit = 0;
}
// Update probability
for(probabilityModel[symbolIndex] -= (probability - (bit << 12) >> probabilityModel.up) + bit; range < 16777216; range *= 256){
low = (low << 8 | inputArray[inputPos++]) >>> 0;
}
return bit;
}
// Decode binary sequence (reverse order)
function decodeBinarySequence(bitCount, probabilityModel, context, totalBits){
for(context = 1, totalBits = bitCount; bitCount--; ){
context += context + decodeBinary(context, probabilityModel);
}
return context ^ 1 << totalBits;
}
// Decode binary sequence (forward order)
function decodeBinaryForward(bitCount, probabilityModel, result){
for(result = 0; bitCount--; ){
result += decodeBinary(bitCount, probabilityModel) << bitCount;
}
return result;
}
// Decode count with binary tree
function decodeCountWithTree(probabilityModel, bitLength, maxCount, count){
maxCount = (1 << bitLength) - maxCount;
count = decodeBinarySequence(bitLength - 1, probabilityModel);
if(count >= maxCount){
count += count + readPlainBits(1) - maxCount;
}
return count;
}
function decodeModel(frequencyTable, symbolArray, previousByte){
var byte = decodeBinary(0, headerModel0) || 2 + decodeBinary(1, headerModel0),
count, distinctSymbols, entropy, bitLength, alphabetSize = 0,
counters = frequencyTable.counters, frequencies = frequencyTable.subarray(258);
previousByte = literalBits < 2 ? literalBits && previousByte < 97 & 1 : previousByte >> 8 - literalBits;
if(byte == 2){
frequencyTable[entropy = decodeBinarySequence(8, symbolModels[previousByte])] = 0;
alphabetSize++;
entropy++;
modelStack[stackPointer++] = frequencyTable;
} else {
if(byte < 2){
if(count = decodeCountWithTree(headerModel5, 5, 20)){
decodeDelta(symbolArray, alphabetSize = ++count, output, symbolModels[previousByte], symbolModel1);
} else {
for(; count < 256; count++){
if(frequencies[count] >> 15) symbolArray[alphabetSize++] = count;
}
}
} else if(!(alphabetSize = decodeGammaRLD(symbolArray, output, gammaModel))){
return 1;
}
bitLength = decodeBinary(2, headerModel0) ? decodeBinarySequence(2, headerModel1) + 1 : decodeCountWithTree(headerModel7, 4, contextBits - 4) + 5;
count = 0;
// Read probabilities
if(bitLength < 3){
for(; count < alphabetSize; ){
frequencyTable[symbolArray[count++]] = 1 + decodeBinarySequence(bitLength, lengthModels[bitLength - 1]);
}
} else {
for(; count < alphabetSize; frequencyTable[symbolArray[count++]] = distinctSymbols){
if(distinctSymbols = decodeBinary(0, lengthModel2[bitLength]), !distinctSymbols){
for(; distinctSymbols + 1 < bitLength && !decodeBinary(distinctSymbols, lengthModel3[bitLength]); ){
distinctSymbols++;
}
distinctSymbols = 1 << distinctSymbols | (distinctSymbols && decodeBinaryForward(distinctSymbols, lengthModel4[bitLength]));
distinctSymbols++;
}
}
}
}
for(counters[0] = count = distinctSymbols = 0; count < 256; byte && (entropy = count)){
counters[count + 1] = distinctSymbols += byte = frequencies[count] = frequencyTable[count++];
}
frequencyTable[count] = alphabetSize;
frequencyTable[257] = entropy; // alphabet size and last symbol+1
}
var inputPos = 2,
byte = inputArray[1],
count = inputArray[0],
outputPos = 5,
previousByte = 0,
inputLength = inputArray.length,
contextBits = 8 + (byte & 7),
literalBits = byte >> 3 & 7,
orderNum = byte >> 6 & 3,
hashNum = 20 - (count & 15),
low, range = -1 >>> 0,
output = [],
// Static models
frequencyTables = [],
modelStack = [],
stackPointer,
// Header models
headerModel0 = createProbabilityModel(3, 4),
headerModel1 = createProbabilityModel(16, 5),
headerModel5 = createProbabilityModel(16, 5),
headerModel7 = createProbabilityModel(8, 4),
lengthModels = [createProbabilityModel(2, 6), createProbabilityModel(4, 6), createProbabilityModel(8, 5)],
gammaModel = [createProbabilityModel(8, 5), createProbabilityModel(8, 5), createProbabilityModel(32, 4)],
lengthModel1 = [],
lengthModel2 = [],
lengthModel3 = [],
lengthModel4 = [],
symbolModels = [],
symbolModel1 = createProbabilityModel(256, 4);
onComplete = onComplete || function(output){ return output; };
onProgress = onProgress || function(){};
output.readBits = readPlainBits;
output.decodeBinary = decodeBinary;
output.decodeSequence = decodeBinarySequence;
// Initialize decoder state
for(; --outputPos; ){
low = (low << 8 | inputArray[inputPos++]) >>> 0;
}
for(count >>= 4; outputPos < count; previousByte = previousByte << 8 | byte){
output[outputPos++] = byte = readPlainBits(8);
}
// Initialize models
for(count = 15; count; lengthModel4[count] = createProbabilityModel(count--, 5)){
lengthModel1[count] = createProbabilityModel(count, 5);
lengthModel2[count] = createProbabilityModel(1, 5);
lengthModel3[count] = createProbabilityModel(count, 5);
}
for(; count < 23; ){
gammaModel[count + 3] = createProbabilityModel(++count, 4);
}
for(count = 1 << literalBits; count--; ){
symbolModels[count] = createProbabilityModel(256, literalBits < 2 ? 4 : 3);
}
function processBlock(){
var entropy, blockSize = decodeBinarySequence(5, gammaModel[2]),
hashMask = -1 >>> 8 * (3 - orderNum),
counters, frequencyTable,
symbolArray = new Uint8Array(256);
if(!blockSize) return onComplete(output, inputLength, outputPos);
blockSize = 1 << --blockSize | decodeBinaryForward(blockSize, gammaModel[3 + blockSize]);
stackPointer = 0;
for(onProgress(inputPos, inputLength); blockSize--; output[outputPos++] = byte){
frequencyTable = frequencyTables[entropy = orderNum > 1 ? calculateHash(previousByte & hashMask, hashNum) : previousByte & hashMask];
if(!frequencyTable){
frequencyTable = frequencyTables[entropy] = new Uint16Array(1028);
frequencyTable.counters = new Uint32Array(frequencyTable.buffer).subarray(257);
}
if(!frequencyTable[256]) decodeModel(frequencyTable, symbolArray, 255 & previousByte);
counters = frequencyTable.counters;
var lastSymbol = frequencyTable[257],
byte = 0,
count = lastSymbol - 1,
distinctSymbols = counters[lastSymbol];
if(frequencyTable[256] > 1){
range = range / distinctSymbols >>> 0; // decode
for(entropy = low / range >>> 0; byte < count; counters[distinctSymbols + 1] > entropy ? count = distinctSymbols : byte = ++distinctSymbols){
distinctSymbols = byte + count >> 1; // binary search
}
low -= range * counters[byte];
for(range *= frequencyTable[byte]; range < 16777216; range *= 256){
low = (low << 8 | inputArray[inputPos++]) >>> 0;
}
if(!--frequencyTable[byte + 258]){ // reduce counter
frequencyTable[byte + 258] = 32768;
--frequencyTable[256];
// Update model by counters
for(count = distinctSymbols = 0; count < lastSymbol; ){
counters[count + 1] = distinctSymbols += frequencyTable[count] = frequencyTable[count + 258] & 65535 >> (frequencyTable[count++] < 32768);
}
}
} else if(distinctSymbols){ // symbol count reached 1
for(entropy = low / range >>> 0; byte < count; counters[distinctSymbols + 1] > entropy ? count = distinctSymbols : byte = ++distinctSymbols){
distinctSymbols = byte + count >> 1;
}
if(!--frequencyTable[byte]){
frequencyTable[byte + 258] = 32768;
--frequencyTable[256];
}
} else {
byte = count; // model started with 1 symbol
}
previousByte = previousByte << 8 | byte;
}
for(; stackPointer; ){
modelStack[--stackPointer][256] = 0; // reset deterministic model
}
setTimeout(processBlock, 0);
}
return setTimeout(processBlock, 0);
}
let src=Uint8Array.from("Peter piper picked a peck of pickled peppers. A peck of pickled peppers Peter Piper picked. If Peter Piper picked a peck of pickled peppers, Where's the peck of pickled peppers Peter Piper picked",a=>a.charCodeAt()),
rate=(pos,last)=>console.log((pos/last*1e4|0)/100,"%");
compress(src, 7, 0, 1, 0, 8, (encData,inL,outL)=>{
decompress(encData,(decData,inL,outL)=>{
document.write("comp size:",inL," dec size:",outL,"<hr>",encData,"<hr>",decData)
},rate)
},rate)
進捗表示の為にsetTimeoutを使っていますが、どうでもいい場合はprocessBlock()に置き換えて使って下さい。その方がcompress()もdecompress()も返値的に使いやすいです。
codepen的実験