More than 1 year has passed since last update.

大容量JSONファイルのブラウザでの読み込み

Posted at 2022-09-03

ローカルにある大容量のJSONファイルをブラウザで読みたい

前提

ブラウザの文字列変数には使用上の上限があるらしい。
(http://var.blog.jp/archives/52543754.html)
1GBのJSONファイルをJSON.parseで解釈しようとするとエラーを吐く。
（実際にはその全段階で文字列変数に突っ込んだ段階でエラー）
Googleのサービス（ロケーション等）でダウンロードしてJSONなどは平気で数百メガバイトいくため何とかしよう。
以下のサンプルはGoogleサービスでよくみるJSONファイルを前提としています。

※当然搭載メモリによる上限はあります。今回はあくまでブラウザの仕様上の上限突破を目指します。

ちょっとづつ読み込んで配列に突っ込む

GoogleのJSONは1行に一つしか'{'or'}'がない。
・・・ということでまずは、一行ずつJSONファイルを読み込む

FileReaderクラスを拡張してバイト単位の読出しに対応

class LargeFileReader extends FileReader{
  constructor(){
    super();
  }
  readAs(blob, ctx){
    return new Promise((res, rej) => { 
      super.addEventListener('load', ({target}) => res(target.result));
      super.addEventListener('error', ({target}) => rej(target.error));
      super[ctx](blob);
    });
  }
  readAsArrayBuffer(blob){
    return this.readAs(blob, 'readAsArrayBuffer');
  }
  readAsText(blob){
    return this.readAs(blob, 'readAsText');
  }
  readAsDataURL(blob){
    return this.readAs(blob, 'readAsDataURL');
  }
}

巨大ファイルを改行区切りで読み込みます

改行はCR+LF or LFに対応

  async asyncReadLargeText(f, encode){
    const result = [],
      td = new TextDecoder(encode),
      re = new RegExp('\r?\n');
    //Stringの配列を返す
    //Return String array.
    function blob2arrayText(remain, b = new Uint8Array()){
      const nowBlob = new Uint8Array(remain.length + b.length)
      nowBlob.set(remain);
      nowBlob.set(b, remain.length);
      const arr = Uint8Array.from(nowBlob),
        dec = td.decode(arr),
        lines = dec.split(re);
      return lines;
    }
    //指定バイトづつ読込
    //Reading text file per specified byte size.
    const chunkSize = 1024*1024*500;
    //各読み出し単位毎の最後の改行以降のデータを格納
    //Stores the data after the last newline for each read unit.
    let remainBlob = new Uint8Array(),
    //whileループにおける、処理取りこぼしの検知用 (取りこぼしがない場合は-1)
    //For detecting undone operations in the while loop. (-1 if no undone)
      noNewLine = -1,
      offset = 0;
    while(offset < f.size){
      noNewLine = offset;
      //指定バイトのblob作成
      const slice = f.slice(offset, offset + chunkSize);
      offset += chunkSize;
      const buffer = await readAsArrayBuffer(slice).catch( e => console.log(e)),
        view = new Int8Array(buffer);
      for(let i = view.length-1; i >= 0; i--){
        //バイト列の後方から改行をさがす。
        //Search for newlines from the back of the byte sequence.
        const v = view[i];
        if(v === 13 || v === 10){//  \r = 13 , \n = 10
          //最後の改行より前部分を切り取り（改行含む）
          const newBlob = view.slice(0, i+1),
            lines = blob2arrayText(remainBlob, newBlob);
          for(let i=0, len=lines.length; i < len; i++){
            if(lines[i] !== '') result.push(lines[i]);
          }
          //最後の改行以降を保存して次回ループ時に結合
          remainBlob = view.slice(i+1);
          noNewLine = -1;
          break;
        }
      }
    }
    if(remainBlob.length > 0){
      blob2arrayText(remainBlob);
      noNewLine = -1;
    }
    //読み出し単位データ内に改行がない場合の補完
    if(noNewLine >= 0){
      const slice = f.slice(noNewLine),
        buffer = await rd.readAsArrayBuffer(slice).catch( () => result);
      blob2arrayText(remainBlob, new Int8Array(buffer));
    }
    return result;
  }//end of AsyncReadLargeText

読み込んだ行ごとの配列を一個づつのJSONへ組み立てなおします。
{}の出現回数をカウントして、一塊のJSONができ次第、パースしています。

readjsonfile.js

readjsonfile(txtArr){
    //JSONファイルのルートキーを指定
    //今回はGoogleのロケーション履歴を例にしている
    const key = 'locations';
    const arr = {[key]:[]};
    let recode = '{',
      k = 1;
    //let curl=0;//一行に複数個{}がある場合を想定した動き、今回はコメントアウト
    for(let i=2, len=txtArr.length; i<len; i++){
      const s = txtArr[i];
      //const part = s.split('}');
      //一行に{}は各一個までの想定
      if(s.indexOf('}') !== -1) k--;
      if(k === 0){
        recode = recode + '}';
        try{
          //一つの配列ごとにJSONを組み立ててパース
          const js = JSON.parse(recode);
          if(Object.keys(js).length > 0) arr[key].push(js);
        }catch(e){
          continue;
        }
        recode = '{';
        k++;
        continue;
      }
      if(s.indexOf('{') !== -1) k++;
      //curl = curl - (part.length - 1);
      //if(curl > 0) recode = recode + s;
      recode = recode + s;
      //curl = curl + (s.split('{').length - 1);
    }
    return arr;
  }

これでルートキー以下の要素が一つづつ収まった配列が出来上がります。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up