watsonx.data 2.0.0 で Fragment result cache を構成してみた (キャッシュ編その3)

Last updated at 2024-11-22Posted at 2024-07-03

はじめに

watsonx.data はキャッシュを構成する事により Presto (Java) エンジンを使用した照会のパフォーマンスを向上する事ができます。
本投稿内の Presto は全て Presto (Java) を意味しています。

参考文献
RaptorX: Building a 10X Faster Presto

この文書によると Presto のキャッシュは階層構造になっており、5種類の異なるキャッシュを構成する事により、リモートのストレージからメタデータやデータの取得に要する時間を大幅に短縮する事ができます。
watsonx.data 2.0.0 に含まれる Presto でもこの5種類のキャッシュはサポートされており構成する事が可能です。参考文献の内容とwatsonx.dataでキャッシュの名前や指定できるプロパティーが一部異なりますが目的や特徴は同じで、下記の5種類のキャッシュを構成する事ができます。

キャッシュの種類	キャッシュの場所	キャッシュの場所	特徴
Metastore versioned cache	コーディネーター	メモリー	メタデータのキャッシュ
File list cache	コーディネーター	メモリー	ファイルのリストのキャッシュ
File and stripe footer cache	ワーカー	メモリー	ファイルの記述子、Stripe、フッターのキャッシュ
Fragment result cache	ワーカー	ディスク	読み取ったデータのキャッシュ
Fragment result cache	ワーカー	ディスク	分割して処理されたデータのキャッシュ

キャッシュのアーキテクチャーを図にすると以下のようになります。
詳しくは参考文献をご覧ください。
5種類のキャッシュの内、Metastore versioned cache と File and stripe footer cache については watsonx.data 2.0.0 ではデフォルトで使用可能となっています。

今回は、5種類のキャッシュの中から Fragment result cache について構成方法をご紹介します。
Fragment result cache はデフォルトでは使用可能になっていないため構成手順に従って構成する必要があります。

File list cache については以下の記事をご参照ください。
watsonx.data 2.0.0 で File list cache を構成してみた (キャッシュ編その1)

Data cache については以下の記事をご参照ください。
watsonx.data 2.0.0 で Data cache を構成してみた (キャッシュ編その2)

Fragment result cache について

ここで Fragment result cache について、もう少し詳しく説明します。
Presto のワーカーは、複数の照会で処理が重複するのを防ぐために、部分的に処理されたデータの中間結果をローカル・ストレージにキャッシュすることができます。パーティション化されたデータが別々のワーカーで処理される場合、結果はそれぞれのワーカーにキャッシュされて中間結果となります。別の照会が発行されてキャッシュに保存された結果を利用できる場合、キャッシュに存在しない結果のデータ分だけ処理すれば良いため照会のパフォーマンスを向上する事ができます。

以下に Fragment result cache を構成する手順を記述します。

Fragment result cache の構成手順

参考文献 (watsonx.data 2.0.0 のマニュアル)
Enhancing the query performance through caching

Fragment result cache は wxdengine カスタム・リソースに Fragment result cacheのプロパティーを追加する事により構成します。以下が Fragment result cache を構成するプロパティーとなります。

プロパティー	意味
fragment_result_cache_enabled	Fragment result cache を Enable にするかどうか
fragment_result_cache_max_cached_entries (オプション)	最大のキャッシュのエントリー数
fragment_result_cache_ttl (オプション)	キャッシュのエントリーから削除されるまでの時間。デフォルトは 36h。
fragment_result_cache_partition_statistics_based_optimization_enabled (オプション)	統計ベースのオプティマイザーを使用するかどうか。デフォルトは true。
fragmentCacheStorageClass	Fragment result cacheを作成するストレージ・クラス
fragmentCacheStorageSize	キャッシュとして使用するストレージのサイズ

今回は新規に作成した Persistent Volume (PV)を Fragment result cache のストレージとして使用します。watsonx.data 2.0.x のマニュアル上ではオプションとなっていますが、PVを新規に作成作成する事により、ワーカー・ポッドがマウントしたローカル・ボリュームをキャッシュとして使用する事ができるため、キャッシュの読み書きのパフォーマンスの向上を期待する事ができます。
ODF等の既存のコンテナ・ストレージのストレージ・クラスを使用すると短い手順で Fragment result cache を構成する事ができますが、キャッシュの読み書きがネットワーク経由となり遅延が発生する可能性があります。

1.OCPクラスターにログイン

"oc login" コマンドでOCPクラスターにログインします。

2.watsonx.data のプロジェクトに変更

作業中のプロジェクトをwatsonx.data がインストールされているプロジェクトに変更します。今回の環境では watsonx.data は名前スペース zen にインストールされています。

export PROJECT_CPD_INST_OPERANDS=zen
$ oc project ${PROJECT_CPD_INST_OPERANDS}

3.ワーカー・ノードの中に必要なディレクトリーを作成

デバッグセッションで、OpenShiftの全ワーカー・ノードの中に必要なディレクトリーを作成します。

① ワーカーノードの確認

"oc get node"でOCPのノードの一覧を出力して、NAMEとROLESからワーカーノードを判断します。ストレージノードは対象外です。
今回の環境では worker-1～worker-5 がワーカーノードになります。

$ oc get node
NAME        STATUS   ROLES                  AGE   VERSION
master-1    Ready    control-plane,master   38d   v1.26.13+8f85140
master-2    Ready    control-plane,master   38d   v1.26.13+8f85140
master-3    Ready    control-plane,master   38d   v1.26.13+8f85140
storage-1   Ready    worker                 38d   v1.26.13+8f85140
storage-2   Ready    worker                 38d   v1.26.13+8f85140
storage-3   Ready    worker                 38d   v1.26.13+8f85140
worker-1    Ready    worker                 38d   v1.26.13+8f85140
worker-2    Ready    worker                 38d   v1.26.13+8f85140
worker-3    Ready    worker                 38d   v1.26.13+8f85140
worker-4    Ready    worker                 38d   v1.26.13+8f85140
worker-5    Ready    worker                 38d   v1.26.13+8f85140

② ワーカー・ノードの中に必要なディレクトリーを作成

全てのワーカーノードの中に、Fragment result cache が使用するディレクトリーを作成します。
このディレクトリーを作成するファイルシステムには、Fragment result cache を作成するために十分な容量が必要となります。今回の環境では /var の下に十分な容量があるため /var の下にサブディレクトリーを作成します。
PV は Fragment result が構成される Presto のPodの数だけ必要になります。コーディネーターとワーカー両方のポッドに Fragment result cache が作成されます。今回の環境ではコーディネーターのポッドが1個、ワーカーのポッドが5個稼働していますので6個のPVが必要です。
少々複雑ですが、各PVはワーカーノードの中の特定のディレクトリー (path)を指定して作成します。ポッドが起動する時に Fragment result cahe のためにどのPVを使用するかは制御できませんので、どのPVが使用されても良いように、全てのワーカーノードの中に予めPV用の6個のディレクトリーを作成しておきます。
以下にその手順を示します。
まず、"oc debug node/<ワーカーノード名>" を実行して、ワーカーノードの中に入ります。Warningが表示される場合がありますが無視します。少し待つとプロンプトが表示されますので "chroot /host" を実行します。
この後でPV用の6個のディレクトリーを作成します。ディレクトリーが作成された事を確認したら "exit" を 2回実行してノードのデバッグから抜けます。
この作業を全てのノードで実行します。

$ oc debug node/worker-1
Warning: would violate PodSecurity "baseline:v1.24": host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/worker-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.252.14
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# mkdir -p /var/fragmentCache/pv1
sh-5.1# mkdir -p /var/fragmentCache/pv2
sh-5.1# mkdir -p /var/fragmentCache/pv3
sh-5.1# mkdir -p /var/fragmentCache/pv4
sh-5.1# mkdir -p /var/fragmentCache/pv5
sh-5.1# mkdir -p /var/fragmentCache/pv6
sh-5.1# ls /var/fragmentCache
pv1  pv2  pv3  pv4  pv5  pv6
sh-5.1# exit
exit
sh-4.4# exit
exit
Removing debug pod ...

4.PVを作成するYAMLファイルを作成

PVを作成するためのYAMLファイルを作成します。
・storageClassName には任意の名前を指定します。
・path には 2 のステップでワーカーノードに作成したディレクトリーを指定します。
・nodeAffinity の values には、ワーカーノード名を列挙します。
name と path が異なる6個のYAMLファイルを作成します。
fragment-cache-pv2.yaml では、"name: fragment-cache-storage-pv2","path: /var/fragmentCache/pv2" のような指定をして6個のYAMLファイルを作成します。

例)

$ cat fragment-cache-pv1.yaml
apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: fragment-cache-storage-pv1
  spec:
    capacity:
      storage: 10Gi
    volumeMode: Filesystem
    accessModes:
    - ReadWriteOnce
    persistentVolumeReclaimPolicy: Delete
    storageClassName: fragment-cache-storage
      local:
        path: /var/fragmentCache/pv1
  nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - worker-1
            - worker-2
            - worker-3
            - worker-4
            - worker-5

5.PVを作成

Prestoのポッドが複数ある場合は、name の値が異なるYAMLファイルを Prestoのポッドの数 (コーディネーター + ワーカー) だけ作成して、PVを作成します。今回の環境では1個のコーディネーター・ポッドと5個のワーカー・ポッドが存在するため6個のPVを作成します。

$ oc apply -f fragment-cache-pv1.yaml
persistentvolume/fragment-cache-storage-pv1 created
..........
$ oc apply -f fragment-cache-pv6.yaml
persistentvolume/fragment-cache-storage-pv6 created

6.Fragment result cache を設定する Presto のエンジンIDを確認

$ oc get wxdengine -o custom-columns='DISPLAY NAME:spec.engineDisplayName,ENGINE ID:metadata.labels.engineName'
DISPLAY NAME   ENGINE ID
presto-01      presto-01

7.Fragment result cache を設定する Prestoエンジンのステートフルセットを全て削除

$ oc delete statefulset -l engineName=presto-01
statefulset.apps "ibm-lh-lakehouse-presto-01-coordinator-blue" deleted
statefulset.apps "ibm-lh-lakehouse-presto-01-presto-worker" deleted
statefulset.apps "ibm-lh-lakehouse-presto-01-single-blue" deleted

8.wxdengine カスタム・リソースに patch を適用して Fragment result cache を構成

"oc patch" コマンドは statefulset を削除した直後に実行する必要があるため、コピー＆ペーストできるように用意しておく事をお勧めします。

今回は下記のプロパティーの設定で Fragment result cache を作成します。

プロパティー	値
fragment_result_cache_enabled	true
fragment_result_cache_max_cached_entries	1000000
fragment_result_cache_ttl	36h
fragment_result_cache_partition_statistics_based_optimization_enabled	true
fragmentCacheStorageClass	fragment-cache-storage
fragmentCacheStorageSize	10Gi

"oc patch"コマンドで Fragment result cache を設定

$ oc patch wxdengine/lakehouse-presto-01 --type=merge -p '{ "spec":  { "fragment_result_cache_enabled": "true", "fragment_result_cache_max_cached_entries": "100000", "fragment_result_cache_ttl": "36h","fragment_result_cache_partition_statistics_based_optimization_enabled": "true", "fragmentCacheStorageClass": "fragment-cache-storage", "fragmentCacheStorageSize": "10Gi" } }'
wxdengine.watsonxdata.ibm.com/lakehouse-presto-01 patched

しばらくすると Presto のステートフルセットとポッドが再起動しますので確認します。

$ oc get statefulset | grep presto
bm-lh-lakehouse-presto-01-coordinator-blue    1/1     2m47s
ibm-lh-lakehouse-presto-01-presto-worker      5/5     2m46s
ibm-lh-lakehouse-presto-01-single-blue        0/0     2m53s

$ oc get pod | grep presto
ibm-lh-lakehouse-presto-01-coordinator-blue-0    1/1     Running     0    2m55s
ibm-lh-lakehouse-presto-01-presto-worker-0       1/1     Running     0    2m55s
ibm-lh-lakehouse-presto-01-presto-worker-1       1/1     Running     0    2m55s
ibm-lh-lakehouse-presto-01-presto-worker-2       1/1     Running     0    2m55s
ibm-lh-lakehouse-presto-01-presto-worker-3       1/1     Running     0    2m55s
ibm-lh-lakehouse-presto-01-presto-worker-4       1/1     Running     0    2m55s

9.ポッドの中に /mnt/flash/fragment ディレクトリーが作成されている事を確認

$ oc exec -it ibm-lh-lakehouse-presto-01-presto-worker-0 -- bash
bash-4.4$ ls /mnt/flash
fragment

/mnt/flash/fragment ディレクトリーはコーディネーターのポッドの中にも作成されています。念のため全てのポッドにディレクトリーが作成されている事を確認しましょう。

10.Persistent Volume Claim (PVC) を確認

マニュアルには記載がありませんが Fragment result cache を作成すると PVC が作成され、4 と 5 のステップで作成したPVと接続 (Bound) されている事が確認できます。

$ oc get pvc | grep fragment-cache
ibm-lh-fragment-cache-mount-ibm-lh-lakehouse-presto-01-coordinator-blue-0   Bound    fragment-cache-storage-pv4                 10Gi       RWO            fragment-cache-storage        8m31s
ibm-lh-fragment-cache-mount-ibm-lh-lakehouse-presto-01-presto-worker-0      Bound    fragment-cache-storage-pv5                 10Gi       RWO            fragment-cache-storage        8m31s
ibm-lh-fragment-cache-mount-ibm-lh-lakehouse-presto-01-presto-worker-1      Bound    fragment-cache-storage-pv6                 10Gi       RWO            fragment-cache-storage        8m31s
ibm-lh-fragment-cache-mount-ibm-lh-lakehouse-presto-01-presto-worker-2      Bound    fragment-cache-storage-pv2                 10Gi       RWO            fragment-cache-storage        8m31s
ibm-lh-fragment-cache-mount-ibm-lh-lakehouse-presto-01-presto-worker-3      Bound    fragment-cache-storage-pv1                 10Gi       RWO            fragment-cache-storage        8m31s
ibm-lh-fragment-cache-mount-ibm-lh-lakehouse-presto-01-presto-worker-4      Bound    fragment-cache-storage-pv3                 10Gi       RWO            fragment-cache-storage        8m31s

1番左の列が作成された PVCの名前、3番目の列がPVの名前です。2番目の列が全て Bound であれば、PVC と PV が接続されていて、作成した PV が Fragment result cache として正常に使用可能である事が確認できます。Bound ではなく Available になっている場合は PV が Fragment result cache として使用されませんので Fragment result cacheの作成手順に問題が無いかどうか確認して作り直す必要があります。
キャッシュを削除する手順については別途投稿する予定です。

おわりに

今回は watsonx.data 2.0.0 の Presto の照会のパフォーマンスを向上するためのキャッシュの1つである Fragment result cache の構成について紹介しました。
watsonx.data 1.1.x でも全く同じ手順で構成する事ができます。

キャッシュを削除する手順については、以下の記事をご参照ください。
watsonx.data 2.0.0 で構成したキャッシュを削除する (キャッシュ編その4)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

watsonx.data 2.0.0 で Fragment result cache を構成してみた (キャッシュ編 その3)