More than 3 years have passed since last update.

PynamoDBについて

Last updated at 2021-03-17Posted at 2021-01-02

#PynamoDBとは
boto3のラッパーライブラリです。boto3で書くと複雑になってしまうところをシンプルにかけます。
ドキュメントには以下のように書かれています。

But why stop there? PynamoDB also supports:

・Sets for Binary, Number, and Unicode attributes
・Automatic pagination for bulk operations
・Global secondary indexes
・Local secondary indexes
・Complex queries

実装を読みながら各項目について私なりに解釈していきます。

#Sets for Binary, Number, and Unicode attributes
Binary,Number,Unicodeをはじめ様々な属性を定義することができます。

class UserModel(Model, User):
    class Meta:
        table_name = "User"
        region = "ap-northeast-1"
        billing_mode = "PAY_PER_REQUEST"
    
    user_id = UnicodeAttribute(hash_key=True)
    execution_date_time = UnicodeAttribute(range_key=True)
    target_month = UnicodeAttribute()
    charge = NumberAttribute(null=True)
    user_type = UnicodeEnumAttribute(UserType)
    secondary_index = SecondaryIndex()

attributes.pyをみると、Pynamodbが内部でシリアライズすることで属性の型をDynamoDBが許容する型に変更してくれています。逆にデシリアライズすることでDynamoDBに格納されたデータをモデルで使用する型に変換してくれます。

Pynamodbに用意された型以外の型を利用したい場合は、カスタム属性で独自に作成する必要があります。

面倒な人はpynamodb-attributesという便利なライブラリもあるのでこれを利用してもいいかもです。
ちなみに上のUnicodeEnumAttributeはこのライブラリを利用しています。

#Automatic pagination for bulk operations
分割されたページを自動取得してくれます。

Dynamodbは一度に1MBまでのデータしか取得することができないので、boto3でscanやqueryを使用する際は例えば以下のように書くと思います。(公式ドキュメントから引用)
ResponseのLastEvalutedKeyを確認し、必要であればこれをリクエストに追加して再取得するようにしていますね。

done = False
start_key = None
while not done:
    if start_key:
       scan_kwargs['ExclusiveStartKey'] = start_key
    response = table.scan(**scan_kwargs)
    display_movies(response.get('Items', []))
    start_key = response.get('LastEvaluatedKey', None)
    done = start_key is None

記述量が多いので、独自のラッパークラスを作っている方もいらっしゃるのではないでしょうか。
こういった記述がPynamodbでは不要になる（Pynamodb内部で行ってくれている）のです。

Pynamodbではscan,queryの戻り値の型はResultIteratorと呼ばれるイテレータです。
実装ではscanする前に__next__でExclusiveStartKeyにLastEvaluatedKeyを代入しています。
LastEvaluatedKeyが空になるまでページを繰り返し取得できます。

また、__next__が呼ばれるごとにDynamoDBへアクセスするのでメモリも効率的に利用できます。
仮に10000件レコードがあり、1000件でlastevalueatedkeyがあったとすると、1000件だけ取得されてメモリに展開されていて、残りの9000件は未取得でメモリに展開されていないといった感じです。

#Secondary indexes
・Global secondary indexes
・Local secondary indexes
はどちらもテーブルのモデル定義と似た書き方でIndexを貼ることができます。
継承元がLocalSecondaryIndexになるかGlobalSecondaryIndexになるかの違いはあります。


class SecondaryIndex(LocalSecondaryIndex):
   class Meta:
       index_name = "secondaryIndex"
       projection = AllProjection()
   
   id = UnicodeAttribute(hash_key=True)
   target_month = UnicodeAttribute(range_key=True)

idというハッシュキーに対してtarget_monthというLocalSecondaryIndexを貼っています。

クエリもシンプルにかけます。
[モデルクラス].[モデルクラスに定義したindex名].query()です。

UserModel.secondary_index.query(target_month)

#Complex queries
expressionを利用して複雑なクエリをシンプルに書けます。
クエリに含めることができる引数はこちらにあり、例えば以下のように使うことができます。


filter_condition = (
    UserModel.user_type == user_type
    if user_type
    else None
)

users = list(
    UserModel.query(
        hash_key=id,
        range_key_condition=range_key_condition,
        filter_condition=filter_condition,
        consistent_read=False,
        scan_index_forward=False,
    )
)

よく使用する引数ですが、以下の通りです。
hash_key:ハッシュキーの指定
range_key_condition:レンジキーの指定
filter_condition:ハッシュキー、レンジキー以外の属性での指定
consistent_read:強い整合性のある読み込みをするかどうか
scan_index_forward:昇順に並べるか(True)、降順に並べるか(False)

※ちなみにPynamodbに限らずですが、filter_conditionはDynamoDBからレスポンスを受けた後に項目をフィルタリングするものなので、このfilterの結果が0件になったとしてもResponseのサイズが小さくなるわけではありませんので注意をしてください。通信量もフィルタ前の件数に応じて決定します。

filter_conditionでよく使用しますがCondition Expressionsに条件式の書き方があります。これを利用して条件付き操作もできます。

info = UserModel(id="abc",name="Taro")
info.save(UserModel.id.exists())

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up