More than 1 year has passed since last update.

某記事のNoSQLの説明の補足 ~Dynamo styleを引き継いだCassandraと比べて、より新しいDynamoDB, CosmosDB, FireStoreではアーキテクチャは大きく異なるというお話~

Last updated at 2023-08-18Posted at 2023-08-17

はじめに

こちらの記事のNoSQLに関する説明の補足です。
NoSQL製品は無数に存在し全てを網羅することは難しいことから、3大パブリック・クラウドのNoSQL DBである、DynamoDB, CosmosDB, Cloud FireStoreについて主に言及しています。
※当該記事で言及のあるCassandra自体は代表的なNoSQL DBのうちの一つではあるものの、3大パブリック・クラウドのNoSQL DBである、DynamoDB, CosmosDB, Cloud FireStoreとは仕組みが大きく異なり、この製品一つでNoSQLを代表させるのは無理があります

補足1: 現状の3大パブリック・クラウドのNoSQLデータベースは全てシングル・アイテムの操作に対するStrong Consistencyを提供しています

こちらの記事ではNoSQLとしてCassandraのみが挙げられており、Pros/Consが次のように説明されています。

【Pros】
スケーラビリティに優れる。線形スケールアウト可能とも評される。
ネットワーク分断があっても何らかの形で動き続ける、高い可用性を持つ。

【Cons】
SQLインターフェースを持たない。
整合性はEventual Consistencyと言われるもので、ほぼ何も保証ができないに等しい。

しかしながら、現状(2020年時点でも)の3大パブリック・クラウドで提供されている以下のNoSQL DBは全てシングル・アイテムのStrong Consistencyを提供しています。
以下、各製品の公式ドキュメントの記述となります。

DynamoDB

ConsistentRead
Determines the read consistency model: If set to true, then the operation uses strongly consistent reads; otherwise, the operation uses eventually consistent reads.

CosmosDB

Strong consistency
Strong consistency offers a linearizability guarantee. Linearizability refers to serving requests concurrently. The reads are guaranteed to return the most recent committed version of an item. A client never sees an uncommitted or partial write. Users are always guaranteed to read the latest committed write.

Cloud FireStore

Strong reads
By default, Cloud Firestore reads are strongly consistent. This strong consistency means that a Cloud Firestore read returns the latest version of the data that reflects all writes that have been committed up until the start of the read.

上記製品のうち、DynamoDB, CosmoDBに関してはデータの整合性とのトレードオフでレイテンシー、可用性、パーティション単位での読み取りスループットの改善のためにEventual Consistencyを選択することも可能です。
また、これらの製品のEventual Consistencyは本来の意味での「何の保障性もない」値ではなく、パーティション内の単一のリーダー・ノードから直列に更新が伝搬されます。
よって、最新の値でない(stale readとなる)可能性はあるものの、過去時点にパーティション内で合意形成された値となります(Dirty Readや、更新の順序がノードごとに異なるといった事象は発生しないはず)

各製品の公式ドキュメントでは次のように説明されています。

DynamoDB

Eventually consistent is the default read consistent model for all read operations. When issuing eventually consistent reads to a DynamoDB table or an index, the responses may not reflect the results of a recently completed write operation. If you repeat your read request after a short time, the response should eventually return the more recent item.

CosmosDB

In Eventual consistency, the client issues read requests against any one of the four replicas in the specified region. This replica may be lagging and could return stale or no data.

加えて、これらの製品ではEventual Consistencyの選択時にRCU/RUの消費が半分になる(パーティションあたりのスループットが倍になる)という機能が提供されています。

DynamoDB

One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size.

CosmosDB

Your choice of consistency model also affects the throughput. You can get approximately 2x read throughput for the more relaxed consistency levels (session, consistent prefix and eventual consistency) compared to stronger consistency levels (bounded staleness or strong consistency).

Cloud FireStore

Cloud FireStoreのStale readに関しても名称は異なるものの、同様に可用性、レイテンシー等とデータの整合性とのトレードオフになると考えられます。
※私が調べた範囲では、スループット向上の機能は提供されていない？ようです。

Stale reads
Strong reads are the default mode in Cloud Firestore. However, it comes at a cost of potential higher latency due to the communication that may be required with the leader. Often your Cloud Firestore application doesn’t need to read the latest version of the data and the functionality works well with data that may be a few seconds stale.

以上のことから、これらのパブリック・クラウドのNoSQL DBにおいて「整合性はEventual Consistencyと言われるもので、ほぼ何も保証ができないに等しい」というのは誤りであり、「Strong Consistencyでの操作を提供している」「Eventual Consistencyについても、本来の意味ではない」事がわかります。

補足2: 3大パブリック・クラウドのNoSQLデータベースは全てパーティション内で合意形成を行い、リーダーノードの選出およびログのレプリケーションを実施します

以下の箇所にて、次の記述がありますが、

NoSQL(Cassandra)はAP 、整合性を妥協し、可用性とネットワーク分断耐性を高めている。
...
Pで問題が発生しても整合性を維持: これは今までにない特徴で、Paxos/Raftで実現

実際には3大パブリック・クラウドのNoSQLデータベースは全てPaxosプロトコルにより、パーティション内のリーダーの選出、ログのレプリケーションに関して合意形成する仕組みであり、「これは今までにない特徴」というところに関して明確に誤りです。
以下、各NoSQL DBの仕組みについての説明の公式の説明とそれぞれのサービスのリリース日時となります。

DynamoDB

Amazon DynamoDB: A Scalable, Predictably
Performant, and Fully Managed NoSQL
Database Service

The replicas for a partition form a replication group. The replication group uses Multi-Paxos [14] for leader election and consensus.Any replica can trigger a round of the election. Once elected leader, a replica can maintain leadership as long as it periodically renews its leadership lease.

DynamoDBは2012年にリリースされています。

CosmosDB

Azure Cosmos DB: Low Latency and High Availability at Planet Scale

If a primary goes down, a new primary is automatically elected from the remaining replicas using Paxos.

CosmosDBは2017年にリリースされています。

Cloud FireStore

Cloud FireStoreはそもそも内部でデータの保持にCloud Spannerを利用しています。

OSSの代表的なNoSQL DBの一つであるMongoDBに関しても2017年リリースの3.6.0からRaftプロトコルを使ったレプリケーションを実装しています。

以上のことから、2020年時点でNoSQL界隈でも標準的に採用されている技術ではないでしょうか。
※Cassandraに関しては2023年現在でこの仕組みは採用されていないようです

また、様々なところで「3つの特性の中から2つを選ぶ」というCAP定理の説明は誤解を招くことが懸念されている※1ことから、データベースの種別の分類に利用するのはあまり好ましくないのではないでしょうか。

※1: CAP Twelve Years Later: How the "Rules" Have Changed
, A Critique of the CAP Theorem

補足3: 複数パーティション/アイテムをまたがるACIDトランザクションをサポートするNoSQL製品は複数存在します

DynamoDBでは2018年に複数パーティション/アイテムを跨ぐACIDトランザクションをサポートしています。

FireStoreに関しては2019年のGA当初からトランザクションをサポートしています。

私が調べた範囲では2023年現在、AzureのCosmosDBについてはシングル・パーティション内でのトランザクションのみをサポートしているようです。

Azure Cosmos DB supports full ACID compliant transactions with snapshot isolation for operations within the same logical partition key.

Multi-document transactions are supported within an unsharded collection. Multi-document transactions aren't supported across collections or in sharded collections. The timeout for transactions is a fixed 5 seconds.

加えて複数のノードに跨ぐ分散トランザクションの実行は可用性や性能の問題が発生する懸念があることから、無闇矢鱈に利用すべきものではない認識です。
例えばGoogle Spannerについては次のように性能や可用性について懸念があることは認識しつつも、それでも機能を提供したほうが好ましいと判断されたように見受けられます。

Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings [9, 10, 19]. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Running two-phase commit over Paxos
mitigates the availability problems.

DynamoDBについても、「1リクエストで実行する」「ロックを取得しない」「個別のアイテムに対する操作を極力rejectされないようにする」といった工夫されているものの、それでもレイテンシーが通常操作と比べて最大で4倍程度大きいです。

Transactions are submitted as single request.
...
Transactions do not acquire locks.
...
Reads to individual items can always be performed successfully even if there is a prepared transaction that is attempting to write that item.
...
Writes to individual items can be performed immediately and serialized before any prepared transactions in many cases.
...
Writes to individual items can be performed immediately or delayed and serialized after any prepared transactions in other cases.
...
Figure 8 shows the performance of single operation transactional vs non-transactional writes at the 50th and 99th percentiles. Latency for write transactions is about 4x the latency of non-transactional writes.

また、WCUの消費が2倍となることからコスト及びパーティション毎の書き込みスループットに影響を与えます。

Each item requires two write capacity units (WCUs): one to prepare the transaction and one to commit the transaction.

補足4: SQLのサブセットが提供されているNoSQL製品は複数存在します

製品によって異なりますが、SQLのサブセットが提供されています。

DynamoDB

Amazon DynamoDB supports PartiQL, a SQL-compatible query language, to select, insert, update, and delete data in Amazon DynamoDB.

この機能のリリースは2021年のようです。

CosmosDB

Azure Cosmos DB for NoSQL supports querying documents using the built-in query syntax.

SQL APIについては2017年時点で公開されていたようです。

Cloud FireStore

Cloud FireStoreに関しては現状ではSQL互換のAPIは提供されていないようです。

補足としてはこれらの製品ではJOINが提供されていない、あるいは同一パーティション内でのself-JOINが提供されていますが、これには明確な理由があります。
以下、DynamoDBの公式ドキュメントからの引用となります。
ジョインの処理(それもネットワーク越しにデータの分配が必要になる、分散ジョイン)はスケーラビリティに影響を与えることからジョインの操作自体が廃止されています。
※DynamoDBとおなじようにスケーラビリティを重視したCosmosDB, Cloud FireStoreについても同様だと考えられます。

SQL queries of this kind can provide a flexible API for accessing data, but they require a significant amount of processing. Each join in the query increases the runtime complexity of the query as the data for each table must stage and then be assembled to return the result set. Additional factors that can impact how long it takes the queries to execute are the size of the tables and whether the columns being joined have indexes. The preceding query initiates complex queries across several tables and then sorts the result set.

Eliminating the need for JOINs is at the heart of NoSQL data modeling. This is why we built DynamoDB to support Amazon.com, and why DynamoDB can deliver consistent performance at any scale. Given the runtime complexity of SQL queries and JOINs, RBDMS performance is not constant at scale, which causes performance issues as customer applications grow.

ジョインや集計の処理を廃止し、一度にアクセスするデータ量を制限することで、予測可能なパフォーマンス、メモリの空き容量やCPUの使用率、ストレージのI/Oなどを意識しない、抽象度の高いリソースのアロケーションの仕組みを実現しています。
※リソースのアロケーションの仕組みが抽象化されていることはCosmosDB, FireStoreも同様です
※Cloud FireStoreではcount処理が提供されているようです

DynamoDB wants to provide predictable performance. The first step is to abstract the workload, which introduces the concept of read capacity unit (RCU) and write capacity unit (WCU). In fact, RCU and WCU are very close to queries per second (QPS) in the traditional sense: only the size of the target item is added, so that you can do relatively accurate workload planning. For example, 1 WCU = 1 KB item’s 1 QPS. When the user can describe the workload in terms of RCU and WCU, the first step to predictability is complete. DynamoDB’s scheduler can do a lot of things like pre-partitioning and pre-allocating resources, because the hardware capabilities for different models are simply abstracted into a combination of WCUs and RCUs.
...
The more you understand the abstraction of workloads, the better it is to build predictable systems. The more granular the measurement of workloads, the more room you have to make money (or save costs).

ジョインがパフォーマンスに大きな影響を与える可能性があり、場合によってはチューニングの実施が必要なことはCloud Spannerのベストプラクティスにも記載があります。`

Join operations can be expensive because they can significantly increase the number of rows that your query needs to scan, which results in slower queries. In addition to the techniques that you're accustomed to using in other relational databases to optimize join queries, here are some best practices for a more efficient JOIN when using Spanner SQL:

If possible, join data in interleaved tables by primary key.
Use the join directive if you want to force the order of the join.
If you're using a HASH JOIN or APPLY JOIN and if you have a WHERE clause that is highly selective on one side of your JOIN, put the table that produces the smallest number of rows as the first table in the FROM clause of the join.
For queries that are critical for your workload, specify the most performant join method and join order in your SQL statements for more consistent performance.

前述の最適化手法のうち、interleaved tableは物理的にデータを同居させる手法のようなので、NoSQLの非正規化と同様の効果があると考えられます。

An interleaved table is a table that you declare to be an interleaved child of another table because you want the rows of the child table to be physically stored with the associated parent row.

前述のNoSQL製品でJOINや集計を含む複雑なSQLの実行が必要な場合には、以下のような機能、他のサービスを利用すると考えられます。
※実際にはETL/ELTの手法は無数に存在しニーズも多様であると考えられることから、あくまで一例となります。また、ETLパイプラインの構築、維持管理は難易度が高い可能性があります。
※Federated Queryのように外部のデータソースを参照する仕組みの場合、一応ジョインや集計は実施できるものの、アイテムをまたがるデータの一貫性は維持されないと考えられます。また、パフォーマンスのペナルティも一般的にかなり大きいと考えられます。

DynamoDB
- Amazon Athena Federated Query経由でAmazon DynamoDBのデータをAmazon QuickSightで可視化するハンズオンの公開
- COPY from Amazon DynamoDB
CosmosDB
- What is Azure Cosmos DB analytical store?
Cloud FireStore
- Integrate with BigQuery

まとめ

当該記事に記載のあるNoSQLの説明について不十分なところがあり、NewSQLの特徴、新しい点とされている機能、特性の多くはNoSQLでもサポートされています。

現状の3大パブリック・クラウドのNoSQLデータベースは全てシングル・アイテムの操作に対するStrong Consistencyを提供しています
- Eventual Cosnsistencyについても、何の保障性もない、一般的な定義に基づくものではないです
3大パブリック・クラウドのNoSQLデータベースは全てパーティション内で合意形成を行い、リーダーノードの選出およびログのレプリケーションを実施します
- この点に関してはあまり変わらないのではないでしょうか
複数パーティション/アイテムをまたがるACIDトランザクションをサポートするNoSQL製品は複数存在します
- どのようなトレードオフでどのような機能、仕組みを提供しているのかは製品ごとに大きく異なります
SQLのサブセットが提供されているNoSQL製品は複数存在します
- サブセットしか提供されていないことについても、明確な理由があります
- アクセスパターンを絞ることで、予測可能なレイテンシーおよび抽象度の高いリソースのアロケーションの仕組みを実現しています

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up