0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Understanding the Limitations of DynamoDB: A Developer's Perspective

Posted at

Preamble

After using DynamoDB for a while, I have concluded that it is not well-suited for complex API purposes.
Usually we use a combo API Gateway + Lambda Function + DynamoDB to make a serverless backend for applications.


API Gateway

  • Acts as the entry point to handle incoming HTTP/HTTPS requests from clients (e.g., web or mobile apps)
  • Routes requests to the appropriate backend services or Lambda functions
  • Can also handle tasks like authentication, throttling, and CORS

Lambda Function

  • Executes the business logic triggered by API Gateway
  • Processes incoming requests, performs operations such as data validation, computation, or transformation, and interacts with other AWS services
  • Stateless and serverless, meaning it scales automatically and only incurs costs when the function is invoked

DynamoDB

  • A NoSQL database used to store and retrieve data
  • Designed for high availability, scalability, and low latency
  • Ideal for applications requiring fast, predictable performance with flexible schemas

However, there are some limitations in DynamoDB that may cause frustration and disappointment.
Let me explain.

1 - Item size limit

A single DynamoDB item (similar to a "row" in a relational database) cannot exceed 400KB.

In comparison,
PostgresSQL allows up to 1.6TB per item (1600 columns × 1GB each).

2 - Page size limit

When requesting DynamoDB we can not retrieve more than 1MB answer - for example when we try to retrieve a group of item.

When the request does not return the entire list of item corresponding to the request, it will add a LastEvaluatedKeyproperty that can be used in the next request to read the next page.

For example in a simple table :

subscriptions
user_id (Partition Key)
subscription_type (Sort Key)

To retrieve all items for subscription_type = ‘daily-newsletter’ multiple queries are required due to pagination :

table = dynamodb.Table('subscriptions')

# Perform the query
response = table.query(
  KeyConditionExpression=Key('subscription_type').eq('daily-newsletter')
)
user_ids = [item['user_id'] for item in response['Items']]

# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
  response = table.query(
    KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
    ExclusiveStartKey=response['LastEvaluatedKey']
  )
  user_ids.extend([item['user_id'] for item in response['Items']])

In comparison,
PostgresSQL can handle it with a single request without row limit (depends on settings).

SELECT user_id from subscriptions where subscription_type='daily-newsletter'

The same limitation is also applicable for table scan (select all elements in a table):

table = dynamodb.Table('subscriptions')

# Perform the query
response = table.scan()
user_ids = [item['user_id'] for item in response['Items']]

# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
  response = table.query(
    ExclusiveStartKey=response['LastEvaluatedKey']
  )
  user_ids.extend([item['user_id'] for item in response['Items']])

In comparison,
PostgresSQL can handle it simply by doing:

SELECT user_id from subscriptions

We could imagine reducing the size by selecting precisely the column(s) needed with the parameter : ProjectionExpression with help to fill the 1MB limit but it will not change the number of result returned… It will just improve the response time.

table = dynamodb.Table('subscriptions')
scan_params = {"ProjectionExpression": "user_id"}

# Perform the query
response = table.scan(**scan_params)
user_ids = [item['user_id'] for item in response['Items']]

# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
  response = table.scan(
    ExclusiveStartKey=response['LastEvaluatedKey'],
    **scan_params
  )
  user_ids.extend([item['user_id'] for item in response['Items']])

3 - Pagination Limit

We just explain the result is paginated due to the 1MB response limit but what about the "real pagination" system used in most of the system.
Pagination can be handle by adding the parameter LIMIT but there is no way to set the page num or the offset.

To handle it we need to loop over queries retrieving page after page until finding the good one:

table = dynamodb.Table('subscriptions')
scan_params = {"Limit": 20}
page = 5 # Receive from client

items = []
last_evaluated_key = None
# We must loop over all the results until we reach the good page
for current_page in range(1, page + 1):
    if last_evaluated_key:
        scan_params["ExclusiveStartKey"] = last_evaluated_key

    response = table.scan(**scan_params)
    last_evaluated_key = response.get("LastEvaluatedKey")

    if current_page == page:
        items = response.get("Items", [])
        break

    if last_evaluated_key is None:
        break

This method is not good and can introduce error due to problem mentioned in #2
As the size is limited to 1MB, there is no guarantee that the number of element per page (LIMIT) is respected.

First page can contain 20 elements, second page 17, third page 19….

In comparison,
PostgresSQL can handle simply by doing:

SELECT * 
FROM subscriptions
ORDER BY id
LIMIT 20 OFFSET 20;

4 - Batch limits

DynamoDB uses an API-based interaction model where each request incurs a cost, so it is better to group operations into a single API call when possible.
Fewer requests also reduce network overhead and improve response times.

DynamoDB provides batch operations

  • It allows to retrieve multiple items
  • It allows to perform multiple write operations (inserts or deletes)

Operations are executed from one or more tables in a single request.

For example, as DynamoDB can’t perform the SQL in operation, we need to retrieve individually each item based on their PK.

# Retrieve user from the database
table = dynamodb.Table('users')

batch_size = 100
user_profiles = []

keys = [{'user_id': user_id} for user_id in unique_user_ids]

for i in range(0, len(keys), batch_size):
    batch_keys = keys[i:i+batch_size]
    response = dynamodb.batch_get_item(
        RequestItems={
            table.name: {
                "Keys": batch_keys,
                'ConsistentRead': True
            }
        }
    )
    user_profiles.extend(response['Responses'].get(table.name, []))

    # Handle unprocessed keys
    while 'UnprocessedKeys' in response and response['UnprocessedKeys']:
        response = dynamodb.batch_get_item(
            RequestItems=response['UnprocessedKeys']
        )
        user_profiles.extend(response['Responses'].get(table.name, []))

Batch operation has many constraints :

  • Limited to 100 Primary Keys as batch input for GET requests
  • Limited to 25 Primary Keys as batch input for PUT/DELETE requests
  • No identical Primary Key in the batch get-item
  • Maximum of 16MB in input
  • Maximum of 16MB in output
  • Result may be paginated

In comparison,
PostgresSQL has no equivalent as it does not use APIs but we can retrieve users like :

SELECT * FROM users WHERE user_id IN (1, 2, 3);

5 - No Table Join

With DynamoDB it is not possible to do cross reference between table. It is intrinsic to NoSQL database

For example in a simple table :

subscriptions
user_id (Partition Key)
subscription_type (Sort Key)
users
user_id (Partition Key)
email

To retrieve the email for all the subscription user we need to chain the request GET and GET batch:

table = dynamodb.Table('subscriptions')

# Retrieve the list of newsletter subscribers
response = table.query(
  KeyConditionExpression=Key('subscription_type').eq('daily-newsletter')
)
user_ids = [item['user_id'] for item in response['Items']]

# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
  response = table.query(
    KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
    ExclusiveStartKey=response['LastEvaluatedKey']
  )
user_ids.extend([item['user_id'] for item in response['Items']])

# Remove duplicates
unique_user_ids = list(set(user_ids))  

batch_size = 100
user_profiles = []

keys = [{'user_id': user_id} for user_id in unique_user_ids]

table = dynamodb.Table('users')
        
# Create a batch of a maximum of 100 user_id and loop until all user_id are retrieved
for i in range(0, len(keys), batch_size):
    batch_keys = keys[i:i+batch_size]
    response = dynamodb.batch_get_item(
        RequestItems={
            table.name: {
                "Keys": batch_keys,
                'ConsistentRead': True
            }
        }
    )
    user_profiles.extend(response['Responses'].get(table.name, []))

    # Handle unprocessed keys
    while 'UnprocessedKeys' in response and response['UnprocessedKeys']:
        response = dynamodb.batch_get_item(
            RequestItems=response['UnprocessedKeys']
        )
        user_profiles.extend(response['Responses'].get(table.name, []))

In comparison,
PostgresSQL can easily join table and do complex join as needed :

SELECT *
FROM users as u, subscriptions as s
WHERE s.user_id = u.user_id
AND s.subscription_type = 'daily-newsletter'

6 - No Direct Filtering on Non-Key Attributes

In DynamoDB, the primary mechanism for retrieving data is through queries and scans based on the primary key attributes: the Partition Key and the Sort Key.

However, querying directly on attributes that are not part of the primary key is not supported in the same way. To query on non-key attributes efficiently, you need to use Global Secondary Indexes (GSIs) or Local Secondary Indexes (LSIs).

Without using GSI or LSI, a filter can be applied with the filter-expression parameter but this filter is applied after data retrieval. It’s a post-processing filter that is similar to an application-level filtering. The data are just not transferred to the application sparing unnecessary data over the network.
But Post-processing filter is totally inefficient with pagination.

table = dynamodb.Table('subscriptions')

# Perform the query
response = table.query(
  KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
  FilterExpression=Attr('lang').eq('english')
)
user_ids = [item['user_id'] for item in response['Items']]

# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
  response = table.query(
    KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
    FilterExpression=Attr('lang').eq('english'),
    ExclusiveStartKey=response['LastEvaluatedKey']
  )
  user_ids.extend([item['user_id'] for item in response['Items']])

In comparison,
PostgresSQL can easily do filter on any table and do complex join as needed :

SELECT *
FROM subscriptions
WHERE subscription_type = 'daily-newsletter'
AND lang = 'english'

7 - Comparison with PostgresSQL

Feature DynamoDB PostgreSQL
Item Size Limit 400KB per item Up to 1.6TB per item
Pagination Requires manual handling with LastEvaluatedKey, no OFFSET Supports LIMIT and OFFSET for direct pagination
Page Size Maximum 1MB per request (response payload limit) No strict page size limits, depends on memory and settings
Batch Operations Limited to 100 keys for BatchGetItem or 25 items for BatchWriteItem, max 16MB request/response No batch; SQL queries can handle complex conditions
Filtering Filtering with FilterExpression is post-processing and inefficient for large datasets and pagination Can filter on any attribute in SQL queries
Joins No table joins; must perform multiple queries or denormalize data Supports complex table joins with JOIN
Access Model API-based interaction (HTTPS) Direct connection via SQL
Query Language NoSQL API with basic query and scan operations SQL for complex and ad-hoc queries
Indexing Requires Global/Local Secondary Indexes (GSI/LSI) for non-key queries Automatic and custom indexes
Atomicity Transactions available but limited to 25 operations per transaction Transactions supported, ACID-compliant
Scalability Scales horizontally, highly distributed and serverless Scales vertically (add hardware) and horizontally (partitioning)
Data Model NoSQL (Key-Value, Document); encourages denormalized data Relational, supports normalized and complex relationships
Performance Optimized for high throughput and predictable access patterns Optimized for complex queries with proper indexing
Use Case Ideal for high-performance key-value or document-based use cases Ideal for complex applications with relational data

Of course, DynamoDB and PostgresSQL don't serve the same purpose.

Conclusion

I don’t encourage to use DynamoDB for :

  • User related table where queries are complex
  • Direct mapping between id ←→ value to handle a dictionary. DynamoDB can feel overkill.
    A simpler solution, like ElastiCache (Redis) would be better suited for this scenario.
  • Short values in term of MB
    For large binary data or files, DynamoDB is not designed to handle "big data" directly. Instead, it's common to store such data in S3 and keep references (e.g., URLs) in DynamoDB.
    Small data records with high-frequency access patterns being better suited for DynamoDB.
  • Pagination
    Pagination is inefficient and need to read the full table.

On the other hand,
DynamoDB is designed to handle applications where there are large numbers of read or write operations per second, often in the range of thousands or millions.

It is built for speed and scalability.

Horizontal Scalability

  • DynamoDB automatically partitions data across multiple nodes in a distributed system, spreading the load evenly
  • The capacity of your database scales automatically as your read/write workload increases

Performance Optimization

  • DynamoDB provides single-digit millisecond latency for both reads and writes, regardless of the size of your dataset
  • By leveraging Provisioned Throughput or On-Demand Mode, you can configure DynamoDB to handle spikes in traffic seamlessly without affecting performance
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?