Preamble
After using DynamoDB for a while, I have concluded that it is not well-suited for complex API purposes.
Usually we use a combo API Gateway + Lambda Function + DynamoDB to make a serverless backend for applications.
API Gateway
- Acts as the entry point to handle incoming HTTP/HTTPS requests from clients (e.g., web or mobile apps)
- Routes requests to the appropriate backend services or Lambda functions
- Can also handle tasks like authentication, throttling, and CORS
Lambda Function
- Executes the business logic triggered by API Gateway
- Processes incoming requests, performs operations such as data validation, computation, or transformation, and interacts with other AWS services
- Stateless and serverless, meaning it scales automatically and only incurs costs when the function is invoked
DynamoDB
- A NoSQL database used to store and retrieve data
- Designed for high availability, scalability, and low latency
- Ideal for applications requiring fast, predictable performance with flexible schemas
However, there are some limitations in DynamoDB that may cause frustration and disappointment.
Let me explain.
1 - Item size limit
A single DynamoDB item (similar to a "row" in a relational database) cannot exceed 400KB.
In comparison,
PostgresSQL allows up to 1.6TB per item (1600 columns × 1GB each).
2 - Page size limit
When requesting DynamoDB we can not retrieve more than 1MB answer - for example when we try to retrieve a group of item.
When the request does not return the entire list of item corresponding to the request, it will add a LastEvaluatedKey
property that can be used in the next request to read the next page.
For example in a simple table :
subscriptions |
---|
user_id (Partition Key) |
subscription_type (Sort Key) |
To retrieve all items for subscription_type = ‘daily-newsletter’
multiple queries are required due to pagination :
table = dynamodb.Table('subscriptions')
# Perform the query
response = table.query(
KeyConditionExpression=Key('subscription_type').eq('daily-newsletter')
)
user_ids = [item['user_id'] for item in response['Items']]
# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
response = table.query(
KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
ExclusiveStartKey=response['LastEvaluatedKey']
)
user_ids.extend([item['user_id'] for item in response['Items']])
In comparison,
PostgresSQL can handle it with a single request without row limit (depends on settings).
SELECT user_id from subscriptions where subscription_type='daily-newsletter'
The same limitation is also applicable for table scan (select all elements in a table):
table = dynamodb.Table('subscriptions')
# Perform the query
response = table.scan()
user_ids = [item['user_id'] for item in response['Items']]
# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
response = table.query(
ExclusiveStartKey=response['LastEvaluatedKey']
)
user_ids.extend([item['user_id'] for item in response['Items']])
In comparison,
PostgresSQL can handle it simply by doing:
SELECT user_id from subscriptions
We could imagine reducing the size by selecting precisely the column(s) needed with the parameter : ProjectionExpression
with help to fill the 1MB limit but it will not change the number of result returned… It will just improve the response time.
table = dynamodb.Table('subscriptions')
scan_params = {"ProjectionExpression": "user_id"}
# Perform the query
response = table.scan(**scan_params)
user_ids = [item['user_id'] for item in response['Items']]
# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
response = table.scan(
ExclusiveStartKey=response['LastEvaluatedKey'],
**scan_params
)
user_ids.extend([item['user_id'] for item in response['Items']])
3 - Pagination Limit
We just explain the result is paginated due to the 1MB response limit but what about the "real pagination" system used in most of the system.
Pagination can be handle by adding the parameter LIMIT
but there is no way to set the page num or the offset.
To handle it we need to loop over queries retrieving page after page until finding the good one:
table = dynamodb.Table('subscriptions')
scan_params = {"Limit": 20}
page = 5 # Receive from client
items = []
last_evaluated_key = None
# We must loop over all the results until we reach the good page
for current_page in range(1, page + 1):
if last_evaluated_key:
scan_params["ExclusiveStartKey"] = last_evaluated_key
response = table.scan(**scan_params)
last_evaluated_key = response.get("LastEvaluatedKey")
if current_page == page:
items = response.get("Items", [])
break
if last_evaluated_key is None:
break
This method is not good and can introduce error due to problem mentioned in #2
As the size is limited to 1MB, there is no guarantee that the number of element per page (LIMIT
) is respected.
First page can contain 20 elements, second page 17, third page 19….
In comparison,
PostgresSQL can handle simply by doing:
SELECT *
FROM subscriptions
ORDER BY id
LIMIT 20 OFFSET 20;
4 - Batch limits
DynamoDB uses an API-based interaction model where each request incurs a cost, so it is better to group operations into a single API call when possible.
Fewer requests also reduce network overhead and improve response times.
DynamoDB provides batch
operations
- It allows to retrieve multiple items
- It allows to perform multiple write operations (inserts or deletes)
Operations are executed from one or more tables in a single request.
For example, as DynamoDB can’t perform the SQL in
operation, we need to retrieve individually each item based on their PK.
# Retrieve user from the database
table = dynamodb.Table('users')
batch_size = 100
user_profiles = []
keys = [{'user_id': user_id} for user_id in unique_user_ids]
for i in range(0, len(keys), batch_size):
batch_keys = keys[i:i+batch_size]
response = dynamodb.batch_get_item(
RequestItems={
table.name: {
"Keys": batch_keys,
'ConsistentRead': True
}
}
)
user_profiles.extend(response['Responses'].get(table.name, []))
# Handle unprocessed keys
while 'UnprocessedKeys' in response and response['UnprocessedKeys']:
response = dynamodb.batch_get_item(
RequestItems=response['UnprocessedKeys']
)
user_profiles.extend(response['Responses'].get(table.name, []))
Batch operation has many constraints :
- Limited to 100 Primary Keys as batch input for GET requests
- Limited to 25 Primary Keys as batch input for PUT/DELETE requests
- No identical Primary Key in the batch
get-item
- Maximum of 16MB in input
- Maximum of 16MB in output
- Result may be paginated
In comparison,
PostgresSQL has no equivalent as it does not use APIs but we can retrieve users
like :
SELECT * FROM users WHERE user_id IN (1, 2, 3);
5 - No Table Join
With DynamoDB it is not possible to do cross reference between table. It is intrinsic to NoSQL database
For example in a simple table :
subscriptions |
---|
user_id (Partition Key) |
subscription_type (Sort Key) |
users |
---|
user_id (Partition Key) |
To retrieve the email
for all the subscription user we need to chain the request GET and GET batch:
table = dynamodb.Table('subscriptions')
# Retrieve the list of newsletter subscribers
response = table.query(
KeyConditionExpression=Key('subscription_type').eq('daily-newsletter')
)
user_ids = [item['user_id'] for item in response['Items']]
# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
response = table.query(
KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
ExclusiveStartKey=response['LastEvaluatedKey']
)
user_ids.extend([item['user_id'] for item in response['Items']])
# Remove duplicates
unique_user_ids = list(set(user_ids))
batch_size = 100
user_profiles = []
keys = [{'user_id': user_id} for user_id in unique_user_ids]
table = dynamodb.Table('users')
# Create a batch of a maximum of 100 user_id and loop until all user_id are retrieved
for i in range(0, len(keys), batch_size):
batch_keys = keys[i:i+batch_size]
response = dynamodb.batch_get_item(
RequestItems={
table.name: {
"Keys": batch_keys,
'ConsistentRead': True
}
}
)
user_profiles.extend(response['Responses'].get(table.name, []))
# Handle unprocessed keys
while 'UnprocessedKeys' in response and response['UnprocessedKeys']:
response = dynamodb.batch_get_item(
RequestItems=response['UnprocessedKeys']
)
user_profiles.extend(response['Responses'].get(table.name, []))
In comparison,
PostgresSQL can easily join table and do complex join as needed :
SELECT *
FROM users as u, subscriptions as s
WHERE s.user_id = u.user_id
AND s.subscription_type = 'daily-newsletter'
6 - No Direct Filtering on Non-Key Attributes
In DynamoDB, the primary mechanism for retrieving data is through queries and scans based on the primary key attributes: the Partition Key and the Sort Key.
However, querying directly on attributes that are not part of the primary key is not supported in the same way. To query on non-key attributes efficiently, you need to use Global Secondary Indexes (GSIs) or Local Secondary Indexes (LSIs).
Without using GSI or LSI, a filter can be applied with the filter-expression
parameter but this filter is applied after data retrieval. It’s a post-processing filter that is similar to an application-level filtering. The data are just not transferred to the application sparing unnecessary data over the network.
But Post-processing filter is totally inefficient with pagination.
table = dynamodb.Table('subscriptions')
# Perform the query
response = table.query(
KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
FilterExpression=Attr('lang').eq('english')
)
user_ids = [item['user_id'] for item in response['Items']]
# Handling pagination if the result set is large
while 'LastEvaluatedKey' in response:
response = table.query(
KeyConditionExpression=Key('subscription_type').eq('daily-newsletter'),
FilterExpression=Attr('lang').eq('english'),
ExclusiveStartKey=response['LastEvaluatedKey']
)
user_ids.extend([item['user_id'] for item in response['Items']])
In comparison,
PostgresSQL can easily do filter on any table and do complex join as needed :
SELECT *
FROM subscriptions
WHERE subscription_type = 'daily-newsletter'
AND lang = 'english'
7 - Comparison with PostgresSQL
Feature | DynamoDB | PostgreSQL |
---|---|---|
Item Size Limit | 400KB per item | Up to 1.6TB per item |
Pagination | Requires manual handling with LastEvaluatedKey , no OFFSET
|
Supports LIMIT and OFFSET for direct pagination |
Page Size | Maximum 1MB per request (response payload limit) | No strict page size limits, depends on memory and settings |
Batch Operations | Limited to 100 keys for BatchGetItem or 25 items for BatchWriteItem , max 16MB request/response |
No batch; SQL queries can handle complex conditions |
Filtering | Filtering with FilterExpression is post-processing and inefficient for large datasets and pagination |
Can filter on any attribute in SQL queries |
Joins | No table joins; must perform multiple queries or denormalize data | Supports complex table joins with JOIN
|
Access Model | API-based interaction (HTTPS) | Direct connection via SQL |
Query Language | NoSQL API with basic query and scan operations | SQL for complex and ad-hoc queries |
Indexing | Requires Global/Local Secondary Indexes (GSI/LSI) for non-key queries | Automatic and custom indexes |
Atomicity | Transactions available but limited to 25 operations per transaction | Transactions supported, ACID-compliant |
Scalability | Scales horizontally, highly distributed and serverless | Scales vertically (add hardware) and horizontally (partitioning) |
Data Model | NoSQL (Key-Value, Document); encourages denormalized data | Relational, supports normalized and complex relationships |
Performance | Optimized for high throughput and predictable access patterns | Optimized for complex queries with proper indexing |
Use Case | Ideal for high-performance key-value or document-based use cases | Ideal for complex applications with relational data |
Of course, DynamoDB and PostgresSQL don't serve the same purpose.
Conclusion
I don’t encourage to use DynamoDB for :
- User related table where queries are complex
- Direct mapping between id ←→ value to handle a dictionary. DynamoDB can feel overkill.
A simpler solution, like ElastiCache (Redis) would be better suited for this scenario. - Short values in term of MB
For large binary data or files, DynamoDB is not designed to handle "big data" directly. Instead, it's common to store such data in S3 and keep references (e.g., URLs) in DynamoDB.
Small data records with high-frequency access patterns being better suited for DynamoDB. - Pagination
Pagination is inefficient and need to read the full table.
On the other hand,
DynamoDB is designed to handle applications where there are large numbers of read or write operations per second, often in the range of thousands or millions.
It is built for speed and scalability.
Horizontal Scalability
- DynamoDB automatically partitions data across multiple nodes in a distributed system, spreading the load evenly
- The capacity of your database scales automatically as your read/write workload increases
Performance Optimization
- DynamoDB provides single-digit millisecond latency for both reads and writes, regardless of the size of your dataset
- By leveraging Provisioned Throughput or On-Demand Mode, you can configure DynamoDB to handle spikes in traffic seamlessly without affecting performance