Understanding DynamoDB Items Select: COUNT
For being simple but heavily scalable, DynamoDB is a perfect key-value store based database. Some would define database as a typical RDBMS or a feature-rich MongoDB, and may argue that DynamoDB is useless database. If that’s your definition of database, I would say “DynamoDB is not even a Database”, It’s a key value store. It’s about what you define and expect from the database.
DynamoDB as a Key Value store doesn’t have aggregation operation like “count”. But after looking at this answer, learnt something new.
The most efficient way to get an item count from DynamoDB Query operation is to use Select COUNT request parameter — Accepted answer in AWS re:Post
I’m surprised at the same time not convinced that DynamoDB has a COUNT and started experimenting it.
Created a simple table and inserted a bunch of records and performed a Count query as mentioned
new ScanCommand({
TableName,
Select: 'COUNT', // <- Wow! section for me
ReturnConsumedCapacity: 'INDEXES'
});
{
"ConsumedCapacity": {
"CapacityUnits": 1,
"Table": {
"CapacityUnits": 1
},
"TableName": "DDB_Analysis"
},
"Count": 10, // <- Seriously? section for me
"ScannedCount": 10
}
This “Count” and “ScannedCount” are seen in every response and then why do I need to explicitly specify “Select: COUNT” in the request?
Besides show “Count”, looks like this excludes the matched “Items” array. Maybe this is the behavior.
Query/Scan operations have limit of 1MB, and this query doesn’t have Items only the Count. So the limit still applies or will I get the full count?
COUNT
- Returns the number of matching items, rather than the matching items themselves. Note that this uses the same quantity of read capacity units as getting the items, and is subject to the same item size calculations. — AWS Official Docs
Okay, Let’s try.
I inserted some 15k records and tried the same.
{
"ConsumedCapacity": {
"CapacityUnits": 129,
"Table": {
"CapacityUnits": 129
},
"TableName": "DDB_Analysis"
},
"Count": 11997,
"LastEvaluatedKey": {
"userId": "ta-user-7295",
"tenantId": "tenant-a"
},
"ScannedCount": 11997
}
This clarifies that 1MB limit is also applicable for “Select: COUNT”. Let’s try with an Index which has less projected attributes.
new ScanCommand({
TableName,
IndexName: "email-index",
Select: 'COUNT',
ReturnConsumedCapacity: 'INDEXES'
});
{
"ConsumedCapacity": {
"CapacityUnits": 92.5,
"LocalSecondaryIndexes": {
"email-index": {
"CapacityUnits": 92.5
}
},
"Table": {
"CapacityUnits": 0
},
"TableName": "DDB_Analysis"
},
"Count": 14062,
"LastEvaluatedKey": {
"email": "9156@gmail.com",
"userId": "ta-user-9156",
"tenantId": "tenant-a"
},
"ScannedCount": 14062
}
This index is projected with KEYS_ONLY and so should be having less item size, resulting in higher count but not the full count. Hence limit restriction applies as same as normal query.
Takeaway
Finally it clarifies that “Select: COUNT” is doing nothing special, but just ignores the “Items” array in response, as you’re looking for the only the already existing attribute Count.
- DynamoDB is only a Key Value store doesn’t have aggregation operation even like “count”
- If you’re looking for Count for your query, you need to continue the query/scan with the LastEvaluatedKey and loop it till the end. As usual.
- This doesn’t saves cost as it consumes the same RRU. Maybe only the network latency, as you not getting the larger payload “Items” in response.
- If you’re looking for getting count in cost optimal way, I would recommend you to have a look at https://aws.amazon.com/blogs/database/implement-resource-counters-with-amazon-dynamodb/
- Most important thing to note. For any answers in AWS, always refer to official docs and better to experiment if docs is not clear enough.