Local DynamoDB Grew Up: A Hands-On Look at ExtendDB

DynamoDB Local has been the laptop stand-in for AWS DynamoDB since 2013. It’s a Java JAR, runs in memory or against a SQLite file, accepts almost any request shape, has no real auth, and treats Streams pretty loosely. It’s fine for unit tests. It starts to crack the moment your code does anything beyond PutItem and GetItem.

ExtendDB v0.1.0 just shipped. It’s a clean-room implementation of the DynamoDB wire protocol, written in Rust by AWS engineers, backed by PostgreSQL, Apache 2.0. The pitch is “DynamoDB Local, but you can take it seriously.” This post is a hands-on look at whether that holds up, run as a side-by-side lab on a single Mac.

What ExtendDB actually is

ExtendDB isn’t a fork of DynamoDB. There’s no DynamoDB source code in it. What it does is speak the DynamoDB wire protocol (the JSON-1.0 X-Amz-Target dispatch that the AWS SDKs use), so an existing boto3 or AWS CLI client can be pointed at an ExtendDB endpoint with one flag change and it works, no modifications needed.

Under the hood it’s a small, focused Rust workspace:

flowchart LR
    bin[extenddb<br/>CLI + daemon] --> server[extenddb-server<br/>HTTP + console]
    server --> engine[extenddb-engine<br/>DynamoDB op handlers]
    server --> auth[extenddb-auth<br/>SigV4 + IAM policy]
    engine --> core[extenddb-core<br/>types, expressions]
    engine --> storage[extenddb-storage<br/>trait definitions]
    storage --> pg[extenddb-storage-postgres<br/>PostgreSQL backend]

The extenddb-storage crate is just trait definitions. extenddb-storage-postgres is the only implementation today. The seam is clean enough that a different backend (sqlite, foundationdb, whatever) would mostly come down to implementing the traits. Here’s the lifecycle trait, verbatim from crates/storage/src/lib.rs:

/// All methods receive `account_id` to scope operations to a single account.
/// This enables multi-account isolation: different accounts can have tables
/// with the same name without conflict.
pub trait TableEngine: Send + Sync {
    fn create_table(
        &self,
        account_id: &str,
        input: CreateTableInput,
    ) -> BoxFuture<'_, Result<TableDescription, StorageError>>;
    fn delete_table(
        &self,
        account_id: &str,
        input: DeleteTableInput,
    ) -> BoxFuture<'_, Result<TableDescription, StorageError>>;
    fn describe_table(
        &self,
        account_id: &str,
        input: DescribeTableInput,
    ) -> BoxFuture<'_, Result<TableDescription, StorageError>>;
    // ...list_tables, update_table, table_key_info
}

Two things to notice. First, every method takes an account_id. Account scoping isn’t a layer above storage, it’s threaded all the way through. Second, the trait uses BoxFuture for object safety. The module-level docs mention “RPITIT”, but the actual methods landed as boxed futures, probably to keep the trait object-safe for the engine layer.

Lab setup

Both backends run on the same Mac. Postgres 18.4 from Homebrew, started with brew services start postgresql@18. ExtendDB built with cargo build --release, then:

$ ./target/release/extenddb init --config extenddb.toml
…creating catalog database extenddb_catalog…
…generating self-signed TLS certificate at ~/.extenddb/tls/cert.pem…
  Admin credentials (shown once, save them now)
  │  Username: admin
  │  Password: <redacted>

TLS is mandatory. The init command generates a 587-byte self-signed EC P-256 cert (openssl x509 -in cert.pem -noout -subject shows CN=extenddb self-signed, O=extenddb). Clients trust it via AWS_CA_BUNDLE=~/.extenddb/tls/cert.pem.

ExtendDB then runs with extenddb serve on https://127.0.0.1:8000. Test credentials get provisioned through the management API via the bundled devtools/provision-test-credentials script, which creates a test account, IAM user, access key, and an allow-all-dynamodb policy.

DynamoDB Local goes up next to it in Docker:

$ docker run -d --name ddb-local -p 8001:8000 \
    amazon/dynamodb-local:latest \
    -jar DynamoDBLocal.jar -inMemory -port 8000

A boto3 conftest fixture flips between the two endpoints based on the LAB_BACKEND env var, so every probe runs against either backend without code changes. Transcripts of every probe were captured as JSON lines. Everything in this post is quoted from those.

Baseline parity: CRUD walks fine

First thing to know is that the easy stuff just works. A PutItem with the full type zoo:

ddb.put_item(TableName="lab_alpha", Item={
    "pk":         {"S": "user#1"},
    "sk":         {"S": "profile"},
    "name":       {"S": "Alice"},
    "age":        {"N": "30"},
    "active":     {"BOOL": True},
    "tags":       {"L": [{"S": "rust"}, {"S": "postgres"}]},
    "meta":       {"M": {"city": {"S": "Istanbul"}}},
    "deleted_at": {"NULL": True},
    "blob":       {"B": b"\x00\x01\x02"},
})

Same call, both backends, same response shape. Ten probes covering CreateTable (with a GSI), strong-consistent GetItem, Query on partition + sort range, Query on the GSI, Scan with a filter, UpdateItem with combined SET/REMOVE/ADD plus a ConditionExpression, DeleteItem with a condition, BatchWriteItem of 25 items, and a TransactWriteItems with an intentional condition-check conflict. All pass against both backends.

One small exception worth noting. ExtendDB’s BatchWriteItem response omits the UnprocessedItems key entirely when the batch fully succeeded. Real DynamoDB and DynamoDB Local both return "UnprocessedItems": {}. It’s the kind of drift that boto3 papers over with resp.get("UnprocessedItems", {}), but a strict client could trip on it.

So at the level of “boto3 code that ran against DynamoDB yesterday runs against ExtendDB today”, the answer is yes. The interesting differences start where DynamoDB Local has historically been waved at.

Where they diverge

Auth is real, sort of

The popular shorthand “DynamoDB Local has no auth” turns out to be slightly wrong. Both backends reject a completely unsigned request:

# B1: no Authorization header on either
extenddb         -> 400 com.amazon.coral.service#MissingAuthenticationToken
dynamodb-local   -> 400 com.amazonaws.dynamodb.v20120810#MissingAuthenticationToken

The real gap shows up in B2. Send any Authorization header that parses as SigV4 (a made-up access key id, a Signature=deadbeef, anything), and:

# B2: bogus but parseable Authorization header
extenddb         -> 400 UnrecognizedClientException
                       "The security token included in the request is invalid."
dynamodb-local   -> 200 {"TableNames":[]}

ExtendDB looks up the access key against its IAM store and rejects when it doesn’t exist. DynamoDB Local doesn’t look at the access key at all. It accepts the request the moment the header parses. That’s the gap.

Small but useful detail: ExtendDB access keys are prefixed AKIAEXTENDDB (or ASIAEXTENDDB for sessions), so they can’t be confused with real AWS keys in logs or grep output.

Streams shaped like the real thing

Both backends implement Streams. The shape is different. Enable a stream with NEW_AND_OLD_IMAGES, write an item, overwrite it, delete it, then drain the stream:

# Naive single-shard read against ExtendDB returns zero records.
# ExtendDB partitions stream records across 4 shards by partition-key hash.
records = []
for shard in describe_stream(stream_arn)["StreamDescription"]["Shards"]:
    it = get_shard_iterator(StreamArn=stream_arn,
                            ShardId=shard["ShardId"],
                            ShardIteratorType="TRIM_HORIZON")["ShardIterator"]
    # ...drain pages until empty

That matches how real DynamoDB shards streams. DynamoDB Local takes a simpler path and puts everything on a single shard, which is why naive consumers happen to work against it and then surprise their authors in production.

One v0.1.0 gotcha worth surfacing. Deleting a stream-enabled table and recreating it with the same name during the same server lifetime fails with duplicate key value violates unique constraint "stream_shards_pkey". The shard rows from the old table don’t get fully cleared. Workaround in the lab harness is to suffix table names with a timestamp.

TTL with real REMOVE records

ExtendDB runs a TTL sweeper. Drop in an already-expired item, wait, and the item is gone:

ddb.put_item(TableName="lab_ttl_...",
             Item={"pk": {"S": "doomed"},
                   "expires_at": {"N": str(int(time.time()) - 60)},
                   "payload": {"S": "x"}})
# 60 seconds later
ddb.get_item(...)  # no Item

More importantly, the deletion emits a stream record with the spec’d service identity:

INSERT  userIdentity = None
REMOVE  userIdentity = {'PrincipalId': 'dynamodb.amazonaws.com', 'Type': 'Service'}

That userIdentity matters. It’s what real consumer code uses to tell a user-initiated DeleteItem apart from background TTL expiry. Two small drifts here. First, ExtendDB returns the keys as Type / PrincipalId (PascalCase), where the DynamoDB spec uses type / principalId. Code that does record["userIdentity"]["type"] will need a fallback. Second, the docs reference a ttl_deletion_target_seconds runtime setting; in v0.1.0 it isn’t an exposed key, and the sweep interval is hardcoded at 60 s in crates/storage-postgres/src/ttl_worker.rs. DynamoDB Local skips this whole story: no sweeper at all, expired items stay in the table forever.

Multi-account isolation that works

Provisioned a second account lab_b, created a user, access key, and an allow-all-dynamodb policy. Then from two boto3 clients (one signed as each account), both created a table with the same name lab_shared_name and put a marker row each:

Account A reads back  -> {"owner": "from-a", "pk": "x"}
Account B reads back  -> {"owner": "from-b", "pk": "x"}

Same table name, different account, separate rows, no cross-talk. DynamoDB Local has exactly one namespace: same name, same table.

You can look inside

ExtendDB stores items in plain PostgreSQL. There’s one shared extenddb data database (account isolation is enforced at the row level via account_id, not by separate databases). Each DynamoDB table maps to a Postgres table named _ddb_<uuid>. A row is pk | item_data (JSONB):

$ psql -d extenddb -c "SELECT * FROM _ddb_f6453e99-3173-... LIMIT 2;"
   pk    |                                  item_data
---------+------------------------------------------------------------------------------
 item-001| {"pk": {"S": "item-001"}, "owner": {"S": "account-a"}, "value": {"N": "42"}}
 item-002| {"pk": {"S": "item-002"}, "owner": {"S": "account-a"}, "value": {"N": "99"}}

For backup, restore, replication, debugging, or just satisfying curiosity, this is a different world from DynamoDB Local’s opaque in-memory or SQLite file.

Crash safety and HTTPS

kill -9 the ExtendDB process mid-write, restart, read the canary back:

== Read canary after restart ==
"Item": { "payload": {"S": "survives-crash"}, "pk": {"S": "canary"} }

Postgres durability rather than process memory. DynamoDB Local in -inMemory mode loses everything on restart by design.

HTTPS is mandatory but it isn’t implemented as outright refusal. A plain-HTTP request to ExtendDB’s port returns 301 (redirect to HTTPS). No application data gets served over plain HTTP, but the connection is accepted long enough to send the redirect. DynamoDB Local serves HTTP normally.

Two design choices worth calling out

The first one is how streams stay consistent. From crates/storage/src/lib.rs:

/// Parameters for capturing a stream record within a data write transaction.
///
/// When present, the storage backend inserts the stream record in the same
/// transaction as the data write, guaranteeing atomicity.
pub struct StreamCapture {
    pub view_type: StreamViewType,
    pub user_identity: Option<UserIdentity>,
    pub region: Arc<str>,
}

The stream record write lives inside the same Postgres transaction as the item write. No “data committed but stream missed” or “stream wrote but data rolled back” possible. Both happen, or neither does. It’s the kind of thing real DynamoDB hides behind its proprietary storage, and it’s satisfying to see the seam explicitly in the trait.

The second is a small, surprising defensive choice. From docs/differences-from-dynamodb.md:

| TTL attribute name | Any UTF-8 string (1 to 255 bytes) | Restricted to [a-zA-Z0-9._-]+ (1 to 255 bytes). Names with spaces, quotes, or other special characters are rejected. This eliminates SQL injection risk in the TTL expression index. |

DynamoDB allows any UTF-8 in the TTL attribute name. ExtendDB rejects anything outside [a-zA-Z0-9._-]+. The reason is that the TTL attribute name is interpolated into a Postgres index expression, and a permissive name set would force quoting that could be exploited. It’s a clean-room implementation making a security trade-off the original probably never had to think about.

When NOT to use it

A short list of concrete gaps, mixing genuinely missing features with v0.1.0 drift:

  • Not implemented at all: Global Tables, DAX, PartiQL (ExecuteStatement / BatchExecuteStatement), federated SAML/OIDC auth, Kinesis streaming destinations.
  • Different shape: ImportTable / ExportTableToPointInTime go to a local filesystem path instead of an S3 bucket. On-demand capacity has a fixed initial burst and no autoscaling.
  • Wire-protocol drift to watch out for: BatchWriteItem omits UnprocessedItems: {} from successful responses. userIdentity fields use PascalCase keys (Type, PrincipalId) rather than the DynamoDB-spec camelCase. Stream consumers must drain all shards (real DynamoDB behavior, but DynamoDB Local users may have only ever read shard 0).
  • v0.1.0 bug: a stream-enabled table can’t be deleted and recreated with the same name within the same server lifetime (stream_shards_pkey collision). Rotate table names in tests.
  • Docs drift: the ttl_deletion_target_seconds runtime setting referenced in the differences doc isn’t currently exposed; sweep interval is hardcoded.

None of these are reasons to avoid ExtendDB if you’re hitting DynamoDB Local’s limitations. They’re reasons to test your code against ExtendDB before you assume the wire is identical.

Verdict

If you’ve ever written test code that papered over DynamoDB Local (mocked auth out, skipped Streams assertions, stubbed TTL) and quietly worried the real service would behave differently, ExtendDB is interesting. It’s the first local DynamoDB stand-in that takes auth, Streams, TTL, multi-account isolation, and durability seriously, and the storage layer is plain Postgres you can psql into.

For CI pipelines that need realistic auth and stream semantics, on-prem or air-gapped deployments, and dev teams that have been losing time to “well, it works against DynamoDB Local…”, v0.1.0 is already a useful tool. For anything that depends on byte-exact wire compatibility with the real service, do your own pass first.

Links:

Posted in: