How to add storage-level caching between DynamoDB and Titan? -
i using titan/dynamodb library use aws dynamodb backend titan db graphs. app read-heavy , noticed titan executing query requests against dynamodb. using transaction- , instance-local caches , indexes reduce dynamodb read units , overall latency. introduce cache layer consistent ec2 instances: read/write-through cache between dynamodb , application store query results, vertices, , edges.
i see 2 solutions this:
- implicit caching done directly titan/dynamodb library. classes
parallelscannercould changed read aws elasticache first. change have applied read & write operations ensure consistency. - explicit caching done application before invoking titan/gremlin api.
the first option seems more fine-grained, cross-cutting, , generic.
- does exist? maybe other storage backends?
- is there reason why not exist already? graph db applications seem read-intensive cross-instance caching seems pretty significant feature speedup queries.
first, parallelscanner not thing need change. importantly, changes need make in dynamodbdelegate (that class makes low level dynamodb api calls).
regarding implicit caching, add caching layer on top of dynamodb. example, implement cache using api gateway on top of dynamodb, or use elasticache. either way, need figure out way invalidate query/scan pages. inserting/deleting items cause page boundaries change requires thought.
explicit caching may easier implicit caching. level of abstraction higher, based on incoming writes may easier decide @ application level whether traversal cached needs invalidated. if treat graph application service, cache results @ service level.
something in between may possible (but requires work). continue use vertex/database caches provided titan, , use low value ttl consistent how write columns. or, take caching approach step further , following.
- enable dynamodb stream on edgestore.
- use lambda function stream edgestore updates kinesis stream.
- consume kinesis stream edgestore updates in same jvm gremlin server on each of gremlin server instances. need instrument database level cache in titan consume kinesis stream , invalidate cached columns appropriate, in each titan instance.

Comments
Post a Comment