sorting - Is there an efficient way to sort the results of a join in RethinkDB? -
in rethinkdb, need perform join between 2 tables (representing has-and-belongs-to-many relationship) , sort results of join. there hundreds of thousands or millions of results, need sort them efficiently.
ideally, use orderby() index. orderby() can use index when called on table, , .eqjoin() returns stream or array.
here’s example of query i’m working with. want conversations have given topic:
r.table('conversations_topics') .getall('c64a00d3-1b02-4045-88e7-ac3b4fee478f', {index: 'topics_id'}) .eqjoin('conversations_id', r.table('conversations')) .map(row => row('right')) .orderby('createdat') the unindexed orderby() used here starts getting unacceptably slow when topic contains few thousand conversations, , break @ 100,000 due rethinkdb's array size limit. topics in db contain many hundreds of thousands, or millions, of conversations, unacceptable.
i need query return small number of results @ time (say 25), need results in order, can’t limit until after sort. ideas?
i think way remove conversations_topics , embed topic data conversations table. can create compound index, use between filter , order @ same time.
r.table('conversations').indexcreate('topicanddate', function(doc) { return doc('topics') .map(function(topic) { return [topic, doc('createdat')] }) .coerceto('array') }, {multi: true}) then can use sth query:
r.table('conversations').between([('c64a00d3-1b02-4045-88e7-ac3b4fee478f', r.minval], [('c64a00d3-1b02-4045-88e7-ac3b4fee478f', r.maxval], {index: 'topicanddate'}) .orderby({index: r.desc('topicanddate')}) .limit(25) the key here use same index both of orderby , between. if know time range, can make faster setting time value in between command instead of using minval , maxval.
hopefully faster.
Comments
Post a Comment