I recently spent few days optimizing performance of slate data reads for an important application before its production roll-out and found few useful hacks to get much lower latency:
- For postgres queries doing COUNT DISTINCT on text columns, we found that replacing
COUNT(DISTINCT text_column)
withCOUNT(DISTINCT text_column::text collate "C")
achieves a ~5x speed-up for queries that have to scan 5-10M rows. This is because C-string comparison functions are much faster than UTF-8 strings used by default, of course in some cases it’s going to impact accuracy, but in our case the queries were dealing with ASCII strings anyway - Many developers I’ve been working with are not aware how to set up indexes when configuring postgres syncs - my rule of thumb is that whenever I
GROUP BY column_a, column_b
in Slate, I set up index oncolumn_a, column_b
in the Postgres Sync for the same table. But when you have 2 queries for the same table, one grouping oncolumn_a, column_b
and the other oncolumn_b, column_c
, you need 2 separate indexes (index1:column_a, column_b
, index2:column_b, column_c
instead of one index oncolumn_a, column_b, column_c
to achieve peak performance - For functions on objects, using
async
functions and wrapping multiple queries inPromise.all(...)
to run them in parallel is also pretty awesome, it requires some code refactoring but I’ve seen this achieve 5-10x speed-ups in some functions when parallelism is near-perfect
What are some other cool performance hacks people should know about?