I recently spent few days optimizing performance of slate data reads for an important application before its production roll-out and found few useful hacks to get much lower latency:
- For postgres queries doing COUNT DISTINCT on text columns, we found that replacing
COUNT(DISTINCT text_column)withCOUNT(DISTINCT text_column::text collate "C")achieves a ~5x speed-up for queries that have to scan 5-10M rows. This is because C-string comparison functions are much faster than UTF-8 strings used by default, of course in some cases it’s going to impact accuracy, but in our case the queries were dealing with ASCII strings anyway - Many developers I’ve been working with are not aware how to set up indexes when configuring postgres syncs - my rule of thumb is that whenever I
GROUP BY column_a, column_bin Slate, I set up index oncolumn_a, column_bin the Postgres Sync for the same table. But when you have 2 queries for the same table, one grouping oncolumn_a, column_band the other oncolumn_b, column_c, you need 2 separate indexes (index1:column_a, column_b, index2:column_b, column_cinstead of one index oncolumn_a, column_b, column_cto achieve peak performance - For functions on objects, using
asyncfunctions and wrapping multiple queries inPromise.all(...)to run them in parallel is also pretty awesome, it requires some code refactoring but I’ve seen this achieve 5-10x speed-ups in some functions when parallelism is near-perfect
What are some other cool performance hacks people should know about?