Quick hacks to improve query/function performance in Slate

I recently spent few days optimizing performance of slate data reads for an important application before its production roll-out and found few useful hacks to get much lower latency:

  1. For postgres queries doing COUNT DISTINCT on text columns, we found that replacing COUNT(DISTINCT text_column) with COUNT(DISTINCT text_column::text collate "C") achieves a ~5x speed-up for queries that have to scan 5-10M rows. This is because C-string comparison functions are much faster than UTF-8 strings used by default, of course in some cases it’s going to impact accuracy, but in our case the queries were dealing with ASCII strings anyway
  2. Many developers I’ve been working with are not aware how to set up indexes when configuring postgres syncs - my rule of thumb is that whenever I GROUP BY column_a, column_b in Slate, I set up index on column_a, column_b in the Postgres Sync for the same table. But when you have 2 queries for the same table, one grouping on column_a, column_b and the other on column_b, column_c, you need 2 separate indexes (index1: column_a, column_b, index2: column_b, column_c instead of one index on column_a, column_b, column_c to achieve peak performance
  3. For functions on objects, using async functions and wrapping multiple queries in Promise.all(...) to run them in parallel is also pretty awesome, it requires some code refactoring but I’ve seen this achieve 5-10x speed-ups in some functions when parallelism is near-perfect

What are some other cool performance hacks people should know about?

6 Likes

Thanks for sharing these tips!

You can find some more suggestions on the Slate documentation: