The Problem
We have a relational table where we store user activity. A query like the following takes 77 seconds!
FROM "site_activity"
WHERE
    (
        NOT "site_activity"."is_deleted"
        AND "site_activity"."user_id" = 68812389
        AND NOT (
            "site_activity"."kind" IN (
                'updated',
                'duplicated',
                'reapplied'
            )
        )
        AND NOT (
            "site_activity"."content_type_id" = 14
            AND "site_activity"."kind" = 'created'
        )
    )
ORDER BY
    "site_activity"."created_at" DESC,
    "site_activity"."id" DESC
LIMIT  9;
The query plan looks like this
                                     QUERY PLAN
--------------------------------------------------------------------------------------------
Limit
    (cost=17750.72..27225.75 rows=9 width=16)
    (actual time=199501.336..199501.338 rows=9 loops=1)
  Output: id, created_at
  Buffers: shared hit=4502362 read=693523 written=37273
  I/O Timings: read=190288.205 write=446.870
  ->  Incremental Sort
      (cost=17750.72..2003433582.97 rows=1902974 width=16)
      (actual time=199501.335..199501.336 rows=9 loops=1)
        Output: id, created_at
        Sort Key: site_activity.created_at DESC, site_activity.id DESC
        Presorted Key: site_activity.created_at
        Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
        Buffers: shared hit=4502362 read=693523 written=37273
        I/O Timings: read=190288.205 write=446.870
        ->  Index Scan Backward using site_activity_created_at_company_id_idx on public.site_activity
            (cost=0.58..2003345645.30 rows=1902974 width=16)
            (actual time=198971.283..199501.285 rows=10 loops=1)
              Output: id, created_at
              Filter: (
                (NOT site_activity.is_deleted) AND (site_activity.user_id = 68812389)
                AND ((site_activity.kind)::text <> ALL ('{updated,duplicated,reapplied}'::text[]))
                AND ((site_activity.content_type_id <> 14) OR ((site_activity.kind)::text <> 'created'::text))
              )
              Rows Removed by Filter: 14735308
              Buffers: shared hit=4502353 read=693523 written=37273
              I/O Timings: read=190288.205 write=446.870
Settings: effective_cache_size = '261200880kB',
          effective_io_concurrency = '400',
          jit = 'off',
          max_parallel_workers = '24',
          random_page_cost = '1.5',
          work_mem = '64MB'
Planning:
  Buffers: shared hit=344
Planning Time: 6.429 ms
Execution Time: 199501.365 ms
(22 rows)
Time: 199691.997 ms (03:19.692)
Table Facts
It contains a little more than 4 billion rows.
The table structure is
Table "public.site_activity" Column | Type | Collation | Nullable | Default ----------------+--------------------------+-----------+----------+---------------------------------------------- id | bigint | | not null | nextval('site_activity_id_seq'::regclass) created_at | timestamp with time zone | | not null | modified_at | timestamp with time zone | | not null | is_deleted | boolean | | not null | object_id | bigint | | not null | kind | character varying(32) | | not null | context | text | | not null | company_id | integer | | not null | content_type_id | integer | | not null | user_id | integer | | | Indexes: "site_activity_pkey" PRIMARY KEY, btree (id) "site_activity_modified_at_idx" btree (modified_at) "site_activity_company_id_idx" btree (company_id) "site_activity_created_at_company_id_idx" btree (created_at, company_id) "site_activity_object_id_idx" btree (object_id) "site_activity_content_type_id_idx" btree (content_type_id) "site_activity_kind_idx" btree (kind) "site_activity_kind_idx1" btree (kind varchar_pattern_ops) "site_activity_user_id_idx" btree (user_id) Foreign-key constraints: "site_activity_company_id_fk_site_company_id" FOREIGN KEY (company_id) REFERENCES site_company(id) DEFERRABLE INITIALLY DEFERRED "site_activity_content_type_id_fk_django_co" FOREIGN KEY (content_type_id) REFERENCES django_content_type(id) DEFERRABLE INITIALLY DEFERRED "site_activity_user_id_fk_site_user_id" FOREIGN KEY (user_id) REFERENCES site_user(id) DEFERRABLE INITIALLY DEFERREDa.
kindis treated as anenum. In db we store it as varchar. But in the application (python) we treat it as enum. So the values are fixed. There are around 100 values in it.b.
content_type_idhas around 80 values.This is the distribution of values,
a.
contextis actually JSON with a max 8Mb size.a. 3
content_type_idvalues holds 92% of the rows. They are 14 and 19.a. 3
kindconsumes 75% rows. These arecreated,updatedandsent.a. The combination of
kindandcontent_type_idcreates 460 values. Among them, 1 combination contains 35% of rows and we exclude them in the query all time.The replica instance has type
db.r5.12xlarge. 24 cores, 48 vCPUs, 384GB Mem, storage type io1.
Question
- How do we handle if the table grows to 100 billion? In the current projection, this can happen in the next 3-5 years.
 - Is NoSQL a good solution? Note we are not accessing the documents with only id or kind.
 
Notes
- The facts that I presented might bias the solution to replication in the same host and then later sharding over multiple hosts. But if there is some other solution that can keep up to the 100 billion mark, we should be good.
 - We don't have to use AWS. But preferred.