I have three related tables:
- Modules
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| module_id   | int(11)     | NO   | PRI | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
- Events
+------------------+--------------------------+------+-----+---------+----------------+
| Field            | Type                     | Null | Key | Default | Extra          |
+------------------+--------------------------+------+-----+---------+----------------+
| event_id         | int(11)                  | NO   | PRI | NULL    | auto_increment |
| event_time       | datetime(4)              | NO   |     | NULL    |                |
| module_id        | int(11)                  | NO   | MUL | NULL    |                |
| file_id          | int(11)                  | YES  | MUL | NULL    |                |
+------------------+--------------------------+------+-----+---------+----------------+
- Files
+--------------+-----------------------------+------+-----+---------+----------------+
| Field        | Type                        | Null | Key | Default | Extra          |
+--------------+-----------------------------+------+-----+---------+----------------+
| file_id      | int(11)                     | NO   | PRI | NULL    | auto_increment |
| path         | varchar(512)                | NO   | UNI | NULL    |                |
+--------------+-----------------------------+------+-----+---------+----------------+
So, there are Modules, Events and Files. (Unused fields are trimmed from the tables to simplify).
Target: I want to get the latest event that happened on each module and it's file path.
What I tried: So, for this, at first I created a naive implementation on Django using subquery:
last_event_subquery = Event.objects.filter(
        module_id__module_id=OuterRef('module__id')
    ).order_by('-event_time', '-event_id')
modules = Module.objects.all().annotate(
        last_event_path=Subquery(last_event_subquery.values('file_id__path')[:1])
    ).annotate(
        last_event_id=Subquery(last_event_subquery.values('event_id')[:1])
    ).annotate(
        last_event_datetime=Subquery(last_event_subquery.values('event_time')[:1])
    )
But, I discovered that running this over 1 million records in the events table is blatantly slow. Of course, there are several indexes over there to optimize all the thing, but even with that I couldn't find a combination of indexes that takes less than 5 seconds to run, which is too much imo. Then, I saw the reason, the equivalent SQL query is too silly:
SELECT `module`.`module_id`,
       (SELECT U2.`path` FROM `events` U0 LEFT OUTER JOIN `files` U2 ON (U0.`file_id` = U2.`file_id`)
       WHERE U0.`module_id` = (`modules`.`module_id`) ORDER BY U0.`event_time` DESC, U0.`event_id` DESC  LIMIT 1)
       AS `last_event_path`,
       (SELECT U0.`event_id` FROM `events` U0
       WHERE U0.`module_id` = (`modules`.`module_id`) ORDER BY U0.`event_time` DESC, U0.`event_id` DESC  LIMIT 1)
       AS `last_event_id`,
       (SELECT U0.`event_time` FROM `events` U0
       WHERE U0.`module_id` = (`modules`.`module_id`) ORDER BY U0.`event_time` DESC, U0.`event_id` DESC  LIMIT 1)
       AS `last_event_time` FROM `events`
As you can see, it's repeating the subquery three times.
So, I decided to give it a try at the best I could do in SQL, and I struggled to get the following working:
SELECT module.module_id,
       events.event_id,
       events.event_time,
       files.path
       FROM modules INNER JOIN events ON events.event_id =
           (SELECT events.event_id FROM events
            WHERE modules.module_id = events.module_id
                ORDER BY events.event_time DESC, events.event_id DESC LIMIT 1)
       INNER JOIN files ON files.file_id = events.file_id;
This runs in 0.001s. So, the problem now is that I can't get this done in the Django ORM language. Of course I could just place the raw SQL query and I would be done, but how would I live with such a disgrace?
I have investigated throughout the Django docs, struggled all over the stackoverflow questions, but I couldn't find the answer. The nearest I got was this, but the problem is that I can't get that limited to one result per module.
I have also tried the FilteredRelation, but couldn't get an appropriate filter. I can't use select_related() neither, because it's a reverse relation with a ForeignKey. I can't either use distinct() with a column field, because I'm using MySQL (more specifically, MariaDB version 10.3).
Do you have any advice on how to approach this?
Thank you!
