I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.
The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:
SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20
... to select all rows needed, and then
SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5
... in a loop per each row selected by the first query.
Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts setup is just for example, I made this up so it's, hopefully, easier to understand what I need.
I started with the following query:
SELECT *
FROM `posts`
WHERE `user_id`
IN (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4
)
LIMIT 5
But I get the #1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' error.
So I've tried the JOIN idea like suggested here:
SELECT posts.id,
posts.user_id,
NULL
FROM (
SELECT posts.user_id
FROM posts
WHERE posts.deleted = 0
LIMIT 20
) q
JOIN posts
ON posts.user_id = q.user_id
I've also tried several nested queries as suggested here:
SELECT *
FROM posts
WHERE user_id IN (
SELECT * FROM (
SELECT user_id
FROM posts
LIMIT 20
)
as t);
And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN, RIGHT JOIN, even INNER JOIN, but still no success.
Please help.
UPDATE Forgot to mention that the table is around 5GB in size.
UPDATE Tried the sub-sub query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
Same as above, it returns the following:
+----+---------+------+
| id | user_id | post |
+----+---------+------+
| 1 | 1 | a |
+----+---------+------+
| 2 | 1 | b |
+----+---------+------+
| 3 | 1 | c |
+----+---------+------+
| .. | .. | .. |
I.e. the 5 (which is what the outer LIMIT is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:
SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
I get the 4 distinct values:
+---------+
| user_id |
+---------+
| 1 |
+---------+
| 2 |
+---------+
| 3 |
+---------+
| 4 |
+---------+