EDIT 3 - MySQL version is 8.0.33.
EDIT 2 - See final working code at the bottom. Thanks @Akina !
I have a table of scores for a sporting event. The table has three relevant fields that I want to select -
scoreIDas a primary key valueclassifierIDthat maps to the primary key of another table which has details about particular course layoutscalculatedPercentthat is the outcome of a particular event
The table also has three other fields that I use in a WHERE clause but those are incidental.
I need to generate a query that selects the four best values for calculatedPercent with the stipulation that no classifierID can be duplicated. I need to be able to capture the scoreID for use in a later stage of the process.
Here was my first query:
SELECT `masterScores`.`scoreID`, `masterScores`.`classifierID`, `masterScores`.`calculatedPercent`
FROM `masterScores`
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE"
ORDER BY `masterScores`.`calculatedPercent` DESC LIMIT 4
Initially I thought that this was great as it did select the scoreID value for the rows that had the highest calculatedPercent values for a given member. Then I noticed that several members had their first and second highest scores on the same course, which violates the requirement that classifierID values not be duplicated.
I played around with SELECT DISTINCT for a bit but eventually realized that what I really needed was GROUP BY so I did some research and found Error related to only_full_group_by when executing a query in MySql but that didn't completely solve my issue.
I next tried:
SELECT `masterScores`.`scoreID`, `masterScores`.`classifierID`, MAX(`masterScores`.`calculatedPercent`) AS bestPercent
FROM `masterScores`
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE"
GROUP BY `masterScores`.`classifierID`
ORDER BY bestPercent DESC LIMIT 4
This me the following error message:
#1055 - Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column '.masterScores.calculatedPercent' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I considered using MIN and MAX for the masterScores.scoreID column but it wasn't matching expectations; the scoreID primary key value wasn't always the one for the selected calculatedPercent. I read somewhere that, because scoreID is a primary key, I could fix this by using the ANY_VALUE aggregate. I tried this:
SELECT ANY_VALUE(`masterScores`.`scoreID`), `masterScores`.`classifierID`, MAX(`masterScores`.`calculatedPercent`) AS bestPercent
FROM `masterScores`
WHERE `masterScores`.`memberID` = 3516 AND `masterScores`.`eventDivision` = "O" AND `masterScores`.`scoreUnusable` != "TRUE"
GROUP BY `masterScores`.`classifierID`
ORDER BY bestPercent DESC LIMIT 4
At first glance, this did seem to be working but it wasn't consistently returning scoreID values that matched the bestPercent values.
Again, the objective is to:
Only select 1
calculatedPercentand 1scoreIDvalue for eachclassifierID, subject to the indicated WHERE clauses. If not grouped byclassifierIDthere could be anywhere from 0 to 400 rows for eachclassifierIDthat satisfy the WHERE clause so I do think GROUP BY is appropriate here.Ensure that the selected
calculatedPercentfor each of the groupedclassifierIDis the highest numerical value among all optionsEnsure that only 4 rows are selected and that they are the rows with for which the selected
calculatedPercentvalue is highest.Ensure that the 4 selected rows are ranked in descending order according to the
calculatedPercentvalues.Ensure that the
scoreIDvalue for each of the selected rows is actually representing the same row as the selectedcalculatedPercent(currently, this is the point at which my query is failing).
Here's a subset of data, as a example:
| scoreID | classifierID | bestPercent |
|---|---|---|
| 58007 | 42 | 66.60 |
| 63882 | 42 | 64.69 |
| 64685 | 54 | 64.31 |
| 58533 | 32 | 63.20 |
| 55867 | 42 | 62.28 |
| 66649 | 7 | 56.79 |
| 55392 | 12 | 50.28 |
| 58226 | 1 | 49.52 |
| 55349 | 7 | 41.10 |
Here's the desired output when I run the query:
| scoreID | classifierID | bestPercent |
|---|---|---|
| 58007 | 42 | 66.60 |
| 64685 | 54 | 64.31 |
| 58533 | 32 | 63.20 |
| 66649 | 7 | 56.79 |
Here's the actual output when I run the query:
| scoreID | classifierID | bestPercent |
|---|---|---|
| 55867 | 42 | 66.60 |
| 64685 | 54 | 64.31 |
| 58533 | 32 | 63.20 |
| 55349 | 7 | 56.79 |
As shown, the scoreID values for the first and fourth rows of actual output are not correct.
At this point, I welcome any advice.
EDIT 2 - Final working solution
WITH cte AS (
SELECT scoreID, classifierID, calculatedPercent AS bestPercent,
ROW_NUMBER() OVER (PARTITION BY classifierID ORDER BY calculatedPercent DESC, scoreID DESC) AS rn
FROM masterScores WHERE memberID = 3516 AND eventDivision = "O" AND scoreUnusable != "TRUE"
)
SELECT scoreID, classifierID, bestPercent
FROM cte
WHERE rn = 1
ORDER BY bestPercent DESC
LIMIT 4
I was able to test this against a half-dozen cases that were giving problems and this solution solved every one of them. Again, thanks @Akina !
Going to mark this solved.