I have tables called Movie, Genre and Keyword, from which I have created a view called 'genkeyword'. The view 'genkeyword' has a lot of tuples, so it can be accessed at DB Fiddle.
I have the following query:
SELECT title, 
       year, 
       Count(DISTINCT genre)   AS genre_freq, 
       Count(DISTINCT keyword) AS keyword_freq 
FROM   genkeyword 
WHERE  ( genre IN (SELECT genre 
                   FROM   genkeyword 
                   WHERE  title = 'Harry Potter and the  Deathly Hallows') 
          OR keyword IN (SELECT keyword 
                         FROM   genkeyword 
                         WHERE  title = 'Harry Potter and the  Deathly Hallows') ) 
       AND title <> 'Harry Potter and the Deathly Hallows' 
GROUP  BY title, 
          year 
ORDER  BY genre_freq DESC, 
          keyword_freq DESC; 
What I am intending to do with this query is to get the genre and keyword frequency for each movie that has genres and keywords that are in common with Harry Potter: The output should be:
title                      |      genre_freq    |    keyword_freq
Cinderella                        2                        2
The Shape of Water                2                        1
How to Train Your Dragon          2                        0
Enchanted                         1                        3
I know that the query is not correct, since the I get the following output instead:
    title                      |      genre_freq    |    keyword_freq
    The Shape of Water                4                  3       
    Enchanted                         3                  4
    Cinderella                        2                  5
    How to Train Your Dragon          2                  3              
However, I would like to clarify my understanding about how the query works.
In the 'where' clause of my query:
where (genre in (select genre from genkeyword where title='Harry Potter') or 
keyword in (select keyword from genkeyword where title='Harry Potter')) 
Am I right in saying that there are two result sets generated, one containing all the tuples which have a genre that is in Harry Potter (let this be R1) and the other containing all the tuples that have a keyword that is in Harry Potter (let this be R2)?
If the tuple under consideration contains a genre that is in the genre result set R1 or a keyword that is in the keyword result set R2, then the genre/keyword is counted. I am not sure how count(distinct genre) and count(distinct keyword) works in this case. If the tuple contains a genre that is in R1, is only the genre counted or is the keyword counted as well? This is the same for the case when the tuple contains a keyword in R2, is the genre counted as well as the keyword?
I don't understand why I am getting the genre_freq and keyword_freq values wrong from my query. This is because I don't fully understand how the genre and keyword frequencies are getting counted behind-the-scenes. Any insights are appreciated.
 
     
     
     
    