I have the following table in Postgresql 8.4.12:
           Table "public.ratings"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 userid | character varying(128) |
 item   | character varying(128) |
 score  | integer                |
Indexes:
    "ratings_item" btree (item)
    "ratings_ui" btree (userid, item)
    "ratings_userid" btree (userid)
I would like to perform a self join to find items rated by all users that rated a specific item. For simplicity sake, I'll use a query to just get the number of ratings for each suspect similar item like so;
select r2.item,sum(1)
from ratings r1
left join ratings r2 using (userid)
where r1.item='an3.php'
group by r2.item
The query works but for my table with 36 million records, it takes forever. When I explain the statement, I get the following:
 GroupAggregate  (cost=8102958.42..8247621.18 rows=16978 width=17)    ->  Sort  (cost=8102958.42..8151108.60 rows=19260072 width=17)
         Sort Key: r2.item
         ->  Hash Left Join  (cost=1458652.29..4192647.43 rows=19260072 width=17)
               Hash Cond: ((r1.userid)::text = (r2.userid)::text)
               ->  Bitmap Heap Scan on ratings r1  (cost=868.20..77197.24 rows=24509 width
=22)
                     Recheck Cond: ((item)::text = 'an3.php'::text)
                     ->  Bitmap Index Scan on ratings_item  (cost=0.00..862.07 rows=24509 width=0)
                           Index Cond: ((item)::text = 'an3.php'::text)
               ->  Hash  (cost=711028.93..711028.93 rows=36763293 width=39)
                     ->  Seq Scan on ratings r2  (cost=0.00..711028.93 rows=36763293 width
=39)
From experience, I assume the "Seq Scan on ratings r2" is the culprit.
On another note, if I search for an item that doesn't exist:
select r2.item,sum(1) from ratings r1 left join ratings r2 using (userid)
where r1.item='targetitem' group by r2.item;
It appears to work fine (ie no results are returned and it is immediate)
GroupAggregate  (cost=2235887.19..2248234.70 rows=16978 width=17)    ->  Sort  (cost=2235887.19..2239932.29 rows=1618038 width=17)
         Sort Key: r2.item
         ->  Nested Loop Left Join  (cost=0.00..1969469.94 rows=1618038 width=17)
               ->  Index Scan using ratings_item on ratings r1  (cost=0.00..8317.74 rows=2 059 width=22)
                     Index Cond: ((item)::text = 'targetitem'::text)
               ->  Index Scan using ratings_userid on ratings r2  (cost=0.00..947.24 rows= 419 width=39)
                     Index Cond: ((r1.userid)::text = (r2.userid)::text) 
The same table and query works fine in MySQL but I'm not in a position to migrate my recommender system to another database.
Did I do something wrong or is this something with Postgres? Is there a work around?
 
     
    