I'm really bad at SQL and I would like to know what SQL I can run to solve the problem below which I suspect to be a NP-Complete problem but I'm ok with the query taking a long time to run over large datasets as this will be done as a background task. A standard sql statement is preferred but if a stored procedure is required then so be it. The SQL is required to run on Postgres 9.3.
Problem: Given a set of articles that contain a set of keywords, find the top n articles for each article that contains the most number of matching keywords.
A trimmed down version of the article table looks like this:
CREATE TABLE article (
  id character varying(36) NOT NULL,  -- primary key of article
  keywords character varying,         -- comma separated set of keywords
  CONSTRAINT pk_article PRIMARY KEY (id)
);
-- Test Data
INSERT INTO article(id, keywords) VALUES(0, 'red,green,blue');
INSERT INTO article(id, keywords) VALUES(1, 'red,green,yellow');
INSERT INTO article(id, keywords) VALUES(2, 'purple,orange,blue');
INSERT INTO article(id, keywords) VALUES(3, 'lime,violet,ruby,teal');
INSERT INTO article(id, keywords) VALUES(4, 'red,green,blue,yellow');
INSERT INTO article(id, keywords) VALUES(5, 'yellow,brown,black');
INSERT INTO article(id, keywords) VALUES(6, 'black,white,blue');
Which would result in this for a SELECT * FROM article; query:
Table: article
------------------------
id  keywords            
------------------------
0   red,green,blue      
1   red,green,yellow    
2   purple,orange,blue  
3   lime,violet,ruby,teal
4   red,green,blue,yellow
5   yellow,brown,black
6   black,white,blue
Assuming I want to find the top 3 articles for each article that contains the most number of matching keywords then the output should be this:
------------------------
id  related
------------------------
0   4,1,6
1   4,0,5
2   0,4,6
3   null
4   0,1,6
5   1,6
6   5,0,4
 
     
    