After having endlessly tried to handle large (3-35gb) csv files in R, I have moved over to SQL for handling these datasets. So I am using this code within an R environment (using the using the SQlite based RSQLite package) but it should not detract from my SQL question!
My Question: How do I make a selection of one table based on matching values given in another table?
I would Like to explain by example. I have the following table format:
"Data" Table
Symbol| Value| EX
A  | 1  | N       
A  | 1  | N     
A  | 2  | T  
A  | 3  | N  
A  | 4  | N  
A  | 5  | N  
B  | 1  | P       
B  | 2  | P  
B  | 2  | N  
B  | 2  | N  
B  | 3  | P  
B  | 5  | P  
B  | 6  | T  
... 
I want to select all entries for which the symbol and exchange value matches according to a certain condition given in the example table below.
"Symbolexchange" Table:
Ticker| Exchange
A  | N       
B  | P  
... 
(Note that symbol and ticker refer to the same attribute, also EX and Exchange refer to the same attribute)
So the output I am aiming for is so that it only keeps A entries given exchange is N etc:
Symbol| Value| EX
A  | 1  | N       
A  | 1  | N     
A  | 3  | N  
A  | 4  | N  
A  | 5  | N  
B  | 1  | P       
B  | 2  | P  
B  | 3  | P  
B  | 5  | P  
... 
I was able to do this via two methods, although I am not quite satisfied with them.
This method adds the reference table in columns next to the original table, which is redundant.
SELECT *
FROM Data
INNER JOIN Symbolexchange 
ON Data.EX=Symbolexchange.EXCHANGE
AND Data.SYMBOL=Symbolexchange.TICKER
This Method also gets the job done directly but is slower than the above.
SELECT *
FROM Data
WHERE EX=(SELECT exchange FROM Symbolexchange WHERE ticker = SYMBOL)
Is there a better and faster way way to program this? Speed is quite important because of the size of my data sets. Any other comments on my code welcome!
Thanks