I have two tables, store_customers and additional_customers.
- Each - store_idin the- store_customerstable must be unique
- Each - store_id, customer_idin the- additional_customerstable must be unique.
- Each - store_idin the- additional_customerstable must exist in the- store_customerstable- CREATE TABLE store_customers ( autoinc int(10) unsigned NOT NULL AUTO_INCREMENT, store_id varchar(50) NOT NULL, customer_id varchar(50) NOT NULL, datetime datetime NOT NULL, field_1 varchar(50) DEFAULT NULL, field_2 varchar(50) DEFAULT NULL, PRIMARY KEY ('autoinc'), UNIQUE KEY 'store_id' ('store_id'), KEY 'customer_number' ('customer_number') ); INSERT INTO store_customers (store_id, customer_id, datetime, field_1, field_2) VALUES ('100', '10', '2011-01-01', 'aaa', 'bbb'), ('200', '20', '2012-01-01', 'ccc', 'ddd'), ('300', '20', '2013-01-01', 'eee', 'fff'), ('400', '40', '2014-01-01', 'ggg', 'hhh'), ('500', '50', '2015-01-01', 'iii', 'jjj'), ('600', '50', '2016-01-01', 'kkk', 'lll'), ('700', '70', '2017-01-01', 'mmm', 'nnn'), ('800', '70', '2018-01-01', 'ooo', 'ppp'), ('900', '90', '2019-01-01', 'qqq', 'rrr'); CREATE TABLE additional_customers ( store_id varchar(50) NOT NULL customer_id varchar(50) NOT NULL ); INSERT INTO store_customers (store_id, customer_id) VALUES ('400', '41'), ('400', '42'), ('500', '51'), ('500', '52'), ('700', '71'), ('700', '72'), ('800', '81'), ('800', '82'), ('900', '70');
To make things easier to read:
   SELECT * FROM store_customers;
   store_id  customer_id  datetime     field_1  field_2
   100       10           2011-01-01   aaa      bbb
   200       20           2012-01-01   ccc      ddd
   300       20           2013-01-01   eee      fff
   400       40           2014-01-01   ggg      hhh
   500       50           2015-01-01   iii      jjj
   600       50           2016-01-01   kkk      lll
   700       70           2017-01-01   mmm      nnn
   800       70           2018-01-01   ooo      ppp
   900       90           2019-01-01   qqq      rrr
   -------------------------------------------------------
   SELECT * FROM additional_customers;
   store_id  customer_id
   400       41
   400       42
   500       51
   500       52
   700       71
   700       72
   800       81
   800       82
   900       70
   
For each distinct customer_id, I need to find the matching record in store_customers with the earliest datetime.
For example, customer_id = 70 appears in store_customers twice, tied to two different store_ids (700, 800) - and in additional_customers once, with a different store_id (900) than the records in the prior table. For that customer_id, the earliest datetime is store_id = 700, so I get one row for that customer_id with matching data from store_customers WHERE store_id = 700.
My end result would be:
store_id  customer_id  datetime     field_1  field_2
100       10           2011-01-01   aaa      bbb           
200       20           2012-01-01   ccc      ddd
400       40           2014-01-01   ggg      hhh
400       41           2014-01-01   ggg      hhh
400       42           2014-01-01   ggg      hhh
500       50           2015-01-01   iii      jjj
500       51           2015-01-01   iii      jjj
500       52           2015-01-01   iii      jjj
700       70           2017-01-01   mmm      nnn
700       71           2017-01-01   mmm      nnn
700       72           2017-01-01   mmm      nnn
800       81           2018-01-01   ooo      ppp
800       82           2018-01-01   ooo      ppp
900       90           2019-01-01   qqq      rrr
My current query "works", but I think it may be picking rows arbitrarily(?) - which I don't want. Also, EXPLAIN tells me this is terribly inefficient:
SELECT   sc.store_id,
         customers.customer_id,
         MIN(sc.datetime),
         sc.field_1,
         sc.field_2
FROM     (
           SELECT store_id, customer_id
           FROM   store_customers
           UNION
           SELECT store_id, customer_id
           FROM   additional_customers
         ) AS customers
JOIN     store_customers sc
         ON sc.store_id    = customers.store_id
         OR sc.customer_id = customers.customer_id
GROUP BY customer_id;
How can I make a more accurate, efficient query to return my expected result?
Edit: I should add - I can add additional indexes if necessary. Also, I'm using MySQL version 5.7 - so I can't use window functions.
 
    