I want to do some basic geocoding of addresses using Postgres. I have an address table that has around 1 million raw address strings:
=> \d addresses
  Table "public.addresses"
 Column  | Type | Modifiers
---------+------+-----------
 address | text |
I also have a table of location data:
=> \d locations
   Table "public.locations"
   Column   | Type | Modifiers
------------+------+-----------
 id         | text |
 country    | text |
 postalcode | text |
 latitude   | text |
 longitude  | text |
Most of the address strings contain postalcodes, so my first attempt was to do a like and a lateral join:
EXPLAIN SELECT * FROM addresses a
JOIN LATERAL (
    SELECT * FROM locations
    WHERE address ilike '%' || postalcode || '%'
    ORDER BY LENGTH(postalcode) DESC
    LIMIT 1
) AS l ON true;
That gave the expected result, but it was slow. Here's the query plan:
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=18383.07..18540688323.77 rows=1008572 width=91)
   ->  Seq Scan on addresses a  (cost=0.00..20997.72 rows=1008572 width=56)
   ->  Limit  (cost=18383.07..18383.07 rows=1 width=35)
         ->  Sort  (cost=18383.07..18391.93 rows=3547 width=35)
               Sort Key: (length(locations.postalcode))
               ->  Seq Scan on locations  (cost=0.00..18365.33 rows=3547 width=35)
                     Filter: (a.address ~~* (('%'::text || postalcode) || '%'::text))
I tried adding a gist trigram index to the address column, like mentioned at https://stackoverflow.com/a/13452528/36191, but the query plan for the above query doesn't make use of it, and the query plan in unchanged.
CREATE INDEX idx_address ON addresses USING gin (address gin_trgm_ops);
I have to remove the order by and limit in the lateral join query for the index to get used, which doesn't give me the results I want. Here's the query plan for the query without ORDER or LIMIT:
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Nested Loop  (cost=39.35..129156073.06 rows=3577682241 width=86)
   ->  Seq Scan on locations  (cost=0.00..12498.55 rows=709455 width=28)
   ->  Bitmap Heap Scan on addresses a  (cost=39.35..131.60 rows=5043 width=58)
         Recheck Cond: (address ~~* (('%'::text || locations.postalcode) || '%'::text))
         ->  Bitmap Index Scan on idx_address  (cost=0.00..38.09 rows=5043 width=0)
               Index Cond: (address ~~* (('%'::text || locations.postalcode) || '%'::text))
Is there something I can do to get the query to use the index, or is there a better way to rewrite this query?
 
     
     
     
    