I have the following table:
CREATE TABLE test_regex (
    drug TEXT
);
INSERT INTO test_regex
VALUES
    ('DRUGA 200 MG'),
    ('DRUGB 150 MCG'),
    ('DRUGC 1.5 GM BOX'),
    ('DRUGD 27.2 MG/5 ML VIAL')
;
I want to extract the following:
200 MG
150 MCG
1.5 GM
27.2 MG
What I have tried so far:
SELECT
    substring(drug, '[0-9]*\.?[0-9]* MG|GM|MCG')
FROM
    test_regex
;
Which results in:
 200 MG
 MCG
 GM
 27.2 MG
I've also tried:
SELECT
    substring(drug, '[0-9]*\.?[0-9]* (MG|GM|MCG)')
FROM
    test_regex
;
With this result:
 MG
 MCG
 GM
 MG
I think the problem lies in how the trailing (MG|GM|MCG) group is handled but I couldn't find what I was looking for in the PostgreSQL Docs.  I was expecting to get the first number section, then a space, then either MG, GM or MCG.  I think its getting grouped as the number expression with MG, then either GM or MCG.
 
    
