I'm trying to extract the volume of a string field to sort by it numerically.
Given the following data:
- Something at 300 ml
- 300Ml somthing
- Something special (with 300 Ml)
- 8-v something that should not match
First attempt:
Just cast the string to unsigned (found in this thread: reference)
Problem: It'll obviously also take the 4th data example into account.
Second attempt: Use REGEXP_REPLACE to extract the value (found in this thread: reference)
CAST(
    REGEXP_REPLACE(
        LOWER(column),
        "[:digit:]+[:space:]*ml",
        "///"
    ) as UNSIGNED
) AS volume
...using the character_classes defined in the manual.
Problem: The character_classes seem not to work as intended (possibly an issue with the character class name for spaces?).
Third attempt: Using the common Regexp tokens
CAST(
    REGEXP_REPLACE(
        LOWER(column),
        "\d+\s*ml",
        "///"
    ) as UNSIGNED
) AS volume
...which seems to work better.
Problem: It also matches the 4th data example again.
Question: How do I perform some kind of positive lookahead in MySQL REGEXP_REPLACE that will only match data examples 1-3 (ignoring the case sensitiveness due to casting to lower)?
The one i've tried seem to work fine in PCRE (with global flag) but not in MySQL:
^.*\d+\s*ml.*$ (ignoring the missing capturing group)
Which is translated like: Match everything until 1 or more digits, followed by 0 or more whitespaces and the string ml is found.
 
     
    