0

I need to extract numbers from a string in Excel and the numbers need to be separated by a comma as shown in row 3 (or any other separator).

enter image description here

I can extract the numbers in row 1 by:

=SUMPRODUCT(MID(0&A1, LARGE(INDEX(ISNUMBER(--MID(A1, ROW(INDIRECT("1:"&LEN(A1))), 1)) * ROW(INDIRECT("1:"&LEN(A1))), 0), ROW(INDIRECT("1:"&LEN(A1))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A1)))/10)´´´

Or in row 2 by:

=SUM(MID(0&A2,LARGE(INDEX(ISNUMBER(--MID(A2,ROW($1:$99),1))*ROW($1:$99),),ROW($1:$99))+1,1)*10^ROW($1:$99)/10)

However, I would need them to be separated somehow.

NOTE: I have up to 200 different combination of numbers and car brands, which cause simple substitute functions to reach their limits. I prefer a formula based solution without hidden interim columns if possible (certainly no manual mouse clicking). My VBA skill are low, but I'm open for new challenges!

2 Answers2

5

With Office 365 or 2019 Excel we can use TEXTJOIN and FILTERXML.

=TEXTJOIN(",",,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,","," ")," ","</s><s>")&"</s></t>","//s[number()=.]"))

The FILTERXML parses a made up string that looks like xml. We substitute the spaces and the commas for </s><s> then only return the nodes that are numeric.

The TEXTJOIN takes the array returned and adds a comma between each element.

enter image description here

Without text join we can concatenate many:

IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][1]"),"")

The added [1] to the FILTERXML is which node to return. So we can string a bunch together:

=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][1]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][2]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][3]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][4]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][5]"),"")&
IFERROR(", "&FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,","," "),"  "," ")," ","</s><s>")&"</s></t>","//s[number()=.][6]"),"")

This will do a max of 6, if there possibly can be more add more lines changing the number in the [#] to the next.

enter image description here

Scott Craner
  • 23,868
2

Edit:
It seems the real data is more complicated than the sample initially submitted. The simplest way to program this is to use Regular Expressions to extract the numbers

  • Extract any digit string that follows either the start of the string; or a comma optionally followed by one or more spaces

Edit2 Converted to late-binding after I read template will be distributed

Option Explicit
Function extrNums(cellRef) As String
    Dim RE As Object, MC As Object, M As Object
    Dim sTemp As Variant
    Const sPat As String = "(?:^|,\s*)(\d+)\b"

Set RE = CreateObject("vbscript.regexp") With RE .Global = True .MultiLine = True .Pattern = sPat If .Test(cellRef) Then Set MC = .Execute(cellRef) For Each M In MC sTemp = sTemp & ", " & M.SubMatches(0) Next M End If End With

extrNums = Mid(sTemp, 3)

End Function

enter image description here

Regex Explanation
extract numbers
(?:^|,\s*)(\d+)\b

Options: ^$ match at line breaks

Created with RegexBuddy