This is a RegEx question.
Thanks for any help and please be patient as RegEx is definitely not my strength !
Entirely as background...my reason for asking is that I want to use RegEx to parse strings similar to SVG path data segments. I’ve looked for previous answers that parse both the segments and their segment-attributes, but found nothing that does the latter properly.
Here are some example strings like the ones I need to parse:
M-11.11,-22
L.33-44  
ac55         66 
h77  
M88 .99  
Z 
I need to have the strings parsed into arrays like this:
["M", -11.11, -22]
["L", .33, -44]
["ac", 55, 66]
["h", 77]
["M", 88, .99]
["Z"]
So far I found this code on this answer: Parsing SVG "path" elements with C# - are there libraries out there to do this? The post is C#, but the regex was useful in javascript:
var argsRX = /[\s,]|(?=-)/; 
var args = segment.split(argsRX);
Here's what I get:
 [ "M", -11.11, -22, <empty element>  ]
 [ "L.33", -44, <empty>, <empty> ]
 [ "ac55", <empty>, <empty>, <empty>, 66 <empty>  ]
 [ "h77", <empty>, <empty>  
 [ "M88", .99, <empty>, <empty> ]
 [ "Z", <empty> ]
Problems when using this regex:
- An unwanted empty array element is being put at the end of each string's array.
- If multiple spaces are delimiters, an unwanted empty array element is being created for each extra space.
- If a number immediately follows the opening letters, that number is being attached to the letters, but should become a separate array element.
Here are more complete definitions of incoming strings:
- Each string starts with 1 or more letters (mixed case).
- Next are zero or more numbers.
- The numbers might have minus signs (always preceeding).
- The numbers might have a decimal point anywhere in the number (except the end).
- Possible delimiters are: comma, space, spaces, the minus sign.
- A Comma with space(s) in front or back is also a possible delimiter.
- Even though minus signs are delimiters, they must also remain with their number.
- A number might immediately follow the opening letters (no space) and that number should be separate.
Here is test code I've been using:
<!doctype html>
<html>
<head>
<link rel="stylesheet" type="text/css" media="all" href="css/reset.css" /> <!-- reset css -->
<script type="text/javascript" src="http://code.jquery.com/jquery.min.js"></script>
<style>
    body{ background-color: ivory; }
</style>
<script>
    $(function(){
var pathData = "M-11.11,-22 L.33-44  ac55    66 h77  M88 .99  Z" 
// separate pathData into segments
var segmentRX = /[a-z]+[^a-z]*/ig;
var segments = pathData.match(segmentRX);
for(var i=0;i<segments.length;i++){
    var segment=segments[i];
    //console.log(segment);
    var argsRX = /[\s,]|(?=-)/; 
    var args = segment.split(argsRX);
    for(var j=0;j<args.length;j++){
        var arg=args[j];
        console.log(arg.length+": "+arg);
    }
}
    }); // end $(function(){});
</script>
</head>
<body>
</body>
</html>
 
     
     
     
     
     
    