I need to validate user input for an href on the server side and need to make sure only http:// and https:// are allowed as a protocol (if specified at all.) The objective is to eliminate possible malicious code like javascript:... or anything alike.
What makes it difficult is the number of ways the colon could be encoded in such string e.g. :, :, :, : , :. I'd like to transform the value and see it as the browsers do before they render the page.
One option could be building a DOM document using AngleSharp as it does the perfect job when parsing attributes. Then I could retrieve the value and validate it but it seems somewhat of an overkill to build the whole DOM tree just to parse one value. Is there a way to use AngleSharp to parse just an attribute value? Or is there a lib which I could use just for this task?
I also found this question, but the method used in there does not really parse the URIs the way browsers do.