I'm trying to clean up some auto-generated code where input URL fragments:
- may include spaces, which need to be 
%-escaped (as%20, not+) - may include other URL-invalid characters, which also need to be 
%-escaped - may include path separators, which need to be left alone (
/) - may include already-escaped components, which need not to be doubly-escaped
 
The existing code uses libcurl (via Typhoeus and Ethon), which like command-line curl seems to happily accept spaces in URLs.
The existing code is all string-based and has a number of shenanigans involving removing extra slashes, adding missing slashes, etc. I'm trying to replace this with URI.join(), but this fails with bad URI(is not URI?) on the fragments with spaces.
The obvious solution is to use the (deprecated) URI.escape, which escapes spaces, but leaves slashes alone:
URI.escape('http://example.org/ spaces /<"punc^tu`ation">/non-ascïï /&c.')
# => "http://example.org/%20spaces%20/%3C%22punc%5Etu%60ation%22%3E/non-asc%C3%AF%C3%AF%20%F0%9D%96%88%F0%9D%96%8D%F0%9D%96%86%F0%9D%96%97%F0%9D%96%98/%EF%BC%86%EF%BD%83%EF%BC%8E" 
This mostly works, except for case (3) above — previously escaped components get double-escaped.
s1 = URI.escape(s)
# => "http://example.org/%20spaces%20/%3C%22punc%5Etu%60ation%22%3E/non-asc%C3%AF%C3%AF%20%F0%9D%96%88%F0%9D%96%8D%F0%9D%96%86%F0%9D%96%97%F0%9D%96%98/%EF%BC%86%EF%BD%83%EF%BC%8E"
URI.escape(s)
# => "http://example.org/%2520spaces%2520/%253C%2522punc%255Etu%2560ation%2522%253E/non-asc%25C3%25AF%25C3%25AF%2520%25F0%259D%2596%2588%25F0%259D%2596%258D%25F0%259D%2596%2586%25F0%259D%2596%2597%25F0%259D%2596%2598/%25EF%25BC%2586%25EF%25BD%2583%25EF%25BC%258E" 
The recommended alternatives to URI.escape, e.g. CGI.escape and ERB::Util.url_encode, are not suitable as they mangle the slashes (among other problems):
CGI.escape(s)
# => "http%3A%2F%2Fexample.org%2F+spaces+%2F%3C%22punc%5Etu%60ation%22%3E%2Fnon-asc%C3%AF%C3%AF+%F0%9D%96%88%F0%9D%96%8D%F0%9D%96%86%F0%9D%96%97%F0%9D%96%98%2F%EF%BC%86%EF%BD%83%EF%BC%8E"
ERB::Util.url_encode(s)
# => "http%3A%2F%2Fexample.org%2F%20spaces%20%2F%3C%22punc%5Etu%60ation%22%3E%2Fnon-asc%C3%AF%C3%AF%20%F0%9D%96%88%F0%9D%96%8D%F0%9D%96%86%F0%9D%96%97%F0%9D%96%98%2F%EF%BC%86%EF%BD%83%EF%BC%8E"
Is there a clean, out-of-the-box way to preserve existing slashes, escapes, etc. and escape only invalid characters in a URI string?
So far the best I've been able to come up with is something like:
include URI::RFC2396_Parser::PATTERN
INVALID = Regexp.new("[^%#{RESERVED}#{UNRESERVED}]")
def escape_invalid(str)
  parser = URI::RFC2396_Parser.new
  parser.escape(str, INVALID)
end
This seems to work:
s2 = escape_invalid(s)
# => "http://example.org/%20spaces%20/%3C%22punc%5Etu%60ation%22%3E/non-asc%C3%AF%C3%AF%20%F0%9D%96%88%F0%9D%96%8D%F0%9D%96%86%F0%9D%96%97%F0%9D%96%98/%EF%BC%86%EF%BD%83%EF%BC%8E"
s2 == escape_invalid(s2)
# => true 
but I'm not confident in the regex concatenation (even if it is the way URI::RFC2396_Parser works internally) and I know it doesn't handle all cases (e.g., a % that isn't part of a valid hex escape should probably be escaped). I'd much rather find a library standard solution.