I'm using Hive as a database and I need a SQL method.
I want to replace an URL with the domain name. So for instance when I have http://info.myurl.com/some/path/into/the/server, I want to have myurl.com. But when I have https://myurl.com/ I also want to get myurl.com.
I've found the regexp_extract function in Hive, so I am this far:
select regexp_extract('http://info.myurl.com/some/path/into/the/server', 'http[s]?://.*?\.([^/]*)', 1) from dual;
Now the difficult part: how to know if there are 1 or 2 dots in the domain, so I know if I have to strip the first subdomain?