Currently I can extract the 'domain' from any URL with the following regex:
/^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n\?\=]+)/im
However I'm also getting subdomain's too which I want to avoid. For example if I have sites:
- www.google.com
- yahoo.com/something
- freds.meatmarket.co.uk?someparameter
- josh.meatmarket.co.uk/asldf/asdf
I currently get:
- google.com
- yahoo.com
- freds.meatmarket.co.uk
- josh.meatmarket.co.uk
Those last two I would like to exclude the freds and josh subdomain portion and extract only the true domain which would just be meatmarket.co.uk.
I did find another SOF that tries to solve in PHP, unfortunately I don't know PHP. is this translatable to JS (I'm actually using Google Script FYI)?
  function topDomainFromURL($url) {
    $url_parts = parse_url($url);
    $domain_parts = explode('.', $url_parts['host']);
    if (strlen(end($domain_parts)) == 2 ) { 
      // ccTLD here, get last three parts
      $top_domain_parts = array_slice($domain_parts, -3);
    } else {
      $top_domain_parts = array_slice($domain_parts, -2);
    }
    $top_domain = implode('.', $top_domain_parts);
    return $top_domain;
  }
 
     
     
     
     
     
    