I'm working on a page tracking web app and I'd like to get the canonical domain for a list of sites. As far as I know there is no good way of telling where a site's ownership of subdomains and top level domains starts and ends. I'm not sure the best way to describe that, so here is an example:
If I own a personal URL, mysite.com, I am able to set up subdomains such as www.mysite.com, cdn.mysite.com, and so forth.
If my "group" has a website at a university, such as computerscience.myuni.edu, I might have also have control over www.computerscience.myuni.edu, but not myuni.edu
If I am a huge business and and need to spread web traffic out, I might even have www.acme.com, ww2.acme.com, ww3.acme.com, etc.
So nothing is certain but if I'm given a URL I can probably strip of www., ww2., and cdn., and maybe secure. from the front, but are there any other common "subdomains" that I'm not thinking of that are fairly common and generally not used to serve up a different website?
I'm guess I'm just trying to figure out the best way to get the real "canonical" domain name for a site.
