1

This is a tricky one that requires a level of Excel function mastery I don't currently have. I have been studying Excel's site for way too long and I'm still not progressing.

I am trying to take a complicated URL like:

http://dp18776.live.domiain.net/active/MO_Mi_WFM

and parse it so that I get this:

live.domain.net

The closest I have gotten is with this:

http://dp18776.live.domiain.net

My logic:

=LEFT(A17, SEARCH(".net",A17)+3)

The reason I can't stop there is because the local part can be variable across the thousands of records, whereas only the domain is (e.g. live.domain.net) constant.

I also know I need to use the MID function, I was testing what I could get away with on more basic functions, but the bigger wrench in the works is that I need to be able to key off ".net" to retrieve the domain, so I need to set the starting point in MID() as ".net" - len(domain)

Would anyone be able to help me complete this formula?

GPP
  • 113

3 Answers3

4

Everybody here seems to focus on the .com or .net, but domains don't always end with .com or .net... however, they always end before the /.

So first find where the host name begins:

=FIND("://", A1) + LEN("://")
=FIND("://", A1) + 3

Then find the first slash afterwards – that's where the host name ends:

=FIND("/", A1, FIND("://", A1) + 3)

Or, just in case there's no path at all:

=IFERROR(FIND("/", A1, FIND("://", A1) + 3), LEN(A1) + 1)

And extract everything in between:

=MID(A1;FIND("://";A1)+3;IFERROR(FIND("/";A1;FIND("://";A1)+3);LEN(A1)+1)-(FIND("://";A1)+3))

A1: https://www.example.com/foo/bar
A2: =FIND("://";A1)+3
A3: =IFERROR(FIND("/";A1;A2);LEN(A1)+1)
A4: =MID(A1;A2;A3-A2)

If you want to throw away the leftmost domain component, you can use FIND(".",A1):

=MID(A1;FIND(".";A1)+1;IFERROR(FIND("/";A1;FIND(".";A1)+1);LEN(A1)+1)-(FIND(".";A1)+1))

A1: https://www.example.com/foo/bar
A2: =FIND(".";A1)+1
A3: =IFERROR(FIND("/";A1;A2);LEN(A1)+1)
A4: =MID(A1;A2;A3-A2)

For Salesforce, with no error checks:

=MID(A1,FIND("://",A1)+3,FIND("/",A1,FIND("://",A1)+3)-(FIND("://",A1)+3))

=MID(A1,FIND(".",A1)+1,FIND("/",A1,FIND(".",A1)+1)-(FIND(".",A1)+1))

Note: This still has a few assumptions – e.g. it won't work right if the URLs have a username@domain or username:password@domain. The FIND(".") version also assumes there are at least two domain components and won't work right with https://example/foo – it needs some extra IFERROR() checks for that.

grawity
  • 501,077
3

You can use regexp via VBA

and get your hostname with something like that:

https?:\/\/[a-zA-Z0-9-]+\.([a-zA-Z0-9.-]+)\/.*

group 1:

live.domain.net
1

Without knowing what the conditions are of your local part and where your hangups are, I'd search for the extension and then the first . as you don't have a www.. If you do - search for the second .. I'll provide the parts and then the actual formula.

Searching for whatever extension you need to find.

=IFERROR((SEARCH(".com",A1)),0)+IFERROR((SEARCH(".net",A1)),0)+IFERROR(SEARCH(".org",A1),0)

Searching for your first .

=SEARCH(".",A1)

Then combine them in =MID()

=MID(A1,SEARCH(".",A1)+1,IFERROR((SEARCH(".com",A1)),0)+IFERROR((SEARCH(".net",A1)),0)+IFERROR(SEARCH(".org",A1),0)-SEARCH(".",A1)+3)

If there isn't always a local part, then it may get trickier. There are also some other ways to do it if your local string is always the same length or always has a specific string, or certain values, etc.

You can get rid of the iferror portions if you always know the extension, or you can add additional extensions.

Raystafarian
  • 21,963
  • 12
  • 64
  • 91