31

We all know that HTTPS encrypts the connection between the computer and the server so that it cannot be viewed by a third party. However, can the ISP or a third party see the exact link of the page the user accessed?

For example, I visit

https://www.website.com/data/abc.html

Will the ISP know that I accessed */data/abc.html or just know that I visited the IP of www.website.com?

If they know, then why does Wikipedia and Google have HTTPS when someone can just read the internet logs and find out the exact content the user viewed?

Anonymous
  • 319

3 Answers3

47

From left to right:

The schema https: is, obviously, interpreted by the browser.

The domain name www.website.com is resolved to an IP address using DNS. Your ISP will see the DNS request for this domain, and the response.

The path /data/abc.html is sent in the HTTP request. If you use HTTPS, it will be encrypted along with the rest of the HTTP request and response.

The query string ?this=that, if present in the URL, is sent in the HTTP request – together with the path. So it's also encrypted.

The fragment #there, if present, is not sent anywhere – it's interpreted by the browser (sometimes by JavaScript on the returned page).

grawity
  • 501,077
12

The ISP will only know you visited the IP address associated with www.website.com (and maybe the URL if you are using their DNS and they are specifically looking for the traffic – if the DNS query does not go through that they won't see that).

(Bear with me a bit here – I do get to the answer.)

The way the HTTP protocol works is by connecting to a port (usually port 80) and then the web browser communicates what page it wants to the server – A simple request to look up http://www.sitename.com/url/of/site.html would have the following lines:

GET /url/of/site.html HTTP/1.1
host: www.sitename.com

HTTPS does exactly the same thing except on port 443 – and it wraps the entire TCP session (i.e., everything you see in the quoted bit above plus the response) into an SSL encrypted session – so the ISP does not see any of the traffic (but they may be able to infer something based on the size of the site, and the DNS lookup to resolve www.sitename.com to an IP address in the first instance).

Of course, if there are "web bugs" embedded in the page, this can give "partners" of the information distributors hints about what you are viewing and who you are – likewise, if your chain of trust is broken, an ISP can perform a man-in-the-middle attack. The reason why you can have private end-to-end encryption, in theory, is because of CA certificates distributed with your browser. If an ISP or government can either add a CA certificate or compromise a CA – and both have happened in the past – you lose your security. I believe that The Great Firewall of China effectively does Man-In-The-Middle attacks to read HTTPS data, but it's been a while since I was there.

You can test this easily enough yourself by getting a piece of software which will sniff the traffic entering and leaving your computer. I believe a free piece of software called Wireshark will do this for you.

Cloudy
  • 444
davidgo
  • 73,366
0

I'm not sure if this is comment or answer worthy, but I'd like to share one addendum.

The answers here show what should happen. The question is can the url be read.The answer to that is yes, though it is relatively unlikely.

An attacker (third-party) can absolutely intercept your https traffic and read all of your requests under specific cases. To learn more, I'd invited you to read MITM as well as SSLStrip. I can go into this more if necessary for understanding.

You should not expect your ISP to be doing this both because it is a waste of their bandwidth but also because they have more to lose if you were to find out and sue. However the more precise answer to your question Can this be done? is yes, though it is unlikely anyone will care enough to see what you're googling or wiki-ing.