Many websites use DNS servers from CDNs like Cloudflare that hide their origin IP with reverse proxy. how do the DNS caching servers work in these situations? because many websites can show to be using the same IP address, Cloudflare's, so I assume that'd result in many errors for clients/users of DNS caching services, like the one in Windows OS.
2 Answers
A company that uses a CDN will set up duplicate servers around the world. These servers are known to all the CDN servers.
The one and same IP address you mentioned points to one DNS server which is part of the CDN.
This DNS server will only forward the DNS request to a further DNS server that will be the authoritative server for the domain. It may not give the final answer itself (it's not required to), so this will behave as the usual DNS hierarchy of servers. The choice of this authoritative server is mostly influenced by the geographical location of the client's IP.
The final answer for the DNS query will be a company web server that is serving the queried domain (belongs to the queried domain), which is judged to be the "nearest".
To avoid problems and congestion, the DNS response will have a small time-to-live (TTL), so the process may repeat itself from time to time, and perhaps return another server the next time.
- 498,455
When it comes to Cloudflare which is both CDN and authoritative DNS server, how do the DNS caching services like the one built in Windows or other caching server software, know which IP address belongs to which domain?
They don't need to know that. All DNS caches, just as DNS itself, only map domain names to values ('A' records) – never the other way around.
If for example the DNS caching software remembers that 1.2.3.4 points to facebook.com and also that IP is for google.com, bing.com etc., the cache is useless, so what am I missing?
No, that's not what DNS caching software remembers at all. It does the opposite – both the DNS cache, and the DNS system as a whole, only care that google.com points to 1.2.3.4, not that the address "points" back.
Entries in the DNS cache look exactly like entries in authoritative DNS servers, with domain name as the lookup key. (In fact, beyond Windows, most DNS caches are just ordinary DNS servers that take standard queries and answer them.)
It is also not anything new to have multiple DNS entries resolving to the same address. (For example, even without a CDN, everyone has been using HTTP "virtual hosting" to host tens or hundreds of different websites on the same servers – accordingly with hundreds of DNS names resolving to the same server addresses.)
so DNS caching services cache the end result of a DNS query, which is the origin server IP address
Yes in general, however, when you're using a CDN such as Cloudflare, the end result of that DNS query is not the origin server anymore – it is the CDN server.
With Cloudflare in particular, the table of DNS records that you're seeing in Cloudflare's "control panel" is not actually the DNS records that Cloudflare serves to users. As soon as you enable proxying for a domain name, Cloudflare's authoritative DNS servers will instead start replying with the CDN's own IP addresses instead of the ones you've entered.
(Other CDNs work the same way, as the entire point of using a CDN is that your clients talk to it. For example, the superuser.com domain uses the Fastly CDN so it always resolves to IP addresses of nearby Fastly nodes.)
The purpose of short TTLs with CDNs is therefore not protection, but load-balancing (the CDN's authoritative server dynamically responds with "best" nodes at the moment, both in terms of load and geolocation).
When multiple DNS entries resolve to the same IP address, like google.com and facebook.com both resolve to 1.2.3.4, which element/component in the network is responsible to take the user to the correct website?
It's actually the client and the server, not the network.
Every HTTP client specifies the requested domain as part of the HTTP request – that's the "Host" header in HTTP/1.1 (or the ":authority" pseudo-header in HTTP/2). Web servers or reverse-proxies will map the received Host value to the corresponding <VirtualHost> or similar configuration.
(You can see all request headers using the F12 "Developer tools" in a browser – open the "Network" tab to start collecting data, then go back to the browser and hit F5 to reload the webpage, and go back to the "Network" section.)
Similarly, TLS clients also specify the requested domain as the "Server Name Indication" extension in TLS ClientHello so that the server could send the correct certificate.
In general each protocol has its own way of doing this, and not all protocols actually distinguish different domain names at all. (For example, SSH doesn't provide such information to the server as the goal isn't to connect to a "domain" but to the machine itself.)
But the network (at IP layer) doesn't care about domains (or websites) at all – its only job is to deliver packets to 1.2.3.4, and if the IP address is the same, then the destination server is also the same. ("Destination server" being the CDN reverse-proxy frontend, in this case.)
- 501,077