13

I was just wondering how GeoIP services collect data about IPs geo location besides checking IP address WHOIS information. For example I stumbled upon this website, that says that IP 74.207.244.221 is being located in Fremont, California: https://ipinfo.io/74.207.244.221

But I can not find this info on this IP whois information. ipinfo.io states that:

Originally our API used MaxMind data, but we've been very busy working on creating our own geolocation data. We've made a lot of progress, and we now use our own data to service around half of all requests. We do still fallback to MaxMind data though

And this got me interested, what are the ways in which services like ipinfo.io and MaxMind collect GeoIP data?

Learner
  • 337

2 Answers2

12

Such services usually use 3 ways to geolocate an IP address:

  1. Going through whois databases to search for an address;
  2. Tracking reverse DNS queries to try and find clues based on domain-name records or tracking the path of packet sent to the destination, which could also give clues (using traceroute, for example).
  3. And lastly, they use RTT triangulation.

Round-Trip Time (RTT) Triangulation is a method used to obtain the approximate geolocation of an IP address by measuring the ping latency from three different locations.

For example, if you have three servers spread across the world in the shape of a triangle, and if you ping an IP address from all the three and get the same results for latency, then that would mean that the IP address is located right in the centre of that triangle. It's the way triangulation works, however, in this case it is used with ICMP pings.

Resources you can read:
What is ping? @ Wikipedia
SIGCOMM paper about RTT triangulation

Fanatique
  • 5,153
5

I'm the founder of IPinfo, so I can definitely offer some details around this! There's not one single method we use, or a single data source, to produce our own geolocation database (or any of our other data sets, like IP to company, or IP to carrier). It's a mix of a bunch of different data sets, data processing techniques, and lessons learned doing this for a several years now!

Some data sources and techniques not often mentioned include:

  • Direct feeds from ISPs. Our service handles around 500 million API requests a day, and it used on many popular high profile websites. Therefore ISPs are incentivized to provide us with accurate up-to-date geolocation data so that their customers get a great experience on the web. We're working directly with more and more ISPs all the time.

  • GPS location data. It's possible to collect precise location information with GPS on mobile devices. You can pair that with the IP address and some network topology inference to work out the location for IP ranges given just a few measurements.

  • User submitted corrections. When we do get the location wrong (or it hasn't been updated after a change) we'll often quickly get feedback from users, and can manually fix the location, or tweak our algorithm to ensure it's correctly located on the next run of our data processing pipeline.

For our IP to company data set we actually scrape every single domain name every month, and cross reference the data we extract there with IP ownership information, rwhois records and more. We then also use the domain scraping data to show what domains are hosted on what IP addresses, and also in our IP type classifier, along with many other data sources, to determine the probability of an IP address being primarily used as a residential ISP, business, or hosting provider. We also analyze the link structure of those pages, and show some of this data on host.io.