3

We are announcing our prefix to ISP A and ISP B via BGP, 1.0.0.0/23 to ISP A, and 1.0.0.0/24 and 10.0.1.0/24 to ISP B.

What we want is that when we withdraw 1.0.0.0/24 from ISP B, communication for 1.0.0.0/24 switches to ISP A seamlessly. (Because 1.0.0.0/23 includes 1.0.0.0/24) However, when we do this, we get a packet drop for about 2-3 seconds (I tried pinging it and it shows TTL Expire).

Of course, I assume that if I announce 1.0.0.0/23 to ISP B as well, the problem will be solved. But why are we getting these packet drops?

3 Answers3

6

The withdrawal of the route from your router is only the beginning of the process which updates routing tables across whole Internet, not the end of it. Why do you think it is going to happen instantly?

First of all, you are highly likely to loose packets which were heading towards or through ISP B network when the withdrawal happens.

Second - while for distant sites it probably is not a big deal, the neighbours of your BOTH ISPs would need to recalculate routes and update their FIBs. And again this can cause packet loss at specific moments in time. Imagine site X which has all routes towards you. It sends a packet towards next router (Y) using more specific route (the one being withdrawn). The next router (Y) already removed the more specific route towards you but it sees less specific route towards site X. See the problem?

Tomek
  • 1,288
4

when we do this, we get a packet drop for about 2-3 seconds (I tried pinging it and it shows TTL Expire).

Typically it means one of the routers involved has already processed the update, but another has not yet done so (either because the update physically hasn't reached it yet, or because its control plane doesn't have as much processing power as the first one).

During this time, gateway B knows the new route via A, but A still has the old route via B, resulting in a ping-pong between the two. (If you watch mtr during the process, you might see this happening at more than just one place.)

grawity
  • 501,077
2

Just announcing the same prefix for the new route is not enough, as the new one isn't better for most routers.

Still, there is a way to avoid the problem, assuming you can keep both connections open until you are done:

  1. Announce the minimum number of more specific routes to reroute all traffic from the route you want to shut down.

  2. Withdraw the supplanted route, and announce the new route (or ensure there is an appropriate route, like in your case).

  3. Remove the now redundant more specific routes to clean up after you.

Don't forget to wait for thorough propagation between the steps.