I'm trying to manage a "failover" setup for NAT-ed private network with two not very stable WAN (LTE/3g) connections.
Topology is quite typical: hosts from within the internal network connect to the internal interface [int_if] of the OpenBSD box then traffic is NAT-ed with PF to one of the two external interfaces [ext_if1 and ext_if2].
LAN---->[int_if]--NAT--[ext_if1 or ext_if2]---->WAN1 or WAN2 (depending on kernel decision).
I use static IPs on both egress interfaces, each on a different subnet and with multipath default routes to the ISP's LTE/3g router boxes, as described in OpenBSD Faq for equal-cost load balancing.
Unfortunately I suffer from this annoying behaviour:
Multipath routes do not work if set from within the hostname.interface file. In that case, both default routes are present in the routing table but without "P" flags. As long as I don't flush the routes and set them back again manually, all traffic is forwarded through only one of the two gateways, picked as I guess depending on the alphabetical order of interface name. Looks as if it were jus a "standard" default gateway, no multipath engaged (
net.inet.ip.multipathis of course set to1).While multipath default routes set "by hand" seem to work well - "P" flag appears in routing table and neststat -r shows growing traffic on both routes as well as sequence of traceroute commands then comes another mess.
ping -I one_of_ext_if's_ip some.internet.host works I'd say... randomly, not depending on the actual ISP connection state. Sometimes using the ip address instead of the cname helps on one interface, while the other "prefers" the cname. Pings go randomly to void, while http traffic from the LAN is equal-cost loadbalanced with no lags. I do not block any outgoing traffic with PF so that is not the point, disabling pf at all changes nothing (except cutting off the LAN from world). Setting everything back again to standard, single default gateway setup fixes problem - I can switch between both default routes by hand and pings are fine if the ISP is alive.
I spent two nights googleing and testing to find out what am I doing wrong and still have no idea why that config acts so weird. And of course no idea how could I monitor connections for failover with ifstated when pinging world does not work even if my ISPs are still alive. And when they really died for a while loadbalancing sent LAN requests to void picking undetected dead connection.
I'd like to detect such event and change the route to the single default gateway on working connection until the other ISP starts to responding again. Without working pings I'm stuck with useless ifstated in background.
I appreciate any help...