OVS balance-slb bond Forwards Unknown Unicast

Published at 2019-10-13 | Last Update 2019-10-13

TL; DR

In some cases, OVS 2.5.6 bond with mode balance-slb will forward the unknown unicast [4] traffic it received from one physical NIC back into the physical network through another NIC, which will result in a L2 loop with physical network, this in turn will cause the physical switches complaining MAC flapping (the same MAC address presents at multiple places at a short time) and then stopping MAC learning for some time (e.g. 1 minute), which will be a severe network problem.

Note that at the time of writing this post, 2.5.x is the latest LTS, and 2.3.x is the previous LTS. The documents on balance-slb bond are same in those two series, however, the behavior changed according to our experiences.

1 Problem Description

Recently, our datacenter network team found that there were occasional MAC flapping problems, these problems occurred several times a day, but the consequences were severe: affected switches would stop learning new MAC for 1 minute or so, this would further result to, e.g. new instances (containers, VMs) booted during period would report gateway unreachable.

The physical switch errors looked like this:

FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.2ead.e6cf in vlan 2001 is flapping between port Po19 and port Po22
FWM-6-MAC_MOVE_NOTIFICATION: Host fa16.2ead.e6cf in vlan 2001 is flapping between port Po22 and port Po19

2 Infra & Environment Info

This section provides some basic infrastructure information to help understand the problem. For more detailed information, please refer to my previous post Ctrip Network Architecture Evolution in the Cloud Computing Era.

The data center network utilizes a 3-tier architecture, TOR switches are NOT stacked (stacking, e.g, Cisco vPC).

Fig. 2.1 Physical topology

Others:

OVS version: 2.5.6 (contrast group: 2.3.1)
OVS bond mode: balance-slb, with rebalance turned off (rebalance-interval=0)
Linux Kernel: 4.14

3 Trouble Shooting

3.1 Confirm: `balance-slb` mode bond forwards unicast flooding packets

As the problem appeared only since recently, we went through all our configuration and software changes within this period. At last, we suspected that upgrading OVS from 2.3.1 to 2.5.6 maybe the most potential cause.

We also found several similar reports [1,2], unfortunately these ended with no conclusions. But at this point, we were ensure that we were not the only user encountered this problem.

Digging into the documentation:

When the remote switch has not learned the MAC for the destination of a unicast packet and hence floods the packet to all of the links on the SLB bond, Open vSwitch will forward duplicate packets, one per link, to each other switch port.

Open vSwitch does not solve this problem.

It clearly says that it dose behave in this way.

3.2 Verify/Capture

Then we’d like to capture the occurrance when this problem happens. We used the following script to capture packets on an OVS 2.5.6 node:

$ cat capture.sh
#!/bin/bash

timeout 36000 tcpdump -i eth0 --direction=in  -s 16 -w eth0-in.pcap &
timeout 36000 tcpdump -i eth1 --direction=out -s 16 -w eth1-out.pcap &

timeout 36000 tcpdump -i eth1 --direction=in  -s 16 -w eth1-in.pcap &
timeout 36000 tcpdump -i eth0 --direction=out -s 16 -w eth0-out.pcap &

As we had a large traffic on NICs, we only captured the ether+IP headers of each packet (-s 16: first 16 bytes); at the meantime, we separate the traffic into two diretions:

TOR0 -> eth0 -> OVS -> eth1 -> TOR1
TOR1 -> eth1 -> OVS -> eth2 -> TOR0

So there are totally 4 captured files.

According to our observation, this problem was occasional (this means not all unicast flooding packets would be forwarded, but I haven’t dig further into this just some observation). We run 4 hours until the problem happened again on this host (let’s call it HostA), resulted total 30GB pcap files.

When problem happened, MAC fa:16:3e:b2:2e:27 was learned by TOR from this HostA’s eth1 interface, while this MAC actually belongs to a container on another host (HostB). So it is a unicast flooding traffic (we further confirmed that the dst_mac belongs to a container that’s not on HostA).

Check this in the pcap files:

$ tcpdump -e -r eth0-in.pcap | grep fa:16:3e:b2:2e:27
10:40.110642 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 64: [|vlan]
10:40.110891 fa:16:3e:d1:15:17 (oui Unknown) > fa:16:3e:b2:2e:27, length 78: [|vlan]
10:40.111090 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 78: [|vlan]
10:40.111118 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 78: [|vlan]
10:40.111214 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 78: [|vlan]
10:40.111275 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 78: [|vlan]
10:40.111299 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 78: [|vlan]
10:40.292307 fa:16:3e:d1:15:17 (oui Unknown) > fa:16:3e:b2:2e:27, length 78: [|vlan]

$ tcpdump -e -r eth1-out.pcap | grep fa:16:3e:b2:2e:27
10:40.111248 fa:16:3e:b2:2e:27 (oui Unknown) > fa:16:3e:d1:15:17, length 78: [|vlan]

Indeed, it was received from eth0 and forwared to eth1 then sent out. The process looked like Fig 2.1 (may not be that accurate, as I’m not very familir with some data center network details):

Fig. 2.2 OVS bond forwarded unicast flooding traffic, resulting a L2 loop which further caused MAC flapping

3.3 Why `2.3.1` is OK?

Checking 2.3.1 documentation, the balance-slb mode description is the same: it clearly says it will forward unicast flooding, but why we never encounter this problem in 2.3.1?

Unfortunately, I haven’t found any answer to this problem. But a code diff might provide some hints:

$ git diff v2.3.1 v2.5.6 -- ofproto/ofproto-dpif-xlate.c | grep -C 2 xlate_normal_flood
+
+static void
+xlate_normal_flood(struct xlate_ctx *ctx, struct xbundle *in_xbundle,
+                   uint16_t vlan)
+{
--
+            } else {
+                xlate_report(ctx, "multicast traffic, flooding");
+                xlate_normal_flood(ctx, in_xbundle, vlan);
+            }
+            return;
--
+            } else {
+                xlate_report(ctx, "MLD query, flooding");
+                xlate_normal_flood(ctx, in_xbundle, vlan);
+            }
+        } else {
--
+                 * be forwarded on all ports */
+                xlate_report(ctx, "RFC4541: section 2.1.2, item 2, flooding");
+                xlate_normal_flood(ctx, in_xbundle, vlan);
+                return;
+            }
--
+            if (mcast_snooping_flood_unreg(ms)) {
+                xlate_report(ctx, "unregistered multicast, flooding");
+                xlate_normal_flood(ctx, in_xbundle, vlan);
+            } else {
+                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, vlan);
--
+        } else {
+            xlate_report(ctx, "no learned MAC for destination, flooding");
+            xlate_normal_flood(ctx, in_xbundle, vlan);
         }
-        ctx->xout->nf_output_iface = NF_OUT_FLOOD;

We can see that 2.5 indeed added some flooding related logic, especially this:

+            xlate_report(ctx, "no learned MAC for destination, flooding");
+            xlate_normal_flood(ctx, in_xbundle, vlan);

3.4 Fixup/Workaround

To continue with 2.5.6, we have to switch the bond to active-backup mode before we have better solutions:

$ ovs-vsctl set Port <bond> lacp=off bond-mode=active-backup

Change back to balance-slb if you need:

$ ovs-vsctl set Port <bond> bond_mode=balance-slb
$ ovs-vsctl set port <bond> other_config:bond-rebalance-interval=0

4 Summary

OVS bond assumes that all its slave devices (physical NICs) are connected to a single logical switch, that involves some vendor-specific stacking technologies, such as Cisco vPC. Unfortunately not all physical networks included that, such as our case.

On the other hand, 2.5.x seems to have broken 2.3.x behaviors, although 2.3.x should already behave like 2.5.x according to its documentation.

References

« Beneath the TCP Handshakes in Modern Networking Infrastructures OVS Unknown Unicast Flooding Under Distributed L2 Gateway »

ArthurChiao's Blog