ARP – vswitchzero

One of the best features of the DFW is the flexibility it provides in using objects in rules instead of IP addresses or groups of IP addresses. For example, for a source/destination you could use a VM in the inventory, a cluster or a security group containing all sorts of dynamic criteria. Underneath all of this, however, NSX needs to be able to inspect segment and packet headers to enforce the rules. These headers are only going to contain identifying information like IP addresses and TCP ports so it must keep track of which object is associated with which IP address or addresses. And because of the ‘distributed’ nature of the DFW, each of these translations must ultimately reach the ESXi hosts for enforcement.

There are three ways in which NSX can associate IPs with VMs – VMware Tools reporting, ARP snooping and DHCP snooping. The latter two are disabled by default.

ipdiscovery-1

In recent builds of NSX, you can see the detection types enabled in the host preparation section. As can be seen above, DHCP and ARP snooping are disabled by default leaving only VMware Tools address reporting.

VMware Tools Reporting

As you have probably noticed, VMs with VMware Tools installed conveniently report their configured IP addresses in the vSphere Client.

tshoot12a-4

Virtual machine linux-a2 is reporting 172.16.15.10 as well as an IPv6 address on the summary tab in the vSphere Client. This information comes from VMware Tools and will be recorded in the NSX Manager database. Whenever we use a rule that references the VM linux-a2, NSX will look up this IP address for rule enforcement. These rules could contain a parent object, like the cluster compute-a, or a security group, a logical switch – anything that linux-a2 belongs to.

Continue reading “Understanding NSX IP Discovery”

ARP suppression is one of the key fundamental features in NSX that helps to make the product scalable. By intercepting ARP requests from VMs before they are broadcast out on a logical switch, the hypervisor can do a simple ARP lookup in its own cache or on the NSX control cluster. If an ARP entry exists on the host or control cluster, the hypervisor can respond directly, avoiding a costly broadcast that would likely need to be replicated to many hosts.

ARP Suppression has existed in NSX since the beginning, but it was only available for VMs connected to logical switches. Up until NSX 6.2.4, the DLR kernel module did not benefit from ARP suppression and every non-cached entry needed to be broadcast out. Unfortunately, the DLR – like most routers – needs to ARP frequently. This can be especially true due to the easy L3 separation that NSX allows using logical switches and efficient east-west DLR routing.

Despite having code in the 6.2.4 and later version DLRs to take advantage of ARP suppression, a large number of deployments are likely not actually taking advantage of this feature due to a recently identified problem.

VMware KB 51709 briefly describes this issue, and makes note of the following conditions:

“DLR ARP Suppression may not be effective under some conditions which can result in a larger volume of ARP traffic than expected. ARP traffic sent by a DLR will not be suppressed if an ESXi host has more than one active port connected to the destination VNI, for example the DLR port and one or more VM vNICs.”

What isn’t clear in the KB article, but can be inferred based on the solution is that the problem is related to VLAN tagging on logical switch dvPortgroups. Any dvPortgroup associated with a logical switch with a VLAN ID specified is impacted by this problem.

Continue reading “The NSX DLR and ARP Suppression”