Virtual Extensible LAN – or VXLAN – is the key overlay technology that makes a lot of what NSX does possible. It abstracts the underlying L2/L3 network and allows logical switches to span vast networks and datacenters. To achieve this, each ESXi hypervisor has one or more VTEP vmkernel ports bound the the host’s VXLAN network stack instance.
Your VTEPs are created during VXLAN preparation – normally after preparing your hosts with the NSX VIBs. Doing this in the UI is a straight forward process, but there are some important pre-requisites that must be fulfilled before VXLAN networking will work. Most important of these are:
- Your physical networking must be configured for an end-to-end MTU of 1600 bytes. In theory it’s 1550, but VMware usually recommends a minimum of 1600.
- You must ensure L2 and L3 connectivity between all VTEPs.
- You need to prepare for IP address assignment by either configuring DHCP scopes or IP pools.
- If your replication mode is hybrid, you’ll need to ensure IGMP snooping is configured on each VLAN used by VTEPs.
- Using full Multicast mode? You’ll need IGMP snooping in addition to PIM multicast routing.
This can sometimes be easier said than done – especially if you have hosts in multiple locations with numerous hops to traverse.
Testing VXLAN VTEP communication is a key troubleshooting skill that every NSX engineer should have in their toolbox. Without healthy VTEP communication and a properly configured underlay network, all bets are off.
I know this is a pretty well covered topic, but I wanted to dive into this a little bit deeper and provide more background around why we test the way we do, and how to draw conclusions from the results.
The VXLAN Network Stack
Multiple network stacks were first introduced in vSphere 6.0 for use with vMotion and other services. There are several benefits to isolating services based on network stacks, but the most practical is a completely independent routing table. This means you can have a different default gateway for vMotion – or in this case VXLAN traffic – than you would for all other management services.
Each vmkernel port that is created on an ESXi host must belong to one and only one network stack. When your cluster is VXLAN prepared, the created kernel ports are automatically assigned to the correct ‘vxlan’ network stack.
Using the esxcfg-vmknic -l command will list all kernel ports including their assigned network stack:
[root@esx-a1:~] esxcfg-vmknic -l Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS NetStack vmk0 7 IPv4 172.16.1.21 255.255.255.0 172.16.1.255 00:25:90:0b:1e:12 1500 65535 defaultTcpipStack vmk1 13 IPv4 172.16.98.21 255.255.255.0 172.16.98.255 00:50:56:65:59:a8 9000 65535 defaultTcpipStack vmk2 22 IPv4 172.16.11.21 255.255.255.0 172.16.11.255 00:50:56:63:d9:72 1500 65535 defaultTcpipStack vmk4 vmservice-vmknic-pg IPv4 169.254.1.1 255.255.255.0 169.254.1.255 00:50:56:61:7a:23 1500 65535 defaultTcpipStack vmk3 52 IPv4 172.16.76.22 255.255.255.0 172.16.76.255 00:50:56:6b:e4:94 1600 65535 vxlan
Notice that all kernel ports belong to the ‘defaultTcpipStack’ except for vmk3, which lists vxlan. You can view the netstacks currently enabled on your host using the esxcli network ip netstack list command:
[root@esx-a1:~] esxcli network ip netstack list defaultTcpipStack Key: defaultTcpipStack Name: defaultTcpipStack State: 4660 vxlan Key: vxlan Name: vxlan State: 4660