If you are running Auto Deploy and noticed your VMs didn’t have connectivity after a host reboot or upgrade, you may have run into the problem described in VMware KB 52903. I’ve seen this a few times now with different customers and thought a PSA may be in order. You can find all the key details in the KB, but I thought I’d add some extra context here to help anyone who may want more information.
I recently helped to author VMware KB 52903, which has just been made public. Essentially, it describes a race condition causing a host to come up without its vdrPort connected to the distributed switch. The vdrPort is an important component on an ESXi host that funnels traffic to/from the NSX DLR module. If this port isn’t connected, traffic can’t make it to the DLR for east/west routing on that host. Technically, VMs in the same logical switches will be able to communicate across hosts, but none of the VMs on this impacted host will be able to route.
The race condition occurs when the DVS registration of the host occurs too late in the boot process. Normally, the distributed switch should be initialized and registered long before the vdrPort gets connected. In some situations, however, DVS registration can be late. Obviously, if the host isn’t yet initialized/registered with the distributed switch, any attempt to connect something to it will fail. And this is exactly what happens.
Using the log lines from KB 52903 as an example, we can see that the host attempts to add the vdrPort to the distributed switch at 23:44:19:
2018-02-08T23:44:19.431Z error netcpa[3FFEDA29700] [Originator@6876 sub=Default] Failed to add vdr port on dvs 96 ff 2c 50 0e 5d ed 4a-e0 15 b3 36 90 19 41 5d, Not found
The reason the operation fails is because the DVS switch with the UUID specified is not found from the perspective of this host. It simply hasn’t been initialized yet. A few moments later, the DVS is finally ready for use on the host. Notice the time stamps – you can see the registration of the DVS about 9 seconds later:
2018-02-08T23:44:28.389Z info hostd[4F540B70] [Originator@6876 sub=Hostsvc.DvsTracker] Registered Dvs 96 ff 2c 50 0e 5d ed 4a-e0 15 b3 36 90 19 41 5d
The above message can be found in /var/log/hostd.log.