NSX-T Troubleshooting Scenario 1 – Solution

Welcome to the first installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, the installation of the NSX-T VIBs were failing with the following error:

nsxt-tshoot1a-5

At first glance, it looked as if the NSX-T VIBs, or an older version of them were already installed. Taking a closer look at the actual VIB names, however, was very telling. The ‘esx-nsxv’ in the name denotes that these belong to NSX for vSphere.

Logging in to host esx-a3 via SSH and checking for installed VIBs with ‘nsx’ in the name came back with the following:

[root@esx-a3:~] esxcli software vib list |grep nsx
esx-nsxv                       6.5.0-0.0.8590012                     VMware      VMwareCertified   2018-08-31

Indeed, the NSX-V VIBs are still installed. Having a look at the environment, we saw that all other traces of NSX-V were gone – the manager, controllers, vmkernel ports, portgroups and Web Client plugin were missing. Only these lingering VIBs were not removed from these three hosts for some reason. It’s important to properly remove NSX to prevent issues like this from occurring.

Removing the NSX-V VIBs

The first order of business was to put the host in maintenance mode. I didn’t have any running VMs created yet, so I just went ahead and put all three in maintenance mode:

nsxt-tshoot1b-2

Once that was done, I could remove the VIBs using the following esxcli software vib command:

[root@esx-a3:~] esxcli software vib remove -n esx-nsxv
Removal Result
   Message: Operation finished successfully.
   Reboot Required: false
   VIBs Installed:
   VIBs Removed: VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012
   VIBs Skipped:

Older versions of NSX-V (6.2.x) would require a reboot after VIB removal, but this 6.4.2 deployment didn’t require it. Once that was done, I clicked the ‘Resolve’ button to retry the installation.

nsxt-tshoot1b-3

This time it succeeded.

nsxt-tshoot1b

One other observation was that ESX Agent Manager (EAM) still had an agency existing for compute-a with a ‘goal state’ of Uninstalled. This should have been removed along with NSX-V if it was removed properly, so either the VIBs couldn’t be removed for some reason at the time of removal or there was something wrong with EAM. I went ahead and removed this agency as NSX-V was completely gone. NSX-T does not use ESX Agent Manager for host VIB installation.

Reader Feedback

Quite a few readers picked up on the problem after I posted the first half. Here are a few:

Conclusion

Sometimes there are subtle hints in the error messages that can get you back on track. In this situation, the incomplete removal of NSX-V was to blame. NSX-V and NSX-T cannot coexist on the same ESXi hosts/clusters. Please refer to my guide on properly removing NSX-V here.

I hope this scenario was helpful. If you have any questions or have suggestions for future scenarios, please feel free to leave a comment below or reach out to me on Twitter (@vswitchzero)

Leave a comment