NSX-T Troubleshooting Scenario 1 – Solution

Welcome to the first installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, the installation of the NSX-T VIBs were failing with the following error:

nsxt-tshoot1a-5

At first glance, it looked as if the NSX-T VIBs, or an older version of them were already installed. Taking a closer look at the actual VIB names, however, was very telling. The ‘esx-nsxv’ in the name denotes that these belong to NSX for vSphere.

Logging in to host esx-a3 via SSH and checking for installed VIBs with ‘nsx’ in the name came back with the following:

[root@esx-a3:~] esxcli software vib list |grep nsx
esx-nsxv                       6.5.0-0.0.8590012                     VMware      VMwareCertified   2018-08-31

Indeed, the NSX-V VIBs are still installed. Having a look at the environment, we saw that all other traces of NSX-V were gone – the manager, controllers, vmkernel ports, portgroups and Web Client plugin were missing. Only these lingering VIBs were not removed from these three hosts for some reason. It’s important to properly remove NSX to prevent issues like this from occurring.

Removing the NSX-V VIBs

The first order of business was to put the host in maintenance mode. I didn’t have any running VMs created yet, so I just went ahead and put all three in maintenance mode:

nsxt-tshoot1b-2

Once that was done, I could remove the VIBs using the following esxcli software vib command:

Continue reading “NSX-T Troubleshooting Scenario 1 – Solution”

NSX-T Troubleshooting Scenario 1

Welcome to the first NSX-T troubleshooting scenario! My NSX-V troubleshooting scenarios have been well received, so I thought it was time to start a new series for NSX-T. If you’ve got an idea for a scenario, please let me know!

What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a brief problem statement:

“I removed NSX for vSphere from my lab environment and am trying to install NSX-T for a proof of concept. Unfortunately, I get an error message every time I try to install the NSX-T VIBs on my ESXi hosts! I’m running NSX-T 2.3.1, and ESXi 6.5 U2”

In the NSX-T UI, we’re greeted with a simple “NSX Install Failed” message for the host esx-a3:

nsxt-tshoot1a

Clicking on this error gives us a much more verbose error message:

nsxt-tshoot1a-5

The full text of the error message is as follows:

NSX components not installed successfully on compute-manager discovered node. Failed to install software on host. Failed to install software on host. esx-a3.vswitchzero.net : java.rmi.RemoteException: [DependencyError] File path of '/bin/net-vdl2' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/bin/vsip_vm_list.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/etc/vmware/firewall/netCPRuleset.xml' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/bin/vsipioctl' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/usr/lib/vmware/vm-support/bin/dump-vdr-info.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/bin/net-vdr' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/etc/vmsyslog.conf.d/dfwpktlogs.conf' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/init.d/netcpad' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/usr/lib/vmware/netcpa/bin/netcpa' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/bin/dfwpktlogs.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/vmware/firewall/bfdRuleset.xml' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/vmware/vm-support/dfw.mfx' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} Please refer to the log file for more details.

Clicking on the RESOLVE button simply tries the install again, which fails.

Continue reading “NSX-T Troubleshooting Scenario 1”

Manual Installation of NSX-T Kernel Modules in ESXi

Last week, I discussed the manual deployment of NSX-T controller nodes. Today, I’ll take a look at adding standalone ESXi hosts.

Although people usually associate manual deployment with KVM hypervisors, there is no reason you can’t do the same with ESXi hosts. Obviously, automating this process with vCenter Server as a compute manager has its advantages, but one of the empowering features of NSX-T is that is has no dependency on vCenter Server whatsoever.

Obtaining the ESXi VIBs

First, we’ll need to download the ESXi host VIBs. In my case, the hosts are running ESXi 6.5 U2, so I downloaded the correct 6.5 VIBs from the NSX-T download site.

nsxt-manualvib-1

Once I had obtained the ZIP file, I used WinSCP to copy it to the /tmp location on my ESXi host. The file is only a few megabytes in size so it can go just about anywhere. If you’ve got a lot of hosts to do, putting it in a shared datastore makes sense.

Installing the ESXi VIBs

Because the NSX-T kernel module is comprised of a number of VIBs, we need to install it as an ‘offline depot’ as opposed to individual VIB files. That said, there is no need to extract the ZIP file. To install it, I used the esxcli software vib install command as shown below:

[root@esx-a3:/tmp] esxcli software vib install --depot=/tmp/nsx-lcp-2.3.1.0.0.11294289-esx65.zip
Installation Result
   Message: Operation finished successfully.
   Reboot Required: false
   VIBs Installed: VMware_bootbank_epsec-mux_6.5.0esx65-9272189, VMware_bootbank_nsx-aggservice_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-cli-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-common-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-da_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337, VMware_bootbank_nsx-exporter_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-host_2.3.1.0.0-6.5.11294289, VMware_bootbank_nsx-metrics-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-mpa_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-nestdb-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-nestdb_2.3.1.0.0-6.5.11294421, VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485, VMware_bootbank_nsx-opsagent_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-platform-client_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-profiling-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-proxy_2.3.1.0.0-6.5.11294520, VMware_bootbank_nsx-python-gevent_1.1.0-9273114, VMware_bootbank_nsx-python-greenlet_0.4.9-9272996, VMware_bootbank_nsx-python-logging_2.3.1.0.0-6.5.11294409, VMware_bootbank_nsx-python-protobuf_2.6.1-9273048, VMware_bootbank_nsx-rpc-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-sfhc_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-shared-libs_2.3.0.0.0-6.5.10474844, VMware_bootbank_nsxcli_2.3.1.0.0-6.5.11294343
   VIBs Removed:
   VIBs Skipped:

Remember, your host will need to be in maintenance mode for the installation to succeed. Once finished, a total of 24 new VIBs were installed as shown:

[root@esx-a3:/tmp] esxcli software vib list |grep -i nsx
nsx-aggservice                 2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-cli-libs                   2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-common-libs                2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-da                         2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-esx-datapath               2.3.1.0.0-6.5.11294337                VMware      VMwareCertified   2019-02-15
nsx-exporter                   2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-host                       2.3.1.0.0-6.5.11294289                VMware      VMwareCertified   2019-02-15
nsx-metrics-libs               2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-mpa                        2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-nestdb-libs                2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-nestdb                     2.3.1.0.0-6.5.11294421                VMware      VMwareCertified   2019-02-15
nsx-netcpa                     2.3.1.0.0-6.5.11294485                VMware      VMwareCertified   2019-02-15
nsx-opsagent                   2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-platform-client            2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-profiling-libs             2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-proxy                      2.3.1.0.0-6.5.11294520                VMware      VMwareCertified   2019-02-15
nsx-python-gevent              1.1.0-9273114                         VMware      VMwareCertified   2019-02-15
nsx-python-greenlet            0.4.9-9272996                         VMware      VMwareCertified   2019-02-15
nsx-python-logging             2.3.1.0.0-6.5.11294409                VMware      VMwareCertified   2019-02-15
nsx-python-protobuf            2.6.1-9273048                         VMware      VMwareCertified   2019-02-15
nsx-rpc-libs                   2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-sfhc                       2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-shared-libs                2.3.0.0.0-6.5.10474844                VMware      VMwareCertified   2019-02-15
nsxcli                         2.3.1.0.0-6.5.11294343                VMware      VMwareCertified   2019-02-15

You can find information on the purpose of some of these VIBs in the NSX-T documentation.

Connecting the ESXi Host to the Management Plane

Now that we have the required software installed, we need to connect the ESXi host to NSX Manager. To begin, we’ll need to get the certificate thumbprint from the NSX Manager:

nsxmanager> get certificate api thumbprint
ccdbda93573cd1dbec386b620db52d5275c4a76a5120087a174d00d4508c1493

Next, we need to drop into the nsxcli shell from the ESXi CLI prompt, and then run the join management-plane command as shown below:

[root@esx-a3] # nsxcli
esx-a3> join management-plane 172.16.1.40 username admin thumbprint ccdbda93573cd1dbec386b620db52d5275c4a76a5120087a174d00d4508c1493
Password for API user: ********
Node successfully registered as Fabric Node: 0b08c694-3155-11e9-8a6c-0f1235732823

If all went well, we should now see our NSX Manager listed as connected:

esx-a3> get managers
- 172.16.1.40      Connected

From the root prompt of the ESXi host, we can see that there are now established TCP connections to the NSX Manager appliance on the RabbitMQ port 5671.

[root@esx-a3:/tmp] esxcli network ip connection list |grep 5671
tcp         0       0  172.16.1.23:55477   172.16.1.40:5671    ESTABLISHED     84232  newreno  mpa
tcp         0       0  172.16.1.23:36956   172.16.1.40:5671    ESTABLISHED     84232  newreno  mpa

From the NSX UI, we can now see the host appear as connected under ‘Standalone Hosts’:

nsxt-manualvib-3

As a next step, you’ll want to add this new host as a transport node and you should be good to go.

It’s great to have the flexibility to do this completely without the assistance of vCenter Server. Anyone who has had to deal with the quirks of VC integration and ESX Agent Manager (EAM) in NSX-V will certainly appreciate this.

 

NSX-T PCPU Requirements for Edges

New CPU requirements for NSX-T may leave older lab hardware out in the cold.

If you are running old hardware in your lab, you may have come across an unexpected failure while deploying your first NSX-T edge VM.

nsxt-aes-edge-1

The exact error message will be something similar to:

“[Fabric] Edge <uuid> is not ready for configuration error occurred, error detail is NSX Edge configuration has failed. The host does not support required cpu features: [‘aes’].”

The edge will be successfully deployed, but will remain ‘unconfigured’ and will not allow you to add it as a transport node.

The ‘aes’ feature being referred to is Intel’s AES-NI acceleration for cryptography. You can find out more about AES-NI here. In NSX-V, AES-NI was optionally supported for offloading cryptography for VPN related features. It seems that this has now become a hard requirement for NSX-T.

Unfortunately, like vSphere 6.7, NSX-T has minimum CPU requirements that can’t be worked around. If you have a browse through the NSX-T system requirements, you’ll find a note about CPU compatibility in the “NSX Edge VM and Bare-Metal NSX Edge CPU Requirements” section. Listed there is reference to:

  • Xeon 56xx (Westmere-EP)
  • Xeon E7-xxxx (Westmere-EX and later CPU generation)
  • Xeon E5-xxxx (Sandy Bridge and later CPU generation)

This means that anything released prior to 2011 is unlikely to work, with the exception of a few Westermere EP based Xeons, which seem to have spotty success. On the AMD front, it appears that even CPUs with AES instructions will fail similarly due to a CPU compatibility check that is done during edge deployment.

My management host uses Xeon E5-2670s, which work fine, but my compute cluster uses very old Xeon X3440s that came out before AES-NI was introduced. Now that I can’t run vSphere 6.7 or an NSX-T edge on these hosts, I think it may finally be time to upgrade.

Unfortunately, it doesn’t appear that there is a workaround for this problem. If anyone does come across a way to avoid this, please let me know!

Deploying NSX-T Controllers Manually

Deploying an NSX-T control cluster manually for maximum control and flexibility.

One of the great things about NSX-T is its complete independence from vCenter Server. You can still link to vCenter Server if you’d like to automate certain tasks, but unlike NSX-V, you can accomplish many deployment tasks manually. One of the firsts things you’ll be doing in a new NSX-T setup is to deploy your control cluster.

Although automated deployment through vCenter and the UI is convenient, there are some additional benefits to manual controller deployment. Firstly, you can select a non-production ‘small’ sized form factor that isn’t selectable in the UI saving you a couple of vCPUs and about 8GB of RAM per appliance. Secondly, deploying manually also allows you to thin-provision your controller VMDKs off the bat. In a home lab, these are some desirable benefits. And of course, there is always the satisfaction you get from running through the process manually and better understanding what happens behind the scenes.

NSXT-controllerdeploy-2

As seen above, the automated controller deployment wizard does not allow the selection of a ‘Small’ form factor.

Deploying Controllers

To begin, you’ll need to download the NSX-T controller OVA. You’ll find it listed along with the other NSX-T deliverables on the download page.

NSXT-controllerdeploy-1

There are a few different ways that you can deploy the OVA including with ovftool. I’m just going to use the vSphere Client for this example. As you can see below, we can now select an unsupported ‘Small’ form-factor deployment:

NSXT-controllerdeploy-3

In addition to this, you’ll get the usual template customization options along with a few new ones you may not have seen listed under ‘Internal Properties’:

NSXT-controllerdeploy-4

As you probably have guessed these internal properties can be used to save some of the work needed to get it connected to the management plane and to the control cluster. I’m going to skip this entire section and run through the process manually from the CLI post-deployment.

Continue reading “Deploying NSX-T Controllers Manually”