New Upgrade Issue in NSX 6.4.4

Be sure to check out VMware KB 67416 before upgrading to 6.4.4.

If you are planning to upgrade to NSX 6.4.4, be sure to have a look at VMware KB 67416 before you do. I’ve seen several customers hit this issue now, and a bit of pre-work before the upgrade can save you a lot of grief.

It appears that if you are using grouping objects, like security groups or IP sets in your ESG firewall rules, there is a chance that your ESG will become unmanageable after NSX Manager gets upgraded to 6.4.4. Most customers will notice this issue when they go to upgrade their ESGs as part of the upgrade process and the tasks fail. In addition to not being able to upgrade the edge, all configuration changes you attempt to make will also fail.

This issue lies in the message bus communication channel between NSX Manager and the ESG. These security groups and IP sets trigger a large number of messages and eventually the channel becomes blocked as a result. Unfortunately, there is no workaround aside from removing these groups and IP sets from the firewall before upgrading. This may not be a feasible workaround for the majority of customers out there.

Although not a common configuration, this issue can also be triggered if DFW rules are applied to ESGs and these rules contain grouping objects.

If you know your environment is configured with security groups and IP sets in the edge firewall, I’d recommend reaching out to VMware technical support prior to beginning your upgrade. Support can proactively install a “hot patch” so that you won’t hit this problem. If you have already hit this, the same hot patch can be applied to get you back up and running. In order for the patch to work, the ESG would have to be re-deployed leading to a brief outage. Obviously getting in front of this issue is a better plan than being reactive.

VMware will be updating the 6.4.4 release notes to reflect this.

NSX-T Troubleshooting Scenario 1 – Solution

Welcome to the first installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, the installation of the NSX-T VIBs were failing with the following error:

nsxt-tshoot1a-5

At first glance, it looked as if the NSX-T VIBs, or an older version of them were already installed. Taking a closer look at the actual VIB names, however, was very telling. The ‘esx-nsxv’ in the name denotes that these belong to NSX for vSphere.

Logging in to host esx-a3 via SSH and checking for installed VIBs with ‘nsx’ in the name came back with the following:

[root@esx-a3:~] esxcli software vib list |grep nsx
esx-nsxv                       6.5.0-0.0.8590012                     VMware      VMwareCertified   2018-08-31

Indeed, the NSX-V VIBs are still installed. Having a look at the environment, we saw that all other traces of NSX-V were gone – the manager, controllers, vmkernel ports, portgroups and Web Client plugin were missing. Only these lingering VIBs were not removed from these three hosts for some reason. It’s important to properly remove NSX to prevent issues like this from occurring.

Removing the NSX-V VIBs

The first order of business was to put the host in maintenance mode. I didn’t have any running VMs created yet, so I just went ahead and put all three in maintenance mode:

nsxt-tshoot1b-2

Once that was done, I could remove the VIBs using the following esxcli software vib command:

Continue reading “NSX-T Troubleshooting Scenario 1 – Solution”

Manual Installation of NSX-T Kernel Modules in ESXi

Last week, I discussed the manual deployment of NSX-T controller nodes. Today, I’ll take a look at adding standalone ESXi hosts.

Although people usually associate manual deployment with KVM hypervisors, there is no reason you can’t do the same with ESXi hosts. Obviously, automating this process with vCenter Server as a compute manager has its advantages, but one of the empowering features of NSX-T is that is has no dependency on vCenter Server whatsoever.

Obtaining the ESXi VIBs

First, we’ll need to download the ESXi host VIBs. In my case, the hosts are running ESXi 6.5 U2, so I downloaded the correct 6.5 VIBs from the NSX-T download site.

nsxt-manualvib-1

Once I had obtained the ZIP file, I used WinSCP to copy it to the /tmp location on my ESXi host. The file is only a few megabytes in size so it can go just about anywhere. If you’ve got a lot of hosts to do, putting it in a shared datastore makes sense.

Installing the ESXi VIBs

Because the NSX-T kernel module is comprised of a number of VIBs, we need to install it as an ‘offline depot’ as opposed to individual VIB files. That said, there is no need to extract the ZIP file. To install it, I used the esxcli software vib install command as shown below:

[root@esx-a3:/tmp] esxcli software vib install --depot=/tmp/nsx-lcp-2.3.1.0.0.11294289-esx65.zip
Installation Result
   Message: Operation finished successfully.
   Reboot Required: false
   VIBs Installed: VMware_bootbank_epsec-mux_6.5.0esx65-9272189, VMware_bootbank_nsx-aggservice_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-cli-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-common-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-da_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337, VMware_bootbank_nsx-exporter_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-host_2.3.1.0.0-6.5.11294289, VMware_bootbank_nsx-metrics-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-mpa_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-nestdb-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-nestdb_2.3.1.0.0-6.5.11294421, VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485, VMware_bootbank_nsx-opsagent_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-platform-client_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-profiling-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-proxy_2.3.1.0.0-6.5.11294520, VMware_bootbank_nsx-python-gevent_1.1.0-9273114, VMware_bootbank_nsx-python-greenlet_0.4.9-9272996, VMware_bootbank_nsx-python-logging_2.3.1.0.0-6.5.11294409, VMware_bootbank_nsx-python-protobuf_2.6.1-9273048, VMware_bootbank_nsx-rpc-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-sfhc_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-shared-libs_2.3.0.0.0-6.5.10474844, VMware_bootbank_nsxcli_2.3.1.0.0-6.5.11294343
   VIBs Removed:
   VIBs Skipped:

Remember, your host will need to be in maintenance mode for the installation to succeed. Once finished, a total of 24 new VIBs were installed as shown:

[root@esx-a3:/tmp] esxcli software vib list |grep -i nsx
nsx-aggservice                 2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-cli-libs                   2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-common-libs                2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-da                         2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-esx-datapath               2.3.1.0.0-6.5.11294337                VMware      VMwareCertified   2019-02-15
nsx-exporter                   2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-host                       2.3.1.0.0-6.5.11294289                VMware      VMwareCertified   2019-02-15
nsx-metrics-libs               2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-mpa                        2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-nestdb-libs                2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-nestdb                     2.3.1.0.0-6.5.11294421                VMware      VMwareCertified   2019-02-15
nsx-netcpa                     2.3.1.0.0-6.5.11294485                VMware      VMwareCertified   2019-02-15
nsx-opsagent                   2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-platform-client            2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-profiling-libs             2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-proxy                      2.3.1.0.0-6.5.11294520                VMware      VMwareCertified   2019-02-15
nsx-python-gevent              1.1.0-9273114                         VMware      VMwareCertified   2019-02-15
nsx-python-greenlet            0.4.9-9272996                         VMware      VMwareCertified   2019-02-15
nsx-python-logging             2.3.1.0.0-6.5.11294409                VMware      VMwareCertified   2019-02-15
nsx-python-protobuf            2.6.1-9273048                         VMware      VMwareCertified   2019-02-15
nsx-rpc-libs                   2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-sfhc                       2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-shared-libs                2.3.0.0.0-6.5.10474844                VMware      VMwareCertified   2019-02-15
nsxcli                         2.3.1.0.0-6.5.11294343                VMware      VMwareCertified   2019-02-15

You can find information on the purpose of some of these VIBs in the NSX-T documentation.

Connecting the ESXi Host to the Management Plane

Now that we have the required software installed, we need to connect the ESXi host to NSX Manager. To begin, we’ll need to get the certificate thumbprint from the NSX Manager:

nsxmanager> get certificate api thumbprint
ccdbda93573cd1dbec386b620db52d5275c4a76a5120087a174d00d4508c1493

Next, we need to drop into the nsxcli shell from the ESXi CLI prompt, and then run the join management-plane command as shown below:

[root@esx-a3] # nsxcli
esx-a3> join management-plane 172.16.1.40 username admin thumbprint ccdbda93573cd1dbec386b620db52d5275c4a76a5120087a174d00d4508c1493
Password for API user: ********
Node successfully registered as Fabric Node: 0b08c694-3155-11e9-8a6c-0f1235732823

If all went well, we should now see our NSX Manager listed as connected:

esx-a3> get managers
- 172.16.1.40      Connected

From the root prompt of the ESXi host, we can see that there are now established TCP connections to the NSX Manager appliance on the RabbitMQ port 5671.

[root@esx-a3:/tmp] esxcli network ip connection list |grep 5671
tcp         0       0  172.16.1.23:55477   172.16.1.40:5671    ESTABLISHED     84232  newreno  mpa
tcp         0       0  172.16.1.23:36956   172.16.1.40:5671    ESTABLISHED     84232  newreno  mpa

From the NSX UI, we can now see the host appear as connected under ‘Standalone Hosts’:

nsxt-manualvib-3

As a next step, you’ll want to add this new host as a transport node and you should be good to go.

It’s great to have the flexibility to do this completely without the assistance of vCenter Server. Anyone who has had to deal with the quirks of VC integration and ESX Agent Manager (EAM) in NSX-V will certainly appreciate this.

 

NSX-T PCPU Requirements for Edges

New CPU requirements for NSX-T may leave older lab hardware out in the cold.

If you are running old hardware in your lab, you may have come across an unexpected failure while deploying your first NSX-T edge VM.

nsxt-aes-edge-1

The exact error message will be something similar to:

“[Fabric] Edge <uuid> is not ready for configuration error occurred, error detail is NSX Edge configuration has failed. The host does not support required cpu features: [‘aes’].”

The edge will be successfully deployed, but will remain ‘unconfigured’ and will not allow you to add it as a transport node.

The ‘aes’ feature being referred to is Intel’s AES-NI acceleration for cryptography. You can find out more about AES-NI here. In NSX-V, AES-NI was optionally supported for offloading cryptography for VPN related features. It seems that this has now become a hard requirement for NSX-T.

Unfortunately, like vSphere 6.7, NSX-T has minimum CPU requirements that can’t be worked around. If you have a browse through the NSX-T system requirements, you’ll find a note about CPU compatibility in the “NSX Edge VM and Bare-Metal NSX Edge CPU Requirements” section. Listed there is reference to:

  • Xeon 56xx (Westmere-EP)
  • Xeon E7-xxxx (Westmere-EX and later CPU generation)
  • Xeon E5-xxxx (Sandy Bridge and later CPU generation)

This means that anything released prior to 2011 is unlikely to work, with the exception of a few Westermere EP based Xeons, which seem to have spotty success. On the AMD front, it appears that even CPUs with AES instructions will fail similarly due to a CPU compatibility check that is done during edge deployment.

Update: Commenter Ben Kenobi figured out a workaround to get edges to deploy on modern AMD platforms! You can find his workaround discussed below in the comments as well as on his blog here.

My management host uses Xeon E5-2670s, which work fine, but my compute cluster uses very old Xeon X3440s that came out before AES-NI was introduced. Now that I can’t run vSphere 6.7 or an NSX-T edge on these hosts, I think it may finally be time to upgrade.

Unfortunately, it doesn’t appear that there is a workaround for this problem. If anyone does come across a way to avoid this, please let me know!

Deploying NSX-T Controllers Manually

Deploying an NSX-T control cluster manually for maximum control and flexibility.

One of the great things about NSX-T is its complete independence from vCenter Server. You can still link to vCenter Server if you’d like to automate certain tasks, but unlike NSX-V, you can accomplish many deployment tasks manually. One of the firsts things you’ll be doing in a new NSX-T setup is to deploy your control cluster.

Although automated deployment through vCenter and the UI is convenient, there are some additional benefits to manual controller deployment. Firstly, you can select a non-production ‘small’ sized form factor that isn’t selectable in the UI saving you a couple of vCPUs and about 8GB of RAM per appliance. Secondly, deploying manually also allows you to thin-provision your controller VMDKs off the bat. In a home lab, these are some desirable benefits. And of course, there is always the satisfaction you get from running through the process manually and better understanding what happens behind the scenes.

NSXT-controllerdeploy-2

As seen above, the automated controller deployment wizard does not allow the selection of a ‘Small’ form factor.

Deploying Controllers

To begin, you’ll need to download the NSX-T controller OVA. You’ll find it listed along with the other NSX-T deliverables on the download page.

NSXT-controllerdeploy-1

There are a few different ways that you can deploy the OVA including with ovftool. I’m just going to use the vSphere Client for this example. As you can see below, we can now select an unsupported ‘Small’ form-factor deployment:

NSXT-controllerdeploy-3

In addition to this, you’ll get the usual template customization options along with a few new ones you may not have seen listed under ‘Internal Properties’:

NSXT-controllerdeploy-4

As you probably have guessed these internal properties can be used to save some of the work needed to get it connected to the management plane and to the control cluster. I’m going to skip this entire section and run through the process manually from the CLI post-deployment.

Continue reading “Deploying NSX-T Controllers Manually”

Cisco nenic Driver Issue During NSX Upgrades

The nenic driver versions prior to 1.0.11.0 may cause an outage during NSX upgrades.

If you are planning an NSX upgrade in a Cisco UCS environment, pay close attention to your ‘nenic’ driver version before you begin. The nenic driver is the new native driver replacement for the older vmklinux enic driver. It’s used exclusively for the Cisco VIC adapters found in UCS systems and is now the default in vSphere 6.5 and 6.7.

We’ve seen several instances now where Cisco VIC adapters can go link-down in an error state during NSX VIB upgrades. It doesn’t appear to matter what version of NSX is being upgraded from/to, but the common denominator is an older nenic driver version. This seems to be reproducible with nenic driver version 1.0.0.2 and possibly others. Version 1.0.11.0 and later appear to correct this problem. At the time of writing, 1.0.26.0 is the latest version available.

You can obtain your current nenic driver and firmware version using the following command:

# esxcli network nic get -n vmnicX

Before you upgrade your drivers, be sure to reach out to Cisco to ensure your firmware is also at the recommended release version. Quite often vendors have a recommended driver/firmware combination for maximum stability and performance.

I expect a KB article and an update to the NSX release notes to be made public soon but wanted to ensure this information got out there as soon as possible.

NSX 6.4.3 Now Available!

Express maintenance release fixes two discovered issues.

If it feels like 6.4.2 was just released, you’d be correct – only three weeks ago. The new 6.4.3 release (build 9927516) is what’s referred to as an express maintenance release. These releases aim to correct specific customer identified problems as quickly as possible rather than having to wait many months for the next full patch release.

In this release, only two identified bugs have been fixed. The first is an SSO issue that can occur in environments with multiple PSCs:

“Fixed Issue 2186945: NSX Data Center for vSphere 6.4.2 will result in loss of SSO functionality under specific conditions. NSX Data Center for vSphere cannot connect to SSO in an environment with multiple PSCs or STS certificates after installing or upgrading to NSX Data Center for vSphere 6.4.2.”

The second is an issue with IPsets that can impact third party security products – like Palo Alto Networks and Checkpoint Net-X services for example:

“Issue 2186968: Static IPset not reported to containerset API call. If you have service appliances, NSX might omit IP sets in communicating with Partner Service Managers. This can lead to partner firewalls allowing or denying connections incorrectly. Fixed in 6.4.3.”

You can find more information on these problems in VMware KB 57770 and KB 57834.

So knowing that these are the only two fixes included, the question obviously becomes – do I really need to upgrade?

If you are running 6.4.2 today, you might not need to. If you have more than one PSC associated with the vCenter Server that NSX manager connects to, or if you use third party firewall products that work in conjunction with NSX, the answer would be yes. If you don’t, there is really no benefit to upgrading to 6.3.4 and it would be best to save your efforts for the next major release.

That said, if you were already planning an upgrade to 6.4.2, it only makes sense to go to 6.4.3 instead. You’d get all the benefits of 6.4.2 plus these two additional fixes.

Kudos goes out to the VMware NSBU engineering team for their quick work in getting these issues fixed and getting 6.4.3 out so quickly.

Relevant Links:

 

Manual Upgrade of NSX Host VIBs

Complete manual control of the NSX host VIB upgrade process without the use of vSphere DRS.

NSX host upgrades are well automated these days. By taking advantage of ‘fully automated’ DRS, hosts in a cluster can be evacuated, put in maintenance mode, upgraded, and even rebooted without any user intervention. By relying on DRS for resource scheduling, NSX doesn’t have to worry about doing too many hosts simultaneously and the process can generally be done without end-users even noticing.

But what if you don’t want this level of automation? Maybe you’ve got very sensitive VMs that can’t be migrated, or VMs pinned to hosts for some reason. Or maybe you just want maximum control of the upgrade process and which hosts are upgraded – and when.

There is no reason why you can’t have full control of the host upgrade process and leave DRS in manual mode. This is indeed supported.

Most of the documentation and guides out there assume that people will want to take advantage of DRS-driven upgrades, but this doesn’t mean it’s the only supported method. There is no reason why you can’t have full control of the host upgrade process and this is indeed supported. Today I’ll be walking through this in my lab as I upgrade to NSX 6.4.1.

Step 1 – Clicking the Upgrade Link

Once you’ve upgraded your NSX manager and control cluster, you should be ready to begin tackling your ESXi host clusters. Before you proceed, you’ll need to ensure your host clusters have DRS set to ‘Manual’ mode. Don’t disable DRS – that will get rid of your resource pools. Manual mode is sufficient.

Next, you’ll need to browse to the usual ‘Installation’ section in the UI, and click on the ‘Host Preparation’ tab. From here, it’s now safe to click the ‘Upgrade Available’ link on the cluster to begin the upgrade process. Because DRS is in manual mode, nothing will be able happen. Hosts can’t be evacuated, and as a result, VIBs can’t be upgraded. In essence, the upgrade has started, but immediately stalls and awaits manual intervention.

 

upgnodrs-3
This upgrade is essentially hung up waiting for hosts to enter maintenance mode.

 

In 6.4.1 as shown above, a clear banner message is displayed reminding you that DRS is in manual mode and that hosts must be manually put in maintenance mode.

Continue reading “Manual Upgrade of NSX Host VIBs”

Using the Upgrade Coordinator in NSX 6.4

If you’ve ever gone through an NSX upgrade, you know how many components there are to upgrade. You’ve got your NSX manager appliances, control cluster, ESXi host VIBs, edges, DLR and even guest introspection appliances. In the past, every one of these needed to be upgraded independently and in the correct order.

VMware hopes to make this process a lot more straight forward with the release of the new ‘Upgrade Coordinator’ feature. This is now included as of 6.4.0 in the HTML5 client.
The aim of the upgrade coordinator is to create an upgrade plan or checklist and then to execute this in the correct order. There are many aspects of the upgrade plan than can be customized but for those looking for maximum automation – a single click upgrade option exists as well.

It is important to note that although the upgrade coordinator helps to take some of the guess work out of upgrading, there are still tasks and planning you’ll want to do ahead of time. If you haven’t already, please read my Ten Tips for a Successful NSX Upgrade post.

Today I’ll be using the upgrade coordinator to go from 6.3.3 to 6.4.0 and walk you through the process.

Upgrading NSX Manager

Although the upgrade coordinator plan covers numerous NSX components, NSX manager is not one of them. You’ll still need to use the good old manager UI upgrade process as described on page 36 of the NSX 6.4 upgrade guide. Thankfully, this is the easiest part of the upgrade.

You’ll also notice that I can use the upgrade coordinator for my lab upgrade even though I’m at a 6.3.x release currently. This is because the NSX manager is upgraded first, adding this management plane functionality to be used for the rest of the upgrade.

Note: If you are using a Cross-vCenter deployment of NSX, be sure to upgrade your primary, followed by all secondary managers before proceeding with the rest of the upgrade.

upgco-1

Upgrading NSX Manager to 6.4.x should look very familiar as the process really hasn’t changed. Be sure to heed the warning banner about taking a backup before proceeding. For more info on this, please see my Ten Tips for a Successful NSX Upgrade post.

Continue reading “Using the Upgrade Coordinator in NSX 6.4”

NSX 6.4.0 Upgrade Compatibility

Thinking about upgrading to NSX 6.4.0? As I discussed in my recent Ten Tips for a Successful NSX Upgrade post, it’s always a good idea to do your research before upgrading. Along with reading the release notes, checking the VMware compatibility Matrix is essential.

VMware just updated some of the compatibility matrices to include information about 6.4.0. Here are the relevant Links:

From an NSX upgrade path perspective, you’ll be happy to learn that any current build of NSX 6.2.x or 6.3.x should be fine. At the time of writing, this would be 6.2.9 and earlier as well as 6.3.5 and earlier.

640upg-0
NSX upgrade compatibility – screenshot from 1/17/2018.

On a positive note, VMware required a workaround to be done for some older 6.2.x builds to go to 6.3.5, but this is no longer required for 6.4.0. The underling issue that required this has been resolved.

From a vCenter and ESXi 6.0 and 6.5 perspective, the requirements for NSX 6.4.0 remain largely unchanged from late 6.3.x releases. What you’ll immediately notice is that NSX 6.4.0 is not supported with vSphere 5.5. If you are running vSphere 5.5, you’ll need to get to at least 6.0 U2 before considering NSX 6.4.0.

From the NSX 6.4.0 release notes:

Supported: 6.0 Update 2, 6.0 Update 3
Recommended: 6.0 Update 3. vSphere 6.0 Update 3 resolves the issue of duplicate VTEPs in ESXi hosts after rebooting vCenter server. See VMware Knowledge Base article 2144605 for more information.

Supported: 6.5a, 6.5 Update 1
Recommended: 6.5 Update 1. vSphere 6.5 Update 1 resolves the issue of EAM failing with OutOfMemory. See VMware Knowledge Base Article 2135378 for more information.

Note: vSphere 5.5 is not supported with NSX 6.4.0.

It doesn’t appear that the matrix has been updated yet for other VMware products that interact with NSX, such as vCloud Director.

Before rushing out to upgrade to NSX 6.4.0, be sure to check for compatibility – especially if you are using any third party products. It may be some time before other vendors certify their products for 6.4.0.

Stay tuned for a closer look at some of the new NSX 6.4.0 features!