NSX host upgrades are well automated these days. By taking advantage of ‘fully automated’ DRS, hosts in a cluster can be evacuated, put in maintenance mode, upgraded, and even rebooted without any user intervention. By relying on DRS for resource scheduling, NSX doesn’t have to worry about doing too many hosts simultaneously and the process can generally be done without end-users even noticing.
But what if you don’t want this level of automation? Maybe you’ve got very sensitive VMs that can’t be migrated, or VMs pinned to hosts for some reason. Or maybe you just want maximum control of the upgrade process and which hosts are upgraded – and when.
There is no reason why you can’t have full control of the host upgrade process and leave DRS in manual mode. This is indeed supported.
Most of the documentation and guides out there assume that people will want to take advantage of DRS-driven upgrades, but this doesn’t mean it’s the only supported method. There is no reason why you can’t have full control of the host upgrade process and this is indeed supported. Today I’ll be walking through this in my lab as I upgrade to NSX 6.4.1.
Step 1 – Clicking the Upgrade Link
Once you’ve upgraded your NSX manager and control cluster, you should be ready to begin tackling your ESXi host clusters. Before you proceed, you’ll need to ensure your host clusters have DRS set to ‘Manual’ mode. Don’t disable DRS – that will get rid of your resource pools. Manual mode is sufficient.
Next, you’ll need to browse to the usual ‘Installation’ section in the UI, and click on the ‘Host Preparation’ tab. From here, it’s now safe to click the ‘Upgrade Available’ link on the cluster to begin the upgrade process. Because DRS is in manual mode, nothing will be able happen. Hosts can’t be evacuated, and as a result, VIBs can’t be upgraded. In essence, the upgrade has started, but immediately stalls and awaits manual intervention.
In 6.4.1 as shown above, a clear banner message is displayed reminding you that DRS is in manual mode and that hosts must be manually put in maintenance mode.
So what exactly happened behind the scenes when we clicked the upgrade link for the cluster? EAM or ESX Agent Manager is the vCenter service responsible for scanning and maintaining a record of NSX host VIB versions installed. Each NSX prepared cluster will have what’s called an ‘EAM Agency’ associated with it. Within each Agency exists a number of ‘Agents’. As you probably guessed, there is one ‘Agent’ for each ESXi host in the cluster.
EAM or ESX Agent Manager is the vCenter service responsible for scanning and maintaining a record of NSX host VIB versions installed.
Before the upgrade link was clicked, the EAM agency associated with my compute-a cluster had an associated version of 6.4.0. The last time EAM scanned my hosts, it recorded that each host – or agent in EAM speak – had the VIBs installed that matched the agency version. When the agents have the VIBs installed and the agent version matches the agency version, the hosts show up green and the cluster should be in the ‘Ready’ state.
When the upgrade link was clicked, the compute-a cluster’s EAM agency had its version changed from 6.4.0 to 6.4.1. This also triggered scan tasks of each host in the cluster, which resulted in EAM seeing that the hosts still had VIBs at 6.4.0. The EAM agency version no longer matches the agents, so the cluster goes into a ‘Not Ready’ state.
Clicking the ‘Not Ready’ link will provide more information on why it’s in this state. In my case, there is a pending upgrade of these hosts so it reminds me that they’ll need to be in maintenance mode.
Step 2 – Manual Host Evacuation and Maintenance Mode
Next comes the very manual part. In my situation, I’ve got a three host cluster with running VMs on each. The upgrade process won’t start on a host until all running VMs are evacuated and it is put into maintenance mode.
I started with host esx-a1. After starting the maintenance mode task – which hangs until all VMs are evacuated – I started manually vMotioning the VMs to the other two hosts in the cluster.
I had one edge VM on this host that couldn’t be migrated due to resource limitations, so I just shut it down. Once all the VMs were migrated or in a powered off state, the upgrade tasks immediately started.
This process will involve several tasks. Some are triggered by EAM, others by vSphere APIs. After the VIB is upgraded, EAM will scan the host again and the agent version will be updated.
Once the host is upgraded successfully, you’ll see it change from the ‘Not Ready’ state to the correct VIB version:
Step 3 – Do the Rest
Now that one host is done, I can just repeat the process for the remaining two hosts.
Once the last host was finished, EAM’s scan determined that all agents now match the agency version and the cluster is green for NSX installation status.
This process is quite manual – even VMs need to be manually migrated in batches or one at a time, which is the longest part of the process. Special care must also be taken to ensure that ESXi hosts don’t get overloaded as DRS isn’t stopping you from creating imbalances.
Personally, I think it’s best to leave DRS in fully automated mode and allow it to make the appropriate decisions when evacuating hosts. If for whatever reason you have a few VMs that can’t migrate, you can always manually address those specific VMs when the upgrade stalls.
At any rate, it’s always good to have options. You can take advantage of the new Upgrade Coordinator for the automation of the entire process, allow DRS to do it’s thing or have full manual control. The choice is ultimately yours.
Have any upgrade questions? Please leave a comment below or reach out to me on Twitter (@vswitchzero)