On Friday October 13th, VMware released NSX for vSphere 6.3.4. You may be surprised to see another 6.3.x version only two months after the release of 6.3.3. Unlike the usual build updates, 6.3.4 is a maintenance release containing only a small number of fixes for problems identified in 6.3.3. This is very similar to the 6.2.6 maintenance release that came out shortly after 6.2.5.
In the Resolved Issues section of the release notes, VMware outlines only three separate fixes that 6.3.4 addresses.
I’ll provide a bit of additional commentary around each of the resolved issues in 6.3.4:
Fixed Issue 1970527: ARP fails to resolve for VMs when Logical Distributed Router ARP table crosses 5K limit
This first problem was actually a regression in 6.3.3. In a previous release, the ARP table limit was increased to 20K, but in 6.3.3 the limit regressed back to previous limit of 5K. To be honest, not many customers have deployments to the scale where this would be a problem. A small number of very large deployments may see issues in 6.3.3.
Fixed Issue 1961105: Hardware VTEP connection goes down upon controller reboot. A BufferOverFlow exception is seen when certain hardware VTEP configurations are pushed from the NSX Manager to the NSX Controller. This overflow issue prevents the NSX Controller from getting a complete hardware gateway configuration. Fixed in 6.3.4.
This buffer overflow issue could potentially cause datapath issues. Thankfully, not very many NSX designs include the use of Hardware VTEPs, but if yours does and you are running 6.3.3, it would be a good idea to consider upgrading to 6.3.4.
And the final, but most likely to impact customer’s is listed third in the release notes:
Fixed Issue 1955855: Controller API could fail due to cleanup of API server reference files. Upon cleanup of required files, workflows such as traceflow and central CLI will fail. If external events disrupt the persistent TCP connections between NSX Manager and controller, NSX Manager will lose the ability to make API connections to controllers, and the UI will display the controllers as disconnected. There is no datapath impact. Fixed in 6.3.4.
I discussed this issue in more detail in a recent blog post. You can also find more information on this issue in VMware KB 2151719. In a nutshell, the communication channel between NSX Manager and the NSX Control cluster can become disrupted due to files being periodically purged by a cleanup maintenance script. Usually, you wouldn’t notice until the connection needed to be re-established after a network outage or an NSX manager reboot. Thankfully, as VMware mentions, there is no datapath impact and a simple workaround exists. Despite being more of an annoyance than a serious problem, the vast majority of NSX users running 6.3.3 are likely to hit this at one time or another.
My Opinion and Upgrade Recommendations
The third issue in the release notes described in VMware KB 2151719 is likely the most disruptive to the majority of NSX users. That said, I really don’t think it’s critical enough to have to drop everything and upgrade immediately. The workaround of restarting the controller API service is relatively simple and there should be no resulting datapath impact.
The other two issues described are not likely to be encountered in the vast majority of NSX deployments, but are potentially more serious. Unless you are really pushing the scale limits or are using Hardware VTEPs, there is likely little reason to be concerned.
I certainly think that VMware did the right thing to patch these identified problems as quickly as possible. For new greenfield deployments, I think there is no question that 6.3.4 is the way to go. For those already running 6.3.3, it’s certainly not a bad idea to upgrade, but you may want to consider holding out for 6.3.5, which should include a much larger number of fixes.
On a positive note, if you do decide to upgrade, there are likely some components that will not need to be upgraded. Because there are only a small number or fixes relating to the control plane and logical switching, ESGs, DLRs and Guest Introspection will likely not have any code changes. You’ll also benefit from not having to reboot ESXi hosts for VIB patches thanks to changes in the 6.3.x upgrade process. Once I have a chance to go through the upgrade in my lab, I’ll report back on this.
Running 6.3.3 today? Let me know what your plans are!