New NSX Controller Issue Identified in 6.3.3 and 6.3.4.

Having difficulty deploying NSX controllers in 6.3.3? You are not alone. VMware has just made public a newly discovered bug impacting NSX controllers based on the Photon OS platform. This includes NSX 6.3.3 and 6.3.4.  VMware KB 000051144 provides a detailed summary of the symptoms, but essentially:

  • New NSX 6.3.3 Controllers will fail to deploy after November 2nd, 2017.
  • New NSX 6.3.4 Controllers will fail to deploy after January 1st, 2018.
  • Controllers deployed before this date will be prompting for a new password on login attempt.

That said, if you attempted a fresh deployment of NSX 6.3.3 today, you would not be able to deploy a control cluster.

The issue appears to stem from root and admin account credentials expiring 90 days after the creation of the NSX build. This is not 90 days after it’s deployed, but rather 90 days after the build was created by VMware. This is why NSX 6.3.3 will begin having issues after November 2nd and 6.3.4 will be fine until January 1st 2018.

Some important points:

  • If you have already deployed NSX 6.3.3 or 6.3.4, don’t worry – your controllers will continue to function just fine. Having expired admin/root passwords will not break communication between NSX components.
  • This issue does not pose any kind of datapath impact. It will only pose issues if you attempt a fresh deployment, attempt to upgrade or delete and re-deploy controllers.
  • Until you’ve had a chance to implement the workaround in KB 000051144, you should obviously avoid any of the mentioned workflows.

It appears that VMware will be re-releasing new builds of the existing 6.3.3 and 6.3.4 downloads with the fix in place, along with a fix in 6.3.5 and future releases. They have already added the following text to the 6.3.3 and 6.3.4 release notes:

Important information about NSX 6.3.3: NSX for vSphere 6.3.3 has been repackaged to address the problems mentioned in VMware Knowledge Base articles 2151719 and 000051144. The originally released build 6276725 is replaced with build 7087283. Please refer to the Knowledge Base articles for more detail. See Upgrade Notes for upgrade information.

Old 6.3.3 Build Number: 6276725
New 6.3.3 Build Number: 7087283

Old 6.3.4 Build Number: 6845891
New 6.3.4 Build Number: 7087695

As an added bonus, VMware took advantage of this situation to include the fix for the NSX controller disconnect issue in 6.3.3 as well. This other issue is described in VMware KB 2151719. Despite what it says in the 6.3.4 release notes, only 6.3.3 was susceptable to the issue outlined in KB 2151719.

If you’ve already found yourself in this predicament, VMware has provided an API call that can be used as a workaround. The API call appears to correct the issue by setting the appropriate accounts to never expire. If the password has already expired, it’ll reset it. It’s then up to you to change the password. Detailed steps can be found in KB 000051144.

It’s unfortunate that another controller issue has surfaced after the controller disconnect issue discovered in 6.3.3. Whenever there is a major change like the introduction of a new underlying OS platform, these things can clearly be missed. Thankfully the impact to existing deployments is more of an inconvenience than a serious problem. Kudos to the VMware engineering team for working so quickly to get these fixes and workarounds released!

 

2 thoughts on “New NSX Controller Issue Identified in 6.3.3 and 6.3.4.

  1. Giuliano

    Hi Mike,

    great article, I was gonna blog the same but you anticipated me.

    Re: KB 000051144 the following statement is wrong:

    “If any or all of the Controllers are redeployed repeat the preceding steps again”

    This seems to imply that running the API calls will fix the issues on controllers that are redeployed unfortunately it’s’ not the case. Any newly deployed controller will hang until root and admin account are manually fixed.

    Like

    Reply
    1. Mike Post author

      Thanks for the feedback, Giuliano! I haven’t had a chance to run through this procedure myself yet but hoping to reproduce it in my lab soon. I will reach out to the KB team to see if they can correct/clarify that statement as well.

      Like

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s