Category Archives: Uncategorized

NSX Troubleshooting Scenario 10

Welcome to the tenth installment of my NSX troubleshooting series – a milestone number for the one-year anniversary of I wasn’t sure how many of these I’d write, but I’ve gotten lots of positive feedback so if I can keep thinking of scenarios, I’ll keep going!

What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

I’ll try to include some questions as well for educational purposes in each post.

The Scenario

As always, we’ll start with a brief problem statement:

“I’m using an ESG load balancer to send syslog traffic to a pool of two Linux servers. I can only seem to get UDP syslog traffic to arrive at the pool members. TCP based syslog traffic doesn’t work. I’m using a one-armed load balancer. If I do a packet capture, all I see is the UDP traffic but it’s not coming from the load balancer”

Using the NSX load balancer services for syslog purposes is not at all uncommon. We see this frequently with products like Splunk as well as others. Since syslog traffic can be very heavy, this is a good use case.

When it comes to troubleshooting NSX load balancer issues, triple checking the configuration is key. In speaking with the customer, this is his desired outcome:

  • One-armed load balancer in VLAN 15.
  • No routing done by the edge. Default gateway configuration only and a single interface for simplicity.
  • Transparency is not required – the source IP can be the load balancer as the required source information is in the syslog data transmitted.
  • A mix of both TCP and UDP port 514 traffic is to be load balanced.

Here is a basic, high-level topology provided by the customer:


The one armed load balancer called esg-lb1 is sitting in VLAN 15. It’s default gateway is the SVI interface of the physical switch ( There is only one hop between the ESXi hosts – the syslog clients – and the ESG in VLAN 15. Because this is a one-armed topology, the syslog-a1 and syslog-a2 servers are using the same switch SVI as their default gateway.

Continue reading

NSX 6.3.5 Now Available!

Late yesterday, VMware made available NSX 6.3.5 (Build Number 7119875) for download. This is a full maintenance release including over 32 documented bug fixes. As you may recall, 6.3.4 was a minor patch release with only a few fixes, which is why 6.3.5 has been released so soon afterward.

There are numerous fixes in this release that will be of interest to a lot of customers. Of special note are the fixes related to Guest Introspection – a feature leveraged by 3rd party AV and security products, as well as the IDFW. I was very happy to see that GI got front and center attention in 6.3.5. Along with some needed CPU utilization fixes, there is also a fix for issue number 1897878, outlined in VMware KB 2151235 that I’ve seen quite a bit of in the wild.

Another interesting behavior change is also quoted in the ‘What’s New in 6.3.5’ section:

“Guest Introspection service VM will now ignore network events sent by guest VMs unless Identify Firewall or Endpoint Monitoring is enabled”

This is a feature that we’ve sometimes manually disabled in older releases in very large deployments to improve 3rd party A/V scalability. The vast majority of customers don’t use ‘Network Introspection’ services, so it’s good to see that it’s now off by default unless needed.

Those using Guest Introspection should definitely consider upgrading to 6.3.5.

Another interesting statement that I’m sure many people are interested in is:

“Host prep now has troubleshooting enhancements, including additional information for “not ready” errors”

EAM generally hasn’t provided very descriptive statements in the ‘Not Ready’ dialogue, so this is also a very welcome change.

And of course, the controller disconnect issue and password expiry issues are also fixed in 6.3.5. This obviously brings up a good question – should you still patch with the re-released versions of 6.3.3/6.3.4, or go straight to 6.3.5? I see no reason why you would not go straight to 6.3.5.

Here is some of the relevant information:

NSX 6.3.5 Download Page
NSX 6.3.5 Release Notes

Definitely have a read through the release notes – there are some real gems in there. I hope to get this deployed in my home lab sometime in the next few days!

New NSX Controller Issue Identified in 6.3.3 and 6.3.4.

Having difficulty deploying NSX controllers in 6.3.3? You are not alone. VMware has just made public a newly discovered bug impacting NSX controllers based on the Photon OS platform. This includes NSX 6.3.3 and 6.3.4.  VMware KB 000051144 provides a detailed summary of the symptoms, but essentially:

  • New NSX 6.3.3 Controllers will fail to deploy after November 2nd, 2017.
  • New NSX 6.3.4 Controllers will fail to deploy after January 1st, 2018.
  • Controllers deployed before this date will be prompting for a new password on login attempt.

That said, if you attempted a fresh deployment of NSX 6.3.3 today, you would not be able to deploy a control cluster.

The issue appears to stem from root and admin account credentials expiring 90 days after the creation of the NSX build. This is not 90 days after it’s deployed, but rather 90 days after the build was created by VMware. This is why NSX 6.3.3 will begin having issues after November 2nd and 6.3.4 will be fine until January 1st 2018.

Some important points:

  • If you have already deployed NSX 6.3.3 or 6.3.4, don’t worry – your controllers will continue to function just fine. Having expired admin/root passwords will not break communication between NSX components.
  • This issue does not pose any kind of datapath impact. It will only pose issues if you attempt a fresh deployment, attempt to upgrade or delete and re-deploy controllers.
  • Until you’ve had a chance to implement the workaround in KB 000051144, you should obviously avoid any of the mentioned workflows.

It appears that VMware will be re-releasing new builds of the existing 6.3.3 and 6.3.4 downloads with the fix in place, along with a fix in 6.3.5 and future releases. They have already added the following text to the 6.3.3 and 6.3.4 release notes:

Important information about NSX 6.3.3: NSX for vSphere 6.3.3 has been repackaged to address the problems mentioned in VMware Knowledge Base articles 2151719 and 000051144. The originally released build 6276725 is replaced with build 7087283. Please refer to the Knowledge Base articles for more detail. See Upgrade Notes for upgrade information.

Old 6.3.3 Build Number: 6276725
New 6.3.3 Build Number: 7087283

Old 6.3.4 Build Number: 6845891
New 6.3.4 Build Number: 7087695

As an added bonus, VMware took advantage of this situation to include the fix for the NSX controller disconnect issue in 6.3.3 as well. This other issue is described in VMware KB 2151719. Despite what it says in the 6.3.4 release notes, only 6.3.3 was susceptable to the issue outlined in KB 2151719.

If you’ve already found yourself in this predicament, VMware has provided an API call that can be used as a workaround. The API call appears to correct the issue by setting the appropriate accounts to never expire. If the password has already expired, it’ll reset it. It’s then up to you to change the password. Detailed steps can be found in KB 000051144.

It’s unfortunate that another controller issue has surfaced after the controller disconnect issue discovered in 6.3.3. Whenever there is a major change like the introduction of a new underlying OS platform, these things can clearly be missed. Thankfully the impact to existing deployments is more of an inconvenience than a serious problem. Kudos to the VMware engineering team for working so quickly to get these fixes and workarounds released!


NSX 6.2.9 Now Available for Download!

Although NSX 6.3.x is getting more time in the spotlight, VMware continues to patch and maintain the 6.2.x release branch. On October 26th, VMware made NSX for vSphere 6.2.9 (Build Number 6926419) available for download.

Below are the relevant links:

This is a full patch release, not a minor maintenance release like 6.2.6 and 6.3.4 were. VMware documents a total of 26 fixed issues in the release notes. Some of these are pretty significant relating to everything from DFW to EAM and even some host PSOD fixes. Definitely have a look through the resolved issues section of the release notes for more detail.

On a personal note, I’m really happy to see NSX continue to mature and become more and more stable over time. Working in the support organization, I can confidently say that many of the problems we used to see often are just not around any more – especially with host preparation and the control plane. The pace in which patch releases for NSX come out is pretty quick and some may argue that it is difficult to keep up with. I think this is just something that must be expected when you are working with state of the art technology like NSX. That said, kudos to VMware Engineering for the quick turnaround on many of these identified issues!