I recently ran into some problems while deploying a Windows Server 2012 R2 VM in my vSphere 6.5 U2 lab. I’ve come to expect that the console mouse response is going to be terrible until VMware Tools is installed, but for some odd reason I had no mouse control whatsoever. Thinking it may be a quirk of the Web Console, I tried both the Remote Console and the HTML5 client to no avail.
The VM appeared to be healthy and would register keyboard input, but the motion of the mouse cursor was erratic or the cursor would not move at all. Thinking that I just needed to battle on and get Tools installed, I attempted to use the keyboard for this purpose – what a chore. You think it would have been easy, but the installer kept losing focus and falling behind other open windows. Many of the windows keyboard shortcuts I’d normally use were not functioning because they register on my laptop – not in the console. I couldn’t RDP to the VM either because the NIC needed to be configured with a valid IP address.
After doing a bit of research, it appeared that display scaling could cause all sorts of mouse issues – but this didn’t appear to be applicable in my case. That’s when I stumbled upon a communities thread that mentioned adding a USB controller to the VM. Even though my VM was ‘Hardware Version 13’, the USB 2.0 controller isn’t added by default.
I managed to get to the device manager using the keyboard, and you can see that the virtual hardware will use a PS/2 a mouse in the absence of a USB controller:
I then went ahead and added the basic USB 2.0 controller to the VM and booted it up.
Welcome to the tenth installment of a new series of NSX troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.
Please see the first half for more detail on the problem symptoms and some scoping.
As we saw in the first half, our fictional administrator was attempting to configure an ESG load balancer for both TCP and UDP port 514 traffic. Below is the high-level topology:
One of the first things to keep in mind when troubleshooting the NSX load balancer is the mode in which it’s operating. In this case, we know the customer is using a one-armed load balancer. The tell-tale sign is that the ESG sits in the same VLAN as the pool members with a single interface. Also, the pool members do not have the ESG configured as their default gateway.
We also know based on the screenshots in the first half that the load balancer is not operating in ‘Transparent’ mode – so traffic to the pool members should appear as though it’s coming from the load balancer virtual IP, not from the actual syslog clients. The packet capture the customer did proves that this is actually not the case.
That said, how exactly does an NSX one-armed load balancer work?
As traffic comes in on one of the interfaces and ports configured as a ‘virtual server’, the load balancer will simply forward the traffic to one of the pool members based on the load balancing algorithm configured. In our case, it’s a simple ‘round robin’ rotation of the pool members per session/socket. But forwarding would imply that the syslog servers would see traffic coming from the originating source IP of the syslog client. This would cause a fundamental problem with asymmetry when the pool member needs to reply. When it does, the traffic would bypass the ESG and be sent directly back to the client. This would be fine with UDP, which is connection-less, but what about TCP?
It’s hard to believe, but a full year has passed since I wrote my first blog post on vswitchzero.com. My first post was something very simple just to get used to the authoring process – suppressing shell warnings – written on June 3rd, 2017.
When I started, my goal was to share my knowledge with the community and to share some of the other things I enjoy as well. I really wasn’t sure if I’d keep up with it or enjoy the process, but it has turned out to be a great personal and professional experience for me. I find myself digging deeper into problems and technologies, and looking for new ways to share, challenge and educate. It’s also been a great outlet for me to share some of my hobbies – like building and restoring retro PCs.
To date, I’ve written 76 posts for an average of about a post and a half per week. When I was getting started, it took some time to get in the swing of releasing regular content. Now, it seems I never have fewer than two or three things on my mind to write about, which is great.
Over the last year, I’ve seen a steady rise in my visitor and view counts and have seen many of my posts work their way up in the google search rankings. Some of them have been surprisingly popular – like my post on VMXNET3 buffer exhaustion and the beacon probing deep dive. I’ve also gotten some really positive feedback on my ongoing NSX Troubleshooting Scenario posts, which I hope to continue with. Being recognized as a 2018 vExpert was also a big milestone for me and I look forward to applying for the 2018 NSX vExpert program as well.
I’d like to take a moment to thank William Lam (@lamw), and Matt Mancini (@vmexplorer) who were a big help in getting me started. They provided me with many great tips. Some of which I have embraced, and others that I still struggle with – like trying not to write 15,000-word posts. They also encouraged me to get on Twitter, which has proven to be an excellent tool to share my posts with the greater community.
Thank you all for your support and encouragement! I look forward to the many posts ahead.
Welcome to the tenth installment of my NSX troubleshooting series – a milestone number for the one-year anniversary of vswitchzero.com. I wasn’t sure how many of these I’d write, but I’ve gotten lots of positive feedback so if I can keep thinking of scenarios, I’ll keep going!
What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.
I’ll try to include some questions as well for educational purposes in each post.
As always, we’ll start with a brief problem statement:
“I’m using an ESG load balancer to send syslog traffic to a pool of two Linux servers. I can only seem to get UDP syslog traffic to arrive at the pool members. TCP based syslog traffic doesn’t work. I’m using a one-armed load balancer. If I do a packet capture, all I see is the UDP traffic but it’s not coming from the load balancer”
Using the NSX load balancer services for syslog purposes is not at all uncommon. We see this frequently with products like Splunk as well as others. Since syslog traffic can be very heavy, this is a good use case.
When it comes to troubleshooting NSX load balancer issues, triple checking the configuration is key. In speaking with the customer, this is his desired outcome:
- One-armed load balancer in VLAN 15.
- No routing done by the edge. Default gateway configuration only and a single interface for simplicity.
- Transparency is not required – the source IP can be the load balancer as the required source information is in the syslog data transmitted.
- A mix of both TCP and UDP port 514 traffic is to be load balanced.
Here is a basic, high-level topology provided by the customer:
The one armed load balancer called esg-lb1 is sitting in VLAN 15. It’s default gateway is the SVI interface of the physical switch (172.16.15.1). There is only one hop between the ESXi hosts – the syslog clients – and the ESG in VLAN 15. Because this is a one-armed topology, the syslog-a1 and syslog-a2 servers are using the same switch SVI as their default gateway.
I recently deployed NSX 6.3.2 in my home lab to do some testing. After deploying a DLR, I went back in to add some additional interfaces and was greeted by a ‘blank’ or null error message. Having run into this problem before, I thought it may be a good idea to give some additional context to VMware KB 2151309.
As you can see above, there is no text associated with the error. There are no problems with the IP or mask I used, and it doesn’t seem clear why this would be failing.
You would expect to find more detail in the NSX Manager vsm.log file, but interestingly there is nothing there at all for this exception. That’s because this isn’t an NSX fault, but rather something in the vSphere Web Client.
On May 24th, VMware released NSX 6.4.1 – the first version of NSX to support vSphere 6.7. This is undoubtedly exciting news for those who have been waiting to upgrade their vSphere deployment. Although 6.4.1 sounds like a minor release, there are a slew of UI and usability enhancements as well as context-aware firewall improvements. There has also been some additional functionality introduced into the HTML5 client, which is very welcome news.
You’ll also notice in 6.4.1 that the service composer canvas view has been removed. This was a bit of an iconic overview page for service composer in the UI, but was not terribly useful and didn’t scale at all to large deployments with many security policies. I honestly don’t think anyone will be missing it.
On top of these enhancements, VMware engineering has been busy with bug fixes. NSX 6.4.1 includes 23 documented fixes across all areas of the product. A couple of notable ones include:
- Fixed Issue 2035026: Network outage of around 40-50 seconds seen on Edge Upgrade
- Fixed Issue 1971683: NSX Manager logs false duplicate IP message
- Fixed Issue 2092730: NSX Edge stops responding with /var/log partition at 100% disk usage
- Fixed Issue 1809387: Support for Weak Secure transport protocol – TLS v1.0 removed
You can find the complete list in the resolved issues section of the NSX 6.4.1 release notes.
Planning to upgrade? Remember to check the NSX upgrade matrix. Those running 6.2.0, 6.2.1 or 6.2.2 will need to refer to KB 51624 before upgrading. Have a look at my ‘Ten Tips for a Successful NSX Upgrade‘ post for ways to ensure your upgrade is successful.
Relevant Links for NSX 6.4.1 (Build 8599035):
I have recently rebuilt my home lab – an all too common occurrence due to the number of times I intentionally try to break things. In the process of rebuilding, I had some ISO files I wanted to copy over to a datastore. The process failed and the Web Client greeted me with an uncharacteristically long error message.
The exact text reads:
“The operation failed for an undetermined reason. Typically, this problem occurs due to certificates that the browser does no trust. If you are using self-signed or custom certificates, open the URL below in a new browser tab and accept the certificate, then retry the operation.”
In my case, the URL that it listed was to one of my ESXi hosts in the compute-a cluster called esx-a2. The error then goes on to reference VMware KB 2147256.
It may seem odd that the vSphere Client would be telling you to visit a random ESXi host’s UI address when you are trying to upload a file via vCenter. But if you stop to think about it for a second, vCenter has no access whatsoever to your datastores. Whether you are trying to create a new VMFS datastore, upload a file or even just browse, vCenter must rely on an ESXi host with the necessary access to do the actual legwork. That ESXi host then relays the information back through the Web Client.