VUM Challenges During vCenter 6.5 Upgrade

After procrastinating for a while, I finally started the upgrade process in my home lab to go from vSphere 6.0 to 6.5. The PSC upgrade was smooth, but I hit a roadblock when I started the upgrade process on the vCenter Server appliance.

After going through some of the first steps in the process, I ran into the following error when trying to connect to the source appliance.

vumupgrade-1

The exact text of the error reads:

“Unable to retrieve the migration assistant extension on source vCenter Server. Make sure migration assistant is running on the VUM server.”

I had forgotten that I even had Update Manager deployed. Because my lab is small, I generally applied updates manually to my hosts via the CLI. What I do remember, however, is being frustrated that I had to deploy a full-scale Windows VM to run the Update Manager service.

Continue reading “VUM Challenges During vCenter 6.5 Upgrade”

Building a Retro Gaming Rig – Part 4

Welcome to part 4 of my Building a Retro Gaming Rig series. Today I’ll be looking at some sound cards for the build.

Back in the early nineties when I first started taking an interest in PC gaming, most entry-level systems didn’t come with a proper sound card. I still remember playing the original Wolfenstein 3D using the integrated PC speaker on my friend’s 386 system. All of the beeps, boops and tones that speaker could produce still feel somewhat nostalgic to me. We had a lot of fun with games of that era so didn’t really think much about it. It wasn’t until 1994 that I got my first 486 system and a proper Sound Blaster 16. It was then that I really realized what I was missing out on. Despite having really crappy non-amplified speakers, the FM synthesized MIDI music and sound effects were just so awesome. And who can forget messing around with ‘Sound Recorder’ or playing CD audio in Windows 3.11!

With all that in mind, it was clear that I needed a proper sound card for my retro build. But that really isn’t just a ‘checkbox’ to tick – on machines of this era there really was quite a difference between cards and to get the proper vintage experience I’d have to choose correctly.

Continue reading “Building a Retro Gaming Rig – Part 4”

Debunking the VM Link Speed Myth!

10Gbps from a 10Mbps NIC? Why not? Debunking the VM link speed myth once and for all!

** Edit on 11/6/2017: I hadn’t noticed before I wrote this post, but Raphael Schitz (@hypervisor_fr) beat me to the debunking! Please check out his great post on the subject as well here. **

I have been working with vSphere and VI for a long time now, and have spent the last six and a half years at VMware in the support organization. As you can imagine, I’ve encountered a great number of misconceptions from our customers but one that continually comes up is around VM virtual NIC link speed.

Every so often, I’ll hear statements like “I need 10Gbps networking from this VM, so I have no choice but to use the VMXNET3 adapter”, “I reduced the NIC link speed to throttle network traffic” and even “No wonder my VM is acting up, it’s got a 10Mbps vNIC!”

I think that VMware did a pretty good job documenting the role varying vNIC types and link speed had back in the VI 3.x and vSphere 4.0 era – back when virtualization was still a new concept to many. Today, I don’t think it’s discussed very much. People generally use the VMXNET3 adapter, see that it connects at 10Gbps and never look back. Not that the simplicity is a bad thing, but I think it’s valuable to understand how virtual networking functions in the background.

Today, I hope to debunk the VM link speed myth once and for all. Not with quoted statements from documentation, but through actual performance testing.

Continue reading “Debunking the VM Link Speed Myth!”

NSX 6.2.9 Now Available for Download!

Although NSX 6.3.x is getting more time in the spotlight, VMware continues to patch and maintain the 6.2.x release branch. On October 26th, VMware made NSX for vSphere 6.2.9 (Build Number 6926419) available for download.

Below are the relevant links:

This is a full patch release, not a minor maintenance release like 6.2.6 and 6.3.4 were. VMware documents a total of 26 fixed issues in the release notes. Some of these are pretty significant relating to everything from DFW to EAM and even some host PSOD fixes. Definitely have a look through the resolved issues section of the release notes for more detail.

On a personal note, I’m really happy to see NSX continue to mature and become more and more stable over time. Working in the support organization, I can confidently say that many of the problems we used to see often are just not around any more – especially with host preparation and the control plane. The pace in which patch releases for NSX come out is pretty quick and some may argue that it is difficult to keep up with. I think this is just something that must be expected when you are working with state of the art technology like NSX. That said, kudos to VMware Engineering for the quick turnaround on many of these identified issues!

Boosting vSphere Web Client Performance in ‘Tiny’ Deployments

Getting service health alarms and poor Web Client performance in ‘Tiny’ size deployments? A little extra memory can go a long way if allocated correctly!

In my home lab, I’ve been pretty happy with the vCenter Server ‘Tiny’ appliance deployment size. For the most part, vSphere Web Client performance has been decent and the appliance doesn’t need a lot of RAM or vCPUs.

When I most recently upgraded my lab, I considered using a ‘Small’ deployment but really didn’t want to tie up 16GB of memory – especially with only a small handful of hosts and many services offloaded to an external PSC

Although things worked well for the most part, I had recently been getting vCenter alarms and would get occasional periods of slow refreshes and other oddities.

vsphereclientmem-1
One of two alarms triggering frequently in my lab environment.

The two specific alarms were service health status alarms with the following text strings associated:

The vmware-dataservice-sca status changed from green to yellow

I’d also see this accompanied by a similar message referring to the vSphere Web Client:

The vsphere-client status changed from green to yellow

After doing some searching online, I quickly found VMware KB 2144950 on the subject. Although the cause of this seems pretty clear – insufficient memory allocation to the vsphere-client service – the workaround steps outlined in the KB are lacking context and could use some elaboration.

Continue reading “Boosting vSphere Web Client Performance in ‘Tiny’ Deployments”

NSX 6.3.4 Now Available!

After only two months since the release of NSX 6.3.3, VMware has released the 6.3.4 maintenance release. See what’s fixed and if you really need to upgrade.

On Friday October 13th, VMware released NSX for vSphere 6.3.4. You may be surprised to see another 6.3.x version only two months after the release of 6.3.3. Unlike the usual build updates, 6.3.4 is a maintenance release containing only a small number of fixes for problems identified in 6.3.3. This is very similar to the 6.2.6 maintenance release that came out shortly after 6.2.5.

As always, the relevant detail can be found in the 6.3.4 Release Notes. You can also find the 6.3.4 upgrade bundle at the VMware NSX Download Page.

In the Resolved Issues section of the release notes, VMware outlines only three separate fixes that 6.3.4 addresses.

Resolved Issues

I’ll provide a bit of additional commentary around each of the resolved issues in 6.3.4:

Fixed Issue 1970527: ARP fails to resolve for VMs when Logical Distributed Router ARP table crosses 5K limit

This first problem was actually a regression in 6.3.3. In a previous release, the ARP table limit was increased to 20K, but in 6.3.3 the limit regressed back to previous limit of 5K. To be honest, not many customers have deployments to the scale where this would be a problem. A small number of very large deployments may see issues in 6.3.3.

Fixed Issue 1961105: Hardware VTEP connection goes down upon controller reboot. A BufferOverFlow exception is seen when certain hardware VTEP configurations are pushed from the NSX Manager to the NSX Controller. This overflow issue prevents the NSX Controller from getting a complete hardware gateway configuration. Fixed in 6.3.4.

This buffer overflow issue could potentially cause datapath issues. Thankfully, not very many NSX designs include the use of Hardware VTEPs, but if yours does and you are running 6.3.3, it would be a good idea to consider upgrading to 6.3.4.

And the final, but most likely to impact customer’s is listed third in the release notes:

Fixed Issue 1955855: Controller API could fail due to cleanup of API server reference files. Upon cleanup of required files, workflows such as traceflow and central CLI will fail. If external events disrupt the persistent TCP connections between NSX Manager and controller, NSX Manager will lose the ability to make API connections to controllers, and the UI will display the controllers as disconnected. There is no datapath impact. Fixed in 6.3.4.

I discussed this issue in more detail in a recent blog post. You can also find more information on this issue in VMware KB 2151719. In a nutshell, the communication channel between NSX Manager and the NSX Control cluster can become disrupted due to files being periodically purged by a cleanup maintenance script. Usually, you wouldn’t notice until the connection needed to be re-established after a network outage or an NSX manager reboot. Thankfully, as VMware mentions, there is no datapath impact and a simple workaround exists. Despite being more of an annoyance than a serious problem, the vast majority of NSX users running 6.3.3 are likely to hit this at one time or another.

My Opinion and Upgrade Recommendations

The third issue in the release notes described in VMware KB 2151719 is likely the most disruptive to the majority of NSX users. That said, I really don’t think it’s critical enough to have to drop everything and upgrade immediately. The workaround of restarting the controller API service is relatively simple and there should be no resulting datapath impact.

The other two issues described are not likely to be encountered in the vast majority of NSX deployments, but are potentially more serious. Unless you are really pushing the scale limits or are using Hardware VTEPs, there is likely little reason to be concerned.

I certainly think that VMware did the right thing to patch these identified problems as quickly as possible. For new greenfield deployments, I think there is no question that 6.3.4 is the way to go. For those already running 6.3.3, it’s certainly not a bad idea to upgrade, but you may want to consider holding out for 6.3.5, which should include a much larger number of fixes.

On a positive note, if you do decide to upgrade, there are likely some components that will not need to be upgraded. Because there are only a small number or fixes relating to the control plane and logical switching, ESGs, DLRs and Guest Introspection will likely not have any code changes. You’ll also benefit from not having to reboot ESXi hosts for VIB patches thanks to changes in the 6.3.x upgrade process. Once I have a chance to go through the upgrade in my lab, I’ll report back on this.

Running 6.3.3 today? Let me know what your plans are!

Building a Retro Gaming Rig – Part 3

Welcome to the third installment of my Building a Retro Gaming Rig series. Today, I’ll be taking a look at another motherboard and CPU combo that I picked up from eBay on a bit of a whim.

In Part 1 of this series, I took an in-depth look at some Slot-1 gear, including the popular Asus P2B and some CPU options. As I was thinking ahead in the build, I got frustrated with the lack of simple and classic-looking ATX tower cases available these days. Everything looks far too modern, has too much bling or is just plain gigantic. Used tower cases from twenty years ago are all yellowed pretty badly and just look bad. On the other hand, there are lots of small, simple and affordable micro ATX cases available.

Micro ATX – or mATX – motherboards were actually pretty uncommon twenty-odd years ago. PC tower cases were pretty large and in those days people really did use lots of expansion cards and needed the extra space. Only very compact systems and OEMs seemed to use the mATX form factor at that time. Many of these boards were heavily integrated, lacked expansion slots and stuck you with some pretty weak onboard video solutions.

MSI MS-6160 Motherboard

In an interesting twist, I came across an MSI MS-6160 mATX board based on the Intel 440LX chipset that seemed to tick many of the right boxes. The combo included a Celeron 400MHz processor and 512MB of SDRAM for only $35 CDN.

Continue reading “Building a Retro Gaming Rig – Part 3”

VM Network Performance and CPU Scheduling

Over the years, I’ve been on quite a few network performance cases and have seen many reasons for performance trouble. One that is often overlooked is the impact of CPU contention and a VM’s inability to schedule CPU time effectively.

Today, I’ll be taking a quick look at the actual impact CPU scheduling can have on network throughput.

Testing Setup

To demonstrate, I’ll be using my dual-socket management host. As I did in my recent VMXNET3 ring buffer exhaustion post, I’ll be testing with VMs on the same host and port group to eliminate bottlenecks created by physical networking components. The VMs should be able to communicate as quickly as their compute resources will allow them.

Physical Host:

  • 2x Intel Xeon E5 2670 Processors (16 cores at 2.6GHz, 3.3GHz Turbo)
  • 96GB PC3-12800R Memory
  • ESXi 6.0 U3 Build 5224934

VM Configuration:

  • 1x vCPU
  • 1024MB RAM
  • VMXNET3 Adapter (1.1.29 driver with default ring sizes)
  • Debian Linux 7.4 x86 PAE
  • iperf 2.0.5

The VMs I used for this test are quite small with only a single vCPU and 1GB of RAM. This was done intentionally so that CPU contention could be more easily simulated. Much higher throughput would be possible with multiple vCPUs and additional RX queues.

The CPUs in my physical host are Xeon E5 2670 processors clocked at 2.6GHz per core. Because this processor supports Intel Turbo Boost, the maximum frequency of each core will vary depending on several factors and can be as high as 3.3GHz at times. To take this into consideration, I will test with a CPU limit of 2600MHz, as well as with no limit at all to show the benefit this provides.

To measure throughput, I’ll be using a pair of Debian Linux VMs running iperf 2.0.5. One will be the sending side and the other the receiving side. I’ll be running four simultaneous threads to maximize throughput and load.

I should note that my testing is far from precise and is not being done with the usual controls and safeguards to ensure accurate results. This said, my aim isn’t to be accurate, but rather to illustrate some higher-level patterns and trends.

Continue reading “VM Network Performance and CPU Scheduling”

NSX Engineering Mode ‘root shell’ Access Now Available to Customers

In an interesting move, VMware has released public KB 2149630 on September 29th, providing information on how to access the root shell of the NSX Manager appliance.

If you’ve been on an NSX support call with VMware dealing with a complex issue, you may have seen your support engineer drop into a special shell called ‘Engineering Mode’. This is sometimes also referred to as ‘Tech Support Mode’. Regardless of the name used, this is basically a root bash shell on the underlying Linux based appliance. From here, system configuration files and scripts as well as most normal Linux functions can be accessed.

Normally, when you open a console or SSH session to NSX manager, you are dropped into a restricted ‘admin’ shell with a hierarchical system of commands like Cisco’s IOS. For the majority of what an administrator needs to do, this is sufficient. It’s only in more complex cases – especially when dealing with issues in the Postgres DB – or issues with the underling OS that this may be required.

There are several important statements and disclaimers that VMware makes in this KB article that I want to outline below:

“Important: Do not make any changes to the underlying system without the help of VMware Technical Support. All such changes are not supported and as a result, your system may no longer be supportable by GSS.”

In NSX 6.3.2 and later, you’ll also be greeted by the following disclaimer:

“Engineering Mode: The authorized NSX Manager system administrator is requesting a shell which is able to perform lower level unix commands/diagnostics and make changes to the appliance. VMware asks that you do so only in conjunction with a support call to prevent breaking your virtual infrastructure. Please enter the shell diagnostics string before proceeding.Type Exit to return to the NSX shell. Type y to continue:”

And finally, you’ll want to ensure you have a full backup of NSX Manager should anything need to be modified:

VMware recommends to take full backup of the system before performing any changes after logging into the Tech Support Mode.

Although it is very useful to take a ‘read only’ view at some things in the root shell, making any changes is not supported without getting direct assistance from VMware support.

A few people have asked whether or not making the root shell password public is a security issue, but the important point to remember is that you cannot even get to a position where you can enter the shell unless you are already logged in as an NSX enterprise administrator level account. For example, the built-in ‘admin’ account. For anyone concerned about this, VMware does allow the root password to be changed. It’s just critical that this password not be lost in case VMware support requires access to the root shell for troubleshooting purposes. More information on this can be found in KB 2149630.

To be honest, I’m a bit torn on this development. As someone who does backline support, I know what kind of damage that can be done from the root shell – even with the best intentions. But at the same time, I see this as empowering. It gives customers additional tools to troubleshoot and it also provides some transparency into how NSX Manager works rather than shielding it behind a restricted shell. I think that overall, the benefits outweigh the risks and this was a positive move for VMware.

When I think back to VI 3.5 and vSphere 4.0 when ESXi was shiny and new, VMware initially took a similar stance. You had to go so far as to type ‘UNSUPPORTED’ into the console to access a shell. Today, everyone has unrestricted root access to the hypervisor. The same holds true for the vCenter appliance – the potential for destruction is no different.

I’d welcome any comments or thoughts. Please share them below!

Controller Disconnect and API Bug in NSX 6.3.3

VMware just announced a new bug discovered in NSX 6.3.3. Those running 6.3.3 or planning to upgrade in the near-term may want to familiarize themselves with VMware KB 2151719.

As you may know, VMware moved from a Debian based distribution for the underlying OS of the NSX controllers to their Photon OS platform. This is why the upgrade process includes the complete redeployment of all three controller nodes.

It appears that a scheduled clean-up script on the controllers used to prevent the partitions from filling is also removing some files required for NSX Manager to communicate and authenticate with the controller via REST API.

Most folks running 6.3.3 in a stable deployment will likely not have noticed, but an event disrupting communication between Manager and the Controllers can prevent them from reconnecting. Some examples would include a reboot of the NSX manager, or a network disruption.

Thankfully, the NSX Controller core functions – managing the VXLAN and distributed logical routing control plane – will continue to work in this state and dataplane disruptions should not be experienced.

KB 2151719 discusses a pretty simple workaround of restarting the api-server service on any impacted controllers. This is a non-disruptive action and should be safe to do at any time. The command is the following:

nsx-controller # restart api-server

VMware will likely be addressing this in the next NSX release. If you are planning to upgrade, you may want to consider 6.3.2 or hold out for the next 6.3.x release.