New NSX Controller Issue Identified in 6.3.3 and 6.3.4.

Having difficulty deploying NSX controllers in 6.3.3? You are not alone. VMware has just made public a newly discovered bug impacting NSX controllers based on the Photon OS platform. This includes NSX 6.3.3 and 6.3.4.  VMware KB 000051144 provides a detailed summary of the symptoms, but essentially:

  • New NSX 6.3.3 Controllers will fail to deploy after November 2nd, 2017.
  • New NSX 6.3.4 Controllers will fail to deploy after January 1st, 2018.
  • Controllers deployed before this date will be prompting for a new password on login attempt.

That said, if you attempted a fresh deployment of NSX 6.3.3 today, you would not be able to deploy a control cluster.

The issue appears to stem from root and admin account credentials expiring 90 days after the creation of the NSX build. This is not 90 days after it’s deployed, but rather 90 days after the build was created by VMware. This is why NSX 6.3.3 will begin having issues after November 2nd and 6.3.4 will be fine until January 1st 2018.

Some important points:

  • If you have already deployed NSX 6.3.3 or 6.3.4, don’t worry – your controllers will continue to function just fine. Having expired admin/root passwords will not break communication between NSX components.
  • This issue does not pose any kind of datapath impact. It will only pose issues if you attempt a fresh deployment, attempt to upgrade or delete and re-deploy controllers.
  • Until you’ve had a chance to implement the workaround in KB 000051144, you should obviously avoid any of the mentioned workflows.

It appears that VMware will be re-releasing new builds of the existing 6.3.3 and 6.3.4 downloads with the fix in place, along with a fix in 6.3.5 and future releases. They have already added the following text to the 6.3.3 and 6.3.4 release notes:

Important information about NSX 6.3.3: NSX for vSphere 6.3.3 has been repackaged to address the problems mentioned in VMware Knowledge Base articles 2151719 and 000051144. The originally released build 6276725 is replaced with build 7087283. Please refer to the Knowledge Base articles for more detail. See Upgrade Notes for upgrade information.

Old 6.3.3 Build Number: 6276725
New 6.3.3 Build Number: 7087283

Old 6.3.4 Build Number: 6845891
New 6.3.4 Build Number: 7087695

As an added bonus, VMware took advantage of this situation to include the fix for the NSX controller disconnect issue in 6.3.3 as well. This other issue is described in VMware KB 2151719. Despite what it says in the 6.3.4 release notes, only 6.3.3 was susceptable to the issue outlined in KB 2151719.

If you’ve already found yourself in this predicament, VMware has provided an API call that can be used as a workaround. The API call appears to correct the issue by setting the appropriate accounts to never expire. If the password has already expired, it’ll reset it. It’s then up to you to change the password. Detailed steps can be found in KB 000051144.

It’s unfortunate that another controller issue has surfaced after the controller disconnect issue discovered in 6.3.3. Whenever there is a major change like the introduction of a new underlying OS platform, these things can clearly be missed. Thankfully the impact to existing deployments is more of an inconvenience than a serious problem. Kudos to the VMware engineering team for working so quickly to get these fixes and workarounds released!

 

Building a Retro Gaming Rig – Part 5

In Part 4 of this series, I took a look at some sound card options. I’m now getting a lot closer to having this build finished, but there is still one key piece missing – storage.

When dealing with old hardware, hard drives simply don’t age well. Anything with moving parts is prone to failure and degradation over time. Not only this, but as the bearings wear down, the drives begin to have an annoying whine and droning noise that can be heard rooms away.

I’m all for the genuine nostalgic experience, but slow and noisy drives with 20 years of wear behind them are not something I’m particularly interested in. That said, I knew that I wanted to retrofit a modern storage solution to work with this machine.

Challenges and Limitations

Having worked with older hardware before, I was prepared for some challenges along the way. There are numerous drive size limitations and other BIOS quirks that I’d need to navigate around. Below are just a few:

  • Most 486 and older systems are limited to a 504MB hard drive due to a limit of 1024 cylinders being supported in the BIOS.
  • Many systems in the late nineties simply didn’t support drives larger than 32GB due to other BIOS limitations.
  • With a newer BIOS, some IDE systems can support drives as large as 128GB, which was the LBA limit with an ATA interface.

Clearly there are newer IDE drives with capacities beyond 128GB, but these drives require newer Ultra ATA 100/133 controllers. After doing some testing, I discovered that the Asus P2B that I outlined in Part 1 of this series had a 32GB drive limitation with the latest production BIOS and a 128GB limitation with the newest beta BIOS. The MSI MS-6160 that I covered in Part 3 was limited to 32GB. Since this was the board I wanted to use, I could only consider IDE solutions of 32GB or less if I wanted to stick with the onboard controller.

Continue reading “Building a Retro Gaming Rig – Part 5”

VUM Challenges During vCenter 6.5 Upgrade

After procrastinating for a while, I finally started the upgrade process in my home lab to go from vSphere 6.0 to 6.5. The PSC upgrade was smooth, but I hit a roadblock when I started the upgrade process on the vCenter Server appliance.

After going through some of the first steps in the process, I ran into the following error when trying to connect to the source appliance.

vumupgrade-1

The exact text of the error reads:

“Unable to retrieve the migration assistant extension on source vCenter Server. Make sure migration assistant is running on the VUM server.”

I had forgotten that I even had Update Manager deployed. Because my lab is small, I generally applied updates manually to my hosts via the CLI. What I do remember, however, is being frustrated that I had to deploy a full-scale Windows VM to run the Update Manager service.

Continue reading “VUM Challenges During vCenter 6.5 Upgrade”

Building a Retro Gaming Rig – Part 4

Welcome to part 4 of my Building a Retro Gaming Rig series. Today I’ll be looking at some sound cards for the build.

Back in the early nineties when I first started taking an interest in PC gaming, most entry-level systems didn’t come with a proper sound card. I still remember playing the original Wolfenstein 3D using the integrated PC speaker on my friend’s 386 system. All of the beeps, boops and tones that speaker could produce still feel somewhat nostalgic to me. We had a lot of fun with games of that era so didn’t really think much about it. It wasn’t until 1994 that I got my first 486 system and a proper Sound Blaster 16. It was then that I really realized what I was missing out on. Despite having really crappy non-amplified speakers, the FM synthesized MIDI music and sound effects were just so awesome. And who can forget messing around with ‘Sound Recorder’ or playing CD audio in Windows 3.11!

With all that in mind, it was clear that I needed a proper sound card for my retro build. But that really isn’t just a ‘checkbox’ to tick – on machines of this era there really was quite a difference between cards and to get the proper vintage experience I’d have to choose correctly.

Continue reading “Building a Retro Gaming Rig – Part 4”

Debunking the VM Link Speed Myth!

10Gbps from a 10Mbps NIC? Why not? Debunking the VM link speed myth once and for all!

** Edit on 11/6/2017: I hadn’t noticed before I wrote this post, but Raphael Schitz (@hypervisor_fr) beat me to the debunking! Please check out his great post on the subject as well here. **

I have been working with vSphere and VI for a long time now, and have spent the last six and a half years at VMware in the support organization. As you can imagine, I’ve encountered a great number of misconceptions from our customers but one that continually comes up is around VM virtual NIC link speed.

Every so often, I’ll hear statements like “I need 10Gbps networking from this VM, so I have no choice but to use the VMXNET3 adapter”, “I reduced the NIC link speed to throttle network traffic” and even “No wonder my VM is acting up, it’s got a 10Mbps vNIC!”

I think that VMware did a pretty good job documenting the role varying vNIC types and link speed had back in the VI 3.x and vSphere 4.0 era – back when virtualization was still a new concept to many. Today, I don’t think it’s discussed very much. People generally use the VMXNET3 adapter, see that it connects at 10Gbps and never look back. Not that the simplicity is a bad thing, but I think it’s valuable to understand how virtual networking functions in the background.

Today, I hope to debunk the VM link speed myth once and for all. Not with quoted statements from documentation, but through actual performance testing.

Continue reading “Debunking the VM Link Speed Myth!”

NSX 6.2.9 Now Available for Download!

Although NSX 6.3.x is getting more time in the spotlight, VMware continues to patch and maintain the 6.2.x release branch. On October 26th, VMware made NSX for vSphere 6.2.9 (Build Number 6926419) available for download.

Below are the relevant links:

This is a full patch release, not a minor maintenance release like 6.2.6 and 6.3.4 were. VMware documents a total of 26 fixed issues in the release notes. Some of these are pretty significant relating to everything from DFW to EAM and even some host PSOD fixes. Definitely have a look through the resolved issues section of the release notes for more detail.

On a personal note, I’m really happy to see NSX continue to mature and become more and more stable over time. Working in the support organization, I can confidently say that many of the problems we used to see often are just not around any more – especially with host preparation and the control plane. The pace in which patch releases for NSX come out is pretty quick and some may argue that it is difficult to keep up with. I think this is just something that must be expected when you are working with state of the art technology like NSX. That said, kudos to VMware Engineering for the quick turnaround on many of these identified issues!

Boosting vSphere Web Client Performance in ‘Tiny’ Deployments

Getting service health alarms and poor Web Client performance in ‘Tiny’ size deployments? A little extra memory can go a long way if allocated correctly!

In my home lab, I’ve been pretty happy with the vCenter Server ‘Tiny’ appliance deployment size. For the most part, vSphere Web Client performance has been decent and the appliance doesn’t need a lot of RAM or vCPUs.

When I most recently upgraded my lab, I considered using a ‘Small’ deployment but really didn’t want to tie up 16GB of memory – especially with only a small handful of hosts and many services offloaded to an external PSC

Although things worked well for the most part, I had recently been getting vCenter alarms and would get occasional periods of slow refreshes and other oddities.

vsphereclientmem-1
One of two alarms triggering frequently in my lab environment.

The two specific alarms were service health status alarms with the following text strings associated:

The vmware-dataservice-sca status changed from green to yellow

I’d also see this accompanied by a similar message referring to the vSphere Web Client:

The vsphere-client status changed from green to yellow

After doing some searching online, I quickly found VMware KB 2144950 on the subject. Although the cause of this seems pretty clear – insufficient memory allocation to the vsphere-client service – the workaround steps outlined in the KB are lacking context and could use some elaboration.

Continue reading “Boosting vSphere Web Client Performance in ‘Tiny’ Deployments”

NSX 6.3.4 Now Available!

After only two months since the release of NSX 6.3.3, VMware has released the 6.3.4 maintenance release. See what’s fixed and if you really need to upgrade.

On Friday October 13th, VMware released NSX for vSphere 6.3.4. You may be surprised to see another 6.3.x version only two months after the release of 6.3.3. Unlike the usual build updates, 6.3.4 is a maintenance release containing only a small number of fixes for problems identified in 6.3.3. This is very similar to the 6.2.6 maintenance release that came out shortly after 6.2.5.

As always, the relevant detail can be found in the 6.3.4 Release Notes. You can also find the 6.3.4 upgrade bundle at the VMware NSX Download Page.

In the Resolved Issues section of the release notes, VMware outlines only three separate fixes that 6.3.4 addresses.

Resolved Issues

I’ll provide a bit of additional commentary around each of the resolved issues in 6.3.4:

Fixed Issue 1970527: ARP fails to resolve for VMs when Logical Distributed Router ARP table crosses 5K limit

This first problem was actually a regression in 6.3.3. In a previous release, the ARP table limit was increased to 20K, but in 6.3.3 the limit regressed back to previous limit of 5K. To be honest, not many customers have deployments to the scale where this would be a problem. A small number of very large deployments may see issues in 6.3.3.

Fixed Issue 1961105: Hardware VTEP connection goes down upon controller reboot. A BufferOverFlow exception is seen when certain hardware VTEP configurations are pushed from the NSX Manager to the NSX Controller. This overflow issue prevents the NSX Controller from getting a complete hardware gateway configuration. Fixed in 6.3.4.

This buffer overflow issue could potentially cause datapath issues. Thankfully, not very many NSX designs include the use of Hardware VTEPs, but if yours does and you are running 6.3.3, it would be a good idea to consider upgrading to 6.3.4.

And the final, but most likely to impact customer’s is listed third in the release notes:

Fixed Issue 1955855: Controller API could fail due to cleanup of API server reference files. Upon cleanup of required files, workflows such as traceflow and central CLI will fail. If external events disrupt the persistent TCP connections between NSX Manager and controller, NSX Manager will lose the ability to make API connections to controllers, and the UI will display the controllers as disconnected. There is no datapath impact. Fixed in 6.3.4.

I discussed this issue in more detail in a recent blog post. You can also find more information on this issue in VMware KB 2151719. In a nutshell, the communication channel between NSX Manager and the NSX Control cluster can become disrupted due to files being periodically purged by a cleanup maintenance script. Usually, you wouldn’t notice until the connection needed to be re-established after a network outage or an NSX manager reboot. Thankfully, as VMware mentions, there is no datapath impact and a simple workaround exists. Despite being more of an annoyance than a serious problem, the vast majority of NSX users running 6.3.3 are likely to hit this at one time or another.

My Opinion and Upgrade Recommendations

The third issue in the release notes described in VMware KB 2151719 is likely the most disruptive to the majority of NSX users. That said, I really don’t think it’s critical enough to have to drop everything and upgrade immediately. The workaround of restarting the controller API service is relatively simple and there should be no resulting datapath impact.

The other two issues described are not likely to be encountered in the vast majority of NSX deployments, but are potentially more serious. Unless you are really pushing the scale limits or are using Hardware VTEPs, there is likely little reason to be concerned.

I certainly think that VMware did the right thing to patch these identified problems as quickly as possible. For new greenfield deployments, I think there is no question that 6.3.4 is the way to go. For those already running 6.3.3, it’s certainly not a bad idea to upgrade, but you may want to consider holding out for 6.3.5, which should include a much larger number of fixes.

On a positive note, if you do decide to upgrade, there are likely some components that will not need to be upgraded. Because there are only a small number or fixes relating to the control plane and logical switching, ESGs, DLRs and Guest Introspection will likely not have any code changes. You’ll also benefit from not having to reboot ESXi hosts for VIB patches thanks to changes in the 6.3.x upgrade process. Once I have a chance to go through the upgrade in my lab, I’ll report back on this.

Running 6.3.3 today? Let me know what your plans are!

Building a Retro Gaming Rig – Part 3

Welcome to the third installment of my Building a Retro Gaming Rig series. Today, I’ll be taking a look at another motherboard and CPU combo that I picked up from eBay on a bit of a whim.

In Part 1 of this series, I took an in-depth look at some Slot-1 gear, including the popular Asus P2B and some CPU options. As I was thinking ahead in the build, I got frustrated with the lack of simple and classic-looking ATX tower cases available these days. Everything looks far too modern, has too much bling or is just plain gigantic. Used tower cases from twenty years ago are all yellowed pretty badly and just look bad. On the other hand, there are lots of small, simple and affordable micro ATX cases available.

Micro ATX – or mATX – motherboards were actually pretty uncommon twenty-odd years ago. PC tower cases were pretty large and in those days people really did use lots of expansion cards and needed the extra space. Only very compact systems and OEMs seemed to use the mATX form factor at that time. Many of these boards were heavily integrated, lacked expansion slots and stuck you with some pretty weak onboard video solutions.

MSI MS-6160 Motherboard

In an interesting twist, I came across an MSI MS-6160 mATX board based on the Intel 440LX chipset that seemed to tick many of the right boxes. The combo included a Celeron 400MHz processor and 512MB of SDRAM for only $35 CDN.

Continue reading “Building a Retro Gaming Rig – Part 3”

VM Network Performance and CPU Scheduling

Over the years, I’ve been on quite a few network performance cases and have seen many reasons for performance trouble. One that is often overlooked is the impact of CPU contention and a VM’s inability to schedule CPU time effectively.

Today, I’ll be taking a quick look at the actual impact CPU scheduling can have on network throughput.

Testing Setup

To demonstrate, I’ll be using my dual-socket management host. As I did in my recent VMXNET3 ring buffer exhaustion post, I’ll be testing with VMs on the same host and port group to eliminate bottlenecks created by physical networking components. The VMs should be able to communicate as quickly as their compute resources will allow them.

Physical Host:

  • 2x Intel Xeon E5 2670 Processors (16 cores at 2.6GHz, 3.3GHz Turbo)
  • 96GB PC3-12800R Memory
  • ESXi 6.0 U3 Build 5224934

VM Configuration:

  • 1x vCPU
  • 1024MB RAM
  • VMXNET3 Adapter (1.1.29 driver with default ring sizes)
  • Debian Linux 7.4 x86 PAE
  • iperf 2.0.5

The VMs I used for this test are quite small with only a single vCPU and 1GB of RAM. This was done intentionally so that CPU contention could be more easily simulated. Much higher throughput would be possible with multiple vCPUs and additional RX queues.

The CPUs in my physical host are Xeon E5 2670 processors clocked at 2.6GHz per core. Because this processor supports Intel Turbo Boost, the maximum frequency of each core will vary depending on several factors and can be as high as 3.3GHz at times. To take this into consideration, I will test with a CPU limit of 2600MHz, as well as with no limit at all to show the benefit this provides.

To measure throughput, I’ll be using a pair of Debian Linux VMs running iperf 2.0.5. One will be the sending side and the other the receiving side. I’ll be running four simultaneous threads to maximize throughput and load.

I should note that my testing is far from precise and is not being done with the usual controls and safeguards to ensure accurate results. This said, my aim isn’t to be accurate, but rather to illustrate some higher-level patterns and trends.

Continue reading “VM Network Performance and CPU Scheduling”