NSX Troubleshooting Scenario 5

Welcome to the fifth installment of my new NSX troubleshooting series. What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

NSX Troubleshooting Scenario 5

As always, we’ll start with a brief customer problem statement:

“We’ve just deployed NSX and are doing some testing with the distributed firewall. We created a security tag that we can apply to VMs to prevent them from browsing the web. We applied this tag on three virtual machines. It seems to work on two of them, but the third can always browse the web! Something is not working here”

After speaking to the customer, we were able to collect a bit more information about the VMs and traffic flows in question. Below are the VMs that should not be able to browse:

  • win-a1.lab.local – (static)
  • lubuntu-1.lab.local – (DHCP)
  • lubuntu-2.lab.local – (DHCP)

Only the VM called lubuntu-1 is still able to browse. The others are fine. The customer has been using an internal web server called web-a1.lab.local for testing. That machine is in the same cluster and has an IP address of It serves up a web page on port 80. All of the VMs in question are sitting in the same logical switch and the customer reports that all east-west and north-south routing is functioning normally.

To begin, let’s have a look at the DFW rules defined.


As you can see, they really did just start testing as there is only one new section and a single non-default rule. The rule is quite simple. Any HTTP/HTTPS traffic coming from VMs in the ‘No Browser’ security group should be blocked. We can also see that both this rule and the default were set to log as part of the troubleshooting activities.

Continue reading

The 486 Restoration – Part 2

Welcome to part two of my 486 restoration project! In my last post, I took a look at some of the rescued parts from a badly neglected tower. Today, I’ll be going through my adventures of getting a functional CMOS battery working on this system.

As mentioned briefly in part one, most 386 and early 486 systems included what are referred to as ‘barrel batteries’. These are rechargeable nickel cadmium (NiCAD) batteries and are usually rated at 3.6V fully charged. Unlike the coin cell batteries in newer systems, the battery charges whenever the system is powered on. In theory, this was great because the CMOS battery could last a long time in the system. Using a multi-cell rechargeable battery increases the cost of the board, so the CR2032 coin cell solutions were most likely used for cost savings first and foremost in the years following. This is all well and good, but nobody really envisioned these systems to be in use 25 years later as is the case with this system here.


A 25 year old Varta 3.6V NiCAD – it’s gotta go before the inevitable happens.

A quick google search on these barrel batteries, and you’ll see just how problematic these can be when they age. Not only can they leak and cease to function, but when they do they are very corrosive to copper traces and other types of metal on the board. If caught early enough, the board can be cleaned and may still be functional. Unfortunately, the damage can sometimes be permanent.

Continue reading

Using SDelete and vmkfstools to Reclaim Thin VMDK Space

Using thin provisioned virtual disks can provide many benefits. Not only do they allow over-provisioning, but with the prevalence of flash storage, performance degradation really isn’t a concern like it used to be.

I recently ran into a situation in my home lab where my Windows jump box ran out of disk space. I had downloaded a bunch of OVA and ISO files and had forgotten to move them over to a shared drive that I use for archiving. I expanded the disk by 10GB to take it from 40GB to 50GB, and moved off all the large files. After this, I had about 26GB used and 23GB free – much better.


Because that jump box is sitting on flash storage – which is limited in my lab – I had thin provisioned this VM to conserve as much disk space as possible. Despite freeing up lots of space, the VM’s VMDK was still consuming a lot more than 26GB.

Notice below that doing a normal directory listing displays the maximum possible size of a thin disk. In this case, the disk has been expanded to 50GB:

[root@esx0:/vmfs/volumes/58f77a6f-30961726-ac7e-002655e1b06c/jump] ls -lha
total 49741856
drwxr-xr-x 1 root root 3.0K Feb 12 21:50 .
drwxr-xr-t 1 root root 4.1K Feb 16 16:13 ..
-rw-r--r-- 1 root root 41 Jun 16 2017 jump-7a99c824.hlog
-rw------- 1 root root 13 May 29 2017 jump-aux.xml
-rw------- 1 root root 4.0G Nov 25 18:47 jump-c49da2be.vswp
-rw------- 1 root root 3.1M Feb 12 21:50 jump-ctk.vmdk
-rw------- 1 root root 50.0G Feb 16 17:55 jump-flat.vmdk
-rw------- 1 root root 8.5K Feb 16 15:26 jump.nvram
-rw------- 1 root root 626 Feb 12 21:50 jump.vmdk

Using the ‘du’ command – for disk usage – we can see the flat file containing the data is still consuming over 43GB of space:

[root@esx0:/vmfs/volumes/58f77a6f-30961726-ac7e-002655e1b06c/jump] du -h *flat*.vmdk
43.6G jump-flat.vmdk

That’s about 40% wasted space.

Continue reading

The 486 Restoration – Part 1

As you may have noticed in my recent Building a Retro Gaming Rig series, I’m quite passionate about 1990s era PC hardware. Machines from this time are very nostalgic to me as this is when I really started getting interested in PCs and technology in general. Granted, PC gaming is what really drove my interest in hardware initially, but down the line, I really started enjoyed the hardware just for the sake of it.

I’ve only recently started acquiring and collecting vintage hardware in the last year or so, but I’ve always been drawn to 486 systems. Although we had an old monochrome 8-bit machine growing up – possibly a 80286 – the first PC I was really interested in was a 486 system bought in 1994.


This won’t be the first 486 system I have in my collection. I got a well maintained DEC low profile system from my brother-in-law last summer. It’s a nice system that I hope to take a look at in another post, but it’s very integrated. Everything is on-board and proprietary so it leaves very little room for tweaking. That said, I really wanted something I could customize.

Today’s project all started with an ad on Kijiji I stumbled on a few weeks back – a 486 tower system in “working condition”. Inspecting the posted images carefully, I could see that the system was far from complete – it was missing a video card and didn’t have a hard drive.

Continue reading

NSX Troubleshooting Scenario 4 – Solution

Welcome to the fourth installment of a new series of NSX troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of scenario four. Today I’ll be performing some troubleshooting and will resolve the issue.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

During the scoping in the first half of the scenario, we saw that the problem was squarely in the customer’s new secondary NSX deployment. Two test virtual machines – linux-r1 and linux-r2 – could not be added to any of the universal logical switches.

From the ‘Logical Switches’ view in the NSX Web Client UI, we could see that these universal logical switches were synchronized across both NSX Managers. These existed from the perspective of the Primary and Secondary manager views:


Perhaps the most telling observation, however, was the absence of distributed port groups associated with the universal logical switches on the dvs-rem switch:


As we can see above, the port groups do exist for logical switches in the VNI 900x range. These are non-universal, logical switches available to the secondary NSX deployment only.

In the host preparation section, we can see that dvs-rem is indeed the configured distributed switch for the compute-r cluster and that both hosts look good from a VTEP/VXLAN perspective:


So why are these port groups missing? Without them, VMs simply can’t be added to the associated logical switches.

The Solution

Although you’ve probably noticed that I like to dig deep in some of these scenarios, this one is actually pretty straight forward. A straight forward, but all too common problem – the cluster has not been added to the universal transport zone.


You’d be surprised how often I see this, but to be fair, it’s very easily overlooked. I sometimes need to remind myself to check all the basics first, especially when dealing with new deployments. The key symptom that raised red flags for me was the lack of auto-generated port groups on the distributed switch. The addition of the cluster to the transport zone will trigger the creation of these port groups. If they don’t exist, this should be the first thing that is checked.

As soon as I added the compute-r cluster to the Universal TZ transport zone, we see an immediate slew of portgroup creation tasks:


I’ve now essentially told NSX that I want all the logical switches in that transport zone to span to the compute-r cluster. In NSX-V, we can think of a transport zone as a boundary spanning one or more clusters. Only clusters in that transport zone will have the logical switches available to them for use.

The concept of a ‘Universal Transport Zone’ just takes this a step further and allows clusters in different vCenter instances to connect to the same universal logical switches. The fact that we saw portgroups for the 9000-900X range of VNIs tells us that the compute-r cluster existed in the non-universal Transport Zone called ‘Remote TZ’, but was missing from ‘Universal TZ’.


Thanks again to everyone for posting their testing suggestions and theories! I hope you enjoyed this scenario. If you have other suggestions for troubleshooting scenarios you’d like to see, please leave a comment, or reach out to me on Twitter (@vswitchzero).



NSX Troubleshooting Scenario 4

Time for another NSX troubleshooting scenario! Welcome to the fourth installment of my new NSX troubleshooting series. What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

NSX Troubleshooting Scenario 4

As always, we’ll start with a customer problem statement:

“We recently deployed Cross-vCenter NSX for a remote datacenter location. When we try to add VMs to the universal logical switches, there are no VM vNICs in the list to add. This works fine at the primary datacenter.”

This customer’s environment will be the same as what we outlined in scenario 3. Keep in mind that this should be treated separately. Forget everything from the previous scenario.


The main location is depicted on the left. A three host cluster called compute-a exists there. All of the VLAN backed networks route through a router called vyos. The Universal Control Cluster exists at this location, as does the primary NSX manager.

The ‘remote datacenter’ is to the right of the dashed line. The single ‘compute-r’ cluster there is associated with the secondary NSX manager at that location. According to the customer, this was only recently added.

Continue reading

NSX 6.4.0 Upgrade Compatibility

Thinking about upgrading to NSX 6.4.0? As I discussed in my recent Ten Tips for a Successful NSX Upgrade post, it’s always a good idea to do your research before upgrading. Along with reading the release notes, checking the VMware compatibility Matrix is essential.

VMware just updated some of the compatibility matrices to include information about 6.4.0. Here are the relevant Links:

From an NSX upgrade path perspective, you’ll be happy to learn that any current build of NSX 6.2.x or 6.3.x should be fine. At the time of writing, this would be 6.2.9 and earlier as well as 6.3.5 and earlier.


NSX upgrade compatibility – screenshot from 1/17/2018.

On a positive note, VMware required a workaround to be done for some older 6.2.x builds to go to 6.3.5, but this is no longer required for 6.4.0. The underling issue that required this has been resolved.

From a vCenter and ESXi 6.0 and 6.5 perspective, the requirements for NSX 6.4.0 remain largely unchanged from late 6.3.x releases. What you’ll immediately notice is that NSX 6.4.0 is not supported with vSphere 5.5. If you are running vSphere 5.5, you’ll need to get to at least 6.0 U2 before considering NSX 6.4.0.

From the NSX 6.4.0 release notes:

Supported: 6.0 Update 2, 6.0 Update 3
Recommended: 6.0 Update 3. vSphere 6.0 Update 3 resolves the issue of duplicate VTEPs in ESXi hosts after rebooting vCenter server. See VMware Knowledge Base article 2144605 for more information.

Supported: 6.5a, 6.5 Update 1
Recommended: 6.5 Update 1. vSphere 6.5 Update 1 resolves the issue of EAM failing with OutOfMemory. See VMware Knowledge Base Article 2135378 for more information.

Note: vSphere 5.5 is not supported with NSX 6.4.0.

It doesn’t appear that the matrix has been updated yet for other VMware products that interact with NSX, such as vCloud Director.

Before rushing out to upgrade to NSX 6.4.0, be sure to check for compatibility – especially if you are using any third party products. It may be some time before other vendors certify their products for 6.4.0.

Stay tuned for a closer look at some of the new NSX 6.4.0 features!