New Upgrade Issue in NSX 6.4.4

Be sure to check out VMware KB 67416 before upgrading to 6.4.4.

If you are planning to upgrade to NSX 6.4.4, be sure to have a look at VMware KB 67416 before you do. I’ve seen several customers hit this issue now, and a bit of pre-work before the upgrade can save you a lot of grief.

It appears that if you are using grouping objects, like security groups or IP sets in your ESG firewall rules, there is a chance that your ESG will become unmanageable after NSX Manager gets upgraded to 6.4.4. Most customers will notice this issue when they go to upgrade their ESGs as part of the upgrade process and the tasks fail. In addition to not being able to upgrade the edge, all configuration changes you attempt to make will also fail.

This issue lies in the message bus communication channel between NSX Manager and the ESG. These security groups and IP sets trigger a large number of messages and eventually the channel becomes blocked as a result. Unfortunately, there is no workaround aside from removing these groups and IP sets from the firewall before upgrading. This may not be a feasible workaround for the majority of customers out there.

Although not a common configuration, this issue can also be triggered if DFW rules are applied to ESGs and these rules contain grouping objects.

If you know your environment is configured with security groups and IP sets in the edge firewall, I’d recommend reaching out to VMware technical support prior to beginning your upgrade. Support can proactively install a “hot patch” so that you won’t hit this problem. If you have already hit this, the same hot patch can be applied to get you back up and running. In order for the patch to work, the ESG would have to be re-deployed leading to a brief outage. Obviously getting in front of this issue is a better plan than being reactive.

VMware will be updating the 6.4.4 release notes to reflect this.

NSX-T PCPU Requirements for Edges

New CPU requirements for NSX-T may leave older lab hardware out in the cold.

If you are running old hardware in your lab, you may have come across an unexpected failure while deploying your first NSX-T edge VM.

nsxt-aes-edge-1

The exact error message will be something similar to:

“[Fabric] Edge <uuid> is not ready for configuration error occurred, error detail is NSX Edge configuration has failed. The host does not support required cpu features: [‘aes’].”

The edge will be successfully deployed, but will remain ‘unconfigured’ and will not allow you to add it as a transport node.

The ‘aes’ feature being referred to is Intel’s AES-NI acceleration for cryptography. You can find out more about AES-NI here. In NSX-V, AES-NI was optionally supported for offloading cryptography for VPN related features. It seems that this has now become a hard requirement for NSX-T.

Unfortunately, like vSphere 6.7, NSX-T has minimum CPU requirements that can’t be worked around. If you have a browse through the NSX-T system requirements, you’ll find a note about CPU compatibility in the “NSX Edge VM and Bare-Metal NSX Edge CPU Requirements” section. Listed there is reference to:

  • Xeon 56xx (Westmere-EP)
  • Xeon E7-xxxx (Westmere-EX and later CPU generation)
  • Xeon E5-xxxx (Sandy Bridge and later CPU generation)

This means that anything released prior to 2011 is unlikely to work, with the exception of a few Westermere EP based Xeons, which seem to have spotty success. On the AMD front, it appears that even CPUs with AES instructions will fail similarly due to a CPU compatibility check that is done during edge deployment.

My management host uses Xeon E5-2670s, which work fine, but my compute cluster uses very old Xeon X3440s that came out before AES-NI was introduced. Now that I can’t run vSphere 6.7 or an NSX-T edge on these hosts, I think it may finally be time to upgrade.

Unfortunately, it doesn’t appear that there is a workaround for this problem. If anyone does come across a way to avoid this, please let me know!

Moving a FreeNAS ZFS Pool from a Physical Server to a VM using RDMs

I’ve been using FreeNAS for several years now for both block and NFS storage in my home lab with great success. For more information on my most recent FreeNAS build, you can check out the series here.

Although I’ve been quite pleased with this setup, I had to repurpose the SSDs in the box and had yet another USB boot device failure. This meant I had to reinstall FreeNAS and left me with just a single ZFS pool with a pair of 2TB mechanical drives. It just didn’t feel right to have a full system up and running for just a pair of 2TB drives when I could run them just fine in my management ESXi host. Not to mention the fact that I’ve got 224GB of RAM available there to provide for a much larger L1 ARC cache.

In part 2 of my FreeNAS build series, I looked at using VT-d to passthrough a proper LSI SAS HBA to a VM. This is really the best possible virtual FreeNAS configuration as it bypasses all of the hypervisor’s storage stack and grants direct access to the HBA and drives. I considered using this setup, but I didn’t think it was worth the extra power consumption and cooling needed for the toasty PERC H200 card I’ve been using. Since I wanted to preserve all data on the drives, RDMs seemed to be the next logical solution. This isn’t as ‘pure’ as the VT-d solution, but it still gives the VM full block access to the drives in the system. At any rate, it was worth a try!

Disclaimer: If you are using ZFS and FreeNAS for production purposes or for any critical data that you care about, using a proper physical setup is important. I wouldn’t recommend virtualizing FreeNAS or any other ZFS based storage system for anything but testing or lab purposes.

What I hoped to do was the following:

  1. Take the 2x2TB Western Digital hard drives out of the Dell T110.
  2. Re-install the 2x2TB drives in my Intel S2600 management host on the integrated SATA controller.
  3. Create a new FreeNAS virtual machine.
  4. Add the two drives to the VM as virtual mode RDMs.
  5. Import the existing ZFS volume that is striped across these two drives in FreeNAS.
  6. Re-create the iSCSI target and NFS shares and have access to all existing data in the pool! (assuming all goes well).

Creating a new FreeNAS VM

Once I got the two drives installed in my Intel S2600 management host, I created a new VM and got the FreeNAS OS installed. Below is the virtual hardware configuration I used:

Guest OS type: Other, FreeBSD 64-bit
CPUs: 2x vCPUs
Memory: 16GB (a minimum of 8GB is required)
Hard Disk: 16GB (for the FreeNAS OS boot device, a minimum of 8GB is required)
New SCSI controller: LSI Logic SAS
Network adapter type: VMXNET3
CD/DVD Drive: Mount the FreeNAS 11.2 ISO from a datastore

You’ll notice that some of the options I selected are not defaults for FreeBSD based VMs. This includes the LSI SAS adapter, and the VMware VMXNET3 NIC. LSI Parallel is the default for FreeBSD VMs, but the SAS adapter works well with all recent BSD builds. The same holds true for the VMXNET3 adapter, which has many benefits over the emulated E1000 adapter type.

Continue reading “Moving a FreeNAS ZFS Pool from a Physical Server to a VM using RDMs”

Deploying NSX-T Controllers Manually

Deploying an NSX-T control cluster manually for maximum control and flexibility.

One of the great things about NSX-T is its complete independence from vCenter Server. You can still link to vCenter Server if you’d like to automate certain tasks, but unlike NSX-V, you can accomplish many deployment tasks manually. One of the firsts things you’ll be doing in a new NSX-T setup is to deploy your control cluster.

Although automated deployment through vCenter and the UI is convenient, there are some additional benefits to manual controller deployment. Firstly, you can select a non-production ‘small’ sized form factor that isn’t selectable in the UI saving you a couple of vCPUs and about 8GB of RAM per appliance. Secondly, deploying manually also allows you to thin-provision your controller VMDKs off the bat. In a home lab, these are some desirable benefits. And of course, there is always the satisfaction you get from running through the process manually and better understanding what happens behind the scenes.

NSXT-controllerdeploy-2

As seen above, the automated controller deployment wizard does not allow the selection of a ‘Small’ form factor.

Deploying Controllers

To begin, you’ll need to download the NSX-T controller OVA. You’ll find it listed along with the other NSX-T deliverables on the download page.

NSXT-controllerdeploy-1

There are a few different ways that you can deploy the OVA including with ovftool. I’m just going to use the vSphere Client for this example. As you can see below, we can now select an unsupported ‘Small’ form-factor deployment:

NSXT-controllerdeploy-3

In addition to this, you’ll get the usual template customization options along with a few new ones you may not have seen listed under ‘Internal Properties’:

NSXT-controllerdeploy-4

As you probably have guessed these internal properties can be used to save some of the work needed to get it connected to the management plane and to the control cluster. I’m going to skip this entire section and run through the process manually from the CLI post-deployment.

Continue reading “Deploying NSX-T Controllers Manually”

Changing ESG/DLR Tenant After Deployment

Using NSX REST API calls to modify ESG/DLR configuration that isn’t exposed in the UI.

If you are reading this post, you’ve probably already come to the realization that the ‘Tenant’ field for ESGs can’t be changed in the UI. Once the appliance is deployed, this string value appears set in stone.

changetenant-1
Adding the Tenant and Description are easy during deployment, but can’t be changed in the UI after deployment.

Although it can’t be modified in the UI without creating a new appliance from scratch, it’s pretty easy to modify this field via REST API calls. After having come across a question on the VMware communities forum regarding this, I thought I’d write a quick post on the process.

Step 1: Retrieve the ESG/DLR Configuration

First, you’ll need to do a GET call to retrieve the current ESG/DLR configuration in XML format. I won’t cover the basics of REST API calls in this post as the topic is well covered elsewhere. If you’ve never done REST API calls before, I’d recommend doing some reading on the subject before proceeding.

I’ll be using the popular Postman utility for this. First, we’ll need to find the moref identifier of the ESG/DLR in question.

changetenant-2
We’re interested in mercury-esg1, which is edge-4 in my lab environment.

You can easily find this from the ‘Edges’ view in the UI. In my case, I want to modify the edge called mercury-esg1, which is edge-4. Notice that someone put the string ‘test’ in as the tenant, which we want to change to ‘mercury’.
From Postman, we’ll run the following API call to retrieve edge-4’s configuration:

GET https://<nsxmgrip>/api/4.0/edges/edge-4

I got a 200 OK response, with all the config in XML format returned.

changetenant-3
All of the ESG’s configuration was returned in XML format. This is everything needed to recreate or modify the appliance.

Step 2: Make the Necessary Changes

Next, I’ll copy and paste all the returned XML data into a text editor. The XML section for the tenant string is right near the top:

<edge>
    <id>edge-4</id>
    <version>32</version>
    <status>deployed</status>
    <datacenterMoid>datacenter-2</datacenterMoid>
    <datacenterName>Toronto</datacenterName>
    <tenant>test</tenant>
...

I will simply change <tenant>test</tenant> to <tenant>mercury</tenant>.

Step 3: Apply the Modified Configuration

The final step is to take your modified XML configuration data and apply it back to the ESG/DLR in question. This is as simple as changing your REST API call from GET to PUT and pasting the modified configuration into the ‘Body’ of the call.

changetenant-4
Be sure to double check your configuration before sending the PUT call!

If your call was successful, you should get a 204 No Content response.

changetenant-5

And there you have it – the tenant field has been updated. Unfortunately, I haven’t had any success updating the description field via API. The <description> tag appears to be ignored in this PUT call for some reason. If anyone has any success with this, please let me know.

PowerNSX Alternative

If you prefer using PowerNSX to API calls, the Set-NsxEdge cmdlet can also work. The cmdlet uses the same API calls behind the scene, but can be quicker to execute:

PS C:\Users\mike.VSWITCHZERO> $edge = get-nsxedge mercury-esg1
PS C:\Users\mike.VSWITCHZERO> $edge.tenant = "hello"
PS C:\Users\mike.VSWITCHZERO> set-nsxedge $edge

Edge Services Gateway update will modify existing Edge configuration.
Proceed with Update of Edge Services Gateway mercury-esg1?
[Y] Yes [N] No [?] Help (default is "N"): y


id : edge-4
version : 37
status : deployed
datacenterMoid : datacenter-2
datacenterName : Toronto
tenant : hello
name : mercury-esg1
fqdn : mercury-esg1.mercury.local
enableAesni : true
enableFips : false
vseLogLevel : info
vnics : vnics
appliances : appliances
cliSettings : cliSettings
features : features
autoConfiguration : autoConfiguration
type : gatewayServices
isUniversal : false
hypervisorAssist : false
tunnels :
edgeSummary : edgeSummary

Understanding NSX IP Discovery

An in-depth look at the NSX DFW’s IP discovery methods including Tools and ARP/DHCP snooping.

One of the best features of the DFW is the flexibility it provides in using objects in rules instead of IP addresses or groups of IP addresses. For example, for a source/destination you could use a VM in the inventory, a cluster or a security group containing all sorts of dynamic criteria. Underneath all of this, however, NSX needs to be able to inspect segment and packet headers to enforce the rules. These headers are only going to contain identifying information like IP addresses and TCP ports so it must keep track of which object is associated with which IP address or addresses. And because of the ‘distributed’ nature of the DFW, each of these translations must ultimately reach the ESXi hosts for enforcement.

There are three ways in which NSX can associate IPs with VMs – VMware Tools reporting, ARP snooping and DHCP snooping. The latter two are disabled by default.

ipdiscovery-1

In recent builds of NSX, you can see the detection types enabled in the host preparation section. As can be seen above, DHCP and ARP snooping are disabled by default leaving only VMware Tools address reporting.

VMware Tools Reporting

As you have probably noticed, VMs with VMware Tools installed conveniently report their configured IP addresses in the vSphere Client.

tshoot12a-4

Virtual machine linux-a2 is reporting 172.16.15.10 as well as an IPv6 address on the summary tab in the vSphere Client. This information comes from VMware Tools and will be recorded in the NSX Manager database. Whenever we use a rule that references the VM linux-a2, NSX will look up this IP address for rule enforcement. These rules could contain a parent object, like the cluster compute-a, or a security group, a logical switch – anything that linux-a2 belongs to.

Continue reading “Understanding NSX IP Discovery”

Understanding NSX DFW Generation Numbers

A useful tool for troubleshooting DFW publication failures.

If you’ve ever been on a support call for DFW publication or rule troubleshooting, you may have heard reference to a ‘firewall generation number’ at one time or another. Whenever a change is made to the firewall rules, the NSX management plane (NSX Manager) will push these changes to all ESXi hosts, where the rules will be enforced. Because of the distributed nature of this firewalling system, it’s very important that all ESXi hosts have the latest version of the ruleset.

The NSX UI does a good job of reporting on host publication failures, but its not always clear exactly what version of the rules a problematic host is enforcing.

This is where firewall generation numbers can come in handy. The ‘generation number’ represents the point in time a publish operation occurs. Although it may look like a seemingly random thirteen-digit number, it’s actually a Unix epoch timestamp (in milliseconds) that can be converted to an actual date/time. For example, an epoch timestamp of 1548677100000 equates to Monday, January 28th, 2019 at 12:05:00 UTC. There are several online tools available to help you convert these values, including this one.

An Example

Let’s have a look at the current generation number reported on a pair of ESXi hosts. One host, esx-a2 has been reporting publication failures.

To determine the generation number, you could in theory take the last reported publication date from the UI and convert it into a Unix epoch number. In my experience, there isn’t enough accuracy and you may not get an exact match. The better way to do it is to look for a “Sending rules to Cluster” log messages in the NSX manager vsm.log file. This can be done via SSH session, or more easily using a filter in vRealize Log Insight.

[root@nsxmanager /home/secureall/secureall/logs]# cat vsm.log |grep "Sending rules to Cluster"
<snip>
2018-11-29 01:47:55.317 GMT+00:00 INFO TaskFrameworkExecutor-9 ConfigurationPublisher:110 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] Sending rules to Cluster domain-c41, Generation Number: null Object Generation Number 1543456074899.
2018-11-29 01:47:57.422 GMT+00:00 INFO TaskFrameworkExecutor-16 ConfigurationPublisher:110 - - [nsxv@6876 comp="nsx-manager" subcomp="manager"] Sending rules to Cluster domain-c41, Generation Number: 1543337228980 Object Generation Number 1543456074899.

Continue reading “Understanding NSX DFW Generation Numbers”

Testing NSX VTEP Communication

An in-depth look at the VXLAN network stack and VTEP to VTEP communication testing.

Virtual Extensible LAN – or VXLAN – is the key overlay technology that makes a lot of what NSX does possible. It abstracts the underlying L2/L3 network and allows logical switches to span vast networks and datacenters. To achieve this, each ESXi hypervisor has one or more VTEP vmkernel ports bound the the host’s VXLAN network stack instance.

Your VTEPs are created during VXLAN preparation – normally after preparing your hosts with the NSX VIBs. Doing this in the UI is a straight forward process, but there are some important pre-requisites that must be fulfilled before VXLAN networking will work. Most important of these are:

  1. Your physical networking must be configured for an end-to-end MTU of 1600 bytes. In theory it’s 1550, but VMware usually recommends a minimum of 1600.
  2. You must ensure L2 and L3 connectivity between all VTEPs.
  3. You need to prepare for IP address assignment by either configuring DHCP scopes or IP pools.
  4. If your replication mode is hybrid, you’ll need to ensure IGMP snooping is configured on each VLAN used by VTEPs.
  5. Using full Multicast mode? You’ll need IGMP snooping in addition to PIM multicast routing.

This can sometimes be easier said than done – especially if you have hosts in multiple locations with numerous hops to traverse.

Testing VXLAN VTEP communication is a key troubleshooting skill that every NSX engineer should have in their toolbox. Without healthy VTEP communication and a properly configured underlay network, all bets are off.

I know this is a pretty well covered topic, but I wanted to dive into this a little bit deeper and provide more background around why we test the way we do, and how to draw conclusions from the results.

The VXLAN Network Stack

Multiple network stacks were first introduced in vSphere 6.0 for use with vMotion and other services. There are several benefits to isolating services based on network stacks, but the most practical is a completely independent routing table. This means you can have a different default gateway for vMotion – or in this case VXLAN traffic – than you would for all other management services.

Each vmkernel port that is created on an ESXi host must belong to one and only one network stack. When your cluster is VXLAN prepared, the created kernel ports are automatically assigned to the correct ‘vxlan’ network stack.

Using the esxcfg-vmknic -l command will list all kernel ports including their assigned network stack:

[root@esx-a1:~] esxcfg-vmknic -l
Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS NetStack
vmk0       7                                       IPv4      172.16.1.21                             255.255.255.0   172.16.1.255    00:25:90:0b:1e:12 1500    65535   defaultTcpipStack
vmk1       13                                      IPv4      172.16.98.21                            255.255.255.0   172.16.98.255   00:50:56:65:59:a8 9000    65535   defaultTcpipStack
vmk2       22                                      IPv4      172.16.11.21                            255.255.255.0   172.16.11.255   00:50:56:63:d9:72 1500    65535   defaultTcpipStack
vmk4       vmservice-vmknic-pg                     IPv4      169.254.1.1                             255.255.255.0   169.254.1.255   00:50:56:61:7a:23 1500    65535   defaultTcpipStack
vmk3       52                                      IPv4      172.16.76.22                            255.255.255.0   172.16.76.255   00:50:56:6b:e4:94 1600    65535   vxlan

Notice that all kernel ports belong to the ‘defaultTcpipStack’ except for vmk3, which lists vxlan. You can view the netstacks currently enabled on your host using the esxcli network ip netstack list command:

[root@esx-a1:~] esxcli network ip netstack list
defaultTcpipStack
   Key: defaultTcpipStack
   Name: defaultTcpipStack
   State: 4660

vxlan
   Key: vxlan
   Name: vxlan
   State: 4660

Continue reading “Testing NSX VTEP Communication”

NSX Troubleshooting Scenario 14 – Solution

Welcome to the fourteenth installment of a new series of NSX troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

In the first half, our fictional customer was trying to prevent a specific summary route from being advertised to a DLR appliance using a BGP filter. Every time they added the filter, all connectivity to VMs downstream from that DLR was lost.

tshoot14a-4

The filter appears correct. The summary route is a /21 network that comprises all eight /24s that were assigned to logical switches. You can also see that GE and LE (greater than/less than) values were not specified, so the specific summary route should be matched exactly.

tshoot14a-5

After publishing the changes, we saw that all BGP routes were removed from the DLR. It’s almost as if the filter stopped ALL route prefixes from making it to the DLR rather than just the one specified. Wait, did it?

Let’s refer to the NSX documentation on BGP filters. Under the Configure BGP section, the relevant steps are the following:

Continue reading “NSX Troubleshooting Scenario 14 – Solution”

NSX Troubleshooting Scenario 14

Welcome to the fourteenth installment of my NSX troubleshooting series. What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a brief problem statement:

“I’m trying to prevent some specific BGP routes from being advertised to my DLR, but the route filters aren’t working properly. Every time I try to do this, I get an outage to everything behind the DLR!”

Let’s have a quick look at what this fictional customer is trying to do with BGP.

tshoot14a-0

The design is simple – a single ESG peered with a single DLR appliance. The /21 address space assigned to this environment has been split out into eight /24 networks.

tshoot14a-3

The mercury-esg1 appliance has two neighbors configured – the physical router (172.18.0.1) and the southbound DLR protocol address (172.18.8.4). Both the ESG and DLR are in the same AS (iBGP).

tshoot14a-1

As you can see, on mercury-esg1 a summary static route has been created with the DLR forwarding address as the next hop. This /21 summarizes all eight /24 subnets that will be assigned to the logical switches in this environment. Because the customer wants more specific /24 BGP routes to be advertised by the DLR, this is what is referred to as a floating static route. Because it’s less specific, it’ll only take effect as a backup should BGP peering go down. This is a common design consideration and provides a bit of extra insurance should the DLR appliance go down unexpectedly.

Continue reading “NSX Troubleshooting Scenario 14”