Re-deploying NSX Controllers During Upgrades

I’ve had this question come up a lot lately and there seems to be some confusion around whether or not NSX controllers need to be redeployed after upgrading them. The short answer to this question is really “it depends”. There are actually three different scenarios where you may want or need to delete and re-deploy NSX controllers as part of the upgrade process. Today, I’ll walk through these situations and the proper process to delete and re-deploy your controller nodes.

The Normal Upgrade Process

Upgrading the NSX Control Cluster is a very straight-forward process. After clicking the upgrade link, an automated process begins to upgrade the controller code, and reboot each cluster member sequentially.

controller-redeploy-3

Once the ‘Upgrade Available’ link is clicked, you’ll see each of the three controllers download the upgrade bundle, upgrade and then reboot before NSX moves on to the next one.

controller-redeploy-7

Once NSX goes through its paces, it’s usually a good idea to ensure that the control-cluster join status is ‘Join complete’ and that all three controllers agree on the Cluster UUID.

nsx-controller # show control-cluster status
Type Status Since
--------------------------------------------------------------------------------
Join status: Join complete 07/24 13:38:32
Majority status: Connected to cluster majority 07/24 13:38:19
Restart status: This controller can be safely restarted 07/24 13:38:48
Cluster ID: f2849ee2-7fb6-4aca-abf4-2ca176337956
Node UUID: f2849ee2-7fb6-4aca-abf4-2ca176337956

Role Configured status Active status
--------------------------------------------------------------------------------
api_provider enabled activated
persistence_server enabled activated
switch_manager enabled activated
logical_manager enabled activated
directory_server enabled activated

Because the underling structure of the VM itself doesn’t change, this sort of in-place code upgrade and reboot is sufficient and has minimal impact.

Scenario 1 – E1000 vNIC Replacement

The first scenario where you may want to redeploy the controllers involves a virtual hardware change that was introduced in NSX 6.1.5. NSX controllers deployed in 6.1.5 use the VMXNET3 vNIC adapter, whereas older versions had legacy Intel E1000 emulated vNICs. This change wasn’t well publicized and surprisingly it isn’t even found in the NSX 6.1.5 release notes.

I’ve seen quite a few customers go through upgrade cycles from 6.0 or 6.1 all the way to more recent 6.2.x or 6.3.x releases while retaining E1000 vNICs on their controllers. Although the E1000 vNIC adapter is generally pretty stable, there is at least one documented issue where the adapter driver suffers a hang and the controller is no longer able to transmit or receive. This problem is described in VMware KB 2150747.

That said, I personally would not wait for a problem to occur and would recommend checking to ensure your controllers are using VMXNET3, and if not, go through the redeployment procedure I’ll outline later in this post. Aside from preventing the E1000 hang problem, you’ll also benefit from the other improvements VMXNET3 has to offer like better offloading and lower CPU utilization.

Unfortunately, finding out if your controllers have E1000 or VMXNET3 adapters can be a tad tricky. You’ll find that your controllers are locked down and can’t be edited in the vSphere Web Client or the legacy vSphere Client.

controller-redeploy-1

As seen above, the ‘Edit Settings’ option is greyed out.

controller-redeploy-2

The summary page also doesn’t tell us much, so the easiest way to get the adapter type is to check from the ESXi command line.

First, let’s SSH into a host where one of the controllers live and then find the full path to the VMX file:

[root@esx0:~] cd /vmfs/volumes
[root@esx0:/vmfs/volumes] find ./ -name NSX_Controller*.vmx
./58f77a6f-30961726-ac7e-002655e1b06c/NSX_Controller_078fcf78-9a0c-491d-95a0-02e8b5175935/NSX_Controller_078fcf78-9a0c-491d-95a0-02e8b5175935.vmx

Next, I will look for the relevant vNIC adapter settings in the VMX file using the full path obtained in the previous command output:

[root@esx0:/vmfs/volumes] cat ./58f77a6f-30961726-ac7e-002655e1b06c/NSX_Controller_c97459f1-3845-436f-8e03-60ad3cbed9e4/NSX_Controller_c97459f1-3845-436f-8e03-60ad3cbed9e4.vmx |grep -i ethernet0.virtualDev
ethernet0.virtualDev = "vmxnet3"

The key setting in the VMX that we are interested in is ethernet0.virtualDev. As seen above, the type is vmxnet3 on my controllers as they were created from a freshly deployed 6.2.5 environment. If you see e1000 here, your controllers were deployed from a 6.1.4 or older setup and have never been re-deployed.

Scenario 2 – Updating the Disk Partitioning Layout

The second scenario would be if your controllers were initially deployed in a version of NSX prior to 6.2.3. Since 6.2.3 was pulled shortly after release, 6.2.4 would be more relevant starting point.

A statement you’ll find in the NSX 6.2.4 release notes summarizes this change well:

“…New installations of NSX 6.2.3 or later will deploy NSX Controller appliances with updated disk partitions to provide extra cluster resiliency. In previous releases, log overflow on the controller disk might impact controller stability. In addition to adding log management enhancements to prevent overflows, the NSX Controller appliance has separate disk partitions for data and logs to safeguard against these events. If you upgrade to NSX 6.2.3 or later, the NSX Controller appliances will retain their original disk layout.”

Again, it’s possible you may never run into a problem due to the old partitioning layout, but it’s always wise to take advantage of ‘optional’ resiliency enhancements like this. This is especially true for such a critical component of the NSX control-plane.

Although there isn’t a supported way to enter the root shell on a controller appliance, the ‘show status’ command will provide you with the partitioning layout. Here is the layout on a newer 6.2.5 controller with the newer partitioning:

nsx-controller # show status
Version: 4.0.6 (Build 48886)

Current Time: Fri, 25 Aug 2017 15:01:17 +0000
Uptime: 32 days 1 hour 23 minutes 16 seconds

Load Average: 0.10, 0.10, 0.13
Memory Usage: 3926484 kB (Total), 267752 kB (Free)
Disk Usage:
Filesystem                      1K-blocks    Used Available Use% Mounted on
/dev/sda1                         6596784 1181520   5057120  19% /
udev                              1953420       4   1953416   1% /dev
/dev/mapper/nvp-var               6593712 2349576   3886148  38% /var
/dev/mapper/nvp-var+cloudnet+data 3776568  147908   3417104   5% /var/cloudnet/data

Essentially, there are now three separate partitions for data instead of just one. Files for everything were just lumped together along with the Linux OS in a single partition previously. If some runaway log files filled the partition, key services would be impacted. By separating everything out, the key controller services like the zookeeper clustering service will still be able to write to disk.

I don’t have access to a pre-6.2.3 setup at the moment, but you can tell if your controller still uses the old partitioning layout by the absence of two partitions in the ‘show status output’. Both /dev/mapper/nvp-var and /dev/mapper/nvp-var+cloudnet+data only exist on controllers using the new partitioning layout.

Because disk partitioning is a pretty low-level, there was really no way to incorporate this into the automated upgrade process. To get the new layout, you’ll need to delete and re-deploy the controller appliances.

Scenario 3 – Upgrading to NSX 6.3.3

NSX 6.3.3 introduces a major change to the NSX controllers, replacing the underlying Linux OS with VMware’s new distribution dubbed Photon OS. The virtual hardware also changes slightly in 6.3.3 as the Photon OS based controllers require larger VMDK disks. Because this changes the entire foundation of the VM and is mandatory – unlike the vNIC and partitioning changes mentioned earlier – there is no way to perform in-place code upgrades. Each of the controllers needs to be deleted and re-deployed.

Thankfully, because of the mandatory nature of this change, VMware modified the upgrade process in 6.3.3 to automatically delete and re-deploy controllers for you.

From the NSX 6.3.3 release notes:

“In NSX 6.3.3, the underlying operating system of the NSX Controller changes. This means that when you upgrade to NSX 6.3.3, instead of an in-place software upgrade, the existing controllers are deleted one at a time, and new Photon OS based controllers are deployed using the same IP addresses.”

That said, no manual intervention is required when upgrading to 6.3.3. Controllers will be deleted and re-deployed automatically as part of the upgrade process. For more information, see the NSX public docs on the subject.

Some Warnings and Cautions

Before I go through the process of destroying and re-creating controller nodes, I really want to preface by saying that this process is potentially risky and should only be done during a maintenance window. It’s also very important that the process be done correctly to ensure you don’t run into any major problems. Below are some common pitfalls and other recommendations:

  1. Never just delete or remove the controller appliances from the vCenter inventory. NSX keeps track of the controllers in its database and doesn’t react well to having appliances yanked from under it. They must be deleted properly.
  2. Never deploy more than three controllers thinking you can just do a ‘cut over’. I.e. Don’t deploy six controllers and then delete the three old ones. A one to one replacement must be done and we never want fewer than two functional controllers in the cluster, and never more than three.
  3. If a controller fails to delete using the normal supported method, there is a reason. Don’t force the deletion without speaking to VMware technical support first. A common reason I’ve seen for this is a mismatched moref identifier for the appliance VM. If the NSX database thinks a controller is vm-73, but the actual VM is vm-75, the delete will fail. Removing controllers from the vCenter Inventory and re-adding them will cause this type of mismatch.
  4. It’s very important to validate that the control cluster health is good before proceeding to the next controller for deletion/re-deployment. Do not skip these checks and be patient with this process. Unless you have two fully functional controllers up and running in the cluster, you won’t have full control-plane functionality and a risk data-plane outage.
  5. If something goes wrong, you’ll still be okay if you have two controllers working in the cluster. Don’t just proceed in the interest of ‘moving forward’ because there is a good chance the other two will behave similarly. Contact VMware support if there is every any doubt.

A Quick Note on Force Deletion

While trying to delete a controller, you’ll be greeted by a ‘Forcefully Delete’ option. When selected, this option nukes everything related to the controller from the NSX database and NSX doesn’t care whether the VM appliance is successfully removed or not. This option should never used unless advised by VMware support for repairing specific cluster problems. As mentioned in the previous section, if a regular delete fails, there is always a specific reason. Using ‘Forcefully Delete’ to work around these problems can leave remnants behind and potentially cause problems with the cluster.

The warning presented by the NSX UI when you try to Force Delete a cluster node:

“Forcefully deleting a controller may result into NSX Controller cluster going down and the rest of the controllers may get disconnected, thereby resulting in problems like no majority and data inconsistency. Many operations like adding logical components will not be possible. If you still choose to delete the controller, it is recommended to also delete the rest of the controllers and recreate them.”

It’s also worth mentioning that the only time you’d need to forcefully delete a controller in a normal workflow is when deleting the last of three controllers. NSX will only delete the very last controller using the force option. Because we’re only removing one at a time, this should not apply here.

Controller Re-deployment Process

Again, you won’t need to use this process if you are upgrading to NSX 6.3.3 or later because the deletion and re-creation of appliances is handled in an automated manner. If you’d like to take advantage of a VMXNET3 adapter and/or the new partitioning layout in newer versions of NSX, please read on.

The overall goal here is to replace the NSX control cluster members one at a time, keeping in mind that as long as two controller nodes are online and healthy, the control-plane continues to function. In theory, you shouldn’t suffer any kind of control-plane or data-plane outage using this process.

**Edit 11/15/2017: As you may be aware, there have been a few new bugs discovered, including one that impacts the deployment/re-deployment of NSX controllers in 6.3.3 and 6.3.4. Please be sure to have a look at my post on the subject as well as the VMware KB before proceeding. If you are running 6.3.3, do not delete your controllers until you’ve implemented the workaround or patched. If you still have the old 6.3.3 upgrade bundle, you may not be able to upgrade.

Step 1 – Data collection and preparation

Before proceeding, we’ll need to collect some information about our current controller deployment. In order to deploy a controller, the following information is required:

  1. The vSphere Cluster that your controllers will live in.
  2. The datastore you want to use for your controllers.
  3. The network portgroup (standard or distributed) that your controllers are in.
  4. If you used a specific naming convention for your controllers, be sure to note it down.
  5. And finally, the IP address pool that’s used for the controllers. Note that when deleting controllers using this method, an IP will be freed up from the pool so even with just three IPs in a pool, this process should work.

Be sure to get the above information from the vSphere Web Client before proceeding so that you don’t have to go looking for it during the process.

Step 2 – Validate the control-cluster health

Before you begin the process, it’s very important to ensure you have a functional control cluster with all nodes connected to the cluster majority. As tempting as it may be, do not skip this check.

controller-redeploy-4

Checking in the UI is a good first place to look for obvious signs of trouble, but I would not rely on this method alone. If everything is green in the UI, log into each of the three controllers via SSH and run the show control-cluster status command:

nsx-controller # show control-cluster status
Type Status Since
--------------------------------------------------------------------------------
Join status: Join complete 08/25 15:26:19
Majority status: Connected to cluster majority 08/25 15:30:45
Restart status: This controller can be safely restarted 08/25 15:31:09
Cluster ID: f2849ee2-7fb6-4aca-abf4-2ca176337956
Node UUID: 309611b3-2441-4a1a-a12b-a5537e999c23

Role Configured status Active status
--------------------------------------------------------------------------------
api_provider enabled activated
persistence_server enabled activated
switch_manager enabled activated
logical_manager enabled activated
directory_server enabled activated

There are several key things you’ll want to validate before proceeding.

  1. The Join status must read ‘Join complete’. No other status is acceptable.
  2. The Majority status must read ‘Connected to cluster majority
  3. The Restart status must read ‘This controller can be safely restarted’.
  4. Each controller node must have the same ‘Cluster ID’.

If all three controllers look good, you can proceed.

Step 3 – Delete the first controller

Once we’ve confirmed the control cluster health is good, we can delete the first controller from the NSX UI. It doesn’t matter which one you do first, but in my example, I’ll start with controller-3 and work my way backwards.

To delete, simply select the ‘Management’ tab of the Installation section in the NSX UI and click the little red ‘X’ icon above.

controller-redeploy-9

As mentioned earlier, we want to use the normal ‘Delete’ option. Do NOT use ‘Forcefully Delete’.

controller-redeploy-10

NSX will execute several tasks related to the controller VM. First, it will power off the VM appliance, it will then delete it and remove all references of the controller in the database. It’s not unusual for this process to take 10 minutes or longer.

Once the controller has disappeared from the NSX ‘Management’ tab, it’s very important to check that the appliance itself was actually deleted from the vCenter inventory.

controller-redeploy-11

Check for both the successful power off and deletion tasks in the recent tasks pane and also confirm the VM is no longer present in the inventory.

Finally, we’ll want to check the cluster health from the other two surviving nodes using the same show control-cluster status command we used earlier. Ensure that both controllers look healthy.

I’d also recommend ensuring that the cluster is now only comprised of two nodes from the NSX controller node’s perspective. Just because NSX manager says there are two doesn’t necessarily guarantee the other controllers do. You can check this using the show control-cluster startup-nodes command:

nsx-controller # show control-cluster startup-nodes
172.16.10.43, 172.16.10.44

As seen above, my control cluster confirms only two members.

Step 4 – Replace the Deleted Controller.

Once the first controller has been deleted successfully and we’ve confirmed the health of the control cluster, we can go ahead and deploy a new one.

controller-redeploy-12

The process should be very straight forward and is the same as what was done during the initial deployment of NSX. Keep in mind that the name you specify is simply a label and that the moref identifier of the new controller will change.

controller-redeploy-13

NSX will report the new controller in the ‘Deploying’ status for some time, but you can monitor the tasks and events from the vSphere Web Client:

controller-redeploy-14

You can also watch the console of the new controller to confirm that it’s finished joining the cluster and ready for logins. It will usually be sitting a ‘Fetching initial configuration data’ for some time before it’s ready:

controller-redeploy-15

Once it’s powered up and ready, you can log-in via CLI and ensure that the ‘show control-cluster status’ output looks healthy as described earlier and that there are three startup-nodes again:

nsx-controller # show control-cluster status
Type Status Since
--------------------------------------------------------------------------------
Join status: Join complete 08/25 17:47:16
Majority status: Connected to cluster majority 08/25 17:47:13
Restart status: This controller can be safely restarted 08/25 17:47:14
Cluster ID: f2849ee2-7fb6-4aca-abf4-2ca176337956
Node UUID: f9a2d207-bf57-4f23-b075-1eefc58bfc8d

Role Configured status Active status
--------------------------------------------------------------------------------
api_provider enabled activated
persistence_server enabled activated
switch_manager enabled activated
logical_manager enabled activated
directory_server enabled activated

nsx-controller # show control-cluster startup-nodes
172.16.10.43, 172.16.10.44, 172.16.10.45

As seen above, my new controller is online and healthy. Most importantly it agrees with the other two controllers on the ID of the cluster and number of startup nodes.

You could also do a ‘show status’ on the controller to confirm that it has the new partitioning layout at this time as discussed earlier.

Step 5 – Rinse and Repeat.

It’s extremely important to verify the cluster health before proceeding with the deletion of the next cluster node. Aside from the checks in the previous section, this would also be a good time to do some basic connectivity tests. Make sure your distributed routers are functional and that your guests connected to logical switches are working normally.

If you delete the next controller while the cluster is in a bad state, there is a good chance you’ll be down to a single node and will be operating in a ‘read-only’ state. In this condition, any VTEP, ARP or MAC table changes in the environment – like those triggered by vMotions, etc – would fail to propagate. This is definitely not a situation you’d want to be in.

Once you are sure it’s safe to proceed, simply repeat steps 3 and 4 above for the remaining two controllers.

Conclusion

So there you have it. The process can be a bit of a nail-biting experience in a production environment, but if you take the appropriate precautions everything should work without a hitch. The reward for your patience will be a more resilient control cluster with virtual hardware configured as VMware intended.

Thanks for reading! If you have any questions, please feel free to post below.

Building a Retro Gaming Rig – Part 1

Welcome to a new hardware build series where I’ll be sharing my experiences building a retro Intel gaming system. In part 1, I’ll be going over some of the hardware I picked out for this build and doing a bit of a photo shoot. I apologize in advance for the copious amounts of trivial history and nostalgic rambling in this post, so please feel free to skip the first few sections if you’d like to get straight to the gear. For those who appreciate the trip down memory lane, please read on! 🙂

Introduction

Twenty seventeen has been a real year of nostalgia for me when it comes to PC hardware. It all started earlier this summer when my Brother in-law was cleaning out his basement and gave me a couple of classic PC systems. One was a 1993 era DEC 486 and the other, a 1995 era NEC Pentium 90 system. These systems brought back so many good memories and were a real joy to use and restore. Although the first PC we had in my house growing up was a much older clone from the late 80s – possibly an 8088 or 286, I was really too young to appreciate it and preferred my NES/SNES at the time. I remember teaching myself how to use DOS with it but it wasn’t until 1994 that I could convince my parents that I needed a new computer – you know, for gaming educational purposes.  After doing some research and shopping around, I got a shiny new 486 DX2/66. It may not have been state of the art at the time, but with 8MB of RAM, a CDROM, 430MB Hard drive and a real Soundblaster 16, it ran like a dream. I spent hours upon hours on that machine and it wasn’t long before I had added a 14.4 modem and really got into the BBS scene as well. About a year later, my parents surprised me with a top of the line NEC Pentium 100 system with 16MB of RAM and the 486 replaced the old monochrome 8088 in the basement. Coincidentally it was almost identical to the Pentium 90 given to me by my brother in-law.

The Goal

Although these two retro rigs were a lot of fun to restore and use, I wanted to build a machine that was a bit more flexible, could be customized and used for not only demanding DOS/Windows 9x games but also most of the classics as well. The main issue with these two older machines is that they are both proprietary AT based systems with many BIOS quirks, compatibility issues and other limitations. Even the Pentium 90 with 24MB of RAM just wasn’t fast enough for newer DOS games like Quake. In short, I wanted a custom retro build that was fast, reliable and compatible – something with some nostalgic value and some pizzazz but also well suited for daily use.

Continue reading “Building a Retro Gaming Rig – Part 1”

Finding moref IDs for NSX API Calls

NSX was designed from the ground up to be very configurable via restful API calls. You can create, modify and remove objects and configuration using APIs. This also makes NSX very powerful with automation and other cloud management platforms and tools.

One of the most common questions I get from those getting started with API calls is on the unique identifiers – often referred to as ‘morefs’ or ‘managed object reference’ identifiers – used in these calls. The vCenter Server ‘Managed Object Browser’ is often used as the primary method to obtain these moref identifiers, but it can be a bit daunting to navigate for those unfamiliar with it. It also requires full administrative privileges to vCenter so may not always be an option for all users either. Another way to find them is to use PowerNSX – but again, you may not be comfortable with PowerCLI yet, or maybe it’s not deployed in the environment.

Not to worry – there are some surprisingly simple ways to obtain moref identifiers for all kinds of vCenter and NSX objects. In this post, I’ll show some of these tricks that I’ve picked up over time.

NSX Edges and DLRs

NSX edges and DLRs are the easiest objects to obtain morefs for. VMware very thoughtfully included a column in the vSphere Web Client NSX Edges view called ‘Id’.

nsx-moref1

Above, you can see that edge ‘esg-a1’ has a unique identifier called ‘edge-2’. This ID is indeed the moref.

Another way to get this information would be to use the NSX Manager Central CLI feature. From the manager CLI, the following command will get you a similar output:

nsxmanager.lab.local> show edge all
NOTE: CLI commands for Edge ServiceGateway(ESG) start with 'show edge'
 CLI commands for Distributed Logical Router(DLR) Control VM start with 'show edge'
 CLI commands for Distributed Logical Router(DLR) start with 'show logical-router'
 Edges with version >= 6.2 support Central CLI and are listed here
Legend:
Edge Size: Compact - C, Large - L, X-Large - X, Quad-Large - Q
Edge ID Name Size Version Status
edge-1 dlr-a1 C 6.2.5 GREEN
edge-2 esg-a1 C 6.2.5 GREEN
edge-3 esg-a2 C 6.2.5 GREEN

Virtual Machines and Appliances

In the previous section, we looked at the NSX Edge moref identifiers, but these appliances also exist as unique virtual machines from a vCenter Server perspective. Every virtual machine in the vCenter inventory – including controllers, DLRs, ESGs etc – can also be referred to as virtual machines with a vm-X moref identifier. For example, my DLR called dlr-a1 is edge-1 from an NSX perspective but actually exists as two DLR appliances in high availability mode. Having virtual machine moref identifiers allows us to uniquely identify each appliance.

To find out a virtual machine’s moref identifier, an easy method to use is to look at the URL in the vSphere Web Client. For example, I’ve gone to the ‘Hosts and Clusters’ view in my lab and selected the first of two dlr-a1 appliances:

nsx-moref2

At the end of the URL string in the address bar, we can see that the moref identifier is included. It’s somewhat stuffed in there in and not always noticeable, but knowing that the virtual machine moref always begins with ‘vm-‘ followed by a numerical value, we are able to pick it out. In the above example, the DLR appliance known as dlr-a1-0 has a virtual machine moref value of vm-675.

Looking at the second of the two appliances in the same way – the standby in the HA pair – I get a different virtual machine moref value of vm-677.

NSX Controllers

From an NSX perspective, NSX controllers also have a moref identifier prefixed by ‘controller-‘. Again, VMware thoughtfully included this information underneath the controller IP address in the ‘Controller Node’ column. This can be found in the Installation section of the NSX UI in the vSphere Web Client under the ‘Management’ tab.

nsx-moref3

One thing to be careful about is that the ‘Name’ column does not necessarily provide the moref identifier. VMware recently allowed the ability to name controllers with a friendly name in NSX 6.2. Just like ESGs and DLRs, NSX controllers also have virtual machine moref identifiers that are sometimes needed.

vSphere Clusters

As you have probably noticed, vSphere clusters are often the configuration delimiter for many things in NSX, including host preparation, firewall status etc. As such, many API calls will reference cluster objects. The cluster moref is not prefixed by ‘cluster-‘ as you might expect, but rather ‘domain-c‘. To determine the cluster moref, we can use the same trick that we used for finding the virtual machine IDs. As you’ll discover the URL address in the vSphere Web Client can tell us the moref for numerous objects.

nsx-moref4

In my lab above, cluster compute-a is domain-c121.

You can also obtain a list of NSX prepared clusters and their moref IDs using the NSX Central CLI. From an NSX manager CLI prompt:

nsxmanager.lab.local> show cluster all
No. Cluster Name Cluster Id Datacenter Name Firewall Status
1 compute-r domain-c641 lab Enabled
2 compute-vcp domain-c705 lab Not Ready
3 compute-a domain-c121 lab Enabled
4 management domain-c205 lab Not Ready

ESXi Hosts

There are a couple of different ways that you can obtain the moref for an ESXi host. As you’d expect, the moref is always prefixed by ‘host-‘. Just like for clusters and virtual machines, you can select the object in the vSphere Web Client and get the host moref from the address bar. Alternatively, there is another easy way to get the host moref from any NSX prepared host from the CLI.

When ESXi hosts are prepared, NSX pushes numerous RabbitMQ configuration variables that instruct it how to to communicate with NSX manager. One of these RabbitMQ parameters called /UserVars/RmqHostId actually includes the host moref.

[root@esx-a1:~] esxcfg-advcfg -g /UserVars/RmqHostId
Value of RmqHostId is host-223

As you can see, host esx-a1 has a moref of host-223.

Another option is to use the NSX Central CLI for viewing the contents of a cluster. Once you have the cluster ID – domain-c121 in my example – the following command can be used to view the hosts in the cluster and get the moref identifiers:

nsxmanager.lab.local> show cluster domain-c121
Datacenter: lab
Cluster: compute-a
No. Host Name Host Id Installation Status
1 esx-a1.lab.local host-223 Enabled
2 esx-a2.lab.local host-225 Enabled

Transport Zones

The moref IDs for transport zones aren’t required often, but if you ever need to find one, the NSX Central CLI can get you this information.

The below output will list out all logical switches, but will also correlate the transport zone name with a moref prefixed by ‘vdnscope-‘. Logical switch UUID values can also be identified using this command:

nsxmanager.lab.local> show logical-switch list all
NAME UUID VNI Trans Zone Name Trans Zone ID
Transit VXLAN1 210b616b-691a-470f-ad1e-1cc24b485d0d 5000 Primary TZ vdnscope-1
Blue Network 7a068fb5-9b17-4779-9b3d-0d81a439189f 5001 Primary TZ vdnscope-1
Green Network 22f01e70-18ae-4d3e-a683-be08d244e919 5002 Primary TZ vdnscope-1
Yellow Network 8510eedb-e6a8-41a0-bae0-79f3af8630be 5003 Primary TZ vdnscope-1
Red Network 4996f8f1-68c9-43de-9207-fbe038543133 5004 Primary TZ vdnscope-1
Purple Network bdd16f8e-805f-45df-a5c7-00a0ce319cc9 5005 Primary TZ vdnscope-1
Test Network A 72597655-ad88-4e32-9baf-c724d49c9a7c 5006 Primary TZ vdnscope-1

As you can see above, I have only one transport zone called ‘Primary TZ’ with a moref of vdnscope-1.

Datastores

Some API calls expect input including the moref for a specific datastore. An example would be trying to deploy a new ESG or controller – NSX needs to know where to store the VM. Once again, going to the datastores view in the vSphere Web Client allows us to select the specific datastore and the ‘datastore-‘ prefixed moref is visible in the address bar.

nsx-moref5

Above you can see my datastore called shared-ssd0 equates to moref datastore-621.

IP Pools

Some API calls involving the deployment of objects require the moref identifier of an IP address pool. Unfortunately, this moref can’t be found in the GUI, but with a couple of quick API calls can be obtained pretty easily.

First, we’ll use an API call to query all IP pools on the NSX manager. This will provide an output that will include the moref identifier of the pool in question:

GET https://NSX-Manager-IP-Address/api/2.0/services/ipam/pools/scope/scopeID

As you can see above, the ‘scope ID’ is also required to run this GET call. In every instance I’ve seen, using globalroot-0 as the scopeID works.

ippoolAPI-4

The various IP pools will be separated by <ipamAddressPool> XML tags. You’ll want to identify the correct pool based on the IP range listed or by the text in the <name> field. In my example, I want to find the moref for the Controller Pool, which is found in the section below:

<ipamAddressPool>
<objectId>ipaddresspool-1</objectId>
<objectTypeName>IpAddressPool</objectTypeName>
<vsmUuid>4226CDEE-1DDA-9FF9-9E2A-8FDD64FACD35</vsmUuid>
<nodeId>fa4ecdff-db23-4799-af56-ae26362be8c7</nodeId>
<revision>1</revision>
<type>
<typeName>IpAddressPool</typeName>
</type>
<name>Controller Pool</name>
<scope>
<id>globalroot-0</id>
<objectTypeName>GlobalRoot</objectTypeName>
<name>Global</name>
</scope>
...
<snip>

As you can see above, the IP pool is identified by the moref identifier ipaddresspool-1.

Conclusion

And there you have it. Much easier than navigating through the vCenter Managed Object Browser and doesn’t require full administrative privileges in vCenter. For the address bar trick, you’ll need to have a minimum of read-only privileges to the object you are trying to select, as well as read-only access to the NSX UI in order to view the Edge list and installation section. The NSX Central CLI commands do require that you log in to the NSX manager via CLI using an administrator account, but most individuals managing NSX would have this access.

If there are any other morefs or identifiers you are having difficulty locating, please leave a comment and I’d be happy to post some methods.

 

ECMP Path Determination in NSX

ECMP or ‘equal cost multi-pathing’ is a great routing feature that was introduced in NSX 6.1 several years ago. By utilizing multiple egress paths, ECMP allows for better use of network bandwidth for northbound destinations in an NSX environment. As many as eight separate ESG appliances can be used with ECMP – ideally on dedicated ESX hosts in an ‘edge cluster’.

Lab Setup

In my lab, I’ve configured a very simple topology with two ESXi hosts, each with an ESG appliance used for north/south traffic. The two ESGs are configured for ECMP operation:

ecmp-1

The diagram above is very high-level and doesn’t depict the underlying physical infrastructure or ESXi hosts, but should be enough for our purposes. BGP is used exclusively as the dynamic routing protocol in this environment.

Looking at the diagram, we can see that any VMs on the 172.17.1.0/24 network should have a DLR LIF as their default gateway. Because ECMP is being used, the DLR instance should in theory have an equal cost route to all northbound destinations via 172.17.0.10 and 172.17.0.11.

Let’s have a look at the DLR routing table:

dlr-a1.lab.local-0> sh ip route

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

Total number of routes: 17

B 10.40.0.0/25 [200/0] via 172.17.0.10
B 10.40.0.0/25 [200/0] via 172.17.0.11
C 169.254.1.0/30 [0/0] via 169.254.1.1
B 172.16.10.0/24 [200/0] via 172.17.0.10
B 172.16.10.0/24 [200/0] via 172.17.0.11
B 172.16.11.0/24 [200/0] via 172.17.0.10
B 172.16.11.0/24 [200/0] via 172.17.0.11
B 172.16.12.0/24 [200/0] via 172.17.0.10
B 172.16.12.0/24 [200/0] via 172.17.0.11
B 172.16.13.0/24 [200/0] via 172.17.0.10
B 172.16.13.0/24 [200/0] via 172.17.0.11
B 172.16.14.0/24 [200/0] via 172.17.0.10
B 172.16.14.0/24 [200/0] via 172.17.0.11
B 172.16.76.0/24 [200/0] via 172.17.0.10
B 172.16.76.0/24 [200/0] via 172.17.0.11
C 172.17.0.0/26 [0/0] via 172.17.0.2
C 172.17.1.0/24 [0/0] via 172.17.1.1
C 172.17.2.0/24 [0/0] via 172.17.2.1
C 172.17.3.0/24 [0/0] via 172.17.3.1
C 172.17.4.0/24 [0/0] via 172.17.4.1
C 172.17.5.0/24 [0/0] via 172.17.5.1
B 172.19.7.0/24 [200/0] via 172.17.0.10
B 172.19.7.0/24 [200/0] via 172.17.0.11
B 172.19.8.0/24 [200/0] via 172.17.0.10
B 172.19.8.0/24 [200/0] via 172.17.0.11
B 172.31.255.0/26 [200/0] via 172.17.0.10
B 172.31.255.0/26 [200/0] via 172.17.0.11

As seen above, all of the BGP learned routes from northbound locations – 172.19.7.0/24 included – have both ESGs listed with a cost of 200. In theory, the DLR could use either of these two paths for northbound routing. But which path will actually be used?

Path Determination and Hashing

In order for ECMP to load balance across multiple L3 paths effectively, some type of a load balancing algorithm is required. Many physical L3 switches and routers use configurable load balancing algorithms including more complex ones based on a 5-tuple hash taking even source/destination TCP ports into consideration. The more potential criteria for for analysis by the algorithm, the more likely traffic will be well balanced.

NSX’s implementation of ECMP does not include a configurable algorithm, but rather keeps things simple by using a hash based on the source and destination IP address. This is very similar to the hashing used by static etherchannel bonds – IP hash as it’s called in vSphere – and is generally not very resource intensive to calculate. With a large number of source/destination IP address combinations, a good balance of traffic across all paths should be attainable.

A few years ago, I wrote an article for the VMware Support Insider blog on manually calculating the hash value and determining the uplink when using IP hash load balancing. The general concept used by NSX for ECMP calculations is pretty much the same. Rather than calculating an index value associated with an uplink in a NIC team, we calculate an index value associated with an entry in the routing table.

Obviously, the most simple method of determining the path used would be a traceroute from the source machine. Let’s do this on the win-a1.lab.local virtual machine:

C:\Users\Administrator>tracert -d 172.19.7.100

Tracing route to 172.19.7.100 over a maximum of 30 hops

 1 <1 ms <1 ms <1 ms 172.17.1.1
 2 4 ms <1 ms <1 ms 172.17.0.10
 3 1 ms 1 ms 1 ms 172.31.255.3
 4 2 ms 1 ms <1 ms 10.40.0.7
 5 4 ms 1 ms 1 ms 172.19.7.100

Trace complete.

The first hop, 172.17.1.1 is the DLR LIF address – the default gateway of the VM. The second hop, we can see is 172.17.0.10, which is esg-a1. Clearly the hashing algorithm picked the first of two possible paths in this case. If I try a different northbound address, in this case 172.19.7.1, which is a northbound router interface, we see a different result:

C:\Users\Administrator>tracert -d 172.19.7.1

Tracing route to 172.19.7.1 over a maximum of 30 hops

 1 <1 ms <1 ms <1 ms 172.17.1.1
 2 <1 ms <1 ms <1 ms 172.17.0.11
 3 1 ms 1 ms 1 ms 172.31.255.3
 4 2 ms 1 ms 1 ms 172.19.7.1

Trace complete.

This destination uses esg-a2. If you repeat the traceroute, you’ll notice that as long as the source and destination IP address remains the same, the L3 path up to the ESGs also remains the same.

Traceroute is well and good, but what if you don’t have access to SSH/RDP into the guest? Or if you wanted to test several hypothetical IP combinations?

Using net-vdr to Determine Path

Thankfully, NSX includes a special net-vdr option to calculate the expected path. Let’s run it through its paces.

Keep in mind that because the ESXi hosts are actually doing the datapath routing, it’s there that we’ll need to do this – not on the DLR control VM appliance. Since my source VM win-a1 is on host esx-a1, I’ll SSH there. It should also be noted that it really doesn’t matter which ESXi host you use to check the path. Because the DLR instance is the same across all configured ESXi hosts, the path selection is also the same.

First, we determine the name of the DLR instance using the net-vdr -I -l command:

[root@esx-a1:~] net-vdr -I -l

VDR Instance Information :
---------------------------

Vdr Name: default+edge-1
Vdr Id: 0x00001388
Number of Lifs: 6
Number of Routes: 16
State: Enabled
Controller IP: 172.16.10.43
Control Plane IP: 172.16.10.21
Control Plane Active: Yes
Num unique nexthops: 2
Generation Number: 0
Edge Active: No

In my case, I’ve got only one instance called default+edge-1. The ‘Vdr Name’ will include the tenant name, followed by a ‘+’ and then the edge-ID which is visible in the NSX UI.

Next, let’s take a look at the DLR routing table from the ESXi host’s perspective. Earlier we looked at this from the DLR control VM, but ultimately this data needs to make it to the ESXi host for routing to function. These BGP learned routes originated on the control VM, were sent to the NSX control cluster and then synchronized with ESXi via the netcpa agent.

[root@esx-a1:~] net-vdr -R -l default+edge-1

VDR default+edge-1 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
10.40.0.0 255.255.255.128 172.17.0.10 UGE 1 AUTO 1189684 138800000002
10.40.0.0 255.255.255.128 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.10.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.10.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.11.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.11.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.12.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.12.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.13.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.13.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.14.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.14.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.76.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.76.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.17.0.0 255.255.255.192 0.0.0.0 UCI 1 MANUAL 1189684 138800000002
172.17.1.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000a
172.17.2.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000b
172.17.3.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000c
172.17.4.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000d
172.17.5.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000e
172.19.7.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.19.7.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.19.8.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.19.8.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.31.255.0 255.255.255.192 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.31.255.0 255.255.255.192 172.17.0.11 UGE 1 AUTO 1189684 138800000002

The routing table looks similar to the control VM, with a few exceptions. From this view, we don’t know where the routes originated – only if they are connected interfaces or a gateway. Just as we saw on the control VM, the ESXi host also knows that each gateway route has two equal cost paths.

Now let’s look at a particular net-vdr option called ‘resolve’:

[root@esx-a1:~] net-vdr –help
<snip>
--route -o resolve -i destIp [-M destMask] [-e srcIp] vdrName Resolve a route in a vdr instance
<snip>

Plugging in the same combination of source/destination IP I used in the first traceroute, I see that the net-vdr module agrees:

[root@esx-a1:~] net-vdr --route -o resolve -i 172.19.7.100 -e 172.17.1.100 default+edge-1

VDR default+edge-1 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
172.19.7.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1190003 138800000002

As seen above, the output of the command identifies the specific route in the routing table that will be used for that source/destination IP address pair. As we confirmed in the traceroute earlier, esg-a1 (172.17.0.10) is used.

We can repeat the same command for the other destination IP address we used earlier to see if it selects esg-a2 (172.17.0.11):

[root@esx-a1:~] net-vdr --route -o resolve -i 172.19.7.100 -e 172.17.1.1 default+edge-1

VDR default+edge-1 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
172.19.7.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1190011 138800000002

And indeed it does.

What About Ingress Traffic?

NSX’s implementation of ECMP is applicable to egress traffic only. The path selection done northbound of the ESGs would be at the mercy of the physical router or L3 switch performing the calculation. That said, you’d definitely want ingress traffic to also be balanced for efficient utilization of ESGs in both directions.

Without an appropriate equal cost path configuration on the physical networking gear, you may find that all return traffic or ingress traffic uses only one of the available L3 paths.

In my case, I’m using a VyOS routing appliance just northbound of the edges called router-core.

vyos@router-core1:~$ sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,
 I - ISIS, B - BGP, > - selected route, * - FIB route

C>* 10.40.0.0/25 is directly connected, eth0.40
C>* 127.0.0.0/8 is directly connected, lo
B>* 172.16.10.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.11.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.12.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.13.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.14.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.76.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.17.0.0/26 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.1.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.2.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.3.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.4.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.5.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.19.7.0/24 [20/1] via 10.40.0.7, eth0.40, 02w0d01h
B>* 172.19.8.0/24 [20/1] via 10.40.0.7, eth0.40, 01w6d20h
C>* 172.31.255.0/26 is directly connected, eth0.20

As you can see above, this router is also allowing multiple equal cost paths. We see all the southbound networks learned by BGP twice in the routing table. This was achieved by simply configuring bgp for a ‘maximum paths’ value of greater than ‘1’:

vyos@router-core1:~$ sh configuration
<snip>
protocols {
 bgp 64512 {
 maximum-paths {
 ebgp 4
 ibgp 4
 }

I’m honestly not sure what load balancing algorithm VyOS implements, but from an NSX perspective, it doesn’t really matter. It doesn’t have to match, it simply needs to balance traffic across each of the available L3 paths. As long as an ingress packet arrives at one of the ESGs, it’ll know how to route it southbound.

So there you have it. There is obviously a lot more to ECMP than what I discussed in this post, but hopefully this helps to clarify a bit about path selection.

Finding the NSX VIB Download URL

Although in most cases, it’s not necessary to obtain the NSX VIBs, there are some situations where you’ll need them. Most notably is when incorporating the VIBs into an image profile used for stateless auto-deploy ESXi hosts.

In older versions of NSX, including 6.0.x and 6.1.x, it used to be possible to use the very simple URL format as follows:

https://<nsxmanagerIP>/bin/vdn/vibs/<esxi-version>/vxlan.zip

For example, if you wanted the NSX VIBs for ESXi 5.5 hosts from an NSX manager with IP address 192.168.0.10, you’d use the following URL:

https://192.168.0.10/bin/vdn/vibs/5.5/vxlan.zip

This URL changed at some point, I expect with the introduction of NSX 6.2.x. VMware now uses a less predictable path, including build numbers. To get the exact locations for your specific version of NSX, you can visit the following URL:

https://<nsxmanagerIP>/bin/vdn/nwfabric.properties

When visiting this URL, you’ll be greeted by several lines of text output, which includes the path to the VIBs for various versions of ESXi. Below is some sample output you’ll see with NSX 6.3.2:

# 5.5 VDN EAM Info
VDN_VIB_PATH.1=/bin/vdn/vibs-6.3.2/5.5-5534162/vxlan.zip
VDN_VIB_VERSION.1=5534162
VDN_HOST_PRODUCT_LINE.1=embeddedEsx
VDN_HOST_VERSION.1=5.5.*

# 6.0 VDN EAM Info
VDN_VIB_PATH.2=/bin/vdn/vibs-6.3.2/6.0-5534166/vxlan.zip
VDN_VIB_VERSION.2=5534166
VDN_HOST_PRODUCT_LINE.2=embeddedEsx
VDN_HOST_VERSION.2=6.0.*

# 6.5 VDN EAM Info
VDN_VIB_PATH.3=/bin/vdn/vibs-6.3.2/6.5-5534171/vxlan.zip
VDN_VIB_VERSION.3=5534171
VDN_HOST_PRODUCT_LINE.3=embeddedEsx
VDN_HOST_VERSION.3=6.5.*

# Single Version associated with all the VIBs pointed by above VDN_VIB_PATH(s)
VDN_VIB_VERSION=6.3.2.5672532

Legacy vib location. Used by code to discover avaialble legacy vibs.
LEGACY_VDN_VIB_PATH_FS=/common/em/components/vdn/vibs/legacy/
LEGACY_VDN_VIB_PATH_WEB_ROOT=/bin/vdn/vibs/legacy/

So as you can see above, the VIB path for ESXi 6.5 for example would be:

VDN_VIB_PATH.3=/bin/vdn/vibs-6.3.2/6.5-5534171/vxlan.zip

And to access this as a URL, you’d simply tack this path onto the normal NSX https location as follows:

https://<nsxmanagerIP>/bin/vdn/vibs-6.3.2/6.5-5534171/vxlan.zip

The file itself is about 17MB in size in 6.3.2. Despite being named vxlan.zip, it actually contains two VIBs – both the VSIP module for DFW and message bus purposes as well as the VXLAN module for logical switching and routing.

NSX 6.2.8 Released!

It’s always an exciting time at VMware when a new NSX build goes ‘GA’. Yesterday (July 6th) marks the official release of NSX-V 6.2.8.

NSX 6.2.8 is a maintenance or patch release focused mainly on bug fixes, and there are quite a few in this one. You can head over to the release notes for the full list, but I’ll provide a few highlights below.

Before I do that, here are the relevant links:

In my opinion, some of the most important fixes in 6.2.8 include the following:

Fixed Issue 1760940: NSX Manager High CPU triggered by many simultaneous vMotion tasks

This was a fairly common issue that we would see in larger deployments with large numbers of dynamic security groups. The most common workflow that would trigger this would be putting a host into maintenance mode, triggering a large number of simultaneous vMotions. I’m happy to see that this one was finally fixed. Unfortunately, it doesn’t seem that this has yet been corrected in any of the 6.3.x releases, but I’m sure it will come. You can find more information in VMware KB 2150668.

Fixed Issue 1854519: VMs lose North to south connectivity after migration from a VLAN to a bridged VXLAN

This next one is not quite so common, but I’ve personally seen a couple of customers hit this. If you have a VM in a VLAN network, and then move it to the VXLAN dvPortgroup associated with the bridge, connectivity is lost. This happens because a RARP doesn’t get sent to update the physical switch’s MAC table (VMware often uses RARP instead of GARP for this purpose). Most customers would use the VLAN backed half of the bridged network for physical devices, and not for VMs, but there is no reason why this shouldn’t work.

Fixed Issue 1849037: NSX Manager API threads get exhausted when communication link with NSX Edge is broken

I think this one is pretty self explanatory – if the manager can’t process API calls, a lot of the manager’s functionality is lost. There are a number of reasons an ESXi host could lose its communication channel to the NSX manager, so definitely a good fix.

Fixed Issue 1813363: Multiple IP addresses on same vNIC causes delays in firewall publish operation

Another good fix that should help to reduce NSX manager CPU utilization and improve scale. Multiple IPs on vNICs is fairly common to see.

Fixed Issue 1798537: DFW controller process on ESXi (vsfwd) may run out of memory

I’ve seen this issue a few times in very large micro segmentation environments, and very happy to see it fixed. This should certainly help improve stability and environment scale.

FreeNAS Power Consumption and ACPI

As I alluded to in part 4 and part 5 of my recent ‘FreeNAS 9.10 Lab Build’ series, I could achieve better power consumption figures after enabling deeper C states. I first noticed this a year or two ago when I had re-purposed a Shuttle SH67H3 cube PC for use with FreeNAS. I was familiar with the expected power draw of this system when using it with other operating systems in the past, but with FreeNAS installed, it seemed higher than it should have been.

After doing some digging on the subject, I came across a thread on the FreeNAS forums describing the default ACPI C-state used and information on how to modify it.

A Bit of Background on ACPI C-States

What are often referred to as ‘C-States’ are basically levels of CPU power savings defined by the ACPI (Advanced Configuration and Power Interface) standard. This standard defines states for other devices in the system as well – not just CPUs – but all states prefixed by a ‘C’ refer to CPU power states.

The states supported by a system will often vary depending on the age of the system and the CPU manufacturer, but most modern Intel based systems will support ACPI states C0, C1, C2 and C3. Some older systems only supported C1 and a lower power state called C1E.

C0 is essentially the state where the CPU is completely awake, operating at its full frequency and performance potential and actually executing instructions. The higher the ‘C’ value, the deeper into power savings or sleep modes the CPU can go.

All ACPI compliant systems must support another state called C1. In the C1 state, the CPU is basically at idle and isn’t executing any instructions. The key requirement for the C1 state is that the CPU must be able to return to the C0 state to execute instructions immediately without any latency or delay. Because of this requirement, there are only so many power saving tweaks that the CPU can implement.

This is where the C2 state comes in, also known as ‘Stop-Clock’. In this ACPI idle state, additional power savings features can be used. This is where modern Intel processors shine. They can do all sorts of power saving wizardry, like turning off unused portions of the CPU or even entire idle CPU cores. Because there can be a very slight delay in returning to the C0 state when using these these features, they cannot be implemented when limited to ACPI C1.

Enabling Deeper ACPI C-States

By default, FreeNAS has ACPI configured to use only the C1 power state. Presumably, this is to guarantee maximum performance and prevent any quirks with older CPUs switching between power states.

If maximum performance and performance consistency is desired over power savings – as is often the case in critical production environments – leaving the default at C1 is probably a wise choice. This choice also becomes much less important if your system is heavily utilized and spends very little time at idle anyway. But if you are like me, running a home lab, idle power consumption is an important consideration.

From an SSH or console prompt, you can determine the detected and supported C-States by querying the relevant ACPI sysctls:

[root@freenas] ~# sysctl -a | grep cx_
hw.acpi.cpu.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00% 0.00% last 19us
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_supported: C1/1/1 C2/3/96
dev.cpu.2.cx_usage: 100.00% 0.00% last 136us
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_supported: C1/1/1 C2/3/96
dev.cpu.1.cx_usage: 100.00% 0.00% last 2006us
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_supported: C1/1/1 C2/3/96
dev.cpu.0.cx_usage: 100.00% 0.00% last 2101us
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_supported: C1/1/1 C2/3/96

As you can see above, my Xeon X3430 supports C1, C2 and C3 states. The cx_usage value appears to report how much time the processor spends in that particular state and the cx_lowest reports the deepest allowed state.

In the C1 state, my system is idling at about 95W total power consumption.

I had originally suspected that Intel SpeedStep technology (a feature that provides dynamic frequency and voltage scaling at idle) was not functioning in ACPI C1, but that doesn’t seem to be the case. My CPU’s normal (non turbo-boost) frequency is 2.4GHz. If SpeedStep is functional, I’d expect it to use one of the following lower power states as defined in the following sysctl:

[root@freenas] ~# sysctl -a | grep dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 2395/95000 2394/95000 2261/78000 2128/63000 1995/57000 1862/46000 1729/36000 1596/32000 1463/25000 1330/19000 1197/17000

As seen above, the processor should be able to scale down to 1197MHz at idle. Even with the powerd daemon stopped, you can still use the powerd command line tool to see what the current CPU frequency is as well as any changes as load increases. Using powerd with the -v verbose option, we can see that the processor frequency does indeed jump up and down in the ACPI C1 state and stabilizes at 1197MHz when idle:

[root@freenas] ~# powerd -v
<snip>
changing clock speed from 1463 MHz to 1330 MHz
load   4%, current freq 1330 MHz ( 9), wanted freq 1263 MHz
load   0%, current freq 1330 MHz ( 9), wanted freq 1223 MHz
load   0%, current freq 1330 MHz ( 9), wanted freq 1197 MHz
changing clock speed from 1330 MHz to 1197 MHz
load  26%, current freq 1197 MHz (10), wanted freq 1197 MHz
load   3%, current freq 1197 MHz (10), wanted freq 1197 MHz
load   0%, current freq 1197 MHz (10), wanted freq 1197 MHz
load   4%, current freq 1197 MHz (10), wanted freq 1197 MHz

You can change the lowest allowed ACPI state by using the following command. In this example, I will allow the system to use more advanced power saving features by setting cx_lowest to C2:

[root@freenas] ~# sysctl hw.acpi.cpu.cx_lowest=C2
hw.acpi.cpu.cx_lowest: C1 -> C2

After making this change, the system power consumption immediately dropped down to about 75W. That’s more than 20% – not bad!

Now if we repeat the previous command, we can see some different reporting:

[root@freenas] ~# sysctl -a | grep cx_
hw.acpi.cpu.cx_lowest: C2
dev.cpu.3.cx_usage: 0.00% 100.00% last 19695us
dev.cpu.3.cx_lowest: C2
dev.cpu.3.cx_supported: C1/1/1 C2/3/96
dev.cpu.2.cx_usage: 5.12% 94.87% last 23us
dev.cpu.2.cx_lowest: C2
dev.cpu.2.cx_supported: C1/1/1 C2/3/96
dev.cpu.1.cx_usage: 1.87% 98.12% last 255us
dev.cpu.1.cx_lowest: C2
dev.cpu.1.cx_supported: C1/1/1 C2/3/96
dev.cpu.0.cx_usage: 1.28% 98.71% last 1200us
dev.cpu.0.cx_lowest: C2
dev.cpu.0.cx_supported: C1/1/1 C2/3/96

Part of the reason the power savings was so significant is because the system is spending over 95% of its time at idle in the C2 state.

Making it Stick

One thing you’ll notice is that after rebooting, this change will revert back to the default again. Since this is a sysctl, you’d think this could just be added as a system tunable parameter in the UI. I tried this to no avail.

After doing some digging, I found a bug reported on this. It appears that this is a known problem due to other rc.d scripts interfering with the cx_lowest sysctl.

I took a look in the /etc/rc.d directory and found a script called power_profile. It does indeed appear to be overwriting the ACPI lowest state at bootup:

[root@freenas] ~# cat /etc/rc.d/power_profile
<snip>
# Set the various sysctls based on the profile's values.
node="hw.acpi.cpu.cx_lowest"
highest_value="C1"
lowest_value="Cmax"
eval value=\$${profile}_cx_lowest
sysctl_set
<snip>

I could probably get by this issue by modifying the startup scripts, but the better solution is to simply add the required command to the list of post-init scripts. This can be done from the UI in the following location:

freenas-acpi-1

They key thing is to ensure is that the command is run ‘post init’. When that is done, the setting sticks and is applied after the power_profile script.

Hopefully this could save you a few dollars on your power bill as well!

FreeNAS 9.10 Lab Build – Part 5

In part 4 of this series, I took a look at my new used Dell PowerEdge T110 and talked about the pros and cons about using this type of machine. Today, I’ll be installing the drives and completing the build.

freenas5-3

To begin, I installed my drives into the server’s normal 3.5″ mounting locations. I had a few challenges here, but I was very thankful that I kept the Dell branded SATA cable that came with the PERC H200 card. It’s totally meant for this machine and keeps the wiring organized. In all of the custom builds I’ve done, it’s really tough to get clean SATA-power wiring because of the close proximity of the drives. Because of this, there is often too much pressure on the connectors and I’ve even had issues with them coming loose. This Dell breakout cable is very flexible and because of the built-in power connector, makes for a very clean and secure wiring job.

Unfortunately, I didn’t need to buy the two SATA breakout cables that I discussed in part 1 of the series, but at least I’ve got extras if I ever want to add more drives to the system.

freenas5-2

To get the two 2.5″ SATA drives installed, I used a couple of Kingston brand 2.5″ to 3.5″ adapters. They were a perfect fit for the drive caddies I bought off of ebay, but they interfered with the Dell SATA brakout cable connector. To get around this, I just loosened the screws securing the SSDs and the cable connector to get a couple of extra millimeters of clearance for the connector. Eventually, I hope to add a hotswap enclosure to one of the 5.25″ drive bays for 2.5″ drives, but for now this will have to do.

Continue reading “FreeNAS 9.10 Lab Build – Part 5”

Refurbishing Old Keyboards

Today I’m going to shift gears a bit and take a look at some retro PC keyboards.

keyboard1-0

I’m currently in the process of refurbishing a couple of old PCs, including a classic DEC brand 486 DX2 and a newer 2001 era AMD Athlon system. A lot of people see these machines as trash, but for me there is a lot of nostalgia around systems like this and I see value in trying to restore and maintain them.

One of the first problems I ran into when trying to test out these systems is that not all keyboards were created equally. There are several connector standards that have been used over the years including AT, PS2 and more recently, USB. I have always kept a couple of USB to PS2 adapters for occasions such as this, but to my surprise, three USB keyboards I tried were simply not functional when using the PS2 adapter. As it turns out, there needs to be some intelligence in the keyboard itself to know when its been plugged into a PS2 port in order to function based on that signalling standard. The only keyboard I could get to work with the adapter was my trusty DAS Keyboard, that I was not willing to move away from my main system.

Continue reading “Refurbishing Old Keyboards”

FreeNAS 9.10 Lab Build – Part 4

In part 4 of this series, I’ll be taking a look at my new (used) Dell PowerEdge T110 server and sizing it up for use with FreeNAS 9.10.

On my daily perusal through my favorite eBay seller’s inventory, I came across a well used and somewhat scratched up Dell T110 tower server. It probably didn’t garner a lot of interest with its meager 1GB of RAM, lack of hard drives and rough appearance. Despite this, the seller’s asking price of $99 wasn’t bad when you consider that is had a Xeon X3430 quad core processor and was tested and functional.

Since I already had a working Perc H200 card – an optional and supported card in the Dell T110 – as well as a pair of 4GB 2Rx8 ECC DIMMs collecting dust, this box was suddenly appealing to me.

The next day, I saw the price drop to $75, and almost pulled trigger on it until I saw the hefty shipping cost. I didn’t think it would last long at that price, but decided to make an offer of $63 to offset the shipping a bit and was pleased to find that the seller accepted.

It needed a lot of TLC when it arrived. I spent a fair amount of time removing caked-on dust from the fan assembly and also spent some quality time with a can of compressed air.

Continue reading “FreeNAS 9.10 Lab Build – Part 4”