Building a Retro Gaming Rig – Part 2

In Part 1 of this series, I took a close look at the Asus P2B slot-1 motherboard and some CPU options. Today, the focus will be on graphics cards.

Evaluating Video Card Options

If this were a pure a DOS build, the card I chose wouldn’t matter very much. I’d simply be interested in something with decent 2D image quality and enough VRAM to support the resolutions I’d want to use. The majority of DOS games don’t offer 3D acceleration but because I wanted a system that would run some Windows 9x based titles, a 3D card would be ideal for that genuine experience.

The years leading up to the millennium were very exciting from a graphics hardware perspective. In 1998, 3D accelerated cards were much more accessible to the average consumer and major strides were made in performance and price.

3dfx Interactive

If you were at all into PC gaming or PC hardware back in the late nineties, you will be well acquainted with 3dfx Interactive. Although the company went bankrupt in 2002, 3dfx was the pioneer in mainstream 3D gaming technology up until that point. The card that brought them to fame was their 3D-only ‘Voodoo Graphics’ adapter released in 1996. They were also well known for their proprietary Glide API that many games of the era supported. The turning point for me personally was when I first experienced GLQuake on a Voodoo card back in 1996. I remember being totally blown away after seeing the surreal lighting and 3D effects and was determined to buy one. After saving up while working a summer job in 1997, I bought the 6MB Canopus Pure 3D card based on the famous chipset and never regretted it for a moment. I used that card for several years until buying a Voodoo 3 3000 later on.

The 3dfx Voodoo Banshee

Being that my retro rig was supposed to be of a 1998 vintage, I had originally started looking at the Voodoo 2 released that year, which was superior to the original in every respect. My next choice was the AGP version of the 3dfx Voodoo 3. Unfortunately, both of these models were fetching well over $100 on eBay, and outside of my budget for this weekend project. They are simply in high demand and are genuine collectors items at this point.

Undeterred, I shifted focus to another part that was a bit less popular but still had all the 3dfx flare of 1998 – the 3dfx Voodoo Banshee. The Banshee was a first for 3dfx in two respects – it was their first single chip 2D/3D solution and their first AGP based card. The 3D portion of the core is almost the same as the popular Voodoo 2, but had only one texture mapping unit instead of two. The resulting performance drop was partially offset by a higher core and memory clock speed but it still had plenty of power for games of that generation. As an added bonus, the 2D performance and image quality of the Banshee was excellent – almost as good as the top notch 2D cards of the time and removed the need for a separate 2D and 3D card in the system.

Today, the Banshee cards don’t sell for quite as much as the Voodoo 2 or Voodoo 3, but can still go for between $60 and $120 on eBay.

3dfx1-1

I was fortunate enough to stumble upon an awesome deal on a 16MB Ensoniq brand Banshee card for only $25 USD. It was probably only still available because the eBay listing didn’t include the words ‘3dfx’ or ‘Voodoo’ in the description. After the shipping, duty and exchange I think it came to under $50 CDN. Not too bad for a rare piece of history.

Ensoniq no longer exists, but was best known for their audio products. They were purchased by Creative Labs shortly after this card was produced. After seeing some of the other Banshee models from other companies, I actually like this card best because of the larger heatsink on it. Most of the other cards have rather small chipset style coolers. They all seem to be passive, but I can confirm that these cards do get rather toasty under load.

3dfx1-2

In 1998, dual monitor outputs and DVI were reserved only for elite workstation cards and specialty products. Just a simple VGA-out can be found on the Ensoniq Banshee. Some cards of this era differentiated themselves with TV S-Video and Composite outputs, but nothing that would interest anyone today.

It was interesting to see two mysterious headers on the card. One is a 40-pin female IDE-style connector, and the other is a 26 pin header of some sort. My first thought was that they may have been for future SLI support, but after doing some digging these turned out to be standard feature connectors. These were found on many cards of this vintage for daughter boards and other add-on modules. Technically, the 36 pin SLI connector found on Voodoo 2 cards is indeed a feature connector of sorts as it allows the cards to communicate over a high-bandwidth channel without having to saturate the PCI bus.

AGP Variations

The earliest 3D cards were all PCI based, but in 1998 many first generation AGP variants were on the market, including the Banshee. A lot of people building retro rigs seem to forget that there were actually several AGP ‘versions’ to hit the market over the years and not all were backward compatible. An AGP 1x card released in 1998 very likely won’t work or even fit in an AGP board released a few years later.

retro-slot1-23

AGP 1x and 2x are very similar and provide power to the card at 3.3V. AGP 4x and 8x came out a few years later and were limited to 1.5V. As you can imagine, putting 3.3V through a card designed for 1.5V could be pretty catastrophic. To prevent these problems, slot and card ‘keying’ were implemented to ensure you couldn’t physically insert a card into an incompatible slot. As you can see above, the 3.3V AGP 1x slot on the ASUS P2B has a plastic ‘key’ about a third of the way in from the left of the slot. A 1.5V 4x/8x slot would have that same key, but in a different position closer to the right of the slot.

Interestingly, some cards that came out later on were actually compatible with both 1.5V and 3.3V AGP slots and had two key indentations on the card. The card in the image above is a 2003 era ATI Radeon 9200 SE. I have used this same card in a newer 8x slot machine as well as the 1x ASUS P2B.

The budget priced Radeon 9200 SE is actually about 9 times more powerful than the Banshee. I had considered using this Windows 98 compatible card in my retro build, but it just didn’t have the same nostalgic value that a genuine 3dfx card does. For now, it’s been serving as a spare.

Conclusion

I was really excited to get a 3dfx card – especially without having to shell out too much. It’ll be great to use a card from that era and to use the 3dfx proprietary glide API for many games as well. I think that the Banshee is a perfect match for the build and I look forward to getting everything put together!

Next up, I’ll take a look at a couple of sound card options, including both ISA and PCI cards. In an added twist, I also bought a compact micro ATX socket 370 system that I’ll evaluate as well.

Part 3 – The mATX MSI 6160 and BIOS Woes >>

Building a Retro Gaming Rig Series

Part 1 – ASUS P2B and Slot-1 CPUs
Part 2 – 3D Video Card Options
Part 3 – The mATX MSI MS-6160 and BIOS Woes
Part 4 – Choosing a Sound Card
Part 5 – Hard Drive/Storage Options
Part 6 – Completed Build!

Using FreeNAS for NSX FTP Backups

FreeNAS is a very powerful storage solution and is quite popular with those running vSphere and NSX home labs. I recently built a new FreeNAS 9.10 system and wanted to share some of my experiences getting NSX FTP backups going.

To get this configured, I found the FTP section of the FreeNAS 9.10 documentation to be very useful. I’d definitely recommend giving it a read through as well.

Before Getting Started

Before enabling the FTP service in FreeNAS, you’ll want to decide where to put your NSX backups. In theory, you can dump them in any of your volumes or datasets but you may want to set aside a specific amount of storage space for them. To do this in my lab, I created a dedicated dataset with a 60GB quota for FTP purposes. I like to separate it out to ensure nothing else competes with the backups and the amount of space available is predictable.

FreeNASNSXbackups-1

If you plan to use FTP for more than just NSX, it would be a good idea to create a subdirectory in the dataset or other location you want them to reside. In my case, I created a directory called ‘NSX’ in the dataset:

[root@freenas] ~# cd /mnt/vol1/dataset-ftp
[root@freenas] /mnt/vol1/dataset-ftp# mkdir NSX
[root@freenas] /mnt/vol1/dataset-ftp# ls -lha
total 2
drwxr-xr-x 3 root wheel 3B Sep 7 09:23 ./
drwxr-xr-x 5 root wheel 5B Sep 5 16:13 ../
drwxr-xr-x 2 root wheel 2B Sep 7 09:23 NSX/
[root@freenas] /mnt/vol1/dataset-ftp#

Setting Permissions

One step that is often missed during FreeNAS FTP configuration is to set the appropriate permissions. The proftpd service in FreeNAS uses the built in ftp user account. If that user does not have the appropriate permissions to the location you intend to use, backups will not write successfully.

Since I used a dedicated dataset for FTP called dataset-ftp, I can easily set permissions recursively for this location from the UI:

FreeNASNSXbackups-2

As shown above, we want to set both the owner user and group to ftp. Because I created the NSX directory within the dataset, I’ll be setting permission recursively as well.

If I log into FreeNAS via SSH or console again, I can confirm that this worked because the dataset-ftp mount is now owned by ftp as is the NSX subdirectory within.

[root@freenas] /mnt/vol1# ls -lha
total 14
drwxrwxr-x  5 root  wheel     5B Sep  5 16:13 ./
drwxr-xr-x  4 root  wheel   192B Sep  5 16:09 ../
drwxr-xr-x  3 ftp   ftp       3B Sep  7 09:23 dataset-ftp/
drwxrwxr-x  5 root  wheel    13B Jul 29 16:23 dataset-static/
drwxrwxr-x  2 root  wheel     2B Sep  5 16:13 dataset-tftp/
[root@freenas] /mnt/vol1# ls -lha dataset-ftp
total 2
drwxr-xr-x  3 ftp   ftp       3B Sep  7 09:23 ./
drwxrwxr-x  5 root  wheel     5B Sep  5 16:13 ../
drwxr-xr-x  2 ftp   ftp       2B Sep  7 13:46 NSX/

The Easy Option – Anonymous FTP Access

Setting up anonymous FTP access requires the least amount of effort and is usually sufficient for home lab purposes. I would strongly discourage the use of anonymous access in a production or security sensitive environment as anyone on the network can access the FTP directory configured.

First, configure FTP under services in the FreeNAS UI:

FreeNASNSXbackups-4

As you’d obviously expect, the ‘Allow Anonymous Login’ option needs to be checked off in order for anonymous FTP to work. The ‘Allow Local Users Login’ option should be unchecked if you don’t want to use authentication. It’s also important to select the ‘Path’ to the FTP root directory you wish to use. In my example above, any anonymous FTP logins will go directly into the NSX subdirectory I created earlier.

If you want to use FTP for more than just NSX backups, you can make the path the root of the dataset and NSX can be configured to use a specific subdirectory within as I’ll show later.

Once that’s done, you can enable the FTP service. It’ll be off by default:

FreeNASNSXbackups-5

Now we can do some basic tests to ensure FTP is functional. You can use an FTP client like FileZilla if you like, but I’m just going to use the good old Windows FTP command line utility. First, let’s make sure we can login anonymously:

C:\Users\mike.LAB\Desktop>ftp freenas.lab.local
Connected to freenas.lab.local.
220 ProFTPD 1.3.5a Server (freenas.lab.local FTP Server) [::ffff:172.16.10.17]
User (freenas.lab.local:(none)): anonymous
331 Anonymous login ok, send your complete email address as your password
Password:
230 Anonymous access granted, restrictions apply

A return status of 230 is what we’re looking for here and this seems to work fine. Keep in mind that it technically doesn’t matter what password you enter for the anonymous username. You can just hit enter, but I usually just re-enter the username. It’s not necessary to enter anything that resembles an email address.

Next, let’s make sure we have permission to write to this location. I’ll do an FTP ‘PUT’ of a small text file:

ftp> bin
200 Type set to I
ftp> put C:\Users\mike.LAB\Desktop\test.txt
200 PORT command successful
150 Opening BINARY mode data connection for test.txt
226 Transfer complete
ftp: 14 bytes sent in 0.00Seconds 14000.00Kbytes/sec.
ftp> dir
200 PORT command successful
150 Opening ASCII mode data connection for file list
-rw-r----- 1 ftp ftp 14 Sep 7 13:56 test.txt
226 Transfer complete
ftp: 65 bytes received in 0.01Seconds 10.83Kbytes/sec.
ftp>

As seen above, the file was written successfully with a 226 return code. The last step I’d recommend doing before configuring NSX is to confirm the relative path after login from the FTP server’s perspective. Because I stayed in the FTP root directory, it simply lists a forward slash as shown below:

ftp> pwd
257 "/" is the current directory
ftp>

NSX expects this path as you’ll see shortly. Now that we know anonymous FTP is working, we can configure the FTP server from the NSX appliance UI:

FreeNASNSXbackups-3

As you can see above, I’ve entered ‘anonymous’ as the user name, and entered the same as the password string. The backup directory is the location you want NSX to write backups to. If you had a specific directory you wanted to use within the FTP root directory that was configured, you could enter it here. For example, /backups. As mentioned earlier, my FTP root directory is the NSX directory so it’s not necessary in my case.

Two other pieces of information are mandatory in NSX – the filename prefix and the pass phrase. The filename prefix is just that – a string that is appended to the beginning of the filename. It usually makes sense to identify the environment or NSX manager by name here. This is especially important if you have multiple NSX managers all backing up to one location. The pass phrase is a password used to encrypt the backup binary file generated. Be sure not to lose this or you will not be able to restore your backups.

After hitting OK, we can then do a quick backup to ensure it can connect and write to the location configured.

FreeNASNSXbackups-6

If everything was successful, you should then see your file listed in the backup history pane at the bottom of the view:

FreeNASNSXbackups-7

FTP User Authentication

Anonymous FTP may be sufficient for most home lab purposes, but there are several advantages to configuring users and authentication. FTP by nature transmits in plain text and is not secure, but adding authentication provides a bit more control over who can access the backups and allows the direction of users to specific FTP locations. This can be useful if you plan to use your FTP server for more than just NSX.

Before we begin, let’s create a user in FreeNAS that we’ll use for NSX backups:

FreeNASNSXbackups-8

Some of the key things you’ll need to ensure is that the user’s primary group is the built-in ftp group used by proftpd and that the user’s home directory is where you want them to land after log in. In my example above, I’m creating a user called nsxftpuser with a home directory of the FTP root directory I configured earlier.

Keep in mind that by default FreeNAS will create a new home directory hence the wording “Create Home Directory In:”. I expect the home directory to actually be /mnt/vol1/dataset-ftp/nsxftpuser and not /mnt/vol1/dataset-ftp/.

Next, we need to modify the FTP settings slightly:

FreeNASNSXbackups-9

Since we want to use local user authentication, we need to check ‘Allow Local User Login’. I’ve also unchecked ‘Allow Anonymous Login’ to ensure only authenticated users can now login.

In order to test that we’re dumped into the user’s home directory after login, I changed the FTP default path one level back to the root of the dataset.

As a last step, it’s necessary to stop and start the FTP service again for the changes to take effect.

Before we test this new user, let’s double check that the home directory is located where we want it:

[root@freenas] /mnt/vol1/dataset-ftp# ls -lha
total 18
drwxr-xr-x  4 ftp         ftp       4B Sep  7 14:42 ./
drwxrwxr-x  5 root        wheel     5B Sep  5 16:13 ../
drwxr-xr-x  2 ftp         ftp       5B Sep  7 14:43 NSX/
drwxr-xr-x  2 nsxftpuser  ftp      10B Sep  7 14:42 nsxftpuser/

As you can see above, we now have a home directory matching the username in the FTP root location.

Now let’s try to log in using the nsxftpuser account:

C:\Users\mike.LAB\Desktop>ftp freenas.lab.local
Connected to freenas.lab.local.
220 ProFTPD 1.3.5a Server (freenas.lab.local FTP Server) [::ffff:172.16.10.17]
User (freenas.lab.local:(none)): nsxftpuser
331 Password required for nsxftpuser
Password:
230-Welcome to FreeNAS FTP Server
230 User nsxftpuser logged in
ftp>

So far so good, now let’s PUT a file to ensure we have write access to this location:

ftp> bin
200 Type set to I
ftp> put C:\Users\mike.LAB\Desktop\test.txt
200 PORT command successful
150 Opening BINARY mode data connection for test.txt
226 Transfer complete
ftp: 14 bytes sent in 0.00Seconds 14000.00Kbytes/sec.
ftp> dir
200 PORT command successful
150 Opening ASCII mode data connection for file list
-rw-r----- 1 nsxftpuser ftp 14 Sep 7 14:45 test.txt
226 Transfer complete
ftp: 67 bytes received in 0.00Seconds 22.33Kbytes/sec.

Success! The last thing we need to do is modify the NSX configuration slightly to use the new user account:

FreeNASNSXbackups-10

And sure enough, the backup was successful at 18:47 GMT:

FreeNASNSXbackups-11

If I look at the files from the FreeNAS SSH session, I can see both the encrypted backup binary and metadata properties file located in the user’s home directory:

[root@freenas] /mnt/vol1/dataset-ftp/nsxftpuser# ls -lha lab*
-rw-r-----  1 nsxftpuser  ftp   2.4M Sep  7 14:47 lab18_47_33_Thu07Sep2017
-rw-r-----  1 nsxftpuser  ftp   227B Sep  7 14:47 lab18_47_33_Thu07Sep2017.backupproperties

Scheduling Backups

Once we know NSX backups are functional, it’s a good idea to get them going on a schedule.

An important consideration to keep in mind when deciding when to schedule is when your vCenter backups are done. Because NSX relies heavily upon the state of the vCenter Server inventory and objects, it’s a good idea to try to schedule your backups at around the same time. That way, if you ever need to restore, you’ll have vCenter and NSX objects in sync as closely as possible.

FreeNASNSXbackups-12

In my lab, I have it backing up every night at midnight, but depending on how dynamic your environment is, you may want to do it more frequently.

Another important point to note is that NSX Manager doesn’t handle large numbers of backups very well in the backup directory. The UI will throw a warning once you get up to 100 backups and eventually you’ll get a slow or non-responsive UI in the Backup and Restore section. To get around this, you can manually archive older backups to another location outside of the FTP root directory or create a script to move older files to another location.

The only piece that I haven’t gotten to work with FreeNAS yet is SFTP encrypted backups using TLS. Once I get that going well, I’ll hopefully write up another post on the topic.

Thanks for reading! If you have any questions please leave a comment below.

Removing Stale IP Pool Assignments in NSX

NSX uses the concept of IP pools for IP address assignment for several components including controllers, VTEPs and Guest Introspection. These are normally configured during the initial deployment of NSX and it’s always a good idea to ensure you’ve got some headroom in the pool for future growth.

NSX usually does a good job of keeping track of IP Pool address allocation, but in some situations, stale entries may be wasting IPs. There are a few ways you could get yourself into this situation – most commonly this is due to the improper removal of objects. For example, if an ESXi host is removed from the vCenter inventory while still in an NSX prepared cluster, its VTEP IP address allocation will remain. NSX can’t release the allocation, because the VIBs were never uninstalled and it has no idea what the fate of the host was. If the allocation was released and someone deployed a new host while the old one was still powered on, you’d likely get IP conflicts.

Just this past week, I assisted two separate customers who ran into similar situations – one had a stale IP in their controller pool, and the other had stale IPs in their VTEP pool. Both had removed controllers or ESXi hosts using a non-standard method.

If you have a look in the NSX UI, you’ll notice that there is no way to add, modify or remove allocated IPs. You can only modify or expand the pool. Thankfully, there is a way to remove allocated IPs from a pool using an NSX REST API call.

To simulate a scenario where this can happen, I went ahead and improperly removed one of the NSX controllers and did some manual cleanup afterward. As you can see below, the third controller appears to have been removed successfully.

ippoolAPI-2

When I try to deploy the third controller again, I’m unable to because of a shortage of IPs in the pool:

ippoolAPI-1

If I look at the IP Pool called ‘Controller Pool’ in the grouping objects, I can see that there are only three IPs available and one of them belongs to the old controller than no longer exists:

ippoolAPI-3

So in order to get my third controller re-deployed, I’ll need to either remove the stale 172.16.10.45 entry or expand my pool to have a total of four or more addresses. If this were a production environment, expanding the pool may be a suitable workaround to get things running again quickly. If you are at all like me, simply having this remnant left behind would bother me and I’d want to get it cleaned up.

Releasing IPs Using REST API Calls

Now that we’ve confirmed the IP address we want to nuke from the pool, we can use some API calls to gather the required information and release the address. The API calls we are interested in can be found in the NSX 6.2 and 6.3 API guides. My lab is currently running 6.2.7, so I’ll be using calls found on page 110-114 in the NSX 6.2 API guide.

Before we begin, there are two key pieces of information we’ll need to do this successfully:

  1. The IP address that needs to be released.
  2. The moref identifier of the IP pool in question.

First, we’ll use an API call to query all IP pools on the NSX manager. This will provide an output that will include the moref identifier of the pool in question:

GET https://NSX-Manager-IP-Address/api/2.0/services/ipam/pools/scope/scopeID

As you can see above, the ‘scope ID’ is also required to run this GET call. In every instance I’ve seen, using globalroot-0 as the scopeID works just fine here.

ippoolAPI-4

The various IP pools will be separated by <ipamAddressPool> XML tags. You’ll want to identify the correct pool based on the IP range listed or by the text in the <name> field. The relevant controller pool was identified by the following section in the output in my example:

<ipamAddressPool>
<objectId>ipaddresspool-1</objectId>
<objectTypeName>IpAddressPool</objectTypeName>
<vsmUuid>4226CDEE-1DDA-9FF9-9E2A-8FDD64FACD35</vsmUuid>
<nodeId>fa4ecdff-db23-4799-af56-ae26362be8c7</nodeId>
<revision>1</revision>
<type>
<typeName>IpAddressPool</typeName>
</type>
<name>Controller Pool</name>
<scope>
<id>globalroot-0</id>
<objectTypeName>GlobalRoot</objectTypeName>
<name>Global</name>
</scope>
<clientHandle/>
<extendedAttributes/>
<isUniversal>false</isUniversal>
<universalRevision>0</universalRevision>
<totalAddressCount>3</totalAddressCount>
<usedAddressCount>3</usedAddressCount>
<usedPercentage>100</usedPercentage>
<prefixLength>24</prefixLength>
<gateway>172.16.10.1</gateway>
<dnsSuffix>lab.local</dnsSuffix>
<dnsServer1>172.16.10.10</dnsServer1>
<dnsServer2>172.16.10.11</dnsServer2>
<ipPoolType>ipv4</ipPoolType>
<ipRanges>
<ipRangeDto>
<id>iprange-1</id>
<startAddress>172.16.10.43</startAddress>
<endAddress>172.16.10.45</endAddress>
</ipRangeDto>
</ipRanges>
<subnetId>subnet-1</subnetId>
</ipamAddressPool>

As you can see above, the IP pool is identified by the moref identifier ipaddresspool-1.

As an optional next step, you may wish to view the IP addresses allocated within this pool. The following API call will obtain this information:

GET https://NSX-Manager-IP-Address/api/2.0/services/ipam/pools/poolId/ipaddresses

In my example, I used the following call:

GET https://nsxmanager.lab.local/api/2.0/services/ipam/pools/ipaddresspool-1/ipaddresses

Below is the output I received:

<allocatedIpAddresses>
<allocatedIpAddress>
<id>13</id>
<ipAddress>172.16.10.44</ipAddress>
<gateway>172.16.10.1</gateway>
<prefixLength>24</prefixLength>
<dnsServer1>172.16.10.10</dnsServer1>
<dnsServer2>172.16.10.11</dnsServer2>
<dnsSuffix>lab.local</dnsSuffix>
<subnetId>subnet-1</subnetId>
</allocatedIpAddress>
<allocatedIpAddress>
<id>14</id>
<ipAddress>172.16.10.43</ipAddress>
<gateway>172.16.10.1</gateway>
<prefixLength>24</prefixLength>
<dnsServer1>172.16.10.10</dnsServer1>
<dnsServer2>172.16.10.11</dnsServer2>
<dnsSuffix>lab.local</dnsSuffix>
<subnetId>subnet-1</subnetId>
</allocatedIpAddress>
<allocatedIpAddress>
<id>15</id>
<ipAddress>172.16.10.45</ipAddress>
<gateway>172.16.10.1</gateway>
<prefixLength>24</prefixLength>
<dnsServer1>172.16.10.10</dnsServer1>
<dnsServer2>172.16.10.11</dnsServer2>
<dnsSuffix>lab.local</dnsSuffix>
<subnetId>subnet-1</subnetId>
</allocatedIpAddress>
</allocatedIpAddresses>

Each allocated address in the pool will have its own <id> tag. I can see that 172.16.10.45 is indeed still there. Now let’s remove it using the following API call:

DELETE https://NSX-Manager-IP-Address/api/2.0/services/ipam/pools/poolId/ipaddresses/allocated-ip-address

In my example, the exact call would be:

DELETE https://nsxmanager.lab.local/api/2.0/services/ipam/pools/ipaddresspool-1/ipaddresses/172.16.10.45

ippoolAPI-5

If the call was successful, you should see a Boolean value of ‘true’ returned. Next you can validate again using the previous API call. In my case I used:

GET https://nsxmanager.lab.local/api/2.0/services/ipam/pools/ipaddresspool-1/ipaddresses

And got the following output:

<allocatedIpAddresses>
<allocatedIpAddress>
<id>13</id>
<ipAddress>172.16.10.44</ipAddress>
<gateway>172.16.10.1</gateway>
<prefixLength>24</prefixLength>
<dnsServer1>172.16.10.10</dnsServer1>
<dnsServer2>172.16.10.11</dnsServer2>
<dnsSuffix>lab.local</dnsSuffix>
<subnetId>subnet-1</subnetId>
</allocatedIpAddress>
<allocatedIpAddress>
<id>14</id>
<ipAddress>172.16.10.43</ipAddress>
<gateway>172.16.10.1</gateway>
<prefixLength>24</prefixLength>
<dnsServer1>172.16.10.10</dnsServer1>
<dnsServer2>172.16.10.11</dnsServer2>
<dnsSuffix>lab.local</dnsSuffix>
<subnetId>subnet-1</subnetId>
</allocatedIpAddress>
</allocatedIpAddresses>

As you can see above, the IP with an <id> tag of 15 has been removed. Next, I’ll confirm in the UI that the IP has indeed been released:

ippoolAPI-6

After a refresh of the vSphere Web Client view, the total used decreased to 2 for the Controller Pool and I could deploy my third controller successfully.

Although this process is straight forward if you are familiar with running NSX API calls, I do have to provide a word of caution. NSX will not stop you from releasing an IP if it is genuinely being used. Therefore, it’s important to make 100% sure that whatever object was using the stale IP is indeed off the network. Some basic ping tests are a good idea before proceeding.

Thanks for reading! If you have any questions, please feel free to leave a comment below.

Re-deploying NSX Controllers During Upgrades

I’ve had this question come up a lot lately and there seems to be some confusion around whether or not NSX controllers need to be redeployed after upgrading them. The short answer to this question is really “it depends”. There are actually three different scenarios where you may want or need to delete and re-deploy NSX controllers as part of the upgrade process. Today, I’ll walk through these situations and the proper process to delete and re-deploy your controller nodes.

The Normal Upgrade Process

Upgrading the NSX Control Cluster is a very straight-forward process. After clicking the upgrade link, an automated process begins to upgrade the controller code, and reboot each cluster member sequentially.

controller-redeploy-3

Once the ‘Upgrade Available’ link is clicked, you’ll see each of the three controllers download the upgrade bundle, upgrade and then reboot before NSX moves on to the next one.

controller-redeploy-7

Once NSX goes through its paces, it’s usually a good idea to ensure that the control-cluster join status is ‘Join complete’ and that all three controllers agree on the Cluster UUID.

nsx-controller # show control-cluster status
Type Status Since
--------------------------------------------------------------------------------
Join status: Join complete 07/24 13:38:32
Majority status: Connected to cluster majority 07/24 13:38:19
Restart status: This controller can be safely restarted 07/24 13:38:48
Cluster ID: f2849ee2-7fb6-4aca-abf4-2ca176337956
Node UUID: f2849ee2-7fb6-4aca-abf4-2ca176337956

Role Configured status Active status
--------------------------------------------------------------------------------
api_provider enabled activated
persistence_server enabled activated
switch_manager enabled activated
logical_manager enabled activated
directory_server enabled activated

Because the underling structure of the VM itself doesn’t change, this sort of in-place code upgrade and reboot is sufficient and has minimal impact.

Scenario 1 – E1000 vNIC Replacement

The first scenario where you may want to redeploy the controllers involves a virtual hardware change that was introduced in NSX 6.1.5. NSX controllers deployed in 6.1.5 use the VMXNET3 vNIC adapter, whereas older versions had legacy Intel E1000 emulated vNICs. This change wasn’t well publicized and surprisingly it isn’t even found in the NSX 6.1.5 release notes.

I’ve seen quite a few customers go through upgrade cycles from 6.0 or 6.1 all the way to more recent 6.2.x or 6.3.x releases while retaining E1000 vNICs on their controllers. Although the E1000 vNIC adapter is generally pretty stable, there is at least one documented issue where the adapter driver suffers a hang and the controller is no longer able to transmit or receive. This problem is described in VMware KB 2150747.

That said, I personally would not wait for a problem to occur and would recommend checking to ensure your controllers are using VMXNET3, and if not, go through the redeployment procedure I’ll outline later in this post. Aside from preventing the E1000 hang problem, you’ll also benefit from the other improvements VMXNET3 has to offer like better offloading and lower CPU utilization.

Unfortunately, finding out if your controllers have E1000 or VMXNET3 adapters can be a tad tricky. You’ll find that your controllers are locked down and can’t be edited in the vSphere Web Client or the legacy vSphere Client.

controller-redeploy-1

As seen above, the ‘Edit Settings’ option is greyed out.

controller-redeploy-2

The summary page also doesn’t tell us much, so the easiest way to get the adapter type is to check from the ESXi command line.

First, let’s SSH into a host where one of the controllers live and then find the full path to the VMX file:

[root@esx0:~] cd /vmfs/volumes
[root@esx0:/vmfs/volumes] find ./ -name NSX_Controller*.vmx
./58f77a6f-30961726-ac7e-002655e1b06c/NSX_Controller_078fcf78-9a0c-491d-95a0-02e8b5175935/NSX_Controller_078fcf78-9a0c-491d-95a0-02e8b5175935.vmx

Next, I will look for the relevant vNIC adapter settings in the VMX file using the full path obtained in the previous command output:

[root@esx0:/vmfs/volumes] cat ./58f77a6f-30961726-ac7e-002655e1b06c/NSX_Controller_c97459f1-3845-436f-8e03-60ad3cbed9e4/NSX_Controller_c97459f1-3845-436f-8e03-60ad3cbed9e4.vmx |grep -i ethernet0.virtualDev
ethernet0.virtualDev = "vmxnet3"

The key setting in the VMX that we are interested in is ethernet0.virtualDev. As seen above, the type is vmxnet3 on my controllers as they were created from a freshly deployed 6.2.5 environment. If you see e1000 here, your controllers were deployed from a 6.1.4 or older setup and have never been re-deployed.

Scenario 2 – Updating the Disk Partitioning Layout

The second scenario would be if your controllers were initially deployed in a version of NSX prior to 6.2.3. Since 6.2.3 was pulled shortly after release, 6.2.4 would be more relevant starting point.

A statement you’ll find in the NSX 6.2.4 release notes summarizes this change well:

“…New installations of NSX 6.2.3 or later will deploy NSX Controller appliances with updated disk partitions to provide extra cluster resiliency. In previous releases, log overflow on the controller disk might impact controller stability. In addition to adding log management enhancements to prevent overflows, the NSX Controller appliance has separate disk partitions for data and logs to safeguard against these events. If you upgrade to NSX 6.2.3 or later, the NSX Controller appliances will retain their original disk layout.”

Again, it’s possible you may never run into a problem due to the old partitioning layout, but it’s always wise to take advantage of ‘optional’ resiliency enhancements like this. This is especially true for such a critical component of the NSX control-plane.

Although there isn’t a supported way to enter the root shell on a controller appliance, the ‘show status’ command will provide you with the partitioning layout. Here is the layout on a newer 6.2.5 controller with the newer partitioning:

nsx-controller # show status
Version: 4.0.6 (Build 48886)

Current Time: Fri, 25 Aug 2017 15:01:17 +0000
Uptime: 32 days 1 hour 23 minutes 16 seconds

Load Average: 0.10, 0.10, 0.13
Memory Usage: 3926484 kB (Total), 267752 kB (Free)
Disk Usage:
Filesystem                      1K-blocks    Used Available Use% Mounted on
/dev/sda1                         6596784 1181520   5057120  19% /
udev                              1953420       4   1953416   1% /dev
/dev/mapper/nvp-var               6593712 2349576   3886148  38% /var
/dev/mapper/nvp-var+cloudnet+data 3776568  147908   3417104   5% /var/cloudnet/data

Essentially, there are now three separate partitions for data instead of just one. Files for everything were just lumped together along with the Linux OS in a single partition previously. If some runaway log files filled the partition, key services would be impacted. By separating everything out, the key controller services like the zookeeper clustering service will still be able to write to disk.

I don’t have access to a pre-6.2.3 setup at the moment, but you can tell if your controller still uses the old partitioning layout by the absence of two partitions in the ‘show status output’. Both /dev/mapper/nvp-var and /dev/mapper/nvp-var+cloudnet+data only exist on controllers using the new partitioning layout.

Because disk partitioning is a pretty low-level, there was really no way to incorporate this into the automated upgrade process. To get the new layout, you’ll need to delete and re-deploy the controller appliances.

Scenario 3 – Upgrading to NSX 6.3.3

NSX 6.3.3 introduces a major change to the NSX controllers, replacing the underlying Linux OS with VMware’s new distribution dubbed Photon OS. The virtual hardware also changes slightly in 6.3.3 as the Photon OS based controllers require larger VMDK disks. Because this changes the entire foundation of the VM and is mandatory – unlike the vNIC and partitioning changes mentioned earlier – there is no way to perform in-place code upgrades. Each of the controllers needs to be deleted and re-deployed.

Thankfully, because of the mandatory nature of this change, VMware modified the upgrade process in 6.3.3 to automatically delete and re-deploy controllers for you.

From the NSX 6.3.3 release notes:

“In NSX 6.3.3, the underlying operating system of the NSX Controller changes. This means that when you upgrade to NSX 6.3.3, instead of an in-place software upgrade, the existing controllers are deleted one at a time, and new Photon OS based controllers are deployed using the same IP addresses.”

That said, no manual intervention is required when upgrading to 6.3.3. Controllers will be deleted and re-deployed automatically as part of the upgrade process. For more information, see the NSX public docs on the subject.

Some Warnings and Cautions

Before I go through the process of destroying and re-creating controller nodes, I really want to preface by saying that this process is potentially risky and should only be done during a maintenance window. It’s also very important that the process be done correctly to ensure you don’t run into any major problems. Below are some common pitfalls and other recommendations:

  1. Never just delete or remove the controller appliances from the vCenter inventory. NSX keeps track of the controllers in its database and doesn’t react well to having appliances yanked from under it. They must be deleted properly.
  2. Never deploy more than three controllers thinking you can just do a ‘cut over’. I.e. Don’t deploy six controllers and then delete the three old ones. A one to one replacement must be done and we never want fewer than two functional controllers in the cluster, and never more than three.
  3. If a controller fails to delete using the normal supported method, there is a reason. Don’t force the deletion without speaking to VMware technical support first. A common reason I’ve seen for this is a mismatched moref identifier for the appliance VM. If the NSX database thinks a controller is vm-73, but the actual VM is vm-75, the delete will fail. Removing controllers from the vCenter Inventory and re-adding them will cause this type of mismatch.
  4. It’s very important to validate that the control cluster health is good before proceeding to the next controller for deletion/re-deployment. Do not skip these checks and be patient with this process. Unless you have two fully functional controllers up and running in the cluster, you won’t have full control-plane functionality and a risk data-plane outage.
  5. If something goes wrong, you’ll still be okay if you have two controllers working in the cluster. Don’t just proceed in the interest of ‘moving forward’ because there is a good chance the other two will behave similarly. Contact VMware support if there is every any doubt.

A Quick Note on Force Deletion

While trying to delete a controller, you’ll be greeted by a ‘Forcefully Delete’ option. When selected, this option nukes everything related to the controller from the NSX database and NSX doesn’t care whether the VM appliance is successfully removed or not. This option should never used unless advised by VMware support for repairing specific cluster problems. As mentioned in the previous section, if a regular delete fails, there is always a specific reason. Using ‘Forcefully Delete’ to work around these problems can leave remnants behind and potentially cause problems with the cluster.

The warning presented by the NSX UI when you try to Force Delete a cluster node:

“Forcefully deleting a controller may result into NSX Controller cluster going down and the rest of the controllers may get disconnected, thereby resulting in problems like no majority and data inconsistency. Many operations like adding logical components will not be possible. If you still choose to delete the controller, it is recommended to also delete the rest of the controllers and recreate them.”

It’s also worth mentioning that the only time you’d need to forcefully delete a controller in a normal workflow is when deleting the last of three controllers. NSX will only delete the very last controller using the force option. Because we’re only removing one at a time, this should not apply here.

Controller Re-deployment Process

Again, you won’t need to use this process if you are upgrading to NSX 6.3.3 or later because the deletion and re-creation of appliances is handled in an automated manner. If you’d like to take advantage of a VMXNET3 adapter and/or the new partitioning layout in newer versions of NSX, please read on.

The overall goal here is to replace the NSX control cluster members one at a time, keeping in mind that as long as two controller nodes are online and healthy, the control-plane continues to function. In theory, you shouldn’t suffer any kind of control-plane or data-plane outage using this process.

**Edit 11/15/2017: As you may be aware, there have been a few new bugs discovered, including one that impacts the deployment/re-deployment of NSX controllers in 6.3.3 and 6.3.4. Please be sure to have a look at my post on the subject as well as the VMware KB before proceeding. If you are running 6.3.3, do not delete your controllers until you’ve implemented the workaround or patched. If you still have the old 6.3.3 upgrade bundle, you may not be able to upgrade.

Step 1 – Data collection and preparation

Before proceeding, we’ll need to collect some information about our current controller deployment. In order to deploy a controller, the following information is required:

  1. The vSphere Cluster that your controllers will live in.
  2. The datastore you want to use for your controllers.
  3. The network portgroup (standard or distributed) that your controllers are in.
  4. If you used a specific naming convention for your controllers, be sure to note it down.
  5. And finally, the IP address pool that’s used for the controllers. Note that when deleting controllers using this method, an IP will be freed up from the pool so even with just three IPs in a pool, this process should work.

Be sure to get the above information from the vSphere Web Client before proceeding so that you don’t have to go looking for it during the process.

Step 2 – Validate the control-cluster health

Before you begin the process, it’s very important to ensure you have a functional control cluster with all nodes connected to the cluster majority. As tempting as it may be, do not skip this check.

controller-redeploy-4

Checking in the UI is a good first place to look for obvious signs of trouble, but I would not rely on this method alone. If everything is green in the UI, log into each of the three controllers via SSH and run the show control-cluster status command:

nsx-controller # show control-cluster status
Type Status Since
--------------------------------------------------------------------------------
Join status: Join complete 08/25 15:26:19
Majority status: Connected to cluster majority 08/25 15:30:45
Restart status: This controller can be safely restarted 08/25 15:31:09
Cluster ID: f2849ee2-7fb6-4aca-abf4-2ca176337956
Node UUID: 309611b3-2441-4a1a-a12b-a5537e999c23

Role Configured status Active status
--------------------------------------------------------------------------------
api_provider enabled activated
persistence_server enabled activated
switch_manager enabled activated
logical_manager enabled activated
directory_server enabled activated

There are several key things you’ll want to validate before proceeding.

  1. The Join status must read ‘Join complete’. No other status is acceptable.
  2. The Majority status must read ‘Connected to cluster majority
  3. The Restart status must read ‘This controller can be safely restarted’.
  4. Each controller node must have the same ‘Cluster ID’.

If all three controllers look good, you can proceed.

Step 3 – Delete the first controller

Once we’ve confirmed the control cluster health is good, we can delete the first controller from the NSX UI. It doesn’t matter which one you do first, but in my example, I’ll start with controller-3 and work my way backwards.

To delete, simply select the ‘Management’ tab of the Installation section in the NSX UI and click the little red ‘X’ icon above.

controller-redeploy-9

As mentioned earlier, we want to use the normal ‘Delete’ option. Do NOT use ‘Forcefully Delete’.

controller-redeploy-10

NSX will execute several tasks related to the controller VM. First, it will power off the VM appliance, it will then delete it and remove all references of the controller in the database. It’s not unusual for this process to take 10 minutes or longer.

Once the controller has disappeared from the NSX ‘Management’ tab, it’s very important to check that the appliance itself was actually deleted from the vCenter inventory.

controller-redeploy-11

Check for both the successful power off and deletion tasks in the recent tasks pane and also confirm the VM is no longer present in the inventory.

Finally, we’ll want to check the cluster health from the other two surviving nodes using the same show control-cluster status command we used earlier. Ensure that both controllers look healthy.

I’d also recommend ensuring that the cluster is now only comprised of two nodes from the NSX controller node’s perspective. Just because NSX manager says there are two doesn’t necessarily guarantee the other controllers do. You can check this using the show control-cluster startup-nodes command:

nsx-controller # show control-cluster startup-nodes
172.16.10.43, 172.16.10.44

As seen above, my control cluster confirms only two members.

Step 4 – Replace the Deleted Controller.

Once the first controller has been deleted successfully and we’ve confirmed the health of the control cluster, we can go ahead and deploy a new one.

controller-redeploy-12

The process should be very straight forward and is the same as what was done during the initial deployment of NSX. Keep in mind that the name you specify is simply a label and that the moref identifier of the new controller will change.

controller-redeploy-13

NSX will report the new controller in the ‘Deploying’ status for some time, but you can monitor the tasks and events from the vSphere Web Client:

controller-redeploy-14

You can also watch the console of the new controller to confirm that it’s finished joining the cluster and ready for logins. It will usually be sitting a ‘Fetching initial configuration data’ for some time before it’s ready:

controller-redeploy-15

Once it’s powered up and ready, you can log-in via CLI and ensure that the ‘show control-cluster status’ output looks healthy as described earlier and that there are three startup-nodes again:

nsx-controller # show control-cluster status
Type Status Since
--------------------------------------------------------------------------------
Join status: Join complete 08/25 17:47:16
Majority status: Connected to cluster majority 08/25 17:47:13
Restart status: This controller can be safely restarted 08/25 17:47:14
Cluster ID: f2849ee2-7fb6-4aca-abf4-2ca176337956
Node UUID: f9a2d207-bf57-4f23-b075-1eefc58bfc8d

Role Configured status Active status
--------------------------------------------------------------------------------
api_provider enabled activated
persistence_server enabled activated
switch_manager enabled activated
logical_manager enabled activated
directory_server enabled activated

nsx-controller # show control-cluster startup-nodes
172.16.10.43, 172.16.10.44, 172.16.10.45

As seen above, my new controller is online and healthy. Most importantly it agrees with the other two controllers on the ID of the cluster and number of startup nodes.

You could also do a ‘show status’ on the controller to confirm that it has the new partitioning layout at this time as discussed earlier.

Step 5 – Rinse and Repeat.

It’s extremely important to verify the cluster health before proceeding with the deletion of the next cluster node. Aside from the checks in the previous section, this would also be a good time to do some basic connectivity tests. Make sure your distributed routers are functional and that your guests connected to logical switches are working normally.

If you delete the next controller while the cluster is in a bad state, there is a good chance you’ll be down to a single node and will be operating in a ‘read-only’ state. In this condition, any VTEP, ARP or MAC table changes in the environment – like those triggered by vMotions, etc – would fail to propagate. This is definitely not a situation you’d want to be in.

Once you are sure it’s safe to proceed, simply repeat steps 3 and 4 above for the remaining two controllers.

Conclusion

So there you have it. The process can be a bit of a nail-biting experience in a production environment, but if you take the appropriate precautions everything should work without a hitch. The reward for your patience will be a more resilient control cluster with virtual hardware configured as VMware intended.

Thanks for reading! If you have any questions, please feel free to post below.

Building a Retro Gaming Rig – Part 1

Welcome to a new hardware build series where I’ll be sharing my experiences building a retro Intel gaming system. In part 1, I’ll be going over some of the hardware I picked out for this build and doing a bit of a photo shoot. I apologize in advance for the copious amounts of trivial history and nostalgic rambling in this post, so please feel free to skip the first few sections if you’d like to get straight to the gear. For those who appreciate the trip down memory lane, please read on! 🙂

Introduction

Twenty seventeen has been a real year of nostalgia for me when it comes to PC hardware. It all started earlier this summer when my Brother in-law was cleaning out his basement and gave me a couple of classic PC systems. One was a 1993 era DEC 486 and the other, a 1995 era NEC Pentium 90 system. These systems brought back so many good memories and were a real joy to use and restore. Although the first PC we had in my house growing up was a much older clone from the late 80s – possibly an 8088 or 286, I was really too young to appreciate it and preferred my NES/SNES at the time. I remember teaching myself how to use DOS with it but it wasn’t until 1994 that I could convince my parents that I needed a new computer – you know, for gaming educational purposes.  After doing some research and shopping around, I got a shiny new 486 DX2/66. It may not have been state of the art at the time, but with 8MB of RAM, a CDROM, 430MB Hard drive and a real Soundblaster 16, it ran like a dream. I spent hours upon hours on that machine and it wasn’t long before I had added a 14.4 modem and really got into the BBS scene as well. About a year later, my parents surprised me with a top of the line NEC Pentium 100 system with 16MB of RAM and the 486 replaced the old monochrome 8088 in the basement. Coincidentally it was almost identical to the Pentium 90 given to me by my brother in-law.

The Goal

Although these two retro rigs were a lot of fun to restore and use, I wanted to build a machine that was a bit more flexible, could be customized and used for not only demanding DOS/Windows 9x games but also most of the classics as well. The main issue with these two older machines is that they are both proprietary AT based systems with many BIOS quirks, compatibility issues and other limitations. Even the Pentium 90 with 24MB of RAM just wasn’t fast enough for newer DOS games like Quake. In short, I wanted a custom retro build that was fast, reliable and compatible – something with some nostalgic value and some pizzazz but also well suited for daily use.

Continue reading “Building a Retro Gaming Rig – Part 1”

Finding moref IDs for NSX API Calls

NSX was designed from the ground up to be very configurable via restful API calls. You can create, modify and remove objects and configuration using APIs. This also makes NSX very powerful with automation and other cloud management platforms and tools.

One of the most common questions I get from those getting started with API calls is on the unique identifiers – often referred to as ‘morefs’ or ‘managed object reference’ identifiers – used in these calls. The vCenter Server ‘Managed Object Browser’ is often used as the primary method to obtain these moref identifiers, but it can be a bit daunting to navigate for those unfamiliar with it. It also requires full administrative privileges to vCenter so may not always be an option for all users either. Another way to find them is to use PowerNSX – but again, you may not be comfortable with PowerCLI yet, or maybe it’s not deployed in the environment.

Not to worry – there are some surprisingly simple ways to obtain moref identifiers for all kinds of vCenter and NSX objects. In this post, I’ll show some of these tricks that I’ve picked up over time.

NSX Edges and DLRs

NSX edges and DLRs are the easiest objects to obtain morefs for. VMware very thoughtfully included a column in the vSphere Web Client NSX Edges view called ‘Id’.

nsx-moref1

Above, you can see that edge ‘esg-a1’ has a unique identifier called ‘edge-2’. This ID is indeed the moref.

Another way to get this information would be to use the NSX Manager Central CLI feature. From the manager CLI, the following command will get you a similar output:

nsxmanager.lab.local> show edge all
NOTE: CLI commands for Edge ServiceGateway(ESG) start with 'show edge'
 CLI commands for Distributed Logical Router(DLR) Control VM start with 'show edge'
 CLI commands for Distributed Logical Router(DLR) start with 'show logical-router'
 Edges with version >= 6.2 support Central CLI and are listed here
Legend:
Edge Size: Compact - C, Large - L, X-Large - X, Quad-Large - Q
Edge ID Name Size Version Status
edge-1 dlr-a1 C 6.2.5 GREEN
edge-2 esg-a1 C 6.2.5 GREEN
edge-3 esg-a2 C 6.2.5 GREEN

Virtual Machines and Appliances

In the previous section, we looked at the NSX Edge moref identifiers, but these appliances also exist as unique virtual machines from a vCenter Server perspective. Every virtual machine in the vCenter inventory – including controllers, DLRs, ESGs etc – can also be referred to as virtual machines with a vm-X moref identifier. For example, my DLR called dlr-a1 is edge-1 from an NSX perspective but actually exists as two DLR appliances in high availability mode. Having virtual machine moref identifiers allows us to uniquely identify each appliance.

To find out a virtual machine’s moref identifier, an easy method to use is to look at the URL in the vSphere Web Client. For example, I’ve gone to the ‘Hosts and Clusters’ view in my lab and selected the first of two dlr-a1 appliances:

nsx-moref2

At the end of the URL string in the address bar, we can see that the moref identifier is included. It’s somewhat stuffed in there in and not always noticeable, but knowing that the virtual machine moref always begins with ‘vm-‘ followed by a numerical value, we are able to pick it out. In the above example, the DLR appliance known as dlr-a1-0 has a virtual machine moref value of vm-675.

Looking at the second of the two appliances in the same way – the standby in the HA pair – I get a different virtual machine moref value of vm-677.

NSX Controllers

From an NSX perspective, NSX controllers also have a moref identifier prefixed by ‘controller-‘. Again, VMware thoughtfully included this information underneath the controller IP address in the ‘Controller Node’ column. This can be found in the Installation section of the NSX UI in the vSphere Web Client under the ‘Management’ tab.

nsx-moref3

One thing to be careful about is that the ‘Name’ column does not necessarily provide the moref identifier. VMware recently allowed the ability to name controllers with a friendly name in NSX 6.2. Just like ESGs and DLRs, NSX controllers also have virtual machine moref identifiers that are sometimes needed.

vSphere Clusters

As you have probably noticed, vSphere clusters are often the configuration delimiter for many things in NSX, including host preparation, firewall status etc. As such, many API calls will reference cluster objects. The cluster moref is not prefixed by ‘cluster-‘ as you might expect, but rather ‘domain-c‘. To determine the cluster moref, we can use the same trick that we used for finding the virtual machine IDs. As you’ll discover the URL address in the vSphere Web Client can tell us the moref for numerous objects.

nsx-moref4

In my lab above, cluster compute-a is domain-c121.

You can also obtain a list of NSX prepared clusters and their moref IDs using the NSX Central CLI. From an NSX manager CLI prompt:

nsxmanager.lab.local> show cluster all
No. Cluster Name Cluster Id Datacenter Name Firewall Status
1 compute-r domain-c641 lab Enabled
2 compute-vcp domain-c705 lab Not Ready
3 compute-a domain-c121 lab Enabled
4 management domain-c205 lab Not Ready

ESXi Hosts

There are a couple of different ways that you can obtain the moref for an ESXi host. As you’d expect, the moref is always prefixed by ‘host-‘. Just like for clusters and virtual machines, you can select the object in the vSphere Web Client and get the host moref from the address bar. Alternatively, there is another easy way to get the host moref from any NSX prepared host from the CLI.

When ESXi hosts are prepared, NSX pushes numerous RabbitMQ configuration variables that instruct it how to to communicate with NSX manager. One of these RabbitMQ parameters called /UserVars/RmqHostId actually includes the host moref.

[root@esx-a1:~] esxcfg-advcfg -g /UserVars/RmqHostId
Value of RmqHostId is host-223

As you can see, host esx-a1 has a moref of host-223.

Another option is to use the NSX Central CLI for viewing the contents of a cluster. Once you have the cluster ID – domain-c121 in my example – the following command can be used to view the hosts in the cluster and get the moref identifiers:

nsxmanager.lab.local> show cluster domain-c121
Datacenter: lab
Cluster: compute-a
No. Host Name Host Id Installation Status
1 esx-a1.lab.local host-223 Enabled
2 esx-a2.lab.local host-225 Enabled

Transport Zones

The moref IDs for transport zones aren’t required often, but if you ever need to find one, the NSX Central CLI can get you this information.

The below output will list out all logical switches, but will also correlate the transport zone name with a moref prefixed by ‘vdnscope-‘. Logical switch UUID values can also be identified using this command:

nsxmanager.lab.local> show logical-switch list all
NAME UUID VNI Trans Zone Name Trans Zone ID
Transit VXLAN1 210b616b-691a-470f-ad1e-1cc24b485d0d 5000 Primary TZ vdnscope-1
Blue Network 7a068fb5-9b17-4779-9b3d-0d81a439189f 5001 Primary TZ vdnscope-1
Green Network 22f01e70-18ae-4d3e-a683-be08d244e919 5002 Primary TZ vdnscope-1
Yellow Network 8510eedb-e6a8-41a0-bae0-79f3af8630be 5003 Primary TZ vdnscope-1
Red Network 4996f8f1-68c9-43de-9207-fbe038543133 5004 Primary TZ vdnscope-1
Purple Network bdd16f8e-805f-45df-a5c7-00a0ce319cc9 5005 Primary TZ vdnscope-1
Test Network A 72597655-ad88-4e32-9baf-c724d49c9a7c 5006 Primary TZ vdnscope-1

As you can see above, I have only one transport zone called ‘Primary TZ’ with a moref of vdnscope-1.

Datastores

Some API calls expect input including the moref for a specific datastore. An example would be trying to deploy a new ESG or controller – NSX needs to know where to store the VM. Once again, going to the datastores view in the vSphere Web Client allows us to select the specific datastore and the ‘datastore-‘ prefixed moref is visible in the address bar.

nsx-moref5

Above you can see my datastore called shared-ssd0 equates to moref datastore-621.

IP Pools

Some API calls involving the deployment of objects require the moref identifier of an IP address pool. Unfortunately, this moref can’t be found in the GUI, but with a couple of quick API calls can be obtained pretty easily.

First, we’ll use an API call to query all IP pools on the NSX manager. This will provide an output that will include the moref identifier of the pool in question:

GET https://NSX-Manager-IP-Address/api/2.0/services/ipam/pools/scope/scopeID

As you can see above, the ‘scope ID’ is also required to run this GET call. In every instance I’ve seen, using globalroot-0 as the scopeID works.

ippoolAPI-4

The various IP pools will be separated by <ipamAddressPool> XML tags. You’ll want to identify the correct pool based on the IP range listed or by the text in the <name> field. In my example, I want to find the moref for the Controller Pool, which is found in the section below:

<ipamAddressPool>
<objectId>ipaddresspool-1</objectId>
<objectTypeName>IpAddressPool</objectTypeName>
<vsmUuid>4226CDEE-1DDA-9FF9-9E2A-8FDD64FACD35</vsmUuid>
<nodeId>fa4ecdff-db23-4799-af56-ae26362be8c7</nodeId>
<revision>1</revision>
<type>
<typeName>IpAddressPool</typeName>
</type>
<name>Controller Pool</name>
<scope>
<id>globalroot-0</id>
<objectTypeName>GlobalRoot</objectTypeName>
<name>Global</name>
</scope>
...
<snip>

As you can see above, the IP pool is identified by the moref identifier ipaddresspool-1.

Conclusion

And there you have it. Much easier than navigating through the vCenter Managed Object Browser and doesn’t require full administrative privileges in vCenter. For the address bar trick, you’ll need to have a minimum of read-only privileges to the object you are trying to select, as well as read-only access to the NSX UI in order to view the Edge list and installation section. The NSX Central CLI commands do require that you log in to the NSX manager via CLI using an administrator account, but most individuals managing NSX would have this access.

If there are any other morefs or identifiers you are having difficulty locating, please leave a comment and I’d be happy to post some methods.

 

ECMP Path Determination in NSX

ECMP or ‘equal cost multi-pathing’ is a great routing feature that was introduced in NSX 6.1 several years ago. By utilizing multiple egress paths, ECMP allows for better use of network bandwidth for northbound destinations in an NSX environment. As many as eight separate ESG appliances can be used with ECMP – ideally on dedicated ESX hosts in an ‘edge cluster’.

Lab Setup

In my lab, I’ve configured a very simple topology with two ESXi hosts, each with an ESG appliance used for north/south traffic. The two ESGs are configured for ECMP operation:

ecmp-1

The diagram above is very high-level and doesn’t depict the underlying physical infrastructure or ESXi hosts, but should be enough for our purposes. BGP is used exclusively as the dynamic routing protocol in this environment.

Looking at the diagram, we can see that any VMs on the 172.17.1.0/24 network should have a DLR LIF as their default gateway. Because ECMP is being used, the DLR instance should in theory have an equal cost route to all northbound destinations via 172.17.0.10 and 172.17.0.11.

Let’s have a look at the DLR routing table:

dlr-a1.lab.local-0> sh ip route

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

Total number of routes: 17

B 10.40.0.0/25 [200/0] via 172.17.0.10
B 10.40.0.0/25 [200/0] via 172.17.0.11
C 169.254.1.0/30 [0/0] via 169.254.1.1
B 172.16.10.0/24 [200/0] via 172.17.0.10
B 172.16.10.0/24 [200/0] via 172.17.0.11
B 172.16.11.0/24 [200/0] via 172.17.0.10
B 172.16.11.0/24 [200/0] via 172.17.0.11
B 172.16.12.0/24 [200/0] via 172.17.0.10
B 172.16.12.0/24 [200/0] via 172.17.0.11
B 172.16.13.0/24 [200/0] via 172.17.0.10
B 172.16.13.0/24 [200/0] via 172.17.0.11
B 172.16.14.0/24 [200/0] via 172.17.0.10
B 172.16.14.0/24 [200/0] via 172.17.0.11
B 172.16.76.0/24 [200/0] via 172.17.0.10
B 172.16.76.0/24 [200/0] via 172.17.0.11
C 172.17.0.0/26 [0/0] via 172.17.0.2
C 172.17.1.0/24 [0/0] via 172.17.1.1
C 172.17.2.0/24 [0/0] via 172.17.2.1
C 172.17.3.0/24 [0/0] via 172.17.3.1
C 172.17.4.0/24 [0/0] via 172.17.4.1
C 172.17.5.0/24 [0/0] via 172.17.5.1
B 172.19.7.0/24 [200/0] via 172.17.0.10
B 172.19.7.0/24 [200/0] via 172.17.0.11
B 172.19.8.0/24 [200/0] via 172.17.0.10
B 172.19.8.0/24 [200/0] via 172.17.0.11
B 172.31.255.0/26 [200/0] via 172.17.0.10
B 172.31.255.0/26 [200/0] via 172.17.0.11

As seen above, all of the BGP learned routes from northbound locations – 172.19.7.0/24 included – have both ESGs listed with a cost of 200. In theory, the DLR could use either of these two paths for northbound routing. But which path will actually be used?

Path Determination and Hashing

In order for ECMP to load balance across multiple L3 paths effectively, some type of a load balancing algorithm is required. Many physical L3 switches and routers use configurable load balancing algorithms including more complex ones based on a 5-tuple hash taking even source/destination TCP ports into consideration. The more potential criteria for for analysis by the algorithm, the more likely traffic will be well balanced.

NSX’s implementation of ECMP does not include a configurable algorithm, but rather keeps things simple by using a hash based on the source and destination IP address. This is very similar to the hashing used by static etherchannel bonds – IP hash as it’s called in vSphere – and is generally not very resource intensive to calculate. With a large number of source/destination IP address combinations, a good balance of traffic across all paths should be attainable.

A few years ago, I wrote an article for the VMware Support Insider blog on manually calculating the hash value and determining the uplink when using IP hash load balancing. The general concept used by NSX for ECMP calculations is pretty much the same. Rather than calculating an index value associated with an uplink in a NIC team, we calculate an index value associated with an entry in the routing table.

Obviously, the most simple method of determining the path used would be a traceroute from the source machine. Let’s do this on the win-a1.lab.local virtual machine:

C:\Users\Administrator>tracert -d 172.19.7.100

Tracing route to 172.19.7.100 over a maximum of 30 hops

 1 <1 ms <1 ms <1 ms 172.17.1.1
 2 4 ms <1 ms <1 ms 172.17.0.10
 3 1 ms 1 ms 1 ms 172.31.255.3
 4 2 ms 1 ms <1 ms 10.40.0.7
 5 4 ms 1 ms 1 ms 172.19.7.100

Trace complete.

The first hop, 172.17.1.1 is the DLR LIF address – the default gateway of the VM. The second hop, we can see is 172.17.0.10, which is esg-a1. Clearly the hashing algorithm picked the first of two possible paths in this case. If I try a different northbound address, in this case 172.19.7.1, which is a northbound router interface, we see a different result:

C:\Users\Administrator>tracert -d 172.19.7.1

Tracing route to 172.19.7.1 over a maximum of 30 hops

 1 <1 ms <1 ms <1 ms 172.17.1.1
 2 <1 ms <1 ms <1 ms 172.17.0.11
 3 1 ms 1 ms 1 ms 172.31.255.3
 4 2 ms 1 ms 1 ms 172.19.7.1

Trace complete.

This destination uses esg-a2. If you repeat the traceroute, you’ll notice that as long as the source and destination IP address remains the same, the L3 path up to the ESGs also remains the same.

Traceroute is well and good, but what if you don’t have access to SSH/RDP into the guest? Or if you wanted to test several hypothetical IP combinations?

Using net-vdr to Determine Path

Thankfully, NSX includes a special net-vdr option to calculate the expected path. Let’s run it through its paces.

Keep in mind that because the ESXi hosts are actually doing the datapath routing, it’s there that we’ll need to do this – not on the DLR control VM appliance. Since my source VM win-a1 is on host esx-a1, I’ll SSH there. It should also be noted that it really doesn’t matter which ESXi host you use to check the path. Because the DLR instance is the same across all configured ESXi hosts, the path selection is also the same.

First, we determine the name of the DLR instance using the net-vdr -I -l command:

[root@esx-a1:~] net-vdr -I -l

VDR Instance Information :
---------------------------

Vdr Name: default+edge-1
Vdr Id: 0x00001388
Number of Lifs: 6
Number of Routes: 16
State: Enabled
Controller IP: 172.16.10.43
Control Plane IP: 172.16.10.21
Control Plane Active: Yes
Num unique nexthops: 2
Generation Number: 0
Edge Active: No

In my case, I’ve got only one instance called default+edge-1. The ‘Vdr Name’ will include the tenant name, followed by a ‘+’ and then the edge-ID which is visible in the NSX UI.

Next, let’s take a look at the DLR routing table from the ESXi host’s perspective. Earlier we looked at this from the DLR control VM, but ultimately this data needs to make it to the ESXi host for routing to function. These BGP learned routes originated on the control VM, were sent to the NSX control cluster and then synchronized with ESXi via the netcpa agent.

[root@esx-a1:~] net-vdr -R -l default+edge-1

VDR default+edge-1 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
10.40.0.0 255.255.255.128 172.17.0.10 UGE 1 AUTO 1189684 138800000002
10.40.0.0 255.255.255.128 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.10.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.10.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.11.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.11.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.12.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.12.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.13.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.13.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.14.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.14.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.16.76.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.16.76.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.17.0.0 255.255.255.192 0.0.0.0 UCI 1 MANUAL 1189684 138800000002
172.17.1.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000a
172.17.2.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000b
172.17.3.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000c
172.17.4.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000d
172.17.5.0 255.255.255.0 0.0.0.0 UCI 1 MANUAL 1189684 13880000000e
172.19.7.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.19.7.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.19.8.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.19.8.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1189684 138800000002
172.31.255.0 255.255.255.192 172.17.0.10 UGE 1 AUTO 1189684 138800000002
172.31.255.0 255.255.255.192 172.17.0.11 UGE 1 AUTO 1189684 138800000002

The routing table looks similar to the control VM, with a few exceptions. From this view, we don’t know where the routes originated – only if they are connected interfaces or a gateway. Just as we saw on the control VM, the ESXi host also knows that each gateway route has two equal cost paths.

Now let’s look at a particular net-vdr option called ‘resolve’:

[root@esx-a1:~] net-vdr –help
<snip>
--route -o resolve -i destIp [-M destMask] [-e srcIp] vdrName Resolve a route in a vdr instance
<snip>

Plugging in the same combination of source/destination IP I used in the first traceroute, I see that the net-vdr module agrees:

[root@esx-a1:~] net-vdr --route -o resolve -i 172.19.7.100 -e 172.17.1.100 default+edge-1

VDR default+edge-1 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
172.19.7.0 255.255.255.0 172.17.0.10 UGE 1 AUTO 1190003 138800000002

As seen above, the output of the command identifies the specific route in the routing table that will be used for that source/destination IP address pair. As we confirmed in the traceroute earlier, esg-a1 (172.17.0.10) is used.

We can repeat the same command for the other destination IP address we used earlier to see if it selects esg-a2 (172.17.0.11):

[root@esx-a1:~] net-vdr --route -o resolve -i 172.19.7.100 -e 172.17.1.1 default+edge-1

VDR default+edge-1 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Ref Origin UpTime Interface
----------- ------- ------- ----- --- ------ ------ ---------
172.19.7.0 255.255.255.0 172.17.0.11 UGE 1 AUTO 1190011 138800000002

And indeed it does.

What About Ingress Traffic?

NSX’s implementation of ECMP is applicable to egress traffic only. The path selection done northbound of the ESGs would be at the mercy of the physical router or L3 switch performing the calculation. That said, you’d definitely want ingress traffic to also be balanced for efficient utilization of ESGs in both directions.

Without an appropriate equal cost path configuration on the physical networking gear, you may find that all return traffic or ingress traffic uses only one of the available L3 paths.

In my case, I’m using a VyOS routing appliance just northbound of the edges called router-core.

vyos@router-core1:~$ sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,
 I - ISIS, B - BGP, > - selected route, * - FIB route

C>* 10.40.0.0/25 is directly connected, eth0.40
C>* 127.0.0.0/8 is directly connected, lo
B>* 172.16.10.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.11.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.12.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.13.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.14.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.16.76.0/24 [20/1] via 172.31.255.1, eth0.20, 02w0d01h
B>* 172.17.0.0/26 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.1.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.2.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.3.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.4.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.17.5.0/24 [20/0] via 172.31.255.10, eth0.20, 02w0d00h
 * via 172.31.255.11, eth0.20, 02w0d00h
B>* 172.19.7.0/24 [20/1] via 10.40.0.7, eth0.40, 02w0d01h
B>* 172.19.8.0/24 [20/1] via 10.40.0.7, eth0.40, 01w6d20h
C>* 172.31.255.0/26 is directly connected, eth0.20

As you can see above, this router is also allowing multiple equal cost paths. We see all the southbound networks learned by BGP twice in the routing table. This was achieved by simply configuring bgp for a ‘maximum paths’ value of greater than ‘1’:

vyos@router-core1:~$ sh configuration
<snip>
protocols {
 bgp 64512 {
 maximum-paths {
 ebgp 4
 ibgp 4
 }

I’m honestly not sure what load balancing algorithm VyOS implements, but from an NSX perspective, it doesn’t really matter. It doesn’t have to match, it simply needs to balance traffic across each of the available L3 paths. As long as an ingress packet arrives at one of the ESGs, it’ll know how to route it southbound.

So there you have it. There is obviously a lot more to ECMP than what I discussed in this post, but hopefully this helps to clarify a bit about path selection.

Finding the NSX VIB Download URL

Although in most cases, it’s not necessary to obtain the NSX VIBs, there are some situations where you’ll need them. Most notably is when incorporating the VIBs into an image profile used for stateless auto-deploy ESXi hosts.

In older versions of NSX, including 6.0.x and 6.1.x, it used to be possible to use the very simple URL format as follows:

https://<nsxmanagerIP>/bin/vdn/vibs/<esxi-version>/vxlan.zip

For example, if you wanted the NSX VIBs for ESXi 5.5 hosts from an NSX manager with IP address 192.168.0.10, you’d use the following URL:

https://192.168.0.10/bin/vdn/vibs/5.5/vxlan.zip

This URL changed at some point, I expect with the introduction of NSX 6.2.x. VMware now uses a less predictable path, including build numbers. To get the exact locations for your specific version of NSX, you can visit the following URL:

https://<nsxmanagerIP>/bin/vdn/nwfabric.properties

When visiting this URL, you’ll be greeted by several lines of text output, which includes the path to the VIBs for various versions of ESXi. Below is some sample output you’ll see with NSX 6.3.2:

# 5.5 VDN EAM Info
VDN_VIB_PATH.1=/bin/vdn/vibs-6.3.2/5.5-5534162/vxlan.zip
VDN_VIB_VERSION.1=5534162
VDN_HOST_PRODUCT_LINE.1=embeddedEsx
VDN_HOST_VERSION.1=5.5.*

# 6.0 VDN EAM Info
VDN_VIB_PATH.2=/bin/vdn/vibs-6.3.2/6.0-5534166/vxlan.zip
VDN_VIB_VERSION.2=5534166
VDN_HOST_PRODUCT_LINE.2=embeddedEsx
VDN_HOST_VERSION.2=6.0.*

# 6.5 VDN EAM Info
VDN_VIB_PATH.3=/bin/vdn/vibs-6.3.2/6.5-5534171/vxlan.zip
VDN_VIB_VERSION.3=5534171
VDN_HOST_PRODUCT_LINE.3=embeddedEsx
VDN_HOST_VERSION.3=6.5.*

# Single Version associated with all the VIBs pointed by above VDN_VIB_PATH(s)
VDN_VIB_VERSION=6.3.2.5672532

Legacy vib location. Used by code to discover avaialble legacy vibs.
LEGACY_VDN_VIB_PATH_FS=/common/em/components/vdn/vibs/legacy/
LEGACY_VDN_VIB_PATH_WEB_ROOT=/bin/vdn/vibs/legacy/

So as you can see above, the VIB path for ESXi 6.5 for example would be:

VDN_VIB_PATH.3=/bin/vdn/vibs-6.3.2/6.5-5534171/vxlan.zip

And to access this as a URL, you’d simply tack this path onto the normal NSX https location as follows:

https://<nsxmanagerIP>/bin/vdn/vibs-6.3.2/6.5-5534171/vxlan.zip

The file itself is about 17MB in size in 6.3.2. Despite being named vxlan.zip, it actually contains two VIBs – both the VSIP module for DFW and message bus purposes as well as the VXLAN module for logical switching and routing.

NSX 6.2.8 Released!

It’s always an exciting time at VMware when a new NSX build goes ‘GA’. Yesterday (July 6th) marks the official release of NSX-V 6.2.8.

NSX 6.2.8 is a maintenance or patch release focused mainly on bug fixes, and there are quite a few in this one. You can head over to the release notes for the full list, but I’ll provide a few highlights below.

Before I do that, here are the relevant links:

In my opinion, some of the most important fixes in 6.2.8 include the following:

Fixed Issue 1760940: NSX Manager High CPU triggered by many simultaneous vMotion tasks

This was a fairly common issue that we would see in larger deployments with large numbers of dynamic security groups. The most common workflow that would trigger this would be putting a host into maintenance mode, triggering a large number of simultaneous vMotions. I’m happy to see that this one was finally fixed. Unfortunately, it doesn’t seem that this has yet been corrected in any of the 6.3.x releases, but I’m sure it will come. You can find more information in VMware KB 2150668.

Fixed Issue 1854519: VMs lose North to south connectivity after migration from a VLAN to a bridged VXLAN

This next one is not quite so common, but I’ve personally seen a couple of customers hit this. If you have a VM in a VLAN network, and then move it to the VXLAN dvPortgroup associated with the bridge, connectivity is lost. This happens because a RARP doesn’t get sent to update the physical switch’s MAC table (VMware often uses RARP instead of GARP for this purpose). Most customers would use the VLAN backed half of the bridged network for physical devices, and not for VMs, but there is no reason why this shouldn’t work.

Fixed Issue 1849037: NSX Manager API threads get exhausted when communication link with NSX Edge is broken

I think this one is pretty self explanatory – if the manager can’t process API calls, a lot of the manager’s functionality is lost. There are a number of reasons an ESXi host could lose its communication channel to the NSX manager, so definitely a good fix.

Fixed Issue 1813363: Multiple IP addresses on same vNIC causes delays in firewall publish operation

Another good fix that should help to reduce NSX manager CPU utilization and improve scale. Multiple IPs on vNICs is fairly common to see.

Fixed Issue 1798537: DFW controller process on ESXi (vsfwd) may run out of memory

I’ve seen this issue a few times in very large micro segmentation environments, and very happy to see it fixed. This should certainly help improve stability and environment scale.

FreeNAS Power Consumption and ACPI

As I alluded to in part 4 and part 5 of my recent ‘FreeNAS 9.10 Lab Build’ series, I could achieve better power consumption figures after enabling deeper C states. I first noticed this a year or two ago when I had re-purposed a Shuttle SH67H3 cube PC for use with FreeNAS. I was familiar with the expected power draw of this system when using it with other operating systems in the past, but with FreeNAS installed, it seemed higher than it should have been.

After doing some digging on the subject, I came across a thread on the FreeNAS forums describing the default ACPI C-state used and information on how to modify it.

A Bit of Background on ACPI C-States

What are often referred to as ‘C-States’ are basically levels of CPU power savings defined by the ACPI (Advanced Configuration and Power Interface) standard. This standard defines states for other devices in the system as well – not just CPUs – but all states prefixed by a ‘C’ refer to CPU power states.

The states supported by a system will often vary depending on the age of the system and the CPU manufacturer, but most modern Intel based systems will support ACPI states C0, C1, C2 and C3. Some older systems only supported C1 and a lower power state called C1E.

C0 is essentially the state where the CPU is completely awake, operating at its full frequency and performance potential and actually executing instructions. The higher the ‘C’ value, the deeper into power savings or sleep modes the CPU can go.

All ACPI compliant systems must support another state called C1. In the C1 state, the CPU is basically at idle and isn’t executing any instructions. The key requirement for the C1 state is that the CPU must be able to return to the C0 state to execute instructions immediately without any latency or delay. Because of this requirement, there are only so many power saving tweaks that the CPU can implement.

This is where the C2 state comes in, also known as ‘Stop-Clock’. In this ACPI idle state, additional power savings features can be used. This is where modern Intel processors shine. They can do all sorts of power saving wizardry, like turning off unused portions of the CPU or even entire idle CPU cores. Because there can be a very slight delay in returning to the C0 state when using these these features, they cannot be implemented when limited to ACPI C1.

Enabling Deeper ACPI C-States

By default, FreeNAS has ACPI configured to use only the C1 power state. Presumably, this is to guarantee maximum performance and prevent any quirks with older CPUs switching between power states.

If maximum performance and performance consistency is desired over power savings – as is often the case in critical production environments – leaving the default at C1 is probably a wise choice. This choice also becomes much less important if your system is heavily utilized and spends very little time at idle anyway. But if you are like me, running a home lab, idle power consumption is an important consideration.

From an SSH or console prompt, you can determine the detected and supported C-States by querying the relevant ACPI sysctls:

[root@freenas] ~# sysctl -a | grep cx_
hw.acpi.cpu.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00% 0.00% last 19us
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_supported: C1/1/1 C2/3/96
dev.cpu.2.cx_usage: 100.00% 0.00% last 136us
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_supported: C1/1/1 C2/3/96
dev.cpu.1.cx_usage: 100.00% 0.00% last 2006us
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_supported: C1/1/1 C2/3/96
dev.cpu.0.cx_usage: 100.00% 0.00% last 2101us
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_supported: C1/1/1 C2/3/96

As you can see above, my Xeon X3430 supports C1, C2 and C3 states. The cx_usage value appears to report how much time the processor spends in that particular state and the cx_lowest reports the deepest allowed state.

In the C1 state, my system is idling at about 95W total power consumption.

I had originally suspected that Intel SpeedStep technology (a feature that provides dynamic frequency and voltage scaling at idle) was not functioning in ACPI C1, but that doesn’t seem to be the case. My CPU’s normal (non turbo-boost) frequency is 2.4GHz. If SpeedStep is functional, I’d expect it to use one of the following lower power states as defined in the following sysctl:

[root@freenas] ~# sysctl -a | grep dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 2395/95000 2394/95000 2261/78000 2128/63000 1995/57000 1862/46000 1729/36000 1596/32000 1463/25000 1330/19000 1197/17000

As seen above, the processor should be able to scale down to 1197MHz at idle. Even with the powerd daemon stopped, you can still use the powerd command line tool to see what the current CPU frequency is as well as any changes as load increases. Using powerd with the -v verbose option, we can see that the processor frequency does indeed jump up and down in the ACPI C1 state and stabilizes at 1197MHz when idle:

[root@freenas] ~# powerd -v
<snip>
changing clock speed from 1463 MHz to 1330 MHz
load   4%, current freq 1330 MHz ( 9), wanted freq 1263 MHz
load   0%, current freq 1330 MHz ( 9), wanted freq 1223 MHz
load   0%, current freq 1330 MHz ( 9), wanted freq 1197 MHz
changing clock speed from 1330 MHz to 1197 MHz
load  26%, current freq 1197 MHz (10), wanted freq 1197 MHz
load   3%, current freq 1197 MHz (10), wanted freq 1197 MHz
load   0%, current freq 1197 MHz (10), wanted freq 1197 MHz
load   4%, current freq 1197 MHz (10), wanted freq 1197 MHz

You can change the lowest allowed ACPI state by using the following command. In this example, I will allow the system to use more advanced power saving features by setting cx_lowest to C2:

[root@freenas] ~# sysctl hw.acpi.cpu.cx_lowest=C2
hw.acpi.cpu.cx_lowest: C1 -> C2

After making this change, the system power consumption immediately dropped down to about 75W. That’s more than 20% – not bad!

Now if we repeat the previous command, we can see some different reporting:

[root@freenas] ~# sysctl -a | grep cx_
hw.acpi.cpu.cx_lowest: C2
dev.cpu.3.cx_usage: 0.00% 100.00% last 19695us
dev.cpu.3.cx_lowest: C2
dev.cpu.3.cx_supported: C1/1/1 C2/3/96
dev.cpu.2.cx_usage: 5.12% 94.87% last 23us
dev.cpu.2.cx_lowest: C2
dev.cpu.2.cx_supported: C1/1/1 C2/3/96
dev.cpu.1.cx_usage: 1.87% 98.12% last 255us
dev.cpu.1.cx_lowest: C2
dev.cpu.1.cx_supported: C1/1/1 C2/3/96
dev.cpu.0.cx_usage: 1.28% 98.71% last 1200us
dev.cpu.0.cx_lowest: C2
dev.cpu.0.cx_supported: C1/1/1 C2/3/96

Part of the reason the power savings was so significant is because the system is spending over 95% of its time at idle in the C2 state.

Making it Stick

One thing you’ll notice is that after rebooting, this change will revert back to the default again. Since this is a sysctl, you’d think this could just be added as a system tunable parameter in the UI. I tried this to no avail.

After doing some digging, I found a bug reported on this. It appears that this is a known problem due to other rc.d scripts interfering with the cx_lowest sysctl.

I took a look in the /etc/rc.d directory and found a script called power_profile. It does indeed appear to be overwriting the ACPI lowest state at bootup:

[root@freenas] ~# cat /etc/rc.d/power_profile
<snip>
# Set the various sysctls based on the profile's values.
node="hw.acpi.cpu.cx_lowest"
highest_value="C1"
lowest_value="Cmax"
eval value=\$${profile}_cx_lowest
sysctl_set
<snip>

I could probably get by this issue by modifying the startup scripts, but the better solution is to simply add the required command to the list of post-init scripts. This can be done from the UI in the following location:

freenas-acpi-1

They key thing is to ensure is that the command is run ‘post init’. When that is done, the setting sticks and is applied after the power_profile script.

Hopefully this could save you a few dollars on your power bill as well!