NSX – vswitchzero

NSX-T Troubleshooting Scenario 3 – Solution

Welcome to the third instalment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, the customer’s management cluster was in a degraded state. This was due to one manager – 172.16.1.41 – being in a wonky half-broken state. Although we could ping it, we could not login and all of the services it was contributing to the NSX management cluster were down.

nsxt-tshoot3a-1

What was most telling, however, was the screenshot of the VM’s console window.

nsxt-tshoot3a-4

The most important keyword there was “Read-only file system”. As many readers had correctly guessed, this is a very common response to an underlying storage problem. Like most flavors of Linux, the Linux-based OS used in the NSX appliances will set their ext4 partitions to read-only in the event of a storage failure. This is a protective mechanism to prevent data corruption and further data loss.

When this happens, the guest may be partially functional, but anything that requires write access to the read-only partitions will obviously be in trouble. This is why we could ping the manager appliance, but all other functionality was broken. The manager cluster uses ZooKeeper for clustering services. ZooKeeper requires consistent and low-latency write access to disk. Because this wasn’t available to 172.16.1.41, it was marked as down in the cluster.

After discussing this with our fictional customer, we were able to confirm that an ESXi host esx-e3 experienced a total storage outage for a few minutes and that it had since been fixed. They had assumed it was not related because the appliance was on esx-e1, not esx-e3.

Continue reading “NSX-T Troubleshooting Scenario 3 – Solution”

NSX-T Troubleshooting Scenario 3

It’s been a while since I’ve posted anything, so what better way to get back into the swing of things than a troubleshooting scenario! These last few months I’ve been busy learning the ropes in my new role as an SRE supporting NSX and VMware Cloud on AWS. Hopefully I’ll be able to start releasing regular content again soon.

Welcome to the third NSX-T troubleshooting scenario! What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a fictional customer problem statement:

“I’m not experiencing any problems, but I noticed that my NSX-T 2.4.1 manager cluster is in a degraded state. One of the unified appliances appears to be down. I can ping it just fine, but I can’t seem to login to the appliance via SSH. I’m sure I’m putting in the right password, but it won’t let me in. I’m not sure what’s going on. Please help!”

From the NSX-T Overview page, we can see that one appliance is red.

nsxt-tshoot3a-2

Let’s have a look at the management cluster in the UI:

nsxt-tshoot3a-1

The problematic manager is 172.16.1.41. It’s reporting its cluster connectivity as ‘Down’ despite being reachable via ping. It appears that all of the services including controller related services are down for this appliance as well.

nsxt-tshoot3a-3

Strangely, it doesn’t appear to be accepting the admin or root passwords via SSH. We always get an ‘Access Denied’ response. We can login successfully to the other two appliances without issue using the same credentials.

Opening a console window to 172.16.1.41 greets us with the following:

nsxt-tshoot3a-4

Error messages appear to continually scroll by from system-journald mentioning “Failed to write entry”. Hitting enter gives us the login prompt, but we immediately get the same error messages and can’t login.

What’s Next

It seems pretty clear that there is something wrong with 172.16.1.41, but what may have caused this problem? How would you fix this and most importantly, how can you root cause this?

I’ll post the solution in the next day or two, but how would you handle this scenario? Let me know! Please feel free to leave a comment below or via Twitter (@vswitchzero).

NSX-T Troubleshooting Scenario 2 – Solution

Welcome to the second installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, our fictional customer was having northbound communication problems because the physical core router was not getting any of the NSX advertised routes:

vyos@router-core:~$ sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,
B - BGP, > - selected route, * - FIB route

S>* 0.0.0.0/0 [1/0] via 172.16.1.12, eth0.1
C>* 10.99.99.0/27 is directly connected, eth0.2005
C>* 127.0.0.0/8 is directly connected, lo
C>* 172.16.1.0/24 is directly connected, eth0.1
C>* 172.16.11.0/24 is directly connected, eth0.11
C>* 172.16.76.0/24 is directly connected, eth0.76
C>* 172.16.98.0/24 is directly connected, eth0.98

Based on what we observed in the first half, we can make a few assertions:

The T1 routers are advertising their routes just fine to the T0 (a total of 8 routes).
The T0 router is peering with the core router successfully because we received BGP routes from the core router.
The T0 router is configured for route redistribution of NSX connected and Static routes.

Let’s just run through a couple of quick tests to confirm point one above and make sure that the T0 can communicate with the core router. From VRF 2 (the T0 SR), we’ll check the interface IP first:

Continue reading “NSX-T Troubleshooting Scenario 2 – Solution”

NSX-T Troubleshooting Scenario 2

Welcome to the second NSX-T troubleshooting scenario! What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a fictional customer problem statement:

“I’ve just deployed a new NSX-T 2.3.1 environment with two tenants. The T1 routers (one per tenant) appear to be working fine. I have VM to VM connectivity on logical switches, but I can’t get to any northbound networks. The non-NSX core router isn’t getting any of the NSX routes!”

Taking a quick look at the environment, we can see that each tenant T1 router has several logical switches attached. Each is advertising four subnets as can be seen below:

You can also see that the ‘Advertise All NSX Connected Routes’ option is enabled, which should cause these routes to be advertised to the T0.

On the T0, we can see that there are ‘Linked Ports’ to both T1 routers, as well as a VLAN-backed logical switch for northbound communication via edge-e1. Let’s start by ensuring that these routes are actually making it to the T0 SR.

From the edge CLI, I start by listing all logical router instances to determine the VRF for the T0 SR:

New Upgrade Issue in NSX 6.4.4

Be sure to check out VMware KB 67416 before upgrading to 6.4.4.

If you are planning to upgrade to NSX 6.4.4, be sure to have a look at VMware KB 67416 before you do. I’ve seen several customers hit this issue now, and a bit of pre-work before the upgrade can save you a lot of grief.

It appears that if you are using grouping objects, like security groups or IP sets in your ESG firewall rules, there is a chance that your ESG will become unmanageable after NSX Manager gets upgraded to 6.4.4. Most customers will notice this issue when they go to upgrade their ESGs as part of the upgrade process and the tasks fail. In addition to not being able to upgrade the edge, all configuration changes you attempt to make will also fail.

This issue lies in the message bus communication channel between NSX Manager and the ESG. These security groups and IP sets trigger a large number of messages and eventually the channel becomes blocked as a result. Unfortunately, there is no workaround aside from removing these groups and IP sets from the firewall before upgrading. This may not be a feasible workaround for the majority of customers out there.

Although not a common configuration, this issue can also be triggered if DFW rules are applied to ESGs and these rules contain grouping objects.

If you know your environment is configured with security groups and IP sets in the edge firewall, I’d recommend reaching out to VMware technical support prior to beginning your upgrade. Support can proactively install a “hot patch” so that you won’t hit this problem. If you have already hit this, the same hot patch can be applied to get you back up and running. In order for the patch to work, the ESG would have to be re-deployed leading to a brief outage. Obviously getting in front of this issue is a better plan than being reactive.

VMware will be updating the 6.4.4 release notes to reflect this.

NSX-T Troubleshooting Scenario 1 – Solution

Welcome to the first installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, the installation of the NSX-T VIBs were failing with the following error:

nsxt-tshoot1a-5

At first glance, it looked as if the NSX-T VIBs, or an older version of them were already installed. Taking a closer look at the actual VIB names, however, was very telling. The ‘esx-nsxv’ in the name denotes that these belong to NSX for vSphere.

Logging in to host esx-a3 via SSH and checking for installed VIBs with ‘nsx’ in the name came back with the following:

[root@esx-a3:~] esxcli software vib list |grep nsx
esx-nsxv                       6.5.0-0.0.8590012                     VMware      VMwareCertified   2018-08-31

Indeed, the NSX-V VIBs are still installed. Having a look at the environment, we saw that all other traces of NSX-V were gone – the manager, controllers, vmkernel ports, portgroups and Web Client plugin were missing. Only these lingering VIBs were not removed from these three hosts for some reason. It’s important to properly remove NSX to prevent issues like this from occurring.

Removing the NSX-V VIBs

The first order of business was to put the host in maintenance mode. I didn’t have any running VMs created yet, so I just went ahead and put all three in maintenance mode:

nsxt-tshoot1b-2

Once that was done, I could remove the VIBs using the following esxcli software vib command:

Continue reading “NSX-T Troubleshooting Scenario 1 – Solution”

NSX-T Troubleshooting Scenario 1

Welcome to the first NSX-T troubleshooting scenario! My NSX-V troubleshooting scenarios have been well received, so I thought it was time to start a new series for NSX-T. If you’ve got an idea for a scenario, please let me know!

What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a brief problem statement:

“I removed NSX for vSphere from my lab environment and am trying to install NSX-T for a proof of concept. Unfortunately, I get an error message every time I try to install the NSX-T VIBs on my ESXi hosts! I’m running NSX-T 2.3.1, and ESXi 6.5 U2”

In the NSX-T UI, we’re greeted with a simple “NSX Install Failed” message for the host esx-a3:

nsxt-tshoot1a

Clicking on this error gives us a much more verbose error message:

nsxt-tshoot1a-5

The full text of the error message is as follows:

NSX components not installed successfully on compute-manager discovered node. Failed to install software on host. Failed to install software on host. esx-a3.vswitchzero.net : java.rmi.RemoteException: [DependencyError] File path of '/bin/net-vdl2' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/bin/vsip_vm_list.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/etc/vmware/firewall/netCPRuleset.xml' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/bin/vsipioctl' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/usr/lib/vmware/vm-support/bin/dump-vdr-info.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/bin/net-vdr' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/etc/vmsyslog.conf.d/dfwpktlogs.conf' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/init.d/netcpad' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/usr/lib/vmware/netcpa/bin/netcpa' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/bin/dfwpktlogs.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/vmware/firewall/bfdRuleset.xml' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/vmware/vm-support/dfw.mfx' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} Please refer to the log file for more details.

Clicking on the RESOLVE button simply tries the install again, which fails.

Continue reading “NSX-T Troubleshooting Scenario 1”

Manual Installation of NSX-T Kernel Modules in ESXi

Last week, I discussed the manual deployment of NSX-T controller nodes. Today, I’ll take a look at adding standalone ESXi hosts.

Although people usually associate manual deployment with KVM hypervisors, there is no reason you can’t do the same with ESXi hosts. Obviously, automating this process with vCenter Server as a compute manager has its advantages, but one of the empowering features of NSX-T is that is has no dependency on vCenter Server whatsoever.

Obtaining the ESXi VIBs

First, we’ll need to download the ESXi host VIBs. In my case, the hosts are running ESXi 6.5 U2, so I downloaded the correct 6.5 VIBs from the NSX-T download site.

nsxt-manualvib-1

Once I had obtained the ZIP file, I used WinSCP to copy it to the /tmp location on my ESXi host. The file is only a few megabytes in size so it can go just about anywhere. If you’ve got a lot of hosts to do, putting it in a shared datastore makes sense.

Installing the ESXi VIBs

Because the NSX-T kernel module is comprised of a number of VIBs, we need to install it as an ‘offline depot’ as opposed to individual VIB files. That said, there is no need to extract the ZIP file. To install it, I used the esxcli software vib install command as shown below:

[root@esx-a3:/tmp] esxcli software vib install --depot=/tmp/nsx-lcp-2.3.1.0.0.11294289-esx65.zip
Installation Result
   Message: Operation finished successfully.
   Reboot Required: false
   VIBs Installed: VMware_bootbank_epsec-mux_6.5.0esx65-9272189, VMware_bootbank_nsx-aggservice_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-cli-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-common-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-da_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337, VMware_bootbank_nsx-exporter_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-host_2.3.1.0.0-6.5.11294289, VMware_bootbank_nsx-metrics-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-mpa_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-nestdb-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-nestdb_2.3.1.0.0-6.5.11294421, VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485, VMware_bootbank_nsx-opsagent_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-platform-client_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-profiling-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-proxy_2.3.1.0.0-6.5.11294520, VMware_bootbank_nsx-python-gevent_1.1.0-9273114, VMware_bootbank_nsx-python-greenlet_0.4.9-9272996, VMware_bootbank_nsx-python-logging_2.3.1.0.0-6.5.11294409, VMware_bootbank_nsx-python-protobuf_2.6.1-9273048, VMware_bootbank_nsx-rpc-libs_2.3.1.0.0-6.5.11294490, VMware_bootbank_nsx-sfhc_2.3.1.0.0-6.5.11294539, VMware_bootbank_nsx-shared-libs_2.3.0.0.0-6.5.10474844, VMware_bootbank_nsxcli_2.3.1.0.0-6.5.11294343
   VIBs Removed:
   VIBs Skipped:

Remember, your host will need to be in maintenance mode for the installation to succeed. Once finished, a total of 24 new VIBs were installed as shown:

[root@esx-a3:/tmp] esxcli software vib list |grep -i nsx
nsx-aggservice                 2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-cli-libs                   2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-common-libs                2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-da                         2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-esx-datapath               2.3.1.0.0-6.5.11294337                VMware      VMwareCertified   2019-02-15
nsx-exporter                   2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-host                       2.3.1.0.0-6.5.11294289                VMware      VMwareCertified   2019-02-15
nsx-metrics-libs               2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-mpa                        2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-nestdb-libs                2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-nestdb                     2.3.1.0.0-6.5.11294421                VMware      VMwareCertified   2019-02-15
nsx-netcpa                     2.3.1.0.0-6.5.11294485                VMware      VMwareCertified   2019-02-15
nsx-opsagent                   2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-platform-client            2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-profiling-libs             2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-proxy                      2.3.1.0.0-6.5.11294520                VMware      VMwareCertified   2019-02-15
nsx-python-gevent              1.1.0-9273114                         VMware      VMwareCertified   2019-02-15
nsx-python-greenlet            0.4.9-9272996                         VMware      VMwareCertified   2019-02-15
nsx-python-logging             2.3.1.0.0-6.5.11294409                VMware      VMwareCertified   2019-02-15
nsx-python-protobuf            2.6.1-9273048                         VMware      VMwareCertified   2019-02-15
nsx-rpc-libs                   2.3.1.0.0-6.5.11294490                VMware      VMwareCertified   2019-02-15
nsx-sfhc                       2.3.1.0.0-6.5.11294539                VMware      VMwareCertified   2019-02-15
nsx-shared-libs                2.3.0.0.0-6.5.10474844                VMware      VMwareCertified   2019-02-15
nsxcli                         2.3.1.0.0-6.5.11294343                VMware      VMwareCertified   2019-02-15

You can find information on the purpose of some of these VIBs in the NSX-T documentation.

Connecting the ESXi Host to the Management Plane

Now that we have the required software installed, we need to connect the ESXi host to NSX Manager. To begin, we’ll need to get the certificate thumbprint from the NSX Manager:

nsxmanager> get certificate api thumbprint
ccdbda93573cd1dbec386b620db52d5275c4a76a5120087a174d00d4508c1493

Next, we need to drop into the nsxcli shell from the ESXi CLI prompt, and then run the join management-plane command as shown below:

[root@esx-a3] # nsxcli
esx-a3> join management-plane 172.16.1.40 username admin thumbprint ccdbda93573cd1dbec386b620db52d5275c4a76a5120087a174d00d4508c1493
Password for API user: ********
Node successfully registered as Fabric Node: 0b08c694-3155-11e9-8a6c-0f1235732823

If all went well, we should now see our NSX Manager listed as connected:

esx-a3> get managers
- 172.16.1.40      Connected

From the root prompt of the ESXi host, we can see that there are now established TCP connections to the NSX Manager appliance on the RabbitMQ port 5671.

[root@esx-a3:/tmp] esxcli network ip connection list |grep 5671
tcp         0       0  172.16.1.23:55477   172.16.1.40:5671    ESTABLISHED     84232  newreno  mpa
tcp         0       0  172.16.1.23:36956   172.16.1.40:5671    ESTABLISHED     84232  newreno  mpa

From the NSX UI, we can now see the host appear as connected under ‘Standalone Hosts’:

nsxt-manualvib-3

As a next step, you’ll want to add this new host as a transport node and you should be good to go.

It’s great to have the flexibility to do this completely without the assistance of vCenter Server. Anyone who has had to deal with the quirks of VC integration and ESX Agent Manager (EAM) in NSX-V will certainly appreciate this.

NSX-T PCPU Requirements for Edges

New CPU requirements for NSX-T may leave older lab hardware out in the cold.

If you are running old hardware in your lab, you may have come across an unexpected failure while deploying your first NSX-T edge VM.

nsxt-aes-edge-1

The exact error message will be something similar to:

“[Fabric] Edge <uuid> is not ready for configuration error occurred, error detail is NSX Edge configuration has failed. The host does not support required cpu features: [‘aes’].”

The edge will be successfully deployed, but will remain ‘unconfigured’ and will not allow you to add it as a transport node.

The ‘aes’ feature being referred to is Intel’s AES-NI acceleration for cryptography. You can find out more about AES-NI here. In NSX-V, AES-NI was optionally supported for offloading cryptography for VPN related features. It seems that this has now become a hard requirement for NSX-T.

Unfortunately, like vSphere 6.7, NSX-T has minimum CPU requirements that can’t be worked around. If you have a browse through the NSX-T system requirements, you’ll find a note about CPU compatibility in the “NSX Edge VM and Bare-Metal NSX Edge CPU Requirements” section. Listed there is reference to:

Xeon 56xx (Westmere-EP)
Xeon E7-xxxx (Westmere-EX and later CPU generation)
Xeon E5-xxxx (Sandy Bridge and later CPU generation)

This means that anything released prior to 2011 is unlikely to work, with the exception of a few Westermere EP based Xeons, which seem to have spotty success. On the AMD front, it appears that even CPUs with AES instructions will fail similarly due to a CPU compatibility check that is done during edge deployment.

Update: Commenter Ben Kenobi figured out a workaround to get edges to deploy on modern AMD platforms! You can find his workaround discussed below in the comments as well as on his blog here.

My management host uses Xeon E5-2670s, which work fine, but my compute cluster uses very old Xeon X3440s that came out before AES-NI was introduced. Now that I can’t run vSphere 6.7 or an NSX-T edge on these hosts, I think it may finally be time to upgrade.

Unfortunately, it doesn’t appear that there is a workaround for this problem. If anyone does come across a way to avoid this, please let me know!

Deploying NSX-T Controllers Manually

Deploying an NSX-T control cluster manually for maximum control and flexibility.

One of the great things about NSX-T is its complete independence from vCenter Server. You can still link to vCenter Server if you’d like to automate certain tasks, but unlike NSX-V, you can accomplish many deployment tasks manually. One of the firsts things you’ll be doing in a new NSX-T setup is to deploy your control cluster.

Although automated deployment through vCenter and the UI is convenient, there are some additional benefits to manual controller deployment. Firstly, you can select a non-production ‘small’ sized form factor that isn’t selectable in the UI saving you a couple of vCPUs and about 8GB of RAM per appliance. Secondly, deploying manually also allows you to thin-provision your controller VMDKs off the bat. In a home lab, these are some desirable benefits. And of course, there is always the satisfaction you get from running through the process manually and better understanding what happens behind the scenes.

NSXT-controllerdeploy-2

As seen above, the automated controller deployment wizard does not allow the selection of a ‘Small’ form factor.

Deploying Controllers

To begin, you’ll need to download the NSX-T controller OVA. You’ll find it listed along with the other NSX-T deliverables on the download page.

NSXT-controllerdeploy-1

There are a few different ways that you can deploy the OVA including with ovftool. I’m just going to use the vSphere Client for this example. As you can see below, we can now select an unsupported ‘Small’ form-factor deployment:

NSXT-controllerdeploy-3

In addition to this, you’ll get the usual template customization options along with a few new ones you may not have seen listed under ‘Internal Properties’:

NSXT-controllerdeploy-4

As you probably have guessed these internal properties can be used to save some of the work needed to get it connected to the management plane and to the control cluster. I’m going to skip this entire section and run through the process manually from the CLI post-deployment.

Continue reading “Deploying NSX-T Controllers Manually”