Limiting User Scope and Permissions in NSX

Using REST API calls to limit NSX user permissions to specific objects only.

There is a constant stream of new features added with each release of NSX, but not all of the original features have survived. NSX Data Security was one such feature, but VMware also removed the ‘Limit Scope’ option for user permissions in the NSX UI with the release of 6.2.0 back in 2015. Every so often I’ll get a customer asking where this feature went.

The ‘Limit Scope’ feature allows you to limit specific NSX users to specific objects within the inventory. For example, you may want to provide an application owner with full access to only one specific edge load balancer, and to ensure they have access to nothing else in NSX.

The primary reason that the feature was scrapped in 6.2 was because of UI problems that would occur for users restricted to only specific resources. To view the UI properly and as intended, you’d need access to the ‘global root’ object that is the parent for all other NSX managed objects. VMware KB 2136534 is about the only source I could find that discusses this.

REST API Calls Still Exist

Although the ‘Limit Scope’ option was removed from the UI in 6.2 and later, you may be surprised to discover that the API calls for this feature still exist.

To show how this works, I’ll be running through a simple scenario in my lab. For this test, we’ll assume that there are two edges – mercury-esg1 and mercury-dlr that are related to a specific application deployment. A vCenter user called test in the vswitchzero.net domain requires access to these two edges, but we don’t want them to be able to access anything else.

limitscope-1
We want to limit access to only edge-4 and edge-5 for the ‘test’ user.

The two edges in question have morefs edge-4 and edge-5 respectively. For more information on finding moref IDs for NSX objects, see my post on the subject here.

Continue reading “Limiting User Scope and Permissions in NSX”

Properly Removing a LUN/Datastore in vSphere

Taking the time to remove LUNs correctly is worth the effort and prevents all sorts of complications.

This is admittedly a well-covered topic in both the VMware public documentation and in blogs, but I thought I’d provide my perspective on this as well in case it may help others. Unfortunately, improper LUN removal is still something I encounter all too often here in GSS.

Having done a short stint on the VMware storage support team about seven years back, I knew all too well the chaos that would ensue after improper LUN decommissioning. ESX 4.x was particularly bad when it came to handling unexpected storage loss. Often hosts would become unmanageable and reboots were the only way to recover. Today, things are quite different. VMware has made many strides in these areas, including better host resiliency in the face of APD (all paths down) events, as well as introducing PDL (permenant device loss) several years back. Despite these improvements, you still don’t want to yank storage out from under your hypervisors.

Today, I’ll be decommissioning an SSD drive from my freenas server, which will require me to go through these steps.

Step 1 – Evacuate!

Before you even consider nuking a LUN from your SAN, you’ll want to ensure all VMs, templates and files have been migrated off. The easiest way to do this is to navigate to the ‘Storage’ view in the Web Client, and then select the datastore in question. From there, you can click the VMs tab. If you are running 5.5 or 6.0, you may need to go to ‘Related Objects’ first, and then Virtual Machines.

lunremove-1
One VM still resides on shared-ssd0. It’ll need to be migrated off.

In my case, you can see that the datastore shared-ssd still has a VM on it that will need to be migrated. I was able to use Storage vMotion without interrupting the guest.

lunremove-2
It’s easy to forget about templates as they aren’t visible in the default datastore view. Be sure to check for them as well.

Templates do not show up in the normal view, so be sure to check specifically for these as well. Remember, you can’t migrate templates. You’ll need to convert them to VMs first, then migrate them and convert them back to templates. I didn’t care about this one, so just deleted it from disk.

Continue reading “Properly Removing a LUN/Datastore in vSphere”

3D Printing CPU Trays

Better protection and storage for old Socket 7 and Socket 370 CPUs.

As you may know, I’ve been amassing a bit of a collection of retro hardware from the early to late nineties. This includes a number of CPUs from that era – especially those of the socket 7 variety. Storing these has been a bit of a challenge. I’ve never been satisfied with the protection a static bag alone provides for the delicate pins, and I don’t want to wrap up each CPU in bubble wrap either.

About ten years ago, I used to write PC hardware reviews and would quite often get processors from AMD in these neat little trays. Sometimes they held a single CPU, and sometimes as many as eight. They weren’t anything fancy but were perfectly sized for the chips and made of rigid plastic to protect the pins. You can still find these trays on eBay for more modern socket types, but they are much harder to come by for old processors.

3dtray-0
There are many varying socket 7 and socket 370 CPU designs out there.

Having acquired a 3D printer earlier this year, I thought this would be the perfect project to learn how to create 3D models from scratch. Up until now, I’ve mainly just printed community provided models and haven’t really done anything from scratch aside from some very basic shapes.

Getting the Measurements

I had already printed a couple of single CPU protectors from Thingiverse, but they were either not a good fit, used too much filament or took too long to print. I also wanted something that I could put a lid on and create trays that hold more than one CPU. These existing models gave me some ideas, but ultimately, I’d need to take some precise measurements of my CPUs and start from the ground up.

3dtray-4
A digital measurement caliper. A must-have for anyone with a 3D printer.

To begin, I used a ‘digital caliper’ tool that I purchased on Amazon for about $15. I can’t say enough how helpful this tool is to get precise measurements – it makes designing your objects so much easier.

To make sure the tray would work with a wide variety of socket 7 and socket 370 processors, I took a sample of each type I had in my collection:

  • Intel Pentium P54C (133MHz, ceramic top)
  • Intel Celeron Mendocino (400MHz, metal heatspreader). Same design and dimensions as later Pentium MMX CPUs.
  • Intel Pentium 3 (1000MHz Coppermine, no heatspreader)
  • Intel Pentium 3 (1400MHz, Tulatin, different heatspreader design)
  • Cyrix 6x86L (133MHz, gold-top, short heatspreader)
  • AMD K6-2 (500MHz, full heatspreader)
  • AMD K5 (100MHz, similar to Cyrix heatspreader).

Measuring all of these processors got me to the following conclusions:

  • The dimensions varied very slightly, but all were about 49.5mmx49.5mm +/- 0.1mm.
  • Pin height is 3mm on all CPUs
  • Most CPUs had a notch out of the corner, but some didn’t – like Coppermine P3s.
  • CPU thickness (not including pin height) varied from processor to processor due to the heatspreader designs. The thinnest was the P3 coppermine at only 2mm where the exposed core is located. The thickest was the Tulatin at 3.4mm.

Continue reading “3D Printing CPU Trays”

NSX 6.4.3 Now Available!

Express maintenance release fixes two discovered issues.

If it feels like 6.4.2 was just released, you’d be correct – only three weeks ago. The new 6.4.3 release (build 9927516) is what’s referred to as an express maintenance release. These releases aim to correct specific customer identified problems as quickly as possible rather than having to wait many months for the next full patch release.

In this release, only two identified bugs have been fixed. The first is an SSO issue that can occur in environments with multiple PSCs:

“Fixed Issue 2186945: NSX Data Center for vSphere 6.4.2 will result in loss of SSO functionality under specific conditions. NSX Data Center for vSphere cannot connect to SSO in an environment with multiple PSCs or STS certificates after installing or upgrading to NSX Data Center for vSphere 6.4.2.”

The second is an issue with IPsets that can impact third party security products – like Palo Alto Networks and Checkpoint Net-X services for example:

“Issue 2186968: Static IPset not reported to containerset API call. If you have service appliances, NSX might omit IP sets in communicating with Partner Service Managers. This can lead to partner firewalls allowing or denying connections incorrectly. Fixed in 6.4.3.”

You can find more information on these problems in VMware KB 57770 and KB 57834.

So knowing that these are the only two fixes included, the question obviously becomes – do I really need to upgrade?

If you are running 6.4.2 today, you might not need to. If you have more than one PSC associated with the vCenter Server that NSX manager connects to, or if you use third party firewall products that work in conjunction with NSX, the answer would be yes. If you don’t, there is really no benefit to upgrading to 6.3.4 and it would be best to save your efforts for the next major release.

That said, if you were already planning an upgrade to 6.4.2, it only makes sense to go to 6.4.3 instead. You’d get all the benefits of 6.4.2 plus these two additional fixes.

Kudos goes out to the VMware NSBU engineering team for their quick work in getting these issues fixed and getting 6.4.3 out so quickly.

Relevant Links:

 

VMware Tools 10.3.2 Now Available

New bundled VMXNET3 driver corrects PSOD crash issue.

As mentioned in a recent post, a problem in the tools 10.3.0 bundled VMXNET3 driver could cause host PSODs and connectivity issues. As of September 12th, VMware Tools 10.3.2 is now available, which corrects this issue.

The problematic driver was version 1.8.3.0 in tools 10.3.0. According to the release notes, it has been replaced with version 1.8.3.1. In addition to this fix, there are four resolved issues listed as well.

VMware mentions the following in the 10.3.2 release notes:

Note: VMware Tools 10.3.0 is deprecated due to a VMXNET3 driver related issue. For more information, see KB 57796. Install VMware Tools 10.3.2, or VMware Tools 10.2.5 or an earlier version of VMware Tools.”

Kudos to the VMware engineering teams for getting 10.3.2 released so quickly after the discovery of the problem!

Relevant links:

 

PSOD and Connectivity Problems with VMware Tools 10.3.0

Downgrading to Tools 10.2.5 is an effective workaround.

If you have installed the new VMware Tools 10.3.0 release in VMs running recent versions of Windows, you may be susceptible to host PSODs and other general connectivity problems. VMware has just published KB 57796 regarding this problem, and has recalled 10.3.0 so that it’s no longer available for download.

Tools 10.3.0 includes a new version of the VMXNET3 vNIC driver – version 1.8.3.0 – for Windows, which seems to be the primary culprit. Thankfully, not every environment with Tools 10.3.0 will run into this. It appears that the following conditions must be met:

  1. You are running a build of ESXi 6.5.
  2. You have Windows 2012, Windows 8 or later VMs with VMXNET3 adapters.
  3. The VM hardware is version 13 (the version released along with vSphere 6.5).
  4. Tools 10.3.0 with the 1.8.3.0 VMXNET3 driver is installed in the Windows guests.

VMware is planning to have this issue fixed in the next release of Tools 10.3.x.

If you fall into the above category and are at risk, it would be a good idea to address this even if you haven’t run into any problems. Since this issue is specific to VMXNET3 version 1.8.3.0 – which is bundled only with Tools 10.3.0 – downgrading to Tools 10.2.5 is an effective workaround. Simply uninstall tools, and re-install version 10.2.5, which is available here.

Another option would be to replace VMXNET3 adapters with E1000E based adapters in susceptible VMs. I would personally rather downgrade to Tools 10.2.5 as both of these actions would cause VM impact and the VMXNET3 adapter is far superior.

Again, you’d only need to do this for VMs that fall into the specific categories listed above. Other VMs can be left as-is running 10.3.0 without concern.

On a positive note, Tools 10.3.0 hasn’t been bundled with any builds of ESXi 6.5, so unless you’ve gone out and obtained tools directly from the VMware download page recently, you shouldn’t have it in your environment.

A New Look

A new theme that’s easier to read and mobile friendly.

You may have noticed that the blog has a bit of a fresh new look as of late. When I had originally started the site over a year ago, I used the trusty ‘twenty twelve’ wordpress theme for it’s simplicity and ease of use. Although I liked its simple layout, it was dated and wasn’t particularly mobile-friendly. I’ve since moved over to ‘twenty sixteen’, which is much more customizable, easier to read and works a lot better on mobile devices. I hope that this will be a positive change for the site.

Please be patient with me over the next few days as I iron out the quirks and get the CSS styling to behave. I noticed that some of the images are not aligning correctly, among other things. Thanks for your patience!

Manual Upgrade of NSX Host VIBs

Complete manual control of the NSX host VIB upgrade process without the use of vSphere DRS.

NSX host upgrades are well automated these days. By taking advantage of ‘fully automated’ DRS, hosts in a cluster can be evacuated, put in maintenance mode, upgraded, and even rebooted without any user intervention. By relying on DRS for resource scheduling, NSX doesn’t have to worry about doing too many hosts simultaneously and the process can generally be done without end-users even noticing.

But what if you don’t want this level of automation? Maybe you’ve got very sensitive VMs that can’t be migrated, or VMs pinned to hosts for some reason. Or maybe you just want maximum control of the upgrade process and which hosts are upgraded – and when.

There is no reason why you can’t have full control of the host upgrade process and leave DRS in manual mode. This is indeed supported.

Most of the documentation and guides out there assume that people will want to take advantage of DRS-driven upgrades, but this doesn’t mean it’s the only supported method. There is no reason why you can’t have full control of the host upgrade process and this is indeed supported. Today I’ll be walking through this in my lab as I upgrade to NSX 6.4.1.

Step 1 – Clicking the Upgrade Link

Once you’ve upgraded your NSX manager and control cluster, you should be ready to begin tackling your ESXi host clusters. Before you proceed, you’ll need to ensure your host clusters have DRS set to ‘Manual’ mode. Don’t disable DRS – that will get rid of your resource pools. Manual mode is sufficient.

Next, you’ll need to browse to the usual ‘Installation’ section in the UI, and click on the ‘Host Preparation’ tab. From here, it’s now safe to click the ‘Upgrade Available’ link on the cluster to begin the upgrade process. Because DRS is in manual mode, nothing will be able happen. Hosts can’t be evacuated, and as a result, VIBs can’t be upgraded. In essence, the upgrade has started, but immediately stalls and awaits manual intervention.

 

upgnodrs-3
This upgrade is essentially hung up waiting for hosts to enter maintenance mode.

 

In 6.4.1 as shown above, a clear banner message is displayed reminding you that DRS is in manual mode and that hosts must be manually put in maintenance mode.

Continue reading “Manual Upgrade of NSX Host VIBs”

Home Lab Power Automation – Part 3

In part 2, I shared the PowerCLI scripting I used to power on my entire lab environment in the correct order. In this final installment, I’ll take you through the scripting used to power everything down. Although you may think the process is just the reverse of what I covered in part 2, you’ll see there were some other things to consider and different approaches required.

Step 1 – Shutting Down Compute Cluster VMs

To begin the process, I’d need to shut down all VMs in the compute-a cluster. None of the VMs there are essential for running the lab, so they can be safely stopped at any time. I was able to do this by connecting to vCenter with PowerCLI and then using a ‘foreach’ loop to gracefully shut down any VMs in the ‘Powered On’ state.

"Connecting to vCenter Server ..." |timestamp
Connect-VIServer -Server 172.16.1.15 -User administrator@vsphere.local -Password "VMware9("

"Shutting down all VMs in compute-a ..." |timestamp
$vmlista = Get-VM -Location compute-a | where{$_.PowerState -eq 'PoweredOn'}
foreach ($vm in $vmlista)
    {
    Shutdown-VMGuest -VM $vm -Confirm:$false | Format-List -Property VM, State
    }

The above scripting ensures the VMs start shutting down, but it doesn’t tell me that they completed the process. After this is run, it’s likely that one or more VMs may still be online. Before I can proceed, I need to check that they’re all are in a ‘Powered Off’ state.

"Waiting for all VMs in compute-a to shut down ..." |timestamp
do
{
    "The following VM(s) are still powered on:"|timestamp
    $pendingvmsa = (Get-VM -Location compute-a | where{$_.PowerState -eq 'PoweredOn'})
    $pendingvmsa | Format-List -Property Name, PowerState
    sleep 1
} until($pendingvmsa -eq $null)
"All VMs in compute-a are powered off ..."|timestamp

A ‘do until’ loop does the trick here. I simply populate the list of all powered on VMs into the $pendingvmsa variable and print that list. After a one second delay, the loop continues until the $pendingvmsa variable is null. When it’s null, I know all of the VMs are powered off and I can safely continue.

Continue reading “Home Lab Power Automation – Part 3”

Home Lab Power Automation – Part 2

In part 1, I shared some of the tools I’d use to execute the power on and shutdown tasks in my lab. Today, let’s have a look at my startup PowerCLI script.

A Test-Connection Cmdlet Replacement

As I started working on the scripts, I needed a way to determine if hosts and devices were accessible on the network. Unfortunately, the Test-Connection cmdlet was not available in the Linux PowerShell core release. It uses the Windows network stack to do its thing, so it may be a while before an equivalent gets ported to Linux. As an alternative, I created a simple python script that achieves the same overall result called pinghost.py. You can find more detail on how it works in a post I did a few months back here.

The script is very straightforward. You specify up to three space separated IP addresses or host names as command line arguments, and the script will send one ICMP echo request to each of the hosts. Depending on the response, it will output either ‘is responding’ or ‘is not responding’. Below is an example:

pi@raspberrypi:~/scripts $ python pinghost.py vc.lab.local 172.16.10.67 172.16.10.20
vc.lab.local is not responding
172.16.10.67 is responding
172.16.10.20 is not responding

Then using this script, I could create sleep loops in PowerShell to wait for one or more devices to become responsive before proceeding.

Adding Timestamps to Script Output

As I created the scripts, I wanted to record the date/time of each event and output displayed. In a sense, I wanted it to look like a log that could be written to a file and referred to later if needed. To do this, I found a simple PowerShell filter that could be piped to each command I ran:

#PowerShell filter to add date/timestamps
filter timestamp {"$(Get-Date -Format G): $_"}

Step 1 – Power On the Switch

Powering up the switch requires the use of the tplink_smartplug.py python script that I discussed in part 1. The general idea here is to instruct the smart plug to set its relay to a state of ‘1’. This brings the switch to life. I then get into a ‘do sleep’ loop in PowerCLI until the Raspberry Pi is able to ping the management interface of the switch. More specifically, it will wait until the pinghost.py script returns a string of “is responding”. If that string isn’t received, it’ll wait two seconds, and then try again.

"Powering up the 10G switch ..." |timestamp
/home/pi/scripts/tplink-smartplug-master/tplink_smartplug.py -t 192.168.1.199 -c on |timestamp

"Waiting for 10G switch to boot ..." |timestamp
do
{
$pingresult = python ~/scripts/pinghost.py 172.16.1.1 |timestamp
$pingresult
sleep 2
} until($pingresult -like '*is responding*')

When run, the output looks similar to the following:

Continue reading “Home Lab Power Automation – Part 2”