MikroTik CRS309 Configuration – Part 1

The first in a multi-part series on configuring the CRS309 for a VMware home lab. Part 1 covers 802.1q VLAN trunking.

This will be the first of several posts on configuring the new MikroTik CRS309-1G-8S+IN for use in a VMware home lab. Today’s first post will introduce MikroTik’s RouterOS and will focus on VLAN trunking. Please note that although this was written for the CRS309, this should be applicable to all CRS3xx series switches, including the popular CRS317.

RouterOS vs SwOS

MikroTik’s RouterOS firmware that comes pre-loaded on the switch is very feature rich with loads of L3 functionality. It has everything from BGP/OSPF, to firewalling, NAT, DHCP services – you name it. It’s a multi-purpose firmware that was originally created for MikroTik’s routers. Today, it works with many of their other products, including switches, and wireless devices. Because of this, it feels more like router firmware adapted for use with switches as opposed to something designed especially for them. Configuring RouterOS on the CRS309/CRS317 can be challenging and there is a learning curve to overcome – even for those with a lot of networking experience. Things get easier when you learn the terminology and understand what does and does not apply to a switch within RouterOS.

The CRS309 also includes ‘SwOS’ – a straight-forward L2 firmware that is designed especially for their switches. I found SwOS’s UI to be very intuitive and a lot more like what I’d expect to see on a switch. If you are not looking to use any of the advanced L3 features that RouterOS offers, I’d highly recommend using SwOS instead. There is no need to flash back and forth from RouterOS to SwOS and vice-versa. Both firmware packages coexist in the flash memory and you can easily switch back and forth as required.

Although SwOS could suffice for my needs in the home lab, I’m not one to shy away from a challenge and really want to be able to take advantage of some of the L3 features. If I could get them to work, I could potentially retire one or two virtual machines that I currently use for routing and firewalling.

802.1q VLAN Trunking

One of the first things I tried to do with the CRS309 was get a bunch of VLANs created and all the interfaces configured as 802.1q VLAN trunk ports. This is probably the most common VMware home lab requirement with 10Gbps networking. Each host will have a limited number of interfaces, so VLAN trunks are critical to separate out the various services.

When you get into the RouterOS WebFig interface, you’ll find several places where VLANs can be configured. My first attempt was under ‘Switch’ and ‘VLANs’ – sounds logical enough:

crs309-vlans1

But clearly this doesn’t work. There is also a section for VLANs under ‘Interfaces’, but this is for VLAN interface creation, which I’ll get into later. One of the first things to keep in mind with RouterOS is that a lot of the configuration relating to the switch – including VLAN configuration – is done in the ‘Bridge’ section – not the Switch section.

From the CRS3xx series manual:

“Since RouterOS v6.41 bridges provide VLAN aware Layer2 forwarding and VLAN tag modifications within the bridge. This set of features makes bridge operation more like a traditional Ethernet switch and allows to overcome Spanning Tree compatibility issues compared to configuration when tunnel-like VLAN interfaces are bridged..”

Another important thing to remember is that any VLAN related configuration you do in the bridge does not come into effect until an option called vlan-filtering is set to ‘yes’ for the bridge itself.

Continue reading “MikroTik CRS309 Configuration – Part 1”

MikroTik CRS309-1G-8S+IN

Initial impressions on a very energy efficient 10Gbps SFP+ switch and an excellent choice for a VMware home lab!

A little over a year ago I decided to get my lab 10Gbps capable. At the time, the Quanta LB6M seemed like the obvious choice with 24x SFP+ interfaces, several 1Gbps copper interfaces and a price tag of only $300 USD. It worked well in my lab, but as you can imagine, older 10Gbps technology is far from energy efficient. The switch idled at close to 130W, which wasn’t far off from all three of my compute hosts put together. It was also very loud with seven high-RPM 40mm fans. Even when at their lowest setting, these chassis and PSU fans were painfully loud. There is no doubt that this switch is more at home in a datacenter than a home lab. It wasn’t long before I started keeping an eye out for newer offerings based on more efficient and cost-effective technologies. That’s when I came across MikroTik’s latest 3xx series “Cloud Router Switches”.

The MikroTik CRS309-1G-8S+IN

MikroTik is a technology firm based out of Latvia and is well known for their unique networking products. They sell a range of switches, routers, wireless products and even their caseless RouterBOARD systems for those interested in doing custom routers.

I was originally looking to purchase the CRS309’s bigger brother – the CRS317. Feature-wise they are very similar, but the CRS317 doubles up on SFP+ and 1Gbps copper interfaces. With two SFP+ ports on each of my four ESXi hosts, the CRS309’s eight SFP+ ports seemed to be sufficient. The feature that sold me on the CRS309 as opposed to the CRS317, however, was its completely passive/fanless design. The CRS317’s fans only spin up if the unit gets hot, but I liked the simplicity of a completely fanless solution.

Feature-wise, the CRS309 is a pretty impressive switch. Its state of the art Marvel Prestera 98DX8208 switching chip (98DX8216 in the CRS317) is what makes this such an efficient unit. The 98DX8208 is highly integrated and is good for line rate forwarding on all the SFP+ ports. It also includes an integrated dual core 32-bit ARM based processor running 800MHz. You can find more information on the Marvel Prestera here. The flash storage is a pretty spartan 16MB, but this seems to be plenty for the RouterOS and SwOS firmware to coexist on the switch simultaneously. I’ll get more into the CRS309’s software in a future post.

From a L2 perspective, it’s a perfectly capable unit. It can do line-rate L1 and L2 forwarding at about 81,000Mbps aggregate, or 162,000 Mbps full-duplex. This is because the 98DX8208 has an ASIC switching component and all L2 forwarding is done in hardware. The L3 features of the switch, however, are all done in software and must be processed by the integrated ARM CPU cores. Based on MikroTik’s test results, about 2.5Gbps can be expected when traffic has to go through the CPU. Your mileage may vary though depending on the features you use. This may not sound great, but in a home lab environment, you don’t really need high throughput for inter-VLAN routing or other L3 features. I’m most interested in having 10Gbps line rate throughput for host-to-host VSAN traffic, iSCSI, vMotion etc. If you really do need faster routing, you could always spin up a VyOS VM and use your hypervisor’s horsepower for that purpose. Given the $269 USD price of the unit, I think it’s awesome that you get a full suite of L3 features even if throughput is somewhat limited.

You can find more information and the specifications of the CRS309 at Mikrotik’s site.

Continue reading “MikroTik CRS309-1G-8S+IN”

Manually Patching an ESXi Host from the CLI

Manually patching standalone ESXi hosts without access to vCenter or Update Manager using offline bundles and the CLI.

There are many different reasons you may want to patch your ESXi host. VMware regularly releases bug fixes and security patches, or perhaps you need a newer build for compatibility with another application or third-party tool. In my situation, the ESXi 6.7 U1 ESXi hosts (build 10302608) are not compatible with NSX-T 2.4.0, so I need to get them patched to at least 6.7 EP06 (build 11675023).

hostupgcli-1

Before you get started, you’ll want to figure out which patch release you want to update to. There is quite often some confusion surrounding the naming of VMware patch releases. In some cases, a build number is referenced, for example, 10302608. In other cases, a friendly name is referenced – something like 6.7 EP06 or 6.5 P03. The EP in the name denotes an ‘Express Patch’ with a limited number of fixes released outside of the regular patch cadence, where as a ‘P’ release is a standard patch. In addition to this, major update releases are referred to as ‘U’, for example, 6.7 U1. And to make things more confusing, a special ‘Release Name’ is quite often referenced in security bulletins and other documents. Release names generally contain the release date in them. For example, ESXi670-201903001 for ESXi 6.7 EP07.

The best place to start is VMware KB 1014508, which provides links to numerous KB articles that can be used for cross referencing build numbers with friendly versions names. The KB we’re interested in for ESXi is KB 2143832.

Continue reading “Manually Patching an ESXi Host from the CLI”

NSX-T Troubleshooting Scenario 2 – Solution

Welcome to the second installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, our fictional customer was having northbound communication problems because the physical core router was not getting any of the NSX advertised routes:

vyos@router-core:~$ sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,
B - BGP, > - selected route, * - FIB route

S>* 0.0.0.0/0 [1/0] via 172.16.1.12, eth0.1
C>* 10.99.99.0/27 is directly connected, eth0.2005
C>* 127.0.0.0/8 is directly connected, lo
C>* 172.16.1.0/24 is directly connected, eth0.1
C>* 172.16.11.0/24 is directly connected, eth0.11
C>* 172.16.76.0/24 is directly connected, eth0.76
C>* 172.16.98.0/24 is directly connected, eth0.98

Based on what we observed in the first half, we can make a few assertions:

  1. The T1 routers are advertising their routes just fine to the T0 (a total of 8 routes).
  2. The T0 router is peering with the core router successfully because we received BGP routes from the core router.
  3. The T0 router is configured for route redistribution of NSX connected and Static routes.

Let’s just run through a couple of quick tests to confirm point one above and make sure that the T0 can communicate with the core router. From VRF 2 (the T0 SR), we’ll check the interface IP first:

Continue reading “NSX-T Troubleshooting Scenario 2 – Solution”

NSX-T Troubleshooting Scenario 2

Welcome to the second NSX-T troubleshooting scenario! What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a fictional customer problem statement:

“I’ve just deployed a new NSX-T 2.3.1 environment with two tenants. The T1 routers (one per tenant) appear to be working fine. I have VM to VM connectivity on logical switches, but I can’t get to any northbound networks. The non-NSX core router isn’t getting any of the NSX routes!”

Taking a quick look at the environment, we can see that each tenant T1 router has several logical switches attached. Each is advertising four subnets as can be seen below:

nsxt-tshoot2a-2

You can also see that the ‘Advertise All NSX Connected Routes’ option is enabled, which should cause these routes to be advertised to the T0.

nsxt-tshoot2a-3

On the T0, we can  see that there are ‘Linked Ports’ to both T1 routers, as well as a VLAN-backed logical switch for northbound communication via edge-e1. Let’s start by ensuring that these routes are actually making it to the T0 SR.

From the edge CLI, I start by listing all logical router instances to determine the VRF for the T0 SR:

Continue reading “NSX-T Troubleshooting Scenario 2”

Noctua NH-U9DX i4 and NH-D9DX i4 3U Heatsink Review

There are many different approaches you can take when building a VMware home lab, but I always prefer to do custom-build tower systems. This allows me to select the components I want – the fans, heatsinks, PSU – to keep the noise to a minimum. Even though I keep the lab in the basement, I really don’t like to hear it running. I don’t need the density that a rack provides, and I’m quite happy to have my nearly silent towers humming along on a wire shelf instead. Not to mention that lower noise usually equates to lower power consumption and of course, it keeps my wife happy as well – the most critical metric of all.

I finally got around to upgrading my three ancient compute nodes recently. I have been using the included Dynatron R13 1U copper heatsinks that came with the motherboards I bought on eBay. They work, and keep the CPUs relatively cool, but as you can imagine, noise was not a key consideration in their design. They are as compact and as efficient as possible at only an inch tall. At idle, the Supermicro X9SRL-F keeps them at a low RPM, but put any load on the systems and you’re quickly at 7500RPM and the noise is pretty unbearable. No problem for a datacenter, but not for a home lab.

Today I’ll be taking a bit of a departure from my regular posts and will be doing an in-depth review of two high-end Noctua heatsinks. Noctua was kind enough to send me a review sample of not one, but two of their Xeon heatsinks – the NH-U9DX i4 and the NH-D9DX i4 3U.

Noctua

noctua_200x200pxNoctua is an Austrian company well known for their low noise fans and high-end heatsinks. I’ve been using Noctua heatsinks for ages. In fact, I reviewed some of their original heatsinks and fans many years ago when I used to write hardware reviews. This included their original NH-U12P, the NH-C12P and the smaller NH-U9B. Back then, I praised them for their high-quality construction, near silent operation, excellent mounting hardware and most importantly – excellent cooling performance. That was over ten years ago, and it seems that Noctua is still very well respected for all the same reasons today.

Their gear has always been pricey compared to the competition, but when it comes to Noctua, you get what you pay for.

Square vs Narrow ILM LGA 2011

One of the challenges I had with my new/used Supermicro X9SRL-F boards is that they don’t use the common ‘square’ LGA2011 mounting pattern. Instead, they use what’s referred to as ‘Narrow ILM’ that allows the socket to be more rectangular in shape. This allows more real estate for memory slots and other components on the board. Because of this, I was severely limited in my heatsink choices. Most of what’s available out there for Narrow ILM are rack mount heatsinks similar to what I’m hoping to remove. The majority of the consumer-grade stuff for LGA2011 simply won’t fit.

noctuai4-24

Here you can see the Dynatron R13 narrow ILM heatsink mounted:

Continue reading “Noctua NH-U9DX i4 and NH-D9DX i4 3U Heatsink Review”

Low-Voltage DDR3 – Performance, Heat and Power

When I built my new compute nodes, I chose PC3L-12800R DIMMs that could run in both standard 1.5V and 1.35V low-voltage modes. What I didn’t realize, however, is that Intel’s specification for 1.35V registered memory on Socket R based systems limits low-voltage operation to 1333MHz. The Supermicro X9SRL-F boards I’m using enforce this, despite the modules being capable of the full 1600MHz at 1.35V.

This meant I could run the modules at 1333MHz and enjoy the power/heat reduction that goes with 1.35V operation, or I could force the modules to run at 1600MHz at the ‘standard’ 1.5 volts. As I considered different configuration options, a lot of questions came to mind. Today I’ll be taking an in-depth look at DDR3 memory bandwidth, latency, power consumption and heat. I hope to answer the following questions in this post:

  1. What is the bandwidth difference between 1333MHz with tighter 9-9-9 timings versus 1600MHz at looser 11-11-11 timings?
  2. What is the latency difference between 1333MHz with tighter 9-9-9 timings versus 1600MHz at looser 11-11-11 timings?
  3. Just how much power savings can be realized at 1.35V versus 1.5V?
  4. How much cooler will my DIMMs run at 1.35V versus 1.5V?
  5. What about overclocking my DIMMs to 1867MHz at 12-12-12 timings?
  6. How does memory bandwidth and latency fair with fewer than 4 DIMMs populated on the SandyBridge-EP platform?

A Note On Overclocking

Although there are IvyBridge-EP CPUs that support higher 1866MHz DIMM operation, my SandyBridge-EP based E5-2670s and PC3L-12800 DIMMs do not officially support this. The Supermicro X9SRL-F allows the forcing of 1866MHz at 12-12-12 timings even though the DIMMs don’t have this speed in their SPD tables. Despite being server-grade hardware, this is an ‘overclock’ plain and simple.

I would never consider doing this in a production environment, but in a lab for educational purposes – it was worth a shot. Thankfully, to my surprise, the system appears completely stable at 1866MHz and 1.5V. Because the DIMMs can technically run at 1600MHz with a lower 1.35V, it would seem logical that at 1.5V they’d have additional headroom available. This certainly appears to be true in my case – even with a mixture of three different brands of PC3L-12800R.

With my overclocking success, I decided to include results for 1866MHz operation at 1.5V to see if it’s worth pushing the DIMMs beyond their rated specification.

Bandwidth and Latency

Back in my days as an avid overclocker, increasing memory frequency would generally provide better overall performance than tightening the CAS, RAS and other timings of the modules. Obviously doing both was ideal but if you had to choose between higher frequency and tighter timings, you’d usually come out ahead by sacrificing tighter timings for higher frequencies.

I fully expected the 1600MHz memory running at 11-11-11 timings to be quicker overall than 1333MHz at 9-9-9, but the question was just how much quicker. To test, I booted one of my X9SRL-F systems from a USB stick running Lubuntu 18.04 and used Intel’s handy ‘Memory Latency Checker’ (MLC) tool. MLC allows you to get several performance metrics including peak bandwidth, latency as well as various cross-socket NUMA measurements that can be handy for multi-socket processor systems. I’ll only be looking at a few metrics:

  1. Peak Read Bandwidth
  2. 1:1 Reads/Writes Bandwidth
  3. Idle Latency

memspeed-1

As I expected, despite the looser timings, memory bandwidth increases substantially as the frequency increases. About a 12% boost is obtained going to 1600MHz at 11-11-11 timings. An additional 10% was obtained by overclocking the modules to 1866MHz. Mixed reads and writes get a pretty proportional boost in all three tests.

Continue reading “Low-Voltage DDR3 – Performance, Heat and Power”

New Upgrade Issue in NSX 6.4.4

Be sure to check out VMware KB 67416 before upgrading to 6.4.4.

If you are planning to upgrade to NSX 6.4.4, be sure to have a look at VMware KB 67416 before you do. I’ve seen several customers hit this issue now, and a bit of pre-work before the upgrade can save you a lot of grief.

It appears that if you are using grouping objects, like security groups or IP sets in your ESG firewall rules, there is a chance that your ESG will become unmanageable after NSX Manager gets upgraded to 6.4.4. Most customers will notice this issue when they go to upgrade their ESGs as part of the upgrade process and the tasks fail. In addition to not being able to upgrade the edge, all configuration changes you attempt to make will also fail.

This issue lies in the message bus communication channel between NSX Manager and the ESG. These security groups and IP sets trigger a large number of messages and eventually the channel becomes blocked as a result. Unfortunately, there is no workaround aside from removing these groups and IP sets from the firewall before upgrading. This may not be a feasible workaround for the majority of customers out there.

Although not a common configuration, this issue can also be triggered if DFW rules are applied to ESGs and these rules contain grouping objects.

If you know your environment is configured with security groups and IP sets in the edge firewall, I’d recommend reaching out to VMware technical support prior to beginning your upgrade. Support can proactively install a “hot patch” so that you won’t hit this problem. If you have already hit this, the same hot patch can be applied to get you back up and running. In order for the patch to work, the ESG would have to be re-deployed leading to a brief outage. Obviously getting in front of this issue is a better plan than being reactive.

VMware will be updating the 6.4.4 release notes to reflect this.

NSX-T Troubleshooting Scenario 1 – Solution

Welcome to the first installment of a new series of NSX-T troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

As we saw in the first half, the installation of the NSX-T VIBs were failing with the following error:

nsxt-tshoot1a-5

At first glance, it looked as if the NSX-T VIBs, or an older version of them were already installed. Taking a closer look at the actual VIB names, however, was very telling. The ‘esx-nsxv’ in the name denotes that these belong to NSX for vSphere.

Logging in to host esx-a3 via SSH and checking for installed VIBs with ‘nsx’ in the name came back with the following:

[root@esx-a3:~] esxcli software vib list |grep nsx
esx-nsxv                       6.5.0-0.0.8590012                     VMware      VMwareCertified   2018-08-31

Indeed, the NSX-V VIBs are still installed. Having a look at the environment, we saw that all other traces of NSX-V were gone – the manager, controllers, vmkernel ports, portgroups and Web Client plugin were missing. Only these lingering VIBs were not removed from these three hosts for some reason. It’s important to properly remove NSX to prevent issues like this from occurring.

Removing the NSX-V VIBs

The first order of business was to put the host in maintenance mode. I didn’t have any running VMs created yet, so I just went ahead and put all three in maintenance mode:

nsxt-tshoot1b-2

Once that was done, I could remove the VIBs using the following esxcli software vib command:

Continue reading “NSX-T Troubleshooting Scenario 1 – Solution”

NSX-T Troubleshooting Scenario 1

Welcome to the first NSX-T troubleshooting scenario! My NSX-V troubleshooting scenarios have been well received, so I thought it was time to start a new series for NSX-T. If you’ve got an idea for a scenario, please let me know!

What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a brief problem statement:

“I removed NSX for vSphere from my lab environment and am trying to install NSX-T for a proof of concept. Unfortunately, I get an error message every time I try to install the NSX-T VIBs on my ESXi hosts! I’m running NSX-T 2.3.1, and ESXi 6.5 U2”

In the NSX-T UI, we’re greeted with a simple “NSX Install Failed” message for the host esx-a3:

nsxt-tshoot1a

Clicking on this error gives us a much more verbose error message:

nsxt-tshoot1a-5

The full text of the error message is as follows:

NSX components not installed successfully on compute-manager discovered node. Failed to install software on host. Failed to install software on host. esx-a3.vswitchzero.net : java.rmi.RemoteException: [DependencyError] File path of '/bin/net-vdl2' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/bin/vsip_vm_list.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/etc/vmware/firewall/netCPRuleset.xml' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/bin/vsipioctl' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/usr/lib/vmware/vm-support/bin/dump-vdr-info.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/bin/net-vdr' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} File path of '/etc/vmsyslog.conf.d/dfwpktlogs.conf' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/init.d/netcpad' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/usr/lib/vmware/netcpa/bin/netcpa' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/bin/dfwpktlogs.sh' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/vmware/firewall/bfdRuleset.xml' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_nsx-netcpa_2.3.1.0.0-6.5.11294485', 'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012'} File path of '/etc/vmware/vm-support/dfw.mfx' is claimed by multiple non-overlay VIBs: {'VMware_bootbank_esx-nsxv_6.5.0-0.0.8590012', 'VMware_bootbank_nsx-esx-datapath_2.3.1.0.0-6.5.11294337'} Please refer to the log file for more details.

Clicking on the RESOLVE button simply tries the install again, which fails.

Continue reading “NSX-T Troubleshooting Scenario 1”