NSX Troubleshooting Scenario 14 – Solution

Welcome to the fourteenth installment of a new series of NSX troubleshooting scenarios. Thanks to everyone who took the time to comment on the first half of the scenario. Today I’ll be performing some troubleshooting and will show how I came to the solution.

Please see the first half for more detail on the problem symptoms and some scoping.

Getting Started

In the first half, our fictional customer was trying to prevent a specific summary route from being advertised to a DLR appliance using a BGP filter. Every time they added the filter, all connectivity to VMs downstream from that DLR was lost.

tshoot14a-4

The filter appears correct. The summary route is a /21 network that comprises all eight /24s that were assigned to logical switches. You can also see that GE and LE (greater than/less than) values were not specified, so the specific summary route should be matched exactly.

tshoot14a-5

After publishing the changes, we saw that all BGP routes were removed from the DLR. It’s almost as if the filter stopped ALL route prefixes from making it to the DLR rather than just the one specified. Wait, did it?

Let’s refer to the NSX documentation on BGP filters. Under the Configure BGP section, the relevant steps are the following:

<snip>

20. To specify route filtering from a neighbor, click the Add icon in the BGP Filters area.
Caution: A “block all” rule is enforced at the end of the filters.

21. Select the direction to indicate whether you are filtering traffic to or from the neighbor.

22. Select the action to indicate whether you are allowing or denying traffic.

23. Type the network in CIDR format that you want to filter to or from the neighbor.

24. Type the IP prefixes that are to be filtered and click OK.

25. Click Publish Changes.

Well that’s very interesting – an invisible ‘block all’ rule is at the end of the filters! That could certainly do it. With NSX, you can think of the BGP route filter list as a firewall with a default deny rule at the bottom. It also functions in a top-down manner in the same way a firewall does. If we re-interpret the BGP filter as entered, it would be something like this:

  1. Deny 172.18.8.0/21 from being advertised to the DLR.
  2. Deny ‘ANY’ routes from being advertised to the DLR.

And that’s exactly what we’re seeing – no BGP routes whatsoever are allowed to the DLR, so only connected routes exist in the routing table. To make this work, we’d need something like this instead:

  1. Deny 172.18.8.0/21 from being advertised to the DLR.
  2. Permit ‘ANY’ routes from being advertised to the DLR.
  3. Deny ‘ANY’ routes from being advertised to the DLR.

Notice that the deny all filter still exists – it can’t be seen, and it can’t be removed. The key thing we need to do is add the ‘catch-all’ permit above it to ensure that all the other BGP routes are still allowed to the DLR. Let’s change this:

tshoot14b-1

Using the ‘ANY’ prefix as opposed to a range works best as a catch-all filter. After publishing, the connectivity was not lost this time. Let’s see what the DLR’s routing table looks like now:

mercury-dlr.mercury.local-0> sh ip route
Total number of routes: 15

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,
C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,
IA - OSPF inter area, E1 - OSPF external type 1, E2 - OSPF external type 2,
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2

B       0.0.0.0/0            [200/0]       via 172.18.8.1
B       10.61.0.0/24         [200/0]       via 172.18.8.1
B       10.99.99.0/27        [200/0]       via 172.18.8.1
B       172.16.1.0/24        [200/0]       via 172.18.8.1
B       172.16.11.0/24       [200/0]       via 172.18.8.1
B       172.16.15.0/24       [200/0]       via 172.18.8.1
B       172.16.76.0/24       [200/0]       via 172.18.8.1
B       172.17.0.0/27        [200/0]       via 172.18.8.1
B       172.18.0.0/27        [200/0]       via 172.18.8.1
C       172.18.8.0/24        [0/0]         via 172.18.8.4
C       172.18.9.0/24        [0/0]         via 172.18.9.1
C       172.18.10.0/24       [0/0]         via 172.18.10.1
C       172.18.11.0/24       [0/0]         via 172.18.11.1
C       172.18.12.0/24       [0/0]         via 172.18.12.1
B       192.168.1.0/24       [200/0]       via 172.18.8.1

Perfect. All other routes are retained, but the /21 is no longer in the DLR’s routing table.

Conclusion

Even if the UI looks straightforward, there is sometimes value in referring to the documentation for more context!

I hope this scenario was helpful. If you have any questions or have suggestions for future scenarios, please feel free to leave a comment below or reach out to me on Twitter (@vswitchzero)

2 thoughts on “NSX Troubleshooting Scenario 14 – Solution”

  1. Hi Mike,

    It is nice to see that someone else has thought of BGP filters, to Permit/Deny certain subnets being propagated/advertised South bound, as well as Northbound.
    This is exactly what I have done with my BGP based environment and Juniper SRX L3 switches.
    Preparing for my VCIX-NV exam I have covered the BGP filters, and from there know that little catch you were talking about.

    P.S. May I present you with another issue I am facing know? I am investigating the ECMP Path Distribution, and need some help/clarification on the subject.

    Thank you

    Arthur
    VCIX-NV

Leave a comment