NSX-T Troubleshooting Scenario 2

Welcome to the second NSX-T troubleshooting scenario! What I hope to do in these posts is share some of the common issues I run across from day to day. Each scenario will be a two-part post. The first will be an outline of the symptoms and problem statement along with bits of information from the environment. The second will be the solution, including the troubleshooting and investigation I did to get there.

The Scenario

As always, we’ll start with a fictional customer problem statement:

“I’ve just deployed a new NSX-T 2.3.1 environment with two tenants. The T1 routers (one per tenant) appear to be working fine. I have VM to VM connectivity on logical switches, but I can’t get to any northbound networks. The non-NSX core router isn’t getting any of the NSX routes!”

Taking a quick look at the environment, we can see that each tenant T1 router has several logical switches attached. Each is advertising four subnets as can be seen below:

nsxt-tshoot2a-2

You can also see that the ‘Advertise All NSX Connected Routes’ option is enabled, which should cause these routes to be advertised to the T0.

nsxt-tshoot2a-3

On the T0, we can  see that there are ‘Linked Ports’ to both T1 routers, as well as a VLAN-backed logical switch for northbound communication via edge-e1. Let’s start by ensuring that these routes are actually making it to the T0 SR.

From the edge CLI, I start by listing all logical router instances to determine the VRF for the T0 SR:

edge-e1> get logical-router
Logical Router
UUID                                   VRF    LR-ID  Name                              Type                        Ports
736a80e3-23f6-5a2d-81d6-bbefb2786666   0      0                                        TUNNEL                      3
d800704e-8b8f-4fe1-bd41-fd8536056240   1      1      DR-t0-router                      DISTRIBUTED_ROUTER_TIER0    5
3d5e5d06-8506-476e-b94e-42bee00ff1ce   2      2      SR-t0-router                      SERVICE_ROUTER_TIER0        5
9294fbca-8c44-40fd-9ca4-f2ce75b5e036   3      4      SR-metal-t1                       SERVICE_ROUTER_TIER1        5
924d1dd6-8326-49dd-bead-57c6ceff8d08   4      6      SR-air-t1                         SERVICE_ROUTER_TIER1        5
500aa75f-02d6-4cff-8e2f-22913e625d5c   5      5      DR-air-t1                         DISTRIBUTED_ROUTER_TIER1    7
6ac704b1-2079-4c04-8961-90f689f7ee4e   6      3      DR-metal-t1                       DISTRIBUTED_ROUTER_TIER1    7

Looks like the T0 SR is VRF 2. Next, we’ll check the routing table:

edge-e1> vrf 2
edge-e1(tier0_sr)> get route

Flags: c - connected, s - static, b - BGP, ns - nsx_static
nc - nsx_connected, rl - router_link, t0n: Tier0-NAT, t1n: Tier1-NAT
t1l: Tier1-LB VIP, t1s: Tier1-LB SNAT

Total number of routes: 17

b 0.0.0.0/0 [20/0] via 10.99.99.9
c 10.99.99.0/27 [0/0] via 10.99.99.10
rl 100.64.48.0/31 [0/0] via 169.254.0.1
rl 100.64.48.2/31 [0/0] via 169.254.0.1
c 169.254.0.0/28 [0/0] via 169.254.0.2
b 172.16.1.0/24 [20/0] via 10.99.99.9
b 172.16.11.0/24 [20/0] via 10.99.99.9
b 172.16.76.0/24 [20/0] via 10.99.99.9
b 172.16.98.0/24 [20/0] via 10.99.99.9
ns 172.18.9.0/24 [3/0] via 169.254.0.1
ns 172.18.10.0/24 [3/0] via 169.254.0.1
ns 172.18.11.0/24 [3/0] via 169.254.0.1
ns 172.18.12.0/24 [3/0] via 169.254.0.1
ns 172.18.17.0/24 [3/0] via 169.254.0.1
ns 172.18.18.0/24 [3/0] via 169.254.0.1
ns 172.18.19.0/24 [3/0] via 169.254.0.1
ns 172.18.20.0/24 [3/0] via 169.254.0.1

Sure enough, we can see all the ‘NSX Connected Routes’ from the T1 routers. Interestingly, we also see the 172.16.x.x routes advertised by the core router, so clearly these two are peering successfully. How about the physical router?

vyos@router-core:~$ sh ip route
Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,
B - BGP, > - selected route, * - FIB route

S>* 0.0.0.0/0 [1/0] via 172.16.1.12, eth0.1
C>* 10.99.99.0/27 is directly connected, eth0.2005
C>* 127.0.0.0/8 is directly connected, lo
C>* 172.16.1.0/24 is directly connected, eth0.1
C>* 172.16.11.0/24 is directly connected, eth0.11
C>* 172.16.76.0/24 is directly connected, eth0.76
C>* 172.16.98.0/24 is directly connected, eth0.98

It doesn’t look like we’re getting anything at all from the T0 router here. Let’s check the route redistribution configuration on the T0:

nsxt-tshoot2a-1

Looks like it’s configured to advertise both Static and NSX Connected routes, just like the T1 routers. If that’s the case, why are we not getting the routes on the core router?

What’s Next

I’ll post the solution in the next day or two, but how would you handle this scenario? Let me know! Please feel free to leave a comment below or via Twitter (@vswitchzero).

One thought on “NSX-T Troubleshooting Scenario 2”

  1. I would start looking at what the T0 IS announcing in bgp:
    get bgp neighbor x.x.x.x advertised-routes
    and conclude the NS static routes are not being announced. What makes sense, because you selected “static” and that are only the static routes seen from the T0. You should select NSX static. In the route table they show up as “ns” -> NSX connected.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s