NSX and nested ESXi environments: caveats and layer-2 troubleshooting

After having NSX running in a nested environment, I started last week to integrate / built a NSX environment between my physical and nested ESXi hosts. To be honest, achieving this was more complicated than I have expected. Anyway it was a good trip to improve my NSX troubleshooting skills and maybe the key-findings can help one or another to avoid the problems I had.

From a logical-level my goal was pretty straight forward. I have 3 physical (vSAN) ESXi hosts running n-nested ESXi hosts. All of them are managed from a single vCenter and should be part of a single transport zones where n-VXLANs (unfassbar viele) will be deployed.

Logical_design

When I came to the physical implementation of the logical design, it looked pretty similiar like it has been drawn in the following figure. The example shows 2 nested ESXi running on my physical ESX01, while another nested ESXi runs on ESX02. My transport VLAN 30 (for VXLAN) is configured on the physical switch and as a VLAN trunk on the distributed / NSX vSwitch of the physical hosts. That’s where our VXLAN-frame will flow between the nested and physical hosts. Of course the MTU size was increased all over the environment (end-to-end).

physical_design_overall

In theory everything should work fine with this setup….buuuuuttt well….it didn’t… and that’s where the funny part began. L2 connectivity between VMs on my VXLANs was not working as expected. Sometimes my virtual machines on a specific VXLAN could reach each other, sometime they couldn’t…  that was not very reliable for a robust/reliable protocol like VXLAN is. So it was time to go through all the stuff we learned on the (NSX-) academy.

There are a lot of great resources to check / test / troubleshoot problems within the virtual / physical network, e.g. Roi Ben Haim’s great collection of useful L2 troubleshooting tools.

One thing that really confused me was that vmkping (ping with a specific VMkernel port, in this case the VTEP VMkernel port) worked fine for jumbo-frames in all constellations:

  • Nested – nested (same ESXi)
  • Nested – nested (different ESXi)
  • Nested – physical

vmkping ++netstack=vxlan <vtepvmk IP> -d -s 8972

All relevant NSX tables (MAC, ARP, VTEP table) had valid data and the increased log-level of the netcpa showed me that relevant information has been exchanged between the ESXi and the NSX-Controller.

The problem must have been somewhere else in the network stack. Step by step I figured out in which constellation the VXLAN connectivity worked and when not. I abstracted the constellations into the following 2 scenarios

      1. Scenario: 2 nested ESXi on a single physical ESXi

In this scenario every a VXLAN on every ESXi (nested and physical) was able to communicate with each other.

01_switches_interal_working

       2. Scenario: 3 nested ESXi on a two physical ESXi

This constellation was where it was getting complicated (and to be honest is the scenario where it is important that it works). As soon as the VXLAN frame needed to flow out of the physical host connectivity wasn’t working.

02_switches_interhost_notworking

After creating some test VMs in VXLANs on the type of hosts I checked for dropped packets in ESXTOP –> NOTHING. Luckily we have another networking troubleshooting tool included in our vSphere installation I haven’t used for quiet a while: pktcap-uw. This little tool helps us to monitor (and store it as a pcap-file for analyzing it in e.g. wireshark) ESXi traffic on very specific points within the network stack. In the end you can also monitor dropped packets via

Pktcap-uw –capture drop

Therefore in my environment I watched out for dropped packets within my transport VLAN 30 with

Pktcap-uw –capture drop –vmk vmk1 drop –ng –vlan 30

And received an interesting output.

“… Captured at Drop point, Drop Reason ‘VXLAN Module Drop’. Drop Function ‘OverlayWrapperUplinkOutputCB’ …”

pktcap-uw-screen

So it seems that frames from the nested/virtual world have been dropped on the NSX vSwitch of the physical ESXi.

It seems that the ESXi is blocking frames that are carrying UDP segments with the port that is used for VXLAN (in VMware’s release: 8472, RFC: 4789). I am still not sure what the exact reason is. If I get more feedback I will add it here of course.

The only workaround I figured out (there are some others who made similar observations à should have found / been gone through this article a little earlier. Dmitri Kalintsev came to a similar conclusion) is to separate  the NSX vSwitch (including VXLAN portgroups and the VTEP-VMkernel port) on my physical ESXi from another virtual switch that connects the nested ESXi with the transport VLAN.

04_switches_seperated

So in case you want to integrate your physical ESXi cluster with your nested ones. Keep those specific dropping characteristics in mind. Especially in Intel NUC scenarios with only a single network adapter another workaround would be mandatory (please comment any suggestions or workaround to avoid the frame-drops).

Therefore…. Enjoy your NSX environment at home and bring the knowledge you gained into your organization to benefit from this really nice piece of technology

If you want to know WHY you should learn and HOW you can learn more about NSX, check my related blog posts.

2 thoughts on “NSX and nested ESXi environments: caveats and layer-2 troubleshooting

  • 21. July 2016 at 17:40
    Permalink

    Hi,

    Could you maybe break out a section of the post to detail exactly what must be done to get the nested esxi’s to communicate between hosts using NSX? I see you may have gotten it to work and you troubleshooted a bit, but it’s not exactly clear in the end section what you accomplished, and how it was accomplished.

    Thank you!
    -Nick

    Reply
    • 18. September 2016 at 15:30
      Permalink

      Hi Nick,

      sorry for the late reply. Been quiet busy and your comment just got forgot in my inbox. I cannot guarantees that I can work out a dedicated post about that topic. There are a lot of great sources about how to set up the switches within vSphere. I tried to explain on a physical level what must be done (I did not put my focus on how it’s done).

      In the end just make sure that the virtual switch where your nested ESXi is connected to (for VXLAN communication) is not the same switch where you configure your VXLAN on your physical ESXi.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.