#vRealize Operations and #PowerCLI: better together OR distributed switch health check alert. How to easily identify a wrong configured physical network

During this week I worked a lot with vRealize Operations (vRops). vRops is a great tool when we need to get easy and quick insight into our environment, especially when the environment is a little bit larger. This post should highlight why the eco system (products and community) around VMware is so great and helps us to chose the right tool for the right purpose.

At a customer’s environment we gathered the following vRops alerts

“The MTU on the Distributed Switch is not allowed by one or more VLANs on the host’

“One or more VLANs are unsupported by the physical switch”

This alerts are based on a default alert of vRops. It’s symptoms are defined to be triggered by having the virtual distributed switch health check active.

### Personal Comment: The vDS health check “runs regular health checks to examine certain settings on the distributed and physical switches to identify common errors in the networking configuration. The default interval between two health checks is 1 minute.” The checks that are done to highlight misconfigurations for the MTU, VLAN and Nic-Teaming. Since for this check a huge amount of broadcasts from different mac addresses in every VLAN are send I just want to quote the following sample configuration from VMware’s KB: “A DVS having 2 uplinks, with 35 VLANs across 60 hosts, [creates] 4200 MAC table entries on the upstream physical switch”. I have seen network issues occurring (or at least the responsible network team complaining to me) after enabling this feature in the past (with older network hardware) for a long term. Use the setting as a one-time check during initial deployments or during internal reviews/health-checks. Disable it afterwards. #### Comment-End

For sure a similiar alert is also been triggered on the vCenter. But in case you have vRealize Operations, I would definitely recommend to use that as the primary tool for the environment analysis and alert identification, especially if you have multiple and unlinked linked vCenters.

What i would like to have right now is a list of Servers, the network adapter and which VLAN configurations are missing on the physical switch for this network adapter.

Within vRealize Operations we are able to create views and use them for reports or dashboards. They are very powerful, but at the same time we might be faced with some limitations when we create those views on our own. For example we can only use data, that collected by the vSphere adapter within vRealize Operations.

To check the available data within vRops I typically chose the environment view and look for proper metrics or properties that I want to use for views/reports/dashboards.

Looking at the environment we see that we only have some kind of true/false information for the unsupported VLANs/MTU information. Not enough for our use case.

But you know what? Getting complex data together has always been something I like to achieve with PowerCLI.

What I love with PowerCLI? The possibilities what you want to do regarding operations & reports are nearly endless. What’s even better? For nearly everything someone has already posted a solution. If the solution is not fitting, you can easily adjust the script with your own needs.

Since we were looking for a report that gives me information about vDS health check result I have seen that once again Luc Dekens already helped out the community here.

The output of the script was not exactly what I was looking for, but it was the best starting point to adjust it with further details. With some Powershell trial and error I figured out which parameter should be used to get the untrunked VLANs as well.

Luc’s script uses the extensiondata of the vDS object.

 $vds.ExtensionData.Runtime.HostMemberRuntime | %{

        $_.HealthCheckResult ....

So I tried to figure out which other information are stored within the extensionData.

Besides the property TrunkedVlan, Luc was using another property UntrunkedVlan also existed.

Adding this information to the script and created a small query to get the uplink data of a certain Uplink portkey has led to the following script.

$File = "C:\TEMP\NetworkHealthCheck.csv"
$vdsSwitches = Get-VDSwitch
$output = @()

foreach($vds in $vdsSwitches){
    $result = $vds.ExtensionData.Runtime.HostMemberRuntime | %{
        $_.HealthCheckResult | where{$_ -is [VMware.Vim.VMwareDVSVlanHealthCheckResult]} |
        Select @{N='vdSwitch';E={$vds.Name}},
			@{N='ESXi Host'; E={
				$keyNo = $_.UplinkPortKey
				$port = $vds | get-vdport | Where {$_.Key -eq $keyNo}
				(Get-VMhost -ID "HostSystem-$($port.ExtensionData.ProxyHost.Value)").Name
				}		
			},
			@{N='vDS Port - VMNIC'; E={
				$keyNo = $_.UplinkPortKey
				$vds | get-vdport | Where {$_.Key -eq $keyNo}		
				}
			},
            @{N='TrunkedVLAN';E={
                ($_.TrunkedVLAN | %{
                    if($_.Start -eq $_.End){
                        "{0}" -f $_.Start
                    }
                    else{
                        "{0}-{1}" -f $_.Start,$_.End
                    }
                }) -join ','       
            }},
			@{N='UnTrunkedVLAN';E={
                ($_.UntrunkedVLAN | %{
                    if($_.Start -eq $_.End){
                        "{0}" -f $_.Start
                    }
                    else{
                        "{0}-{1}" -f $_.Start,$_.End
                    }
                }) -join ','       
            }}
    }
	$output += $result
}
$output | Export-CSV -Path $File -NoTypeInformation

Running the script created the following output.

The whole setup, code-change and execution took me around 15 minutes. I really liked this scenario a lot since it explains how the VMware ecosystems work.

Great tools (vRealize Operations, PowerCLI, vSphere), an awesome community (thanks Luc) and a little knowledge of bringing those pieces together to solve a business/customer problem. Great tools are out there in our ecosystem. The biggest constraint will always be our knowledge –> keep on learning new things

I can promise you from experiences in the past. Fixing the untrunked VLAN problem upfront, before an impact has been observed will save you a lot of time, increase service availability and reduces stressful times of weird troubleshooting ;-).

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.