Lenzker’s #VMware #Horizon Guide (Design): Management & Virtualization Layer

This section within my VDI series is not just relevant for VDI environments. In the following I will cover many things that are relevant for regular vSphere (cloud-a-like) environments.

The Management & Virtualization Layer plays quite an important role within a VDI environment. Within EUC solutions we typically should focus on the User and the User experience. But before we can deliver a performant, reliable and available working environment we need to make sure the layer on the bottom is rock solid as well.

What is the purpose of this layer?

  • Hosting Instances of Virtual Desktops
  • Hosting Instances of Remote Desktop Session Hosts (RDSH)
  • Hosting Management components
    • Components for brokering between Endpoints (Desktop User) and the Virtual Desktops  Horizon View)
    • Components for Integrating with Virtualization Management (Horizon View & vCenter Server)
    • Delivering Applications into the Virtual Desktops (App Volumes)
    • Monitoring of the Environment (e.g. Icinga)
    • Operations Management (vRealize Operations Manager and Log Insight)
    • Management of the Virtual Machines (aka Desktops / RDSH Hosts) and the Hypervisor (vCenter Server)
    • Component for managing Security relevant components (NSX-Manager)

In my design guide I will write about general design decisions regarding the management & virtualization layer. In my implementation guide I will give you hints about the setup of the environment.

The goal of this post is to describe / discuss design implications, design decisions and characteristics for the following relevant Products within the core Layer in their most current release:

  • vSphere Layer
  • vCenter Server
  • NSX Manager
  • Horizon View Connection Server

Implementation Details can be found in the corresponding Implementation post I have created.

Separation of Management and Workload

One design goal we always try to achieve is that an impact on management components should have a minimal impact on productive workload.

One design principle that proved itself worthy to achieve this goal is the separation of management and workload components (and for sure maximum avoidance of single-point-of-failures). The more components you can separate, the more reliable your environment is – dedicated Computing-Cluster (based on vSphere with a dedicated management vCenter), Storage Components and a secondary out-of-band network are a must.

It is important to figure out the required availability factor for your management cluster. In the old days the requirements for the workload cluster were much high as for the management.

“If I loose my vCenter, my VMs are still running…so who cares…we can fix that afterwards”.

True story, but it must be seen in an overall solution context.

  1. Within Cloud-Environments or since the focus of this series is Horizon View you might have a severe issues when your vCenter is not available any more. Having instant clones or relying on App Volumes to deliver the Application into your Desktop might bring you into a situation where users cannot connect or work with their Desktop anymore.
  2. Monitoring and (SLA) Reporting components are key components as well. You must place your monitoring instances in different locations, maybe even a third location monitoring key components over a reliable network. You must be informed if something is failing.

Management Capacity – First Draft Sizing

Another thing I want to talk about is having a dedicated management cluster and its required capacity. Over the years I recognized that is tough to convince customers about the benefits of a seperate (and costly) management cluster. Therefore the management cluster gets mostly undersized. If you don’t plan your project accordingly you will end up with much more components on the management cluster than you have imagined in the beginning. Just let me give you a quick overview about the components that would be required to deliver a <= 4000 Desktop environment (Which is the supported numbers of Desktop for a single vCenter in Horizon 7.2).

My recommendation: Create an Excel-Sheet that you can easily extend/edit that gives you a solid output about the required capacity. ## If there is a demand I might paste my excel-sheet from the screenshots here and create a dedicated post on those topics.

For sure some of the parameters can be discussed about, but based on the information I would recommend a management cluster containing:

2-Nodes: 1x Node with Dual-Socket Intel Xeon with 14 or 16 Cores, 256 GB memory, 3.5 TB Disk-Space 

4-Nodes: 1x Node with Dual-Socket Intel Xeon with 10 Cores, 96 GB memory, 3.5 TB Disk-Space 

Uncertainty-Buffer:

Depending on the considerations regarding 2- or 4-Node Management Cluster the required Cluster size will change. I included a (Uncertainty) Computing-Buffer of 25%. This value is only useful if you know for 99% how the environment will look like in the end.

Over the years I kind of figured out the following formula I used for myself:

Uncertainty
99.9% certainty about the management environment: 25%
‘I think I know what will need in the future’ 50-100%
‘I have been to a class and installed it at home’ 200-300%

Second Draft Sizing

Here is a quick example of a typical environment change (10 days before going to production). We figured out that the first design-draft will not give us the reliability that the customer want to achieve. We need an

  • High-Available vROPS since our SLA-Reporting is done by it
  • Log Insight must be high-available as well since security needs the information for auditing purpose
  • vCenter should be high-available as well (thank god / VMware for the vCenter Server Appliance in 6.5)
  • The workload Cluster should be a stretched Cluster (therefore we will need to host the vSAN Witness Appliances on the management cluster) – How many workload Cluster? 4

Minor changes for the customer. No functional impact on the solution, but suddenly those new availability requirements changed the capacity requirements for our management cluster a lot.

Suddenly a 2-Node Cluster with around 20 Cores wouldn’t be suitable anymore (As I said, in real-life we can discuss about this when we would have done a concrete workload assessment – But I hope you get the point).

4-Nodes: 1x Node with Dual-Socket Intel Xeon with 14 Cores, 196 GB memory, 12 TB Disk-Space 

That was just an example about the uncertainties that easily come get true and change everything. Make sure you design a management cluster that is easy scaleable so that you can react to those changes. And trust me… VDI projects are full of stakeholders. Someone will get back to you and will force you to add more services onto it ;-) (Reporting, Security, More Access / Connection Servers).

And have I talked about a production identical Test environment to verify changes before going into production?

Let me quote Ron Burgundy, a great architect, who has been so right in many many situations of my life.

vSphere ESXi host

The ESXi host is installed on a physical server and abstracts the physical resources CPU, Memory, Storage, Network (and GPU) into a virtual consumable format: Virtual Machines. Within Horizon multiple Virtual Desktops will run in form as Virtual Machines (including Operating Systems, Agents, etc.) on the physical servers.

### Relevant Design Decisions

Recommendation:

  • Use vSphere 6.X to make usage of Horizon Instant Clones. Make sure your server are on the VMware HCL
  • Make sure to have a consistent vSphere Configuration (DNS, NTP must work!!!)
  • If using vSAN make sure your driver / firmware match the VMware HCL
  • Size the host accordingly
  • Use the latest version of VMware Tools
  • Know the mechanisms behind Computing. Check out Host Resources Deep Dive by Frank Denneman and Niels Hagoort
  • I made a good experience in the past to allow Inter-VM via Mem.ShareForceSalting = 0 and disable large page support from within the Guest Os (Mem.AllocGuestLargePage=0). So far I didn’t realized any performance impact, the advantage of disallowing large pages right from the beginning is that transparent page sharing works in very memory state of the ESXi. Therefore you have more predictability about the real memory consumption and can achieve higher VDI densities. Make sure to discuss the risk with the customer.

vCenter Server

The vCenter is the management component for all datacenter related parts within the Horizon environment. Within the vSphere WebClient you can manage ESXi-Hosts, Disitributed Switches, Virtual Machines, VM Templates, etc. The vCenter should be high available and easy recoverable since it is the central integration point for many Horizon components and might lead to service outage in case of non-availability.

### Relevant Design Decisions

  • Physical or Virtual
    • HAHAHAHA…. yeahhhh….we have reached a momentum where we don’t discuss about this any more… GO VIRTUAL!!!
  • Type: Appliance or Windows
    • Today there aren’t many reasons any more not to use the vCenter Server Appliance. You get rid of the external database requirement, have integrated HA- and Backup mechanisms and it just works (In the rare cases – they are really rare nowadays – that something goes wrong VMware Global Support Services (GSS) is very well equipped to save the day. The last use case I know about is if you require a multi-home vCenter with multiple network adapters in different networks. There is no supported way to achieve this with the vCenter Server Appliance
  • Deployment Type: Embedded or Distributed
    • That’s a tough one. The big question here is – Do you need an enhanced linked mode? What does that mean? You have all vCenter joined to the same SSO-Domain (that is created during the installation). This shared SSO-Domain (by default called vSphere.local) allows you to access all vCenter environments in a single User-Interface via the vSphere Web Client. Remember a vCenter can scale up to 20.000 running VMs within vCenter 6.5, combined with Horizon a vCenter is supported up to 4000 VMs (full-VMs, Instant clones or Linked-Clones) in Horizon 7.2.
      Now comes the crux: If you want to use the enhanced linked mode you need an distributed installation ( 1 VM running the vCenter Service Appliance and 1 VM hosting the Platform Service Controller that includes SSO-Domain directory service).
  •  Version
    • I am a tech geek :) Go with the latest vCenter Version – which is 6.5 at the moment! Especially the performance of both web clients (html5 & flash) and the scaleability enhancements have been quite a benefit for Horizon environments. Check out some of the key features here. Please make sure that all other ecosystem parts that will interact with the vCenter are compatible with the selected version.
  • Database: Embedded or External
    • If you go with Windows you will need an external Database as soon as your environment scales bigger than 20 ESXi hosts or 200 VMs/VDIs – and trust me: that happens sooner than you might expect. In the Windows scenario the most common database would be an Microsoft SQL Server (If you don’t want to use Oracle).
      If you decide to use the vCenter Server Appliance, you will go with the embedded database which supports up to 20.000 VMs and 2000 ESXi hosts (Don’t forget the Horizon limit of 4000 Destkops per vCenter).
  • Deployment Size
    • Depending on the size of the managed VMs/ESXi hosts you will deploy the best fitting deployment size within the wizard
  • High-Availability
    • The vCenter Server Appliance HA mode adds some kind of complexity to the solution (during my first try-out I had a lot of problems within my lab environment). But if you are able to manage the complexity a high-available vCenter Server is crucial for your Horizon VDI environment.
  • Backup Strategy
    • If you are on Windows. Protect the database and the Windows VM. Use established backup mechanisms to make sure you can recover your vCenter always. Within a Horizon environment you do not really want to migrate existing configurations and pools to a newly created vCenter. That’s no fun at all!

### Recommendation:

  • Use the vCenter Server Appliance with the embedded database
  • Make sure to Backup the vCenter with the integrated (transfer to FTP) and external backup mechanism that are leveraging the vSphere API for data protection (e.g. Veeam, etc.). Make regular test-recoveries. If you recover a production vCenter change vCenter unique ID (runtime settings of the vCenter) to avoid duplicate mac-addresses
  • Make sure all VMware and non-VMware components are compatible with each other. Check the product interoperability matrix and upgrade sequence within VMware’s Knowledge Base.
  • Size the vCenter accordingly. Use the Appliance and follow the sizing guidelines.
  • vCenter and especially the Web Client work much smoother on low-latency storage (-> go All-Flash)
  • For Horizon 7.2 you should not place more than 4000 VMs onto a vCenter. Before Horizon it was around 2000. vCenter 6.5 has become much more efficient – maybe even higher numbers are possible. Anyway you can add multiple vCenter to a Horizon Instance
  • The VDI vCenter should only manage ESX with VDI-workload. Do not mix up Servers and EUC components
  • Do not create an enhanced linked Mode between VDI vCenter and Server vCenter
  • Only rely on the enhanced linked mode when it is really really necessary
  • If you use external PSC to have an enhanced linked mode. Make sure to create multiple PSC instances and put a load-balancer in front of it. You cannot protect the PSC with the vCenter HA mechanism of the Appliance
  • If you use vRealize Operations Manager you can reduce the performance collection level. Last year / month in most cases don’t really interest me. vROPS has all relevant metrics on a 5-minute interval
  • DO NOT just make the vCenter high-available by following the wizard and let it go. Play with the feature. Test multiple outage scenarios and figure out how to do some stuff here. Check out Féidhlim O’Leary blog about the vCenter HA mechanism. You will learn a lot about functionality, limitations and operations here.

vSphere Cluster

The vSphere Cluster groups multiple ESXi installed server into a single logical unit. The cluster is used to run Virtual Machines (which will become Desktops at one point within the lifecycle) and offers services like Distributed Resource Scheduling – VM placement based on Load and Rules – and High Availability – Restart of Virtual Machines after an ESXi or VM failure.

### Relevant Design Decisions

  • High-Availability (HA) enabled/disabled
    • The HA mode makes sure that VMs that were powered off because of a host-failure are restarted automatically on the remaining ESXi hosts. That leads to a minimal impact/downtime, but ensure that the Desktops are coming back within a certain amount of time (3-5 minutes). For sure in certain Desktop scenarios (100 % stateless / non-persistent desktops with only Instant clones) it wouldn’t make that much sense.
    • Another feature of HA might be quite useful. VM protection. If your VMs are not operational any more (no occurring of IOPS and VMware Tools heartbeats) the HA mechanism will detect this failure and reset the VM. That can help to fix automatically failed Desktops that were suffering a blue-screen.
  • Distributed Resource Scheduler
    • Keep it enabled
  • DRS-Automation Mode
    • Keep it fully automated
  • Resource Pools within the DRS-Cluster
    • Only use resource pools if you active manage the resource entitlement. Never use resource pools for grouping reasons. If you want to group by folder -> Use the VMs & Templates View.
  • DRS-Automation Level
    • Keep it conservative – I don’t want Desktops to vMotion too often. If a user is working on a vMotioned VM there will always be this sluggish handover second for the end-user. He will accept it, but I try to minimize this situation as much as possible. Desktops workload are very short-bursted. The demand for many CPU cycles for a very short amount of time typically (for sure based on the use-case of the worker).
  • Enhanced vMotion Compatibility (EVC) – Mode
    • There will be a time where new Server will be added to your cluster and you might want to migrate VMs via vMotion from the older Hosts to the newer Hosts. My recommendation to make this switch as smooth as possible. Enanbled the maximum possible CPU compatibility mode right from the beginning before any VM is running on the Cluster. If you want to enable this mode later on, all VMs on the cluster must be powered off.
  • Admission-Control
    • Keep spare resource for cases of host-failures. Use the percentage Variant and to play around with reservations :)

Recommendation:

  • Use an automation level of conservative or conservative + 1.
  • Make sure you have redundancy in your HA-Network.
  • If you are using vSAN, the vSAN Network will be used for HA failure detection.
  • If you are using vSAN, configure the isolation response to power off.
  • DO NOT use resource Pools
  • Enable Admission Control

vSAN Cluster

The vSAN Cluster functionality offers a scaled out shared storage consisting of local devices over a vSAN network. vSAN eliminates the need for a dedicated SAN / NAS and is included with Horizon Advanced or higher.

### Relevant Design Decisions

  • Version
    • vSAN is a great product with a huge customer base nowadays. Nevertheless it is a newer technology where a lot of bugs are still fixed with every release. Always go with the latest!
  • Type: All-flash or Hybrid
    • I have seen both vSAN setup in VDI environments and both worked quite well / delivered the performance we required for our virtual Desktops. Anyway IMO the time of rotating disks is over ->
  • Dedup / Compression enabled & disabled
    • If you are going All Flash (SSD/ NVMe only) the missing cheap hard-disk capacity can be balanced by enabling Dedup / Compression. Within an VDI environment we have typically many VMs with a huge identical data set. You can expect dedup/compressions ratios of 4x-7x. Anyway: closely monitor and manage those values

Recommendation:

  • Go with vSAN. vSAN is always included as soon as you chose Horizon Advanced or Enterprise
  • Validate all components (firmware, driver, SCSI-Controller) against the vSAN HCL
  • Verify that all components are made for their purpose (Cache and Capacity Disks)
  • Dedup / Compression only available within an All-Flash vSAN Configuration
  • Go All-Flash ;-)
  • Size your Disk-Group properly. Remember: If you have dedup/compression enabled the loss of a single capacity disk will lead to absent Disk Group –> All data objects of this diskgroup will be resynced.
  • Dedup / Compression will only be done within a disk group. The bigger your disk group, the more VM-data will be found here, the higher the chance is that deduplication is successful
  • Design your physical hosts and disk groups in a way that you can easily scale out your disk groups in case you need more space
  • Run a POC with Desktops that are similar to production desktops to get a valid dedup/compression factor
  • Update to vSAN 6.6 to get rid of the multicast network requirement to make your network guys happy.
  • Make sure to have enough free capacity. You don’t want your vSAN Storage to fill-up. Device usage should be < 70% on every single device. Check RVC for that
  • Use vROPS 6.6 with the pretty useful vSAN dashboard

NSX Manager

The NSX Manager will be used to integrate security mechanisms into our virtual Desktop environment. After preparation of the ESXi hosts you can make use of Guest introspection services – like existing Anti-Virus/Malware solutions – or micro segmentation – implementation of a dynamic / context-based distributed firewall between Virtual Desktops.  Firewall rules will be stored on every single ESXi host, but the rule-set will be managed and controlled by the NSX Manager.

Relevant Design Decisions

  • Cross-vCenter NSX
    • If you have multiple vCenter in large environment you can configure cross-vCenter NSX. Specific security policies are therefore applied to multiple vCenter and VMs.

Recommendation:

  • Create a dedicated Service User within the SSO-Domain (default: vSphere.local) on the vCenter and assign it to the SSO group Administrators. Use this service account to connect the NSX-Manager with the vCenter.
  • By default service user the used for the registration against the vCenter can interact within the NSX-Manager Section of the vSphere Web Client: Network & Security.
  • If you only want to make use of Guest Introspection Services (e.g for Anti-Malware/Virus) or the distributed firewall you don’t  need the NSX-Controller deployed.
  • Check the product interoperability matrix to make sure you have a NSX Version matching the correct vCenter/vSphere and VMware Tools.
  • Backup NSX Manager with the internal mechanisms. Snapshot based backup caused a lot of problems in the past. I haven’t verified this on the latest release yet (Update coming)

2 thoughts on “Lenzker’s #VMware #Horizon Guide (Design): Management & Virtualization Layer

  • 27. September 2017 at 14:04
    Permalink

    Can you make the excel sheet available please?
    great site

    Reply
    • 27. September 2017 at 17:53
      Permalink

      Drop me a mail / tweet …

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.