After upgrading to vCenter 6.5 I replaced the Certificate Authority certificate of my (external) Platform Service Controller (PSC) with an ‘flenzquest-enterprise ;-)’ signed certificate.
The tasks to replace the ssl certificates haven’t changed much from version 6.0 and has been document very well within the community.
After the successful replacement I realized that I had problems with vSphere replication and NSX. I know that NSX is not supported yet with vSphere 6.5, but so far the NSX Manager connectivity with vCenter 6.5 has worked pretty well (until I replace the certificates).
I had a very bad feeling about this issue and googling about it brought an old case to my attention which I thought has been fixed quite a while ago (obviously it hasn’t). I found an old chat protocol of me, Frank Büchsel and Feidhlim O’Leary on Twitter.
I have blogged about the fix on my old blog vXpertise.net before it was hacked and unfortunately that was one of the posts we lost in the old days. Even though the problem has been described by VMware in KB2121701 many problem have and had struggled with the workaround (especially using the python script LS_Update_certs.py resulting in Updated 0 service (s)).
Since the KB is written pretty detailed and well I recommend to get an high-level solution thinking with my post and afterwards dive into the deep-physical stuff of the KB.
Understand the problem
First of all: Understand the problem. I am not sure if that problem still occurs because of my concrete situation (upgrade from vCSA 5.5 -> 6.0 U1/U2 -> 6.5, might have used different certificates in the past,etc.)
It might be technically not 100% correct, but my abstraction and explanation level should be enough to understand what the problem is and how it is fixed.
When we roll-out a new vCenter Server Appliance and PSC, the certificates are generated by the Certificate Authority of the Platform Service Controller. Since no one know this guy, no one really trusts him, therefore its certificates are marked red in the browser (and my graphic).
At the same time we need to know that on the PSC a service called lookup service is running. I think about the lookupservice like a good old phone directory. The lookupservice knows other solutions connected to it and has some data about them (how to reach them via https://URL, how to know them: SSL_Certificate of the solution)
You can gather the current lookupservice data via
So the following model, show how it should look like in a default setup.
So what’s happening when we replace the certificates of our solutions like vCenter and PSC? Everything works fine, but the lookupservice is not updating its entries. A solution that is verifying if there is a match between the actual solution certificate and the data within the lookupservice will now fail -> NSX, SRM, vSphere Replication.
How to fix it
Take a Browser and go to
login as SSO-Admin (like firstname.lastname@example.org), remove the content between <filtercriteria> click on invoke method and voila,
Use the inspect element icon, copy the whole base64 certificate
Copy the data into an empty text-file. Call the file psc_old.cer (make sure file extensions are shown) and extract the SSL-thumbprint (we will need this later). Copy the Thumbprint in an empty text file
Now the tough part will beginn.
Transfer the current chain.pem file of the PSC, Sub-CA, Enterprise-CA to your PSC, e.g. with winSCP into the /root folder
The format of the chain file must be Base64 encoded and look like this:
… SUB CA CERTIFICATE
… CA CERTIFICATE
Make sure no whitespace appear and no empty line are included (use a good editor like Notepad++ or Sublime).
Now we need to connect to the PSC via SSH and navigate in the following folder:
Run the following line (as documented in the KB). Important:
- Remove the whitespaces of the SSL Thumbprint
- Type the complete command manually into SSH shell. Do not copy and paste. I was not able to get the whole thing running when I was copy and pasting anything. Type every char of the thumbprint and doublecheck. If you fail here the legendary Updated 0 service (s)) will occur. It seems that the script is searching for the SSL certificate with the thumbprint we have put into the command. If for any reason the thumbprint is not matching any more (because previous attempts succeeded with a wrong certificate), check my graphics, try to understand what you must do and repeat the steps
- Do Snapshots of the PSC/vCSA before you do anything
ls_update_certs.py –url https://psc01.lenzker.local/sdk –fingerprint d79531d1dc743ba43cbaebb735b8bf1aa139a168 –certfile /root/chain.cer –user email@example.com –password VMware99!
works fine (I know I pasted a password, but I just want to make sure you know that you not necessarly need to to deal with special chars in the password).
Check that multiple services have been updated.
Repeat the steps for vCenter Server Appliance
The following graphic should show you the final state.
Enjoy SRM, NSX, vSphere Replication and many more.