[Azure] Deploying a Service Fabric cluster with custom SSL + client cert authentication

Within my current project we’re using Service Fabric in Azure to deploy a micro service (like) architecture. Upon a redesign of some parts of the solution, I decided to check whether I could improve our Service Fabric configuration. There were two things I wanted to achieve:

  1. Run the cluster on a custom URL, secured with a custom certificate.
  2. Use a client certificate for authentication of Azure DevOps deployments.

I’ll come to the why part of these requirements below.

Let me first start with some information that sets the scene. Our cluster is being deployed using ARM templates. You can do the same using Azure CLI or by issuing Powershell commands, but for this example I’ll stick to ARM templates. We’re deploying these via a release pipeline in Azure DevOps in an automated way, throughout our test, acceptance and production environment.

There is a good repository with sample templates on GitHub, check that out here https://github.com/Azure-Samples/service-fabric-cluster-templates.

Run a secure cluster on your own URL

When you deploy a publicly accessible cluster, you do so by linking it to a public IP address instance in Azure. The first thing to note here, is that the actual URL is linked to the public IP instance, and not to the cluster itself as you might expect. For example: if your public IP instance’ DNS name is mypubip.westeurope.cloudapp.azure.com, then by default your cluster will be hosted as https://mypubip.westeurope.cloudapp.azure.com:19080 or something similar. Since the westeurope.cloudapp.azure.com domain is one hosted by Microsoft, you don’t have any control over what’s going on there.

Hence our first requirement: we want to run it on our own URL. First because that’s simply neater. And although I love Azure, I’d rather not use the Azure native URLs for things that might be visible in a browser or otherwise. Second, we’re a bit more in control over the certificates when we control them ourselves. Do note that of course this comes at a small maintenance cost.

 

Certificates

It’s important to have a good understanding of the different types of certificates in service fabric and how they are used. This article does a reasonable job in explaining how there’s node-to-node security and client-to-node security. But it fails in explaining that there’s also a management endpoint which is secured using an SSL certificate.

Important: the primary certificate of the cluster is also the certificate that’s being used to secure your management endpoint (!)

This means that when you want to  change the management endpoint (Service Fabric Explorer) to a different URL, you need to change your primary certificate.

 

Thumbprint vs commonname

The next thing to know is that nowadays, Service Fabric gives us two ways to identify a certificate. In the past, this was done using the thumbprint of the certificate. Which is cool, but requires maintenance as the thumbprint will be subject to change every time you generate a new certificate. This caused a lot of issues in devops teams across the world, so Microsoft changed it. You can now also use the subject (common name) of a certificate to identify it, nice!

But wait… what about when you have a primary and secondary certificate in place? Well in that case, you need to make sure that both of them have the same common name. Otherwise, when you switch to the secondary cert, SSL on your custom URL will start failing. Be aware of this, as there are quite some samples out there that use a different CN for primary / secondary certs (for reasons unknown to me).

Note: unfortunately, changing from a two certificate set-up to one certificate requires a redeploy of the cluster. So if you have a running cluster and you want to limit and downtime, keep the same number of certs.

 

What to change

To get this all working, there’s a few things you need to change within your cluster ARM template, here goes…

  1. Set-up the DNS address you want to use. This should be a CNAME record that points to the A-record of the public IP address, lets stick with mypubip.westeurope.cloudapp.azure.com for this post.
  2. Of course you’ll need an SSL certificate to go with it as well. You can use Let’s Encrypt, but of course I would recommend using a more official certificate provider for production purposes.
  3. Once you have the certificate, upload it to an Azure Keyvault instance. It needs to be hosted there so we can get it from the template in step 5.
  4. The management endpoint needs to change. In your Microsoft.ServiceFabric/clusters template, within the properties section you’ll have something like this:

    This takes a reference to the public IP address and gets the fully qualified domain name from it. That’s the mypubip.westeurope.cloudapp.azure.com address we discussed before. We’ll change that to:

    where prm_cluster-commonname is a parameter that has the full DNS address.
  5. Requests to the management endpoint will land on one of the virtual machines in the cluster. Therefore the certificate will need to be present on those machines so it can be used to secure the URL. To do that, make sure that the certificate is part of the osProfile section of the Microsoft.Compute/virtualMachineScaleSets template.

    Here we have a variable with the ID of the KeyVault instance (full ID which you can get from the properties pane of KeyVault). The parameter points to the full URL of the certificate in keyvault, this is easily retrievable by opening up the certificate in KeyVault and copying the value under “Secret Identifier”. The certificate will be placed in the “Personal” store, which is fine for this cert.
  6. Lastly, under the properties/VirtualMachineProfile/extensionProfile/extensions[0]/settings section, there should be a certificate section. When you were using the thumbprint approach, this will have a thumprint key / value, delete that and replace it with commonNames, as follows:

    The certificate will now be selected using its common name, so you do not have to change anything when the cert changes.

Note: repeat steps 5 and 6 for every virtual machine scale set you might have. Could be one, could be multiple based on your set-up.

Awesome! That should be all you need to (re)deploy you cluster. It will now start to listen on the custom domain you provided, and that communication is secured using the certificate you’ve uploaded to the keyvault. Nice! Should you want to switch certificates (cause they do expire at some point), follow these steps:

  1. Upload a new version of the certificate (remember: same subject / common name!) to the KeyVault.
  2. Replace the URL to the certificate with this new one in your template or parameters file.
  3. Redeploy & let the magic happen.

Use a client certificate for deployments

I promised I’d go into the why… We were deploying our cluster using an Azure Active Directory users, which means you create a service connection in Azure DevOps (ADO) that holds a username / password. This is fine. I just wanted to make that clear. There’s nothing wrong with this approach and it’s not unsafe if you keep the credentials where they should be. That said, there’s always preferences and our preference is not to have too many of these computer accounts for all kinds of reasons. So we just wanted to see whether we could get it working using a client certificate instead.

Let’s go through the steps:

  1. First we need a certificate. Now since there’s no URL involved, the subject is less important. But… I’ve found that it does need to be a certificate that can be validated. I’ve tried using self-signed certificates, but they lead to errors such as:
    • 0x800b0109: a certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.
      I tried to fix this by uploading a self signed issuer to the trusted root store of the VM’s, but then ran into:
    • 0x80092012: the revocation function was unable to check revocation for the certificate
      There might be ways to cope with this, I’m not sure at this point.

    We ended up using a Let’s encrypt cert, that just validates on all machines. You do need to set-up a domain for this as you’ll need to verify it. And again, for production use you should get a certificate from an official authority.

  2. From the certificate we’ll need two things: the common name and the issuer thumbprint. Be aware; that’s the issuer thumbprint, so from the certificate that signed your client cert. Do not try to use the thumbprint of the certificate itself, that won’t work.
  3. Within the Microsoft.ServiceFabric/clusters/properties section, include the following:

    Provide the common name and issuer thumbprint from step 2.

    Note how there’s no need to upload the actual certificate in this case. That’s because client certificates work the other way around; they’re meant to authenticate the client and so the private part is on the client side instead of the server.

  4. Redeploy your cluster once more. It will now be set-up to accept the client certificates subject and issuer thumbprint as if it were an admin user.
  5. Now within Azure DevOps, head over to the Service Connections which is found under the project properties (click the gear icon bottom left).
  6. Create a new Service Connection of type Service Fabric and pick the (default) “Certificate Based” option. You’ll need to provide the following:
    • Connection name: identifies your connection, so pick something that makes sense to you.
    • Cluster endpoint: this is the endpoint we’ve configured in the first half of this post. So https://<yoururl>:19000. The port number might vary based on your configuration.
    • Server Certificate Thumbprint: this is to identify the server, so put in the thumbprint of the certificate that you’ve uploaded to the KeyVault before. If you click the certificate in KeyVault, there’s a property that allows you to copy/paste that thumbprint for your convenience.
    • Client Certificate: here you need to paste in a Base64 encoded string of your certificate. The following script can help you get that once you have the certificate stored on disk as a PFX:

      You’ll probably need to change the filepaths 😉 Once the text file is generated, copy/paste its contents in the Client Certificate field.

    • Password: I had trouble using a certificate that was not password protected, so just in case: make sure you use a password protected PFX file in the previous step and provide that password here.

That’s it. You should now be able to use this service connection to connect to your cluster and deploy your applications!

 

Hope that helps. If you have any comments, suggestions or questions; feel free to leave them below.

Leave a Reply

Your email address will not be published. Required fields are marked *