SSH Public Key Infrastructure

Public Key Infrastructure with Secure Shell

Last updated February 9, 2025

Secure your SSH infrastructure from the very first boot. Rotate keys and never trust a previously unknown machine. Never pass through a key-not-known prompt and do not get used to the identification-changed warning with a remote host.

Tip

There’s a practical example guide on setting up simple Certification Authority, Control and Target host all with SSH certificates you can take some inspiration from before delving into the below.

Lots of administration tasks are based on SSH, the ubiquitous protocol used to securely connect to remote hosts. But quite a minority of those is set up in a systematic way which would amount to a secure-by-design infrastructure approach. Perhaps it appears to be a hassle, but ease of manageability of a system is a good indication of its soundness in terms of security as well, so security should be made seamlessly easy. As environments grow, the question of trust across systems, real and especially virtual - all the way to the last containeraised workload - goes often unaddressed.

Investigate later

Trust everything from the get go. That sounds terrifying - at least to a security conscious administrator. But for some reason - not always, not with SSH, anyways:

The authenticity of host '10.10.10.10 (10.10.10.10)' can't be established.
ED25519 key fingerprint is SHA256:k95pBxp+arqCAfTTYDHhD63o6O0Sff7zgyzcglxbGaE.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?

And everyone is about to studiously paste in the fingerprint from another source - of course not. They would investigate after encountering an issue, but not before.

Even more relatable would be the pesky warning relating to the same:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:uQEwXegch2seMpndUTxkH9cv6qDqD+25Q2+uyZHldLA.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ED25519 key in /root/.ssh/known_hosts:1
  remove with:
  ssh-keygen -f "/root/.ssh/known_hosts" -R "10.10.10.10"
Host key for 10.10.10.10 has changed and you have requested strict checking.
Host key verification failed.

Naturally, we will do as just been told - this is how most of us got intimate with ssh-keygen in the first place, unfortunately often only with its -R switch. The warning, while explicit, is also convoluted. After all, it is only “POSSIBLE” that there is some shenanigans going on, not likely. We are just re-using an IP address or hostname. And it cannot happen to us, anyways - everything else around is secure, so this does not matter, or does it?

Second thought: If it is a secure environment, why are we using encrypted communications within in the first place?

Trust is blind

The concept of trust on first use (TOFU) suddenly becomes familiar. It is, after all, the model of connections non-professionals use all of the time without giving it a second thought. They know this one time, it is just the new target machine being set up, or they are accessing it from this new client machine. But it’s really bad, especially in terms of forming habits - since who knows which case is which and when. Every time this question is answered yes, our repertoire of known host keys grows by one more dubious entry - a host, that became trusted, just like that. A sophisticated attacker knows this as well.

Strict checking

It is so common, one might even make use of an option that SSH provides - StrictHostKeyChecking - and simply set it to no (perhaps counterintuitively, this causes the always assumed answer be yes - to continue connecting; the default for the option is to ask). But bad habits should not be reinforced, instead they need tackling in the opposite way and that is what the option invites us to do - set it to yes, which means our answer will always be no to the unknown unknowns - we do NOT want to connect to hosts we had not encountered so far. We can set this for the local user by appending a single configuration line file entry:

cat >> ~/.ssh/config <<< "StrictHostKeyChecking yes"

Now all these connections to unknown hosts will be failing, as they should - with a resounding relief of the security officer. Well, that’s not too helpful. How do we go about connecting to all these new machines?

Cheating consciously

We will create an exception. Make a shell alias, an explicit invocation of which, will ignore the defaults, by overriding them through its command line.

alias blind-ssh='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no'

Note

Aliases defined on a command line like this do NOT survive through the end of the current shell session, but we do not worry about it in this demonstrative case. We are going to backtrack on it all, anyways.

This will accept blindly whatever key is being presented. If it has not been previously known, it will not affect the record of known hosts - unlike typing away yes did. Then a connection like this will “just work”:

blind-ssh dubious@den.internal

Note

The way this works is setting StrictHostKeyChecking to no, which means a yes presumed for every “sure you want to continue connecting?” prompt. That, however, also appends the newly approved key into the list of known host keys of the user. By redefining its location via UserKnownHostsFile directive, it will be added there, which in this case, means nowhere.

But what kind of piece of advice is this? First forbid it, then bypass it. Exactly. But it actually IS better than the previous state in that it:

forces one to only use such alias sporadically;
no host keys’ list is polluted;
best of it all, such alias is undefined on a new system.

Because - one should NEVER need it after reading through the end. How so?

SSH - simple and secure

At its face value, SSH as a standard remains dependable and secure (not only) for the reason that it is conceptually simple. Clients connect to hosts. Clients authenticate (their users) to hosts. And what is less paid attention to - but equally true - hosts authenticate to clients. This is why there are the above-kind-of prompts in the first place.

So there’s a key on each end, a public key cryptography kind of key - where the knowledge of public key of the one’s counterpart (client/user or host) helps the system authenticate (both ends). The distribution of the public keys - so that the endpoints know each other - is left out of scope. And so TOFU became a habit.

Important

The scenario above assumes that keys (not passwords) are used to authenticate users - the primary side benefit that SSH provides. Some users opt to keep using passwords for the user authentication instead, which of course has its own implications, but even then the “distribution” problem remains out of scope - somehow the password had to be set on the target system prior to SSH connection is about to be established. That said, keys are always at play to authenticate hosts.

Tip

Another interesting aspect of using client/user keys for SSH authentication instead of passwords is that such user does not even need to have ANY password set on the target system. What a better way to solve the issue of worrying about secure passwords than by having none in the first place.

Public Key Infrastructure

Current OpenSSH - implementation we will be mostly focused on - does NOT support Public Key Infrastructure (PKI) as established by the standard of X.509 and employed with better known SSL/TLS setups. But that said, it DOES support PKI as such - it is just much simpler.

Public keys can be signed by (other, private) keys resulting in certificates, where additional information may be added. The signing party is equally known as certification authority - a familiar term, but these are not the kind of certificates you need to go obtain from the likes of Let’s encrypt. You just issue them, distribute and manage them within your full control and they do not go through third-party (and their obscure validation), thus preserve confidentiality of the infrastructure setup as a whole.

Note

There is a standard of X.509v3 Certificates for Secure Shell Authentication and alternative implementations which do go that route. This is entirely out of scope here.

OpenSSH

There is two sides to each authenticated SSH connection. The host that is being connected to, which presents its host key(s) meant to be matched with the records of the connecting user. And then the other way around, user presenting their key to the host - unless password authentication is at play. Two separate roles played by the server and the client:

sshd daemon serving the connection requests at the target host:
- presenting /etc/ssh/*.pub files as its host keys to the client
- looking for ~/.ssh/authorized_keys (i.e. in user’s directory) file - list of client keys to match the currently connecting client against
- following global configuration options set in /etc/ssh/sshd_config
  - alternatively defined in partial /etc/ssh/sshd_config.d/*.conf files
ssh client initiating the connection requests:
- looking for ~/.ssh/known_hosts file for the list of hosts to match against the target
  - may defer to the global /etc/ssh/ssh_known_hosts
- looking for ~/.ssh/config for the client (user) specific configuration
- following global configuration options set in /etc/ssh/ssh_config
  - alternatively defined in partial /etc/ssh/ssh_config.d/*.conf files
- the client keys to be presented to the target are typically held in the same directory as the configuration and referred from it, as ~/.ssh/*_key

The word global above refers to the machine-wide configuration of any particular host (and its users) and is a terminology used in the manual pages. Machine administrator can populate such to the benefit of its users, but user configurations override the global defaults and - as we had seen above when defining our alias - command-line overrides are possible on top.

Note

If the above sounds complicated, it really is NOT - it is just two sides connecting to each other verifying a familiar public key. The configuration options make it appear more elaborate than it is, but they are there just for the flexibility and do NOT have to be used to the full extent, at all.

The more complicated intricacies of the actual connections, handshakes and symmetric encryption following the initial pleasantries are what the user is entirely abstracted from.

Keys and certificates

If you have ever generated a user SSH key, the tool of ssh-keygen - part of OpenSSH suite - will be familiar (with other than -R switch). The good news is that setting up PKI with SSH is really about using this one single tool with a few additional switches on the command line. Focusing purely on this aspect, every time there is a PKI key mentioned, unless explicitly designated, it really refers to a key pair (or respective public or private key of the pair, where applicable) that gets generated by a basic invocation, such as:

ssh-keygen -t ed25519 -f first_key -C "very first key"

Generating public/private ed25519 key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in first_key
Your public key has been saved in first_key.pub

Tip

Passphrase will be asked, but can be omitted. Whilst this might be a completely reasonable step for regular user keys, it is entirely different with a key that is to be used for signing hundreds or thousands of other keys. Most convenient approach to this problem is with the use of ssh-agent, but this is out of scope here.

Two files will be generated will be first_key and first_key.pub - the private (also referred to as identification in SSH parlance) and public key, with smartly chosen permissions - private one only accessible to the user having generated it.

The best practice would be to generate private keys where they are to be used, avoid moving them around and only expose the public keys. By the very nature of asymmetric cryptography, private key can be used to generate the public at any point later, but NOT the other way around.

The -t option defines type of our key in cryptographic terms, for which we chose a current de facto standard. It will be good enough for all our needs.

Caution

Details in regards to different type of keys in the above sense is beyond the scope of this post. However, if you do NOT specify a type explicitly, the tool will default to RSA - a different type, where there is further considerations to be aware of, such as choosing appropriate key size, etc.

The last parameter -C we provided was an innocent comment and can be freely seen in the .pub file.

PKI vs Non-PKI scenario

Without involvement of signing authorities, the distribution of keys - such as one generated above - is rudimentary:

private host key held by the machine, public ones added to connecting clients ~/.ssh/known_hosts list
private user key held in the connecting user’s sub-directory (~/.ssh/), public held within the target machine user’s ~/.ssh/authorized_keys list

The primary principle of authorities is that they sign others’ keys. Then, instead of managing individual public keys to verify the counterparty (host, or a user - it goes both ways), it is only necessary to hold the public key of the Certification Authority (CA). It is the CA that determines which keys get signed, but than any such signed keys are trusted by the participants recognising the CA’s key - which is where it got its designation from.

The signed key records are referred to as certificates - they can contain other additional information affixed to them, but the important distinction is that they bear the signature of the authority that can be verified directly.

Authority and certificates

As CA produces signatures on key records of others, it is using a key itself. There really is no distinction (in terms of quality) between a key used for a host, a user, or a CA within SSH - this is where the simplicity lies.

So suppose we want to use the first_key we got above to sign another key. First, we generate that new to-be-signed key and then we sign it, getting a certificate in the process:

ssh-keygen -t ed25519 -f user_key -C "a user key"

ssh-keygen -s first_key -I "Mark's user key" -n mark user_key.pub

Generating public/private ed25519 key pair.

---8<---

Signed user key user_key-cert.pub: id "Mark's user key" serial 0 for mark valid forever

This simply created another key pair (user_key and user_key.pub) and the -s was used to sign it with the first key. Note that signing is done on the public key record and no type is specified for the signing procedure.

We used two more arguments:

-I specifies key identifier and really is NOT functionally important - but it will be found in server logs later on, so it is useful to pick a reasonably unique designation to trace later on if need be;
-n defines so-called principals and they ARE important; depending on what kind of certificate we are creating (user or host), this will be matched against what entity is attempting to present it, i.e. either a username or a hostname, at the least.

The above example will generate a certificate into a file named user_key-cert.pub based on the original public key’s name automatically.

It is also possible to check what is within a certificate:

ssh-keygen -f user_key-cert.pub -L

user_key-cert.pub:
        Type: ssh-ed25519-cert-v01@openssh.com user certificate
        Public key: ED25519-CERT SHA256:QwUtPojZJ5jYScbSju/s61dF1U0/VJgaY4rfW8odNrc
        Signing CA: ED25519 SHA256:l1OA3VhFs1T4ufAA/xKXSP1dN0d9XGAUds4/IEkZ/Lk (using ssh-ed25519)
        Key ID: "Mark's user key"
        Serial: 0
        Valid: forever
        Principals: 
                mark
        Critical Options: (none)
        Extensions: 
                permit-X11-forwarding
                permit-agent-forwarding
                permit-port-forwarding
                permit-pty
                permit-user-rc

As you can tell, there’s quite some more within than what we have specified above. That’s because we are only scratching the surface when it comes to options.

Valid forever

We will certainly not cover certificates extensively, so as not to make them appear more complex than what they are - signed records of ordinary public keys with some attributes. But one of the things that will capture your attention might be the ominous Valid: forever - this does not sound like a great idea for a key, yet this is how all the ordinary SSH keys are used. Trusted at first use and never really rotated. With certificates, this can be limited (-V option). And before you ask, the Serial (-z option) may be useful when automating key rotations.

Other perks

We also immediately got some default flags, anyone presenting themselves with this user key is permitted to e.g. do port forwarding. All these can be altered. And there’s more. The Critical Options can include eventualities, which can be specified with -O option and a great example would be source-address which, as you would guess it, would only permit users when connecting from specific addresses when using this key. It is also possible to e.g. define force-command to set a specific command to be executed - instead of the usual interactive shell (or what the user specified). None of this would have been possible with an ordinary key. But none of this needs to be used by us, we can just sign keys and use none of the other perks.

Tip

There are many other possibilities opening up with e.g. bespoke PAM modules build with this in mind. A good example would be pam-ussh which allows for sudo authentication be automated based on a user possessing a certificate that has been signed by a specified CA.

Host keys

A diligent observer would have noticed that the certificate produced above essentially assumed it is a user certificate - this is the default when signing. Whilst lots of users would have probably generated themselves their client keys, the host keys get less spotlight. That’s because one hardly ever generates, or re-generates them. They get created for the machine itself, typically at the time of installation and just sit there. But they can be generated, replaced or added for any machine. And they are just ordinary keys.

That also means, they can be signed and certificate can be obtained for them as well - with -h option added on top of the signing -s, the rest being largely the same. The principals in this case would ideally be the hostnames or addresses that the certificate should be matched against when connecting to the host in question, i.e. another host then cannot make use of just any other random signed host certificate, but only its own.

Tip

Beyond the security aspect, this is also great for detecting accidentally wrongly set up DNS, as with e.g. universal user keys and accept-everything in terms of host key approach, it is much harder to prevent execution of remote scripts unintentionally on such endpoints.

Other than that, there’s no magic to host keys. They sit in /etc/ssh/ with conspicuous names, such as ssh_host_ed25519_key - undisturbed, alongside sshd_config. Complementing them with a certificate is the least one can do for ease of seamlessly secure deployment of such hosts.

Configuration

Now that we have seen that producing a certificate is as simple as using -s or -s -h parameters with ssh-keygen over the already existent public keys of a user or a host (respectively), how complex could it be to configure SSH to make use of them is the only question left to be answered.

Target host side

When using certificates for the host, reference has to be added to the sshd_config - to the certificate - with HostCertificate and (with a non-standard name) to the host key itself as well - with HostKey entry.

In a similar fashion, for the host to start recognising all user keys signed by particular CA(s), a new file needs to be created with their list and referred to with TrustedUserCAKeys.

Note

This can be, however, done on a per-user level instead, by amending ~/.ssh/authorized_keys, with the directive cert-authority which then designates CA that, if it had signed certificate of the user connecting, will be trusted.

Important

Whilst the above is user keys related configuration, it is performed on the host being connected to.

That’s all, at most 3 entries in a configuration file, or more simply, in the partial *.conf file, i.e. under /etc/sshd_config.d/ directory. And a reload of the daemon to take the new options into account.

User client side

On the opposite end, from the point of view of the connecting user, the process of recognising signed host keys is equally simple as was adding regular keys into known_hosts (user-specific, or global) - instead of a line referring to specific host key to be recognised, a line starting with @cert-authority has to be added that contains the CA’s (instead of one single host’s) public key. That way, any host certificate signed by the CA will be accepted as long as the machine is within the specified pattern (which this option allows to further restrict) - and its own host key certificate does not include other limits.

Finally, connecting with a user certificate is essentially seamless - if placed together with the key, it will be picked up automatically when configured in the usual way in user’s ~/.ssh/config - or instructed, with -i switch to the ssh client, as would have been the case with non-certificate plain keys scenario.

Practice

One really should not be encountering any TOFU prompts when deploying infrastructure properly, be it clusters, guests or any other equipment. All it takes is to produce certificates for the public keys and implant recongnised CA’s public key onto such hosts. This should really happen at the time of system creation, such as commonly done with cloud-init for virtual instances or automated installation processes for physical hosts.

For some reason, SSH PKI is not popular, not talked about and not well understood. That all despite OpenSSH has provided support for it since 2010. But when it is trivially simple to add a user public key into such deployments - a regularly adopted best practice which had since long replaced insecure passwords - it should be equally easy to add a CA public key and a host certificate. It certainly is much more streamlined than attempting to gather auto-generated public keys from multiple hosts, or ignoring the host trust problem altogether.

Proxmox VE SSH woes

There is little respite to be expected from solutions such as PVE, including into the future. Proxmox do not ship any robust SSH PKI with their cluster-tailored solution - which would be a perfect candidate. Instead, they originally attempted to bypass the host key distribution issue by synchronising multiple keys amongst multiple nodes via symbolic links - an approach that has brought the users a decade of experience of seemingly mysterious bugs and incompatibility with standard SSH tooling. When it finally got fixed, it simply abandoned the aspiration of making seamless SSH connections for the user altogether. Similarly ill-chosen approach still plagues the user keys, which can become inaccessible for the authorising system.

Some built-in features still depend on SSH, but they might be abandoned in favour of the REST API approach. That said, even if these features are possibly eventually re-implemented using non-SSH based solution, this will not provide for the multitude of guest systems that are daily deployed on each and every such piece of infrastructure and that you are possibly in charge of.

Finally, any custom (beyond trivial, or simply scripted) cluster host management is still better off with external tools, such as Ansible, which do depend on SSH. Therefore, it might be worth it taking charge of the better approach, after all.

Manual pages: ssh ssh-keygen sshd

Post is also available as reStructuredText in a GitHub Gist.
Excuse limited formatting, absent referencing and missing media content.
Your feedback is welcome in comments therein.

Fragile cluster management ZFS root advantage