SSH - passwordless lockout

Passwordless SSH can lock you out

Last updated
Not so intuitive cascade of events may lead to inability to SSH connect to a node with otherwise healthy networking setup due to inaccessible keys location.

If you follow standard security practices, you would not allow root logins, let alone connections over SSH (as with Debian standard install). But this would deem your PVE unable to function properly, so you can only resort to fix your /etc/ssh/sshd_config   with the option:

PermitRootLogin prohibit-password

That way, you only allow connections with valid keys (not password). Prior to this, you would have copied over your public keys with ssh-copy-id or otherwise add them to /root/.ssh/authorized_keys.

But this has a huge caveat on any standard PVE install. When you examine the file, it is actually a symbolic link:

/root/.ssh/authorized_keys -> /etc/pve/priv/authorized_keys

This is because there’s already other nodes’ keys there to allow for cross-connecting - and the location is shared. This has several issues, most important of which is that the actual file lies in /etc/pve which is a virtual filesystem   mounted only when all goes well during boot-up.

What could go wrong

If your /etc/pve does not get mounted during bootup, your node will appear offline and will not be accessible over SSH, let alone GUI.

Warning

If accessing via other node’s GUI, you will get confusing Permission denied (publickey,password) in the “Shell”.

You are essentially locked-out, despite the system otherwise booted up except for PVE services. You cannot troubleshoot over SSH, you would need to resort to OOB management or physical access.

This is because during your SSH connection, there’s no way to verify your key against the /etc/pve/priv/authorized_keys.

Caution

If you allow root to authenticate also by password, it will lock you out of “GUI only”. Your SSH will not work - obviously - with key, but fallback to password prompt.

How to avoid this

You need to use your own authorized_keys, different from the default that has been hijacked by PVE. The proper way to do this is define its location in the config:

cat > /etc/ssh/sshd_config.d/LocalAuthorizedKeys.conf <<< "AuthorizedKeysFile .ssh/local_authorized_keys"

If you now copy your own keys to /root/.ssh/local_authorized_keys file (on every node), you are immune from this design flaw.

Tip

There are even better ways to approach this, e.g. SSH certificates, in which case you are not prone to encounter this bug for your own setup. This is out of scope for this post.

Alternatives

If you were planning on to use additional non-privileged user setup with sudo, that is indeed a good alternative. Do note that PVE does not come with sudo pre-installed and will nevertheless require root allowed to login over SSH to preserve full features of the PVE stack   - and these would remain broken.

Due to the Proxmox stack setup, inaccessible SSH for root user prevents you to e.g. troubleshoot failing services (when SSH is healthy) even from GUI shell of a healthy node. For this same reason, it is impossible to remove SSH access for root account in Proxmox - which is also the only reason why this post “embraces” it. However, if you have another way in through other steps, it is just as good (the GUI path will still not work though).

Notes

As much as this post might appear to describe an infrequent issue, the failure of pve-cluster service at boot (which needs to run also on standalone nodes) that causes the “lockout” is quite common side effect of e.g. networking misconfiguration or pmxcfs backend-database corruption. They are out of scope of this post, but happen definitely more often than just failing SSH, let alone networking as a whole - which of course would then anyhow required out-of-band (OOB) management approach. This post was also written with home systems in mindy - which do not have OOB/KVM or even rely entirely on GUI.

Your feedback on the content is welcome in the GitHub Discussions.