no-shred(1)

Important

This is an UNRELEASED version of no-shred, use for TESTING purposes ONLY.

Synopsis

no-shred [info | log | flush]

Description

no-shred is a command to monitor activity of associated services with timer that provide caching capabilities and increased resiliency for Proxmox Cluster Filesystem (pmxcfs) backend database and optimise rrdcached caching of time-series data logged via librrd API.

Arguments

With no argument given, defaults to info.

info: Provides overview of current timestamps for live and cached versions of the pmxcfs backend database files, flushing service timer triggers and rrdcached statistics. Hints are provided in case of failed prerequisites.
log: Full log from the start of current boot of associated service runs and error messages.
flush: Manual request to flush the current state of pmxcfs backend database onto disk.

Services

no-shred.service: Service is activated before pve-cluster and rrdcached, ensures setup of cache flushing timer, then creates overlay ramfs for pmxcfs backend database directory and overrides rrdcached configuration file. Final flush is also performed on shutdown.
no-shred-flush.service: Service is activated 5 minutes after boot, then re-activated hourly and atomically flushes pmxcfs backend database from ramfs onto disk.

Operation

pmxcfs caching

Caching of pmxcfs is transparently introduced before the pve-cluster service start by overlay mounting its default backend database store directory into RAM where the inactive database has been copied over from disk in advance. Regularly, the database is synced onto disk into the underlying original directory using a single transaction, having read-only accessed the live database in parallel with no disruptions.

pmxcfs provides unified access to cluster-wide configurations and is replicated in real-time across all (quorate) nodes or operating in a local mode on a solitary node. Actual backend is provided by SQLite database that is constantly writing. Its backend database files are stored in:

/var/lib/pve-cluster/

First flush is timed 5 minutes after boot to account for e.g. re-synchronisation with the rest of the cluster after prolonged 'node down' period, but avoiding potential repeated auto-reboot periods, HA-associated or on failed cold start. This means that should the database become corrupt, it would NOT be cached onto disk and be available from the last good state on next reboot.

Regular flushing then continues hourly in an atomic manner, i.e. should power loss occur exactly during flush, the copy on disk is safe. A single extra backup copy of one prior version of the database in relation to the most recent flush is also kept aside.

Note

Should a logical corruption occur, i.e. not detectable on database level, this tool cannot fully safeguard against it - this is a deficiency of Proxmox stack where database constraints are not in place in order to increase performance. If the situation is detected within an hour, this is where the extra backup copy might be useful.

rrdached caching

The entire optimisation is achieved by providing a drop-in configuration file:

/etc/default/rrdcached

The values set ensure no journal is in use and increases default 5 minute interval of writes to 1 hour. The actual caching logic therefore completely remains within the original caching daemon.

Known limitations

Installation needs to be followed by reboot for setup to take effect, similarly uninstall needs to be followed by reboot as the overlay mountpoint is NOT dismantled, but flushing would stop from the point of uninstall.

Note	This is less of a concern on a clustered node where every other node has a copy of the entire database at any point and will transparently re-synchronise it upon reboot of the affected node.

Exit status

0: Success.
1: Failure.

no-shred(1) Manual Page

Name