intranet: deleted namespace
At 21:44 CEST an operator mistakenly deleted the intranet
namespace on the prod-1
cluster. It was quickly re-created and caused only 13 minutes of downtime. However, PVC in that namespace were also deleted, resulting in data loss, mainly passwords changed after 2021-08-26 stored in Kerberos.
Remedial actions:
- volumes in OpenStack should not be deleted if their PVC is deleted in kubernetes. That doesn't apply to every volume, but only the ones we really care about (like LDAP, gitolite, etc.). There are some for which we don't care such as postgresql, as postgres-operator already implements mechanisms to prevent cluster deletion.
- implement Velero (!13 (merged))
Misc:
- write and post a postmortem