Post-mortem for Gnosis Chain validator outage, operator CryptoManufaktur
What happened
On 2023-01-16, 10,000 validators on the Gnosis Chain, under management by CryptoManufaktur, stopped attesting.
Timeline
- 12:40 UTC Stakewise alerts CryptoManufaktur to the outage on Telegram
- 13:07 UTC Investigation starts
- 13:22 UTC web3signer / postgres interaction identified as root cause
- 13:33 UTC web3signer update did not resolve issue
- 13:52 UTC Slashing Protection database reset completed, validators are attesting again
Root Cause
web3signer keeps a slashing protection database in PostgreSQL. The sequence for the attestations table is capped at 2 billion, 32 bit. The number of entries in the table exceeded the maximum sequence value.
What worked
- Alerting worked; we responded quickly and were able to resolve the issue quickly
- Blast radius containment worked - while 10k validators is a large number, the second Gnosis environment with 5k validators was not impacted
Next steps
- Consensys have been made aware of this limitation in web3signer. Once a changed version is available, we will deploy it
- Other node operators have been made aware of this limitation so they can act pro-actively