Post-mortem for 2023-01-16 validator outage on Gnosis Chain, CryptoManufaktur

Post-mortem for Gnosis Chain validator outage, operator CryptoManufaktur

What happened

On 2023-01-16, 10,000 validators on the Gnosis Chain, under management by CryptoManufaktur, stopped attesting.

Timeline

  • 12:40 UTC Stakewise alerts CryptoManufaktur to the outage on Telegram
  • 13:07 UTC Investigation starts
  • 13:22 UTC web3signer / postgres interaction identified as root cause
  • 13:33 UTC web3signer update did not resolve issue
  • 13:52 UTC Slashing Protection database reset completed, validators are attesting again

Root Cause

web3signer keeps a slashing protection database in PostgreSQL. The sequence for the attestations table is capped at 2 billion, 32 bit. The number of entries in the table exceeded the maximum sequence value.

What worked

  • Alerting worked; we responded quickly and were able to resolve the issue quickly
  • Blast radius containment worked - while 10k validators is a large number, the second Gnosis environment with 5k validators was not impacted

Next steps

  • Consensys have been made aware of this limitation in web3signer. Once a changed version is available, we will deploy it
  • Other node operators have been made aware of this limitation so they can act pro-actively
7 Likes