Yes, we know these infrastructure updates are kind of boring. But we want to document them.
This one involves if there is a loss of connection to the read replica database(s). It now auto-failovers to the master database. We had two instances of this happening over the past year and wanted to handle it in a smoother automated way.
We also added a Read Replica Lag Detector. If we detect that the read replica is more than 5 minutes behind in updates from the main database, then we failover to the main database and await recovery.