Page MenuHomePhorge

2022/3 Q2H2 Boston Failover Week
Open, Needs TriagePublic

Description

We'll be conducting a resilience test in boston during Q2H2 where we failover to boston, go read write and then fail back.

We've done this before in an emergency but this will be a planned test.

Failover Procedure (London -> Boston)

  • remove LB Session Affinity
  • ensure Boston is pooled in the web
  • ensure web stuff are pointed at the global LBs
  • depool bots-london from web
  • disable services in london

(LIST SERVICES HERE )

  • run the sync/backup scripts to make sure AWS is up to date
  • pull everything in Boston from AWS

(CAN WE RESTORE NON DB HERE)

  • make the DB & phab read only
  • confirm mysql replication is up to date
  • stop replication london->Boston
  • start replication boston->london
  • switch the backups config
  • switch the db CNAME
  • start services in boston

(LIST THEM ALL)

  • stop puppet on tools in both DCs
  • depool london DB and switch boston to master
  • start puppet in boston
  • switch tools-boston to primary
  • switch web to web-boston
  • repool bots-london to global
  • test the water
  • repool the london DB as a secondary
  • make boston DB & phab RW
  • restore session affinity

Failback Procedure (Boston -> London)

  • above but with boston / London switched around

Details

Security
None

Related Objects