We'll be conducting a resilience test in boston during Q2H2 where we failover to boston, go read write and then fail back.
We've done this before in an emergency but this will be a planned test.
Failover Procedure (London -> Boston)
- remove LB Session Affinity
- ensure Boston is pooled in the web
- ensure web stuff are pointed at the global LBs
- depool bots-london from web
- disable services in london
(LIST SERVICES HERE )
- run the sync/backup scripts to make sure AWS is up to date
- pull everything in Boston from AWS
(CAN WE RESTORE NON DB HERE)
- make the DB & phab read only
- confirm mysql replication is up to date
- stop replication london->Boston
- start replication boston->london
- switch the backups config
- switch the db CNAME
- start services in boston
(LIST THEM ALL)
- stop puppet on tools in both DCs
- depool london DB and switch boston to master
- start puppet in boston
- switch tools-boston to primary
- switch web to web-boston
- repool bots-london to global
- test the water
- repool the london DB as a secondary
- make boston DB & phab RW
- restore session affinity
Failback Procedure (Boston -> London)
- above but with boston / London switched around