How to backup/restore an Atlassian Data Center deployment for replication purposes

You want to backup/restore an Atlassian Data Center deployment based on the 'cost-effective' Utoolity AWS Quick Start forks, e.g. to terminate a cluster during off-hours.

Prerequisites

You have provisioned your Atlassian Data Center deployment based on the 'cost-effective' Utoolity AWS Quick Start forks, and the cluster is running (aka 'active') – refer to How to provision a cost-effective Atlassian Data Center deployment on AWS for details.

Concepts

TBD

Document involved AWS concepts – @Steffen Opel [Utoolity]

Implementation

TBD

Document involved AWS automation – @Steffen Opel [Utoolity]
Automate backup/restore via the AWS Systems Manager Run Command – @Steffen Opel [Utoolity]

Step-by-step guide

Shared responsibility

Backup/Restore still requires manual steps and has not yet been confirmed to work as expected for Bitbucket, Confluence or Jira, it is your responsibility to judge whether you can risk loosing your running cluster or whether an at this point manual backup is advised!

Backup

Backup an Atlassian Data Center cluster as follows:

  1. Scale out the bastion host stack as outlined in How to scale in/out an Atlassian Data Center deployment for cost saving purposes

  2. Once the bastion host is running, SSH into it and mount the cluster's EFS filesystem as '/media/atl/...' (to mirror what the nodes are doing), for example:

    1 2 3 4 5 ## Install the EFS mount helper # sudo yum install -y amazon-efs-utils ## Mount the new cluster's EFS filesystem sudo mkdir -p /media/atl sudo mount -t efs fs-12345678:/ /media/atl

     

  3. Scale in the cluster to 'cold standby' as outlined in How to scale in/out an Atlassian Data Center deployment for cost saving purposes

    This ensures a consistent filesystem backup based on defensive best practices – there may be options to achieve a consistent snapshot with the cluster still running, but these have not been explored yet.

  4. Backup the filesystem manually via one of the following options:

    Sync vs archive

    Both examples use a sync approach for starters, obviously it's also possible and probably desired to come up with a more robust archive based scheme for long-term storage.

     

    1. Either create an applicable backup snapshot of the cluster's shared data directory mounted under '/media/atl/...' and move it off the instance for safe keeping and long-term storage, for example via the aws.s3.sync AWS CLI command:

      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Select S3 bucket # ATL_S3_SNAPSHOT_BUCKET=<bucketname> # ATL_S3_SNAPSHOT_PREFIX=<keyprefix> ## Backup shared home directory from source cluster fileystem mounted at '/media/atl' sudo aws s3 sync /media/atl/jira/shared s3://$ATL_S3_SNAPSHOT_BUCKET/$ATL_S3_SNAPSHOT_PREFIX$(date --utc +%Y%m%dT%H%MZ)## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Select S3 bucket # ATL_S3_SNAPSHOT_BUCKET=<bucketname> # ATL_S3_SNAPSHOT_PREFIX=<keyprefix> ## Backup shared home directory from source cluster fileystem mounted at '/media/atl' sudo aws s3 sync /media/atl/jira/shared s3://$ATL_S3_SNAPSHOT_BUCKET/$ATL_S3_SNAPSHOT_PREFIX$(date --utc +%Y%m%dT%H%MZ)## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Select S3 bucket # ATL_S3_SNAPSHOT_BUCKET=<bucketname> # ATL_S3_SNAPSHOT_PREFIX=<keyprefix> ## Backup shared home directory from source cluster fileystem mounted at '/media/atl' sudo aws s3 sync /media/atl/jira/shared s3://$ATL_S3_SNAPSHOT_BUCKET/$ATL_S3_SNAPSHOT_PREFIX$(date --utc +%Y%m%dT%H%MZ)## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Select S3 bucket # ATL_S3_SNAPSHOT_BUCKET=<bucketname> # ATL_S3_SNAPSHOT_PREFIX=<keyprefix> ## Backup shared home directory from source cluster fileystem mounted at '/media/atl' sudo aws s3 sync /media/atl/jira/shared s3://$ATL_S3_SNAPSHOT_BUCKET/$ATL_S3_SNAPSHOT_PREFIX$(date --utc +%Y%m%dT%H%MZ)



    2. Or mount an existing EFS filesystem as the backup target directory (see instructions above) and create an ad hoc backup snapshot of the cluster's shared data directory mounted under '/media/atl/...', for example via the rsync command:

      1 2 3 4 5 6 ## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Backup shared home directory from source cluster fileystem mounted at '/media/atl' sudo mkdir -p ./atl/$ATL_PRODUCT/shared sudo rsync -avz /media/atl/$ATL_PRODUCT/shared ./atl/$ATL_PRODUCT/shared/



  5. (Optional) Delete the CloudFormation stack if you want to maximize cost-savings and do not mind the slightly longer MTTR for creating a new stack rather than activating an existing one.

    Only proceed with this once you have verified that you can restore the backup to a new cluster as outlined below!

TODO
Verify backup steps do work as desired
Craft applicable Bash script to automate backup

Restore

Catch 22

This seemingly cannot currently work with the official templates, because the CloudFormation stack requires to be started up with 1 instance before getting the chance to restore the filesystem, which will then mess up the restored RDS snapshot. However, it seemingly works fine when facilitating the custom 'cold standby' mode of our modified templates to ensure that the cluster is started up with 0, see instructions below.

Restore an Atlassian Data Center cluster backup to a new CloudFormation stack via a bastion host as follows:

  1. Provision an Atlassian Data Center stack based on the 'cost-effective' Utoolity AWS Quick Start forks as usual, as outlined in TBD, and ensure that the following parameters are set correctly:

    • 'Database snapshot ID to restore' => provide the desired RDS snapshot name from

    • 'Stack standby mode' => select 'cold'
      this addresses the aforementioned catch 22 by creating the new cluster with 0 nodes initially to provide time for the manual filesystem restore

  2. While the cluster is starting, scale out the bastion host stack as outlined in How to scale in/out an Atlassian Data Center deployment for cost saving purposes

  3. Once the bastion host is running, SSH into it and mount the new cluster's EFS filesystem as '/media/atl/...' (to mirror what the nodes are doing), for example:

    1 2 3 4 5 ## Install the EFS mount helper # sudo yum install -y amazon-efs-utils ## Mount the new cluster's EFS filesystem sudo mkdir -p /media/atl sudo mount -t efs fs-12345678:/ /media/atl

     

  4. Restore the filesystem manually via one of the following options:

    Sync vs archive

    Both examples use a sync approach for starters, obviously it's also possible and probably desired to come up with a more robust archive based scheme for long-term storage.

     

    1. Either move the backup snapshot from long-term storage into the cluster's shared data directory mounted under '/media/atl/...', for example via the aws.s3.sync AWS CLI command:

      1 2 3 4 5 6 7 8 9 ## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Select S3 bucket # ATL_S3_SNAPSHOT_BUCKET=<bucketname> # ATL_S3_SNAPSHOT_PREFIX=<keyprefix> ## Restore shared home directory to target cluster fileystem mounted at '/media/atl' sudo aws s3 sync s3://$ATL_S3_SNAPSHOT_BUCKET/$ATL_S3_SNAPSHOT_PREFIX /media/atl/jira/shared

       

    2. Or mount an existing EFS filesystem as the backup source directory (see instructions above) and restore the backup snapshot into the new cluster's shared data directory mounted under '/media/atl/...', for example via the rsync command:

      1 2 3 4 5 6 7 8 9 10 ## Select Atlassian product # ATL_PRODUCT=(jira|confluence|bitbucket) ## Prepare backup source directory cd ~ sudo mkdir ./atl ## Restore shared home directory into target cluster fileystem mounted at '/media/atl' sudo mkdir -p /media/atl/$ATL_PRODUCT/shared sudo rsync -avz ./atl/$ATL_PRODUCT/shared/ /media/atl/$ATL_PRODUCT/shared

       

  5. Scale out the cluster to 'active' as outlined in How to scale in/out an Atlassian Data Center deployment for cost saving purposes

    Given a pristine installation requires this, it might be appropriate to restore a new cluster with only one node at first too, though it seems to work fine with multiple nodes as well (after all, the restored cluster had already been gone through all the configuration steps before being backed up)?!

  6. (Optional) Scale out the cluster to the desired number of nodes

  7. Perform applicable post restore operations, for example a reindex

    This implies a notable time sink for large data sets, which is why e.g. Jira supports index backup/restore conceptually, and the Bitbucket template already supports Elasticsearch backup/restore out of the box – accordingly, index backup/restore should probably be added to the Jira and Confluence templates as well if possible?

TODO
Verify restore steps do work as desired
Craft applicable Bash script to automate restore