How to scale in/out an Atlassian Data Center deployment for cost saving purposes

You want to horizontally scale in or out an Atlassian Data Center deployment based on the 'cost-effective' Utoolity AWS Quick Start forks, e.g. to semantically stop the cluster during off-hours.

Prerequisites

You have provisioned your Atlassian Data Center deployment based on the 'cost-effective' Utoolity AWS Quick Start forks, and the cluster is running (aka 'active') – refer to How to provision a cost-effective Atlassian Data Center deployment on AWS for details.

Concepts

It is helpful to get acquainted with the following facilitated concepts:

Implementation

Based on the above concepts, scaling in/out has been implemented via a resp. 'Stack standby mode' parameter as follows:

  • Scale in: changing 'Stack standby mode' from 'active' to 'cold' overwrites the configured Auto Scaling min/max parameters to 0/0, thereby triggering the termination of all regular cluster nodes - down the road this will in turn automatically stop other applicable resources afterwards, most notably the RDS database (not yet implemented, thus requires a manual step for now)

  • Scale out: changing 'Stack standby mode' from 'cold' to 'activate' re-instantiates the configured Auto Scaling min/max parameters, thereby triggering the creation of all regular cluster nodes - down the road this will in turn automatically start other applicable resources first, most notably the RDS database (not yet implemented, thus requires a manual step for now)

TODO 

Trigger an automatic RDS database stop/start based on the cluster node's EC2 Auto Scaling Group scaling in/out events – @Steffen Opel [Utoolity]
Trigger an automatic Bitbucket NFS server stop/start based on the cluster node's EC2 Auto Scaling Group scaling in/out events – @Steffen Opel [Utoolity]

Step-by-step guide

Shared responsibiltiy

Scaling in/out to and from 'cold standby' does not currently seem to work reliably for Bitbucket, where the cluster sometimes does not come up again due to the NFS filesystem not being properly mounted on the nodes (not investigated yet). While this has not ever been observed for Confluence or Jira, it is your responsibility to judge whether you can risk loosing your running cluster or whether an at this point manual backup is advised!

Fix Bitbucket cluster not properly coming up after being in ‘cold standby’ @Steffen Opel [Utoolity]

Scale in

Update the applicable Atlassian Data Center cluster CloudFormation stack to scale in (Stack standby mode → 'cold standby'):

  1. (Optional) Create a backup of the cluster, as outlined in How to backup/restore an Atlassian Data Center deployment for replication purposes

    •  This is not required as such, but should likely be included as an automated convenience step down the road?!

    •  This is highly advised for Bitbucket, and might be advised for Confluence/Jira as a safety measure too, see note above.

  2. Navigate to the CloudFormation console and select the target stack

  3. Select 'Actions → Update Stack', then 'Use current template', and click 'Next'

  4. Scroll down to section 'Availability/Cost (Optional)' and change 'Stack standby mode' to 'cold', keep all other parameters as is, and click 'Next' twice

  5. Preview your changes to confirm that the only affected resources are of type AWS::AutoScaling::AutoScalingGroup, then click 'Update'

  6. Once the stack has reached the UPDATE_COMPLETE status, navigate to the EC2 console to observe that the cluster nodes are shutting down (this might take a moment to start and complete based on how the underlying Auto Scaling group is configured in detail

  7. (Bitbucket only) Select the 'Bitbucket NFS node' and click 'Actions → Instance State → Stop'

    •  There is currently an issue with the Bitbucket cluster not reliably coming up again, see note above.

  8. Once all cluster nodes have been shut down, navigate to the RDS console and select the cluster's RDS instance (in newer deployments it is named after the stack)

  9. Select 'Instance actions → Stop'

Scale out

Update the applicable Atlassian Data Center cluster CloudFormation stack to scale out (Stack standby mode → 'active'):

  1. TBD - basically you need to follow the scale in steps above in reverse order

Document scale out steps in detail @Steffen Opel [Utoolity]