Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This provisions three conceptually independent components via an embedded stack for convenience (they can also be deployed independently for advanced use cases):

  1. A Lambda function that is indirectly subscribed to RDS event notifications of type 'db-instance' via an SNS topic. The function ingests all these events as custom events into CloudWatch Events to allow using its convenient and unified rules engine instead of custom code.

    • AWS is in the process of migrating all event processing to CloudWatch and will also provide native CloudWatch events for RDS at some point, which will render this component obsolete, i.e. it only provides a temporary workaround.

  2.  

    • (warning) Remove component in favor of the

  3. recently released 
    • recently released RDS CloudWatch Events

  4.  – 
  5. A CloudWatch Events rule that matches only the RDS 'db-instance' events with the message 'DB instance is being started due to it exceeding the maximum allowed time being stopped'.

  6. A Step Functions state machine that is triggered by the matched CloudWatch event. The state machine will wait a configurable time and then stop the RDS instance via another Lambda function (default wait is 48 minutes, i.e. a bit shorter than the instance hour that is payed already).

    • Ideally this would stop the instance the moment it has been fully started, but tracking the instance state across several events would be more complex, so this is done he easy way for starters.

Notes

There are a couple of things worth mentioning:

  • The Lambda functions are not exactly robust yet, i.e. they seem to work fine for the 'happy path', and do not explode on error, but proper logging and exception handling looks differently ...

  • Turns out there can be a surprisingly long delay between RDS events showing up in the console and being emitted as SNS messages (up to several minutes) - not a problem for the use case of course, just to keep in mind when debugging the solution.

  • Turns out only DB instances that are provisioned in a single availability zone (i.e. not 'Multi-AZ') can be stopped

Step-by-step guide

Provision the rds-automatic-restart-mitigation.yaml CloudFormation template in the desired region(s):

  • (lightbulb) Conceptually this should be a StackSet with stack instances in all regions where you want to use RDS – refer to How to provision a CloudFormation StackSet for details.

  • (info) Fetch a coffee after initiating the stack creation, wiring the RDS events can surprisingly take up to ~8 minutes apparently

  1. TBD

Info

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@4cc574c
sortmodified
showSpacefalse

...

reversetrue
typepage
cqllabel in ( "automation" , "aws" , "cost" ) and type = "page" and space = "UAAKB"
labelsaws automation cost


Page Properties
hiddentrue


Related issues