close

Eset Endpoint Security Keys Facebook

The dreaded CREATE_FAILED message can be all too common a source of frustration when deploying new stacks with CloudFormation. The AWS Console does show you which component in your stack has failed but if you have a heavy reliance upon metadata and userdata components more often than not you’ll only get a wait condition timeout error which gives you no indication at all as to what has actually gone wrong under the covers.

The good news is that there are some tips and tricks out there for troubleshooting CloudFormation stack failures. Some of the tips revolve around CLI switches, some around knowing a bit more about the CF internals and others about knowing where specific scripts live on your typical EC2 instance. This post attempts to document a few approaches to troubleshooting CloudFormation stack errors and help the reader to take a (somewhat..) structured approach to troubleshooting wait condition timeouts.

CREATE_FAILED

So your stack has failed, there are two typical scenarios:

    Your stack has failed to create a specific object like a security group, a Route53 record or an IAM user which are all commonly created objects in any given stack. Usually there’s a reasonable error message found in the AWS console explaining why which allows you to fix your json template and on the next create-stack operation it will succeed. Commonly this type of failure can be easily rectified. A little trickier… Your stack fails with a wait condition timeout error. All that you know is that the wait condition has not received a success signal or has received less than the required number of success signals and its timeout value has been reached. wait condition timeouts don’t show any errors related to the root cause of the failure. Most of the time they timeout due to errors in the instance’s metadata or userdata sections and this is what we’ll concentrate on for the purpose of this arcticle.

Troubleshooting Steps

I’m assuming that you’re running the create-stack operation using the AWS CLI. The first step is to disable the stack rollback. This means we can log into our failed instance and start troubleshooting. If the EC2 instance has failed due to being passed an incompatible subnet or availability zone it will not even attempt to initialize the instance. This error needs to be fixed in the json template or by amending your stack parameters. If it’s a wait condition Timeout then it’s likely the instance is up and running and has a working network connection.

1. Stop the stack from rolling back by appending the following switch.

aws cloud formation create --stack-Stack-name myStack --template-body file: ///myStack.json --parameters file: ///myStackParams.json --disable-rollback

2. SSH onto your EC2 instance by finding the IP address of your instance in the EC2 Console. For easier access you can include it as an output in your CloudFormation template and view it in the outputs tab of the CloudFormation console.

ssh -i myInstanceKey.pem  ec2-user@53.x.x.x

3. The first place to check is the cfn-init log. Check here for any obvious failures.

/var/log/cfn-init.log

If no obvious errors are found let’s move on and check our metadata.

4. View the contents of the userdata script.

cat /var/lib/cloud/instance/scripts/part-001

Userdata is stored in a script. It includes any custom shell commands you wish to run along with your cfn-init and cfn-signal operations.

Find your cfn-init command (but don’t run it). It should look similar to the following:

/opt/aws/bin/cfn-init -s myStack -r myInstance  --region ap-northeast-1

** Note that if using the new cn-north-1 region you need to append a “-u to your cfn-init command as cfn-init does not automatically find the Beijing region’s CloudFormation endpoint.

5. Taking the arguments shown above, run the cfn-get-metadata command. It will query the CloudFormation endpoint and allow us to check that our metadata is formatted correctly.

/opt/aws/bin/cfn-get-metadata -s myStack -r myInstance  --region ap-northeast-1

Often a parameter my be incorrect or a variable badly formatted. This can result in a garbled URL or package name and consequently a command timeout. Here is an example of our metadata. Be aware that if you use IAM Roles (which I strongly recommend over IAM Users) you need to include the Authentication component which allows the instance  access to any buckets or other AWS services you require. As shown below our cfn-get-metadata output confirms our Authentication is setup correctly and we can see our manifests bucket.

{

endpoint security el capitan     endpoint security companies

TAGS

CATEGORIES