Jenkins, build scripts, and cleaning up after ourselves

jenkins
(Josh Zamor) #1

A quick note on general cleaning in Jenkins. First I discovered that docker login can cause issues with an unrelated docker pull. This recently manifested in jobs failing because the docker pull failed, claiming that the login was invalid. On the surface this doesn’t seem to make much sense as the images were in the public repo - no login is actually required. It failed though because the credentials that we use to push images to docker hub were recently rotated, and that old expired login was left over on our Jenkins’ nodes: build jobs hadn’t cleaned them up after the push of the image. Running docker logout or even docker login with the new credentials fixes the problem, however many deploy jobs wouldn’t call either as they only pull public images from public repos.

Lets recap what’s left around that can cause issues:

  • Docker logins (this ones new to me)
  • Containers, usually ones that aren’t running.
  • Dangling Volumes, networks and images.
  • Test result files, intermediates, etc
  • Credential files (handled via Jenkins withCredentials)
  • Workspaces (last resort)

The preferred method to solve all of these issues is to write most build steps inside a shell script which can run on any computer - including inside a CI node in a Jenkinsfile step, like this one in Reference Data. The key to ensuring all those Docker components that we’re created is to trap the exit of that script, and cleanup as needed, e.g.

#!/bin/bash
cleanup() {
  docker-compose down -v
}
trap cleanup EXIT

The second most preferred cleanup method is to use the cleanup condition inside a post section of a Jenkinsfile. The post and associated cleanup may be placed for a single stage and/or for the entire pipeline, however I’d encourage us to use cleanup inside a stage, and only use the pipeline’s cleanup if we must. Using it in a stage is good-practice as it encourages us to design stages which may be run independently from one-another. cleanup is prefered over always for cleaning - take for example where there is a build failure, and we have

post {
  always {
    junit 'testOutputFile'
    sh 'docker-compose down -v'
  }
}

When the build fails, and there’s no test output file, junit will fail and we’ll never get to cleaning with docker-compose down -v. Instead with cleanup:

post {
  always {
    junit 'testOutputFile'
  }
  cleanup {
    sh 'docker-compose down -v'
  }
}

Now regardless if the build fails, or even if junit fails, we’ll always run docker-compose down -v. Of course cleaning up after docker-compose is better handled in an exit trap, so a more appropriate example can be seen with removing test output directories.

Getting back to the original issue, left over docker login, I’ve run docker logout on all CI nodes which takes care of the immediate issues, and I’ve added a couple of the above examples to a few of the projects (as part of OLMIS-6257), however there’s cleanup that’s need.

If you see a script that could use an exit trap, or a pipeline that would benefit from a cleanup section, I encourage you to fix it up using the methods shown above.

What other cleanup tips do you have?