The build is broken and some more on testing

Hi all,

If you look to our Jenkins server, build.openlmis.org, you’ll see that the requisition service is broken and hasn’t built since I last pushed changes on Friday. The changes were related to an upgrade in the dev environment, which lead me down a path of cleaning up left over docker containers and images. A new error, a pair of failing tests, cropped up that I couldn’t replicate locally - until I pulled the latest openlmis/auth service from dockerhub.

The failing tests in conjunction with needing to pull the latest auth service has me thinking that we’ve written a few integration tests in the requisition service that are actually testing with the auth service in order for the build to succeed. As was posted earlier, it seems like we could benefit from refocusing on our testing strategy. Earlier in the week the idea was floated to write down our testing strategy, which I think we’ll address with the Poland and Seattle teams meeting this next week.

In the meantime, a quick summary:

  • Building requisition was succeeding on the build server because there was an auth service left over whose port was mapped to the host
  • Once left over images and containers were deleted, and a new auth server pulled from docker hub, requisition couldn’t build
  • We didn’t know there was an incompatibility between requisition and auth because our CI server was actually giving a false result (requisition and auth don’t work together).
  • Now that requisition isn’t building on the CI server, we can’t leverage its ability to easily push new versions of the requisition service to DockerHub

And some points we should address:

  • Jenkins needs to be cleaning up left over images, containers and volumes (OLMIS-858 addresses disk space and some cleanup, we should find a more rigorous solution)
  • Deployed services shouldn’t have ports or volumes mapped to the host (OLMIS-841 is in progress) , docker networking should be used to isolate back-end services
  • Building a service shouldn’t rely on successful interactions with other deployed services (with a couple exceptions), starting a service however should and we need to mature our CI process for this
  • Ideally, we’d pull new versions of every service regularly for our development environments - a quick script could be very useful to everyone
    I’d hoped to track the bug down and fix it, however I wasn’t able to so it’s still there for someone to solve it.

Best,

Josh