Docker certificates on functional test server

Hi all,

as you probably noticed, OpenLMIS-3.x-deploy-to-functional-test job is failing since January 9th because of the following error:

x509: certificate has expired or is not yet valid

The suggested solution was to recreate the whole stack using terraform, and it is done - a new instance is temporarily called functional-test2. I created a zip with generated .pem files from AWS as instruction in openlmis-deployment says and uploaded it to Jenkins. Also updated configuration file, but there are still issues with certificates in the deploy job:

error during connect: Get https://functional-test.openlmis.org:2376/v1.38/containers/json?all=1: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “functional-test.openlmis.org”)

Any idea? Should the certificate be approved somehow?

Best,
Klaudia

Thanks for following up in the forum @Klaudia_Palkowska, and appologies I’ve been slow in Slack.

A thought occurred to me, this setup follows the docker article and I’m wondering if you’ve tried using the certs that were generated from your local machine with curl?

@joshzamor I’ve just tried and for https:/functional-test.openlmis.org:2376 I’m getting

{“message”:“page not found”}

I’ve also tried public DNS with the following result:

curl: (7) Failed to connect to ec2-52-55-86-138.compute-1.amazonaws.com port 443: Connection refused

Hmm, that’s odd. When I run curl against either of those targets without the proper client certs I get the expected SSL errors:

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

The return values from curl you received look off:

Huh, page?

Port 443? Docker should be at port 2376, not 443.

Moreover if I follow the instructions for getting the generated certs (point 6, skip zipping) and run

curl https://functional-test.openlmis.org:2376/images/json \
  --cert ./cert.pem --key ./key.pem --cacert ./ca.pem 

I get back that there are no images on that instance (what I’d expect): []

This is suggesting to me that terraform worked well, but that Jenkin’s deployment job doesn’t have the proper docker client certs.

Suggested next steps

My guess is that the new docker client cert haven’t been copied correctly into Jenkins. Follow point 6 (only point 6, and do zip this time), getting the certs from the named bucket in s3, and update the Jenkins job to use those certs.

If the above works, please be mindful that the old functional-test needs to be properly destroyed. I’m seeing 2 functional-test statefiles, 2 elb, instance that might have been stopped manually, etc. The value of terraform is that it can handle state transitions for you, however it’s possible to get it confused if you’re not careful, and then you might end up needing to unwind everything by hand. Once you get it working, lets be sure we cleanup old AWS resources, especially those that show up on the bill.

OK, so I run

curl https://functional-test.openlmis.org:2376/images/json \
  --cert ./cert.pem --key ./key.pem --cacert ./ca.pem 

and got [ ] like you. I think I also ran this previously, but I wasn’t sure what should I got back and the response was weird for me, that’s why I’ve tried other options.

As for the next steps, I followed the mentioned instruction again and the Jenkins job still fails with the same error. I added those certs as a secret file with global scope. Maybe that’s the issue? But I’ve checked the configuration for other credentials and it looks similar…

Best,
Klaudia

It looks like the certs then are working fine, however something in the workspace is messed up. I logged into the build server with SSH and found that the functional test env workspace had a mess of different credential files in there: functional test, functional test 2, a bunch of unzipped directories, etc. Removing the entire directory, something that this script should probably do first, solved the issue. I also removed all the other functional-test credentials from Jenkin’s credentials that looked like past attempts, however there may be a job still somewhere that needs to be updated - I only saw that the deploy-to-job ran.