Nginx error when reload the config file occasionally

nginx error happens occasionally. A solution from ThoughtWorks Siglus Team

  1. when a new API added, then deploy to the test environment, but the new API can not be accessed from a web browser.
  2. When we developed in the localhost, we register or deregister local service to consul service using the consul/registration.js and debug. Sometimes, the consul-template process in Nginx container will be exited for unknown reasons. If this occurs, the Nginx container restart failed.

The root reason:

  1. consul-template in Nginx container will receive the new data from consul service and use it to create the Nginx file. And then call “nginx -s reload”.
    if Nginx can not reload the config file, it will exit error. And the consul-template will exit because of the subprocess error. The later data can not be received.
    consul-template -log-level info -consul consul:8500 -template /etc/consul-template/openlmis.conf:/etc/nginx/conf.d/default.conf:nginx -s reload

2.consul/registration.js is used to register service to consul service, both in a container environment or a local development environment.
using consul/registration.js to register is ok, consul-template will create the right nginx file.

The key code:

  upstream requisition {
    least_conn;
    keepalive 128;
    server your_local_ip:8080;
    
  }

  location ~ /requisition/docs/?$  {
    proxy_pass http://requisition;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

but using consul/registration.js to deregister, the Nginx config file may be wrong. The upstream section cannot be found.

  location ~ /requisition/docs/?$  {
    proxy_pass http://requisition;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

when Nginx reload the file, Nginx must be resolved the domain name, like proxy_pass http://requisition. If resolved error, “nginx -s reload” process will be exited error.

3.consul/registration.js is used to deregister.
the key steps in function, the mode is ‘deregister’:

  function registrationBase(args, mode) {
    registerService(args.service, mode);

    if (args.raml) {
      registerRaml(args.service, args.raml, mode);
    }

    if (args.path) {
      registerPath(args.service, args.path, mode);
    }
  }

When function registerService(args.service, ‘deregister’) be called, call the delete service API(/v1/agent/service/deregister/service.id) inner.But only related service (upstream section in Nginx config) will be deleted from the consul, the API info (location section) will not be deleted. The API info will be deleted when calling function registerRaml and registerPath.

4.nginx container restart error.
if the Nginx config file created by consul-template was wrong, restarting Nginx container, Nginx will load the wrong config file cause Nginx container can not start.

Solution:
1.When “nginx -s reload” subprocess exited error. the consul-template process can keep running.
2.Modify the steps when deregister service using consul/registration.js. Calling function registerRaml and registerPath, then calling function registerService when deregister.
3.when restart Nginx, delete the old Nginx file firstly and then start Nginx process.

Thank you for your contribution @ylcai, I know I’ve run into a couple of these issues myself. I found this PR for Nginx, are there others for Solution #2?

@Klaudia_Palkowska: Any thoughts from the team about merging these?

@joshzamor We wanted to test this before merging but I wasn’t able to get mentioned nginx error without the fix so I cannot say if the PR helped with the issue or not.

@ylcai Could you provide the error that was thrown? Moreover, how did you check this solution? I’ve tried with the following steps:

  1. Start ref-distro locally.
  2. Stop some docker container.
  3. Start stopped container.
  4. Check nginx logs.

Any tips?