Caddy is better than Nginx for Docker Compose on ECS

4 minute read

I recently managed to use Docker Compose to launch a small app in Aamazon’s Elastic Container Services (ECS).
Overall, the result is pretty incredible. I’m able to run all of my containers in AWS, with volumes and netowrks and all, with only a docker-compose.yaml file needed.
However, my biggest issue was with getting nginx to work, and I ended up ditching it to Caddy.

Why you need nginx

As can be seen in the ECS integration Compose features page, the way to accept incoming requests to your Compose project is by defining a port in the Compose file (e.g. 80), and AWS will create a single load balancer that will unconditionally forward all incoming requests on that port to that service.
This means that you can only have one service listening on HTTP/HTTPS, and this service has to do all of the “gateway” work (TLS veritifcation and / or termination, routing to upstream, filtering paths etc). nginx is great for this job.
The interesting part in my nginx config looks like this:

server {
  listen 443 ssl;
  server_name project.site;
  ssl_certificate     /ssl/fullchain.pem;
  ssl_certificate_key /ssl/privkey.pem;
  ssl_client_certificate /ssl/...;
  ssl_verify_client on;

  # Always shortcircuit requests from ELB
  if ($http_user_agent = "ELB-HealthChecker/2.0") {
    return 200;
  }

  location /{
    proxy_pass http://backend/;
  }

  location /debug {
    proxy_pass http://debug;
  }
}

Which means:

  1. Listen on 443, respond to project.site
  2. Where my SSL certificate is stored, and how to validate client certificates
  3. Demand SSL certificates from incoming connections and verify them
  4. If the “User-Agent” string looks like the ELB healthchecker, return “OK”.
  5. Pass all requests to the “backend” service
  6. If the request’s path starts with “/debug”, pass it to the “debug” service

Why nginx doesn’t cut it

Each service (e.g. “backend”) has multiple containers providing this service, each with its own IP.
Container runtimes (k8s, Docker, ECS) provide “service discovery”, usually using DNS (in ECS it’s called CloudMap).
Simply put, this means that doing a DNS query for “backend” will return the IP addresses of containers running the “backend” service.
This allows nginx, as the gateway, to find a server to forward the HTTP request to (and hopefully get a response).
The problem starts with nginx being so speed oriented that it doesn’t re-translate the name “backend” into a new IP address every now and then. Instead, it keeps the mapping (e.g. “backend –> 127.0.0.4”) forever.
This means that whenever I create a new container for backend and remove the old one (as containers are immutable), nginx remembers the wrong IP address, and will fail forwarding the requests until nginx is restarted.
This is obviously not ideal, as I’d like my gateways to adapt to changes in my backend without having to restart them.

This article offers two alternatives to the “never refresh IPs” approach:

  1. Use variables (set $upstream = backend; proxy_pass http://$upstream/) and a custom resolver
  2. Buy nginx pro, create an upstream, and add a resolve extension to the server entry in the upstream.

Buying pro is out of the question, as it requires talking to a human (I can’t just pay for a license on the site).
Using variables works, with the following cavaets:

  1. Unlike in Docker, the address of the DNS server is not known during image build time.
    Instead, I created a script that runs on the container initialization, uses perl to extract the DNS server from resolv.conf, and creates an nginx config to set the resolver to that
  2. nginx using its own DNS resolver means we’re missing out on the search option in resolv.conf, which is a shame because in ECS the names are actually backend.project.local, which means that just using backend in the nginx config won’t work.
    I created an additional script that extracts the search option from resolve.conf and replaces all upstream configurations in all of the nginx files.
    This is comlete tomfoolery, but I wanted things to work already.
  3. Usually, nginx is smart about rewriting the URLs that are forwarded to upstream.
    In the above config file, a request for /debug/memdump should be forwarded to the debug service, with the URL being /memdump.
    This doesn’t work when using variables in comoposing the proxy_pass directive, which messes up my URL structure in my backends.

The DNS refresh seemed like such a small thing, but it left nginx completely unsuitable to be my “gateway”.
I seriously considered switching to httpd, even though it’s not as shiny, just so I can get something working.
While searching for options, I randomly stumbled upon Caddy

Caddy is nice

Simply put, Caddy just works.
I don’t use the shinier features of auto-acquiring certificates from LetsEncrypt.
My config file is as basic as can be:

project.site {
  tls /ssl/fullchain.pem /ssl/privkey.pem {
    client_auth {
      mode require_and_verify
      trusted_leaf_cert_file /ssl/...
      trusted_ca_cert_file /ssl/...
    }
  }

  @awsHealthCheck {
    header User-Agent 'ELB-HealthChecker/2.0'
  }
  respond @awsHealthCheck 200

  handle_path /* {
    reverse_proxy backend
  }

  handle_path /debug/* {
    reverse_proxy debug
  }
}

You can see the directives are pretty similar (I had to compromise on /debug and replace it with /debug/), but it works. No trickery to get it to refresh the records, no variables, no upselling to the Pro version that forces you to talk to a human.

I’m very happy with Caddy, and planning to further use it in the future.