Caddy is better than Nginx for Docker Compose on ECS
I recently managed to use Docker Compose to launch a small app in Aamazon’s Elastic Container Services (ECS).
Overall, the result is pretty incredible. I’m able to run all of my containers in AWS, with volumes and netowrks and all, with only a docker-compose.yaml
file needed.
However, my biggest issue was with getting nginx to work, and I ended up ditching it to Caddy.
Why you need nginx
As can be seen in the ECS integration Compose features page, the way to accept incoming requests to your Compose project is by defining a port
in the Compose file (e.g. 80), and AWS will create a single load balancer that will unconditionally forward all incoming requests on that port to that service.
This means that you can only have one service listening on HTTP/HTTPS, and this service has to do all of the “gateway” work (TLS veritifcation and / or termination, routing to upstream, filtering paths etc). nginx is great for this job.
The interesting part in my nginx config looks like this:
server {
listen 443 ssl;
server_name project.site;
ssl_certificate /ssl/fullchain.pem;
ssl_certificate_key /ssl/privkey.pem;
ssl_client_certificate /ssl/...;
ssl_verify_client on;
# Always shortcircuit requests from ELB
if ($http_user_agent = "ELB-HealthChecker/2.0") {
return 200;
}
location /{
proxy_pass http://backend/;
}
location /debug {
proxy_pass http://debug;
}
}
Which means:
- Listen on 443, respond to
project.site
- Where my SSL certificate is stored, and how to validate client certificates
- Demand SSL certificates from incoming connections and verify them
- If the “User-Agent” string looks like the ELB healthchecker, return “OK”.
- Pass all requests to the “backend” service
- If the request’s path starts with “/debug”, pass it to the “debug” service
Why nginx doesn’t cut it
Each service (e.g. “backend”) has multiple containers providing this service, each with its own IP.
Container runtimes (k8s, Docker, ECS) provide “service discovery”, usually using DNS (in ECS it’s called CloudMap).
Simply put, this means that doing a DNS query for “backend” will return the IP addresses of containers running the “backend” service.
This allows nginx, as the gateway, to find a server to forward the HTTP request to (and hopefully get a response).
The problem starts with nginx being so speed oriented that it doesn’t re-translate the name “backend” into a new IP address every now and then. Instead, it keeps the mapping (e.g. “backend –> 127.0.0.4”) forever.
This means that whenever I create a new container for backend
and remove the old one (as containers are immutable), nginx remembers the wrong IP address, and will fail forwarding the requests until nginx is restarted.
This is obviously not ideal, as I’d like my gateways to adapt to changes in my backend without having to restart them.
This article offers two alternatives to the “never refresh IPs” approach:
- Use variables (
set $upstream = backend; proxy_pass http://$upstream/
) and a custom resolver - Buy nginx pro, create an upstream, and add a
resolve
extension to the server entry in the upstream.
Buying pro is out of the question, as it requires talking to a human (I can’t just pay for a license on the site).
Using variables works, with the following cavaets:
- Unlike in Docker, the address of the DNS server is not known during image build time.
Instead, I created a script that runs on the container initialization, uses perl to extract the DNS server fromresolv.conf
, and creates an nginx config to set the resolver to that - nginx using its own DNS resolver means we’re missing out on the
search
option inresolv.conf
, which is a shame because in ECS the names are actuallybackend.project.local
, which means that just usingbackend
in the nginx config won’t work.
I created an additional script that extracts thesearch
option fromresolve.conf
and replaces all upstream configurations in all of the nginx files.
This is comlete tomfoolery, but I wanted things to work already. - Usually, nginx is smart about rewriting the URLs that are forwarded to upstream.
In the above config file, a request for/debug/memdump
should be forwarded to thedebug
service, with the URL being/memdump
.
This doesn’t work when using variables in comoposing theproxy_pass
directive, which messes up my URL structure in my backends.
The DNS refresh seemed like such a small thing, but it left nginx completely unsuitable to be my “gateway”.
I seriously considered switching to httpd, even though it’s not as shiny, just so I can get something working.
While searching for options, I randomly stumbled upon Caddy
Caddy is nice
Simply put, Caddy just works.
I don’t use the shinier features of auto-acquiring certificates from LetsEncrypt.
My config file is as basic as can be:
project.site {
tls /ssl/fullchain.pem /ssl/privkey.pem {
client_auth {
mode require_and_verify
trusted_leaf_cert_file /ssl/...
trusted_ca_cert_file /ssl/...
}
}
@awsHealthCheck {
header User-Agent 'ELB-HealthChecker/2.0'
}
respond @awsHealthCheck 200
handle_path /* {
reverse_proxy backend
}
handle_path /debug/* {
reverse_proxy debug
}
}
You can see the directives are pretty similar (I had to compromise on /debug
and replace it with /debug/
), but it works. No trickery to get it to refresh the records, no variables, no upselling to the Pro version that forces you to talk to a human.
I’m very happy with Caddy, and planning to further use it in the future.