a configuration error on nginx made my gitlab container registry unavailable

This blog post is about me spending three evenings and a night with bad sleep to figure out a misconfiguration I made on my gitlab instance.

TL;DR

If you set nginx['listen_addresses'] for one of your nginx vhosts in gitlab, make sure to set it for others too, like the registry_nginx['listen_addresses'] setting, applying the ip addresses to listen on for the registry reverse proxy. 😅

The idea

A few days ago I decided to build a CI pipeline in my selfhosted gitlab instance. I had postponed that task for far too long and still built some containers on the cli on one of my docker hosts, where they were supposed to run anyway.

“No bueno”, but I guess it’s like with so many other things. What you do at work, you sometimes don’t do at home. 🤷

I decided that my container registry should live under it’s own subdomain, let’s call it registry.example.com, while the gitlab instance lives at git.example.com.

Usually, this would be a five minute task. It took me 3 evenings and a night with bad sleep until I got it to work.

Configuring the gitlab container registry

Luckily, gitlab Documentation is quite extensive and usually easy to understand. Configuring the registry usually only requires you to have a valid TLS certificate at hand (zeroSSL, let’s encrypt, or self signed does not matter).

Then you set registry_external_url 'https://registry.example.com', and registry_nginx['ssl_certificate'] and registry_nginx['ssl_certificate_key'] to point to your TLS cert and key in gitlab.rb.
Now google “how to exit vim” for a few minutes, and then run gitlab-ctl reconfigure to apply the configuration changes.

This would usually do the trick.

My container registry was unavailable

In my case, the container registry was unavailable. When I navigated to https://registry.example.com in my web browser, or called it with curl, I received a redirect to the gitlab login page, but with the registry subdomain.

😀 oliver@x260  ~  curl -IL -X GET https://registry.example.com 
HTTP/2 302 
server: nginx
date: Wed, 09 Oct 2024 05:51:24 GMT
content-type: text/html; charset=utf-8
location: https://registry.example.com/users/sign_in  <-- why?!
[...]
removed for readability
[...]

HTTP/2 200 
server: nginx
date: Wed, 09 Oct 2024 05:51:24 GMT
content-type: text/html; charset=utf-8
content-length: 9391
[...]
removed for readability
[...]

Odd. A 302 redirect to the location https://registry.example.com/users/sign_in. This was clearly wrong. The registry does not provide a login screen.

Debugging embedded nginx

I’m not necessarily friends with nginx, but I used to work with it for a few years, so at least I know my way around.
Gitlab Omnibus installation runs it’s services embedded, not as systemd services. ps utility showed that the nginx binary is startet from /opt/gitlab/embedded/sbin/nginx and gets it’s configuration prefix passed as -p /var/opt/gitlab/nginx, where it’s configuration files reside.

First, I checked the conf directory, which keeps nginx.conf, gitlab-http.conf and gitlab-registry.conf, as well as some additional files for nginx status and gitlab health endpoints. So at least the registry config was there.

A quick look into nginx.conf showed, that the file was indeed included:

[...]
removed for readability
[...]

  # Enable vts status module.
  vhost_traffic_status_zone;

  upstream gitlab-workhorse {
    server unix:/var/opt/gitlab/gitlab-workhorse/sockets/socket;
  }

  include /var/opt/gitlab/nginx/conf/gitlab-http.conf;

  include /var/opt/gitlab/nginx/conf/gitlab-registry.conf;

  include /var/opt/gitlab/nginx/conf/nginx-status.conf;

However, the gitlab-registry.conf file did not contain any errors. Certificates were configured, proxy configuration was fine.. Nothing that caught my eye.

The next step was finding out, which access log showed my browser/curl requests.
Gitlab writes the logs of its services to /var/log/gitlab/[service]/[something].log.

As you can already tell from the curl output above, the requests were logged in gitlab_access.log.

Test all the things!

That was odd. Everything seemed to be right, but my requests were caught in the wrong nginx config file. To double-check I enabled the debug log, by setting error_log /var/log/gitlab/nginx/gitlab_error.log debug;, which spit’s out very detailed information about the request and nginx’s decisions on where to route it. This also seemed correct, albeit in the wrong virtual host.

The following two days I could only spend short time in the evenings to try different things. The short version is:

I disabled https, no difference
I enabled http2, no difference
I changed different settings in gitlab.rb, no difference
I read and re-read the nginx documentation on host name matching, nothing helped
I looked into forgejo and
I even thaught about just running a standalone registry
I tried debugging the host name matching some more, by changing the host name of the registry

Nothing of all of that worked. I went to bed slightly frustrated and didn’t sleep very well.

The solution

The next morning I remembered that there was one difference in the two configuration files.
gitlab-http.conf had ips on the listen directives of each server block. gitlab-registry.conf didn’t. So this must have been it. Right?

Right!
I updated gitlab.rb to set the same ip addresses for registry_nginx['listen_addresses'] that I had set for nginx['listen_addresses] a long time ago to restrict the interfaces, on which nginx should listen, ran gitlab-ctl reconfigure and everything worked! Whoop-whoop! 🎉

I had completely forgotten that nginx not only tries to match hostnames and uri’s. If you have explicitly set ip addresses in one vhost, but omit doing so in the other, nginx will always match to that IP, no matter what.

Well, I’ll hopefully remember now! 😂

TL;DR#

The idea#

Configuring the gitlab container registry#

My container registry was unavailable#

Debugging embedded nginx#

Test all the things!#

The solution#