Tip: Ignore a Single Query Parameter when Caching with Nginx

Since it's not entirely obvious, this is how you can ignore a subset of the query string when using Nginx as a cache for an upstream server (think legacy cache-busting behavior, which can make a Nginx cache ineffective).

First off is the proxy_cache_key directive, which defaults to the value of $scheme$proxy_host$uri$is_args$args. This is ultimately the value that is hashed by Nginx as the lookup key for the cached result. So naturally, any change to this key will result in a separate canonical cache.

$is_args resolves to the literal ? or empty string, so you can basically consider this key to be equal to $scheme$proxy_host$request_uri. All variables are document on the official docs.

But let's say we want to ignore a value e.g. say ?random=2d027f03-54aa-4713-b67a-1d11a3025f59, but don't want to take the draconic approach of asking Nginx to completely ignore the query string. For this, I think the easiest solution is to use Nginx maps and regex.

TOML, also INI

# Maps need to be under the http context.
http {

    # The input to the map is $args, which might be an empty string.
    map $args $cache_args {
        # Kind of scary, explained later.
        "~*^(.*?)?&?tag=[^&]*?(&.*)?$" $1$2;
        # If the above regex fails to match, assume the $args as unchanged.
        default $args;
    }
}

Let's break down that scary regex.

A regex expression that is color coded with the different sections — I'm so sorry to screen readers, I have no idea how to make this accessible.

(Purple) The first section isn't regex, but it does enable regex within the map pattern selector. The * part declares this regex should be done case-insensitive. RFC does say that URL's should be case-sensitive, but most systems follow Postel's law, as letter-case may change going through multiple proxies.
(Yellow) Just specifying that this regex must eat the whole input, or none of the input. This is typically best practices for both performance and correctness.
(Green) Capture group 1 (aka $1) - basically eat everything, non-greedily, before our "query parameter to remove". The non-greedy *? is mostly a micro-optimization to avoid regex backtracking (read, costly). Most regex DOS exploits are from causing excessive backtracking.
(Red) Capture group 2 (aka $2) - basically eat everything after the "query parameter to remove".
(Gray) The actual "query parameter to remove" - basically eat the literal random=, and then any input that is not &.

I like to personify regex as a hungry monster that eats everything in it's path. I like to think this helps my sanity.

So this regex ultimately will produce 2 groups, ignoring the part of the query string we want to ignore.

A query string with the regex groups applied, color coded — I reeeaaally should learn to use a vector editor, and not Microsoft Word...

So the end result of $cache_args will either be $args$ or the concatenation of $1$2 (group 1 and group 2).

It's important to test the regex against different query strings e.g. prefixed group, postfixed group, etc. I used these test cases to double check edge-conditions.

After we have our new normalized query string, it can be used in the cache key. A complete config might look like:

TOML, also INI

http {

    map $args $cache_args {
        "~*^(.*?)?&?tag=[^&]*?(&.*)?$" $1$2;
        default $args;
    }

    server {
        location / {
            proxy_pass http://upstream:80;
            proxy_cache main;
            proxy_cache_min_uses 0;
            proxy_cache_lock on;

            # Set the cache key to use our possibly modified $cache_args.
            proxy_cache_key $scheme$proxy_host$uri$is_args$cache_args;

            add_header X-Cache-Status $upstream_cache_status;
            add_header X-Cache-Key $uri$is_args$cache_args;
        }
    }
}

Here I'm using the X-Cache-Key response header to help test my regex. For this to be a success, X-Cache-Key should be the exact same URL that Nginx receives, minus that one query parameter random=.

And that's about it. Kind of cool how flexible Nginx is, it gives Varnish a run for its money.