Tip: Ignore a Single Query Parameter when Caching with Nginx
Since it's not entirely obvious, this is how you can ignore a subset of the query string when using Nginx as a cache for an upstream server (think legacy cache-busting behavior, which can make a Nginx cache ineffective).
First off is the proxy_cache_key directive, which defaults to the value of $scheme$proxy_host$uri$is_args$args. This is ultimately the value that is hashed by Nginx as the lookup key for the cached result. So naturally, any change to this key will result in a separate canonical cache.
$is_argsresolves to the literal?or empty string, so you can basically consider this key to be equal to$scheme$proxy_host$request_uri. All variables are document on the official docs.
But let's say we want to ignore a value e.g. say ?random=2d027f03-54aa-4713-b67a-1d11a3025f59, but don't want to take the draconic approach of asking Nginx to completely ignore the query string. For this, I think the easiest solution is to use Nginx maps and regex.
# Maps need to be under the http context.
http {
# The input to the map is $args, which might be an empty string.
map $args $cache_args {
# Kind of scary, explained later.
"~*^(.*?)?&?tag=[^&]*?(&.*)?$" $1$2;
# If the above regex fails to match, assume the $args as unchanged.
default $args;
}
} Let's break down that scary regex.

- (Purple) The first section isn't regex, but it does enable regex within the map pattern selector. The
*part declares this regex should be done case-insensitive. RFC does say that URL's should be case-sensitive, but most systems follow Postel's law, as letter-case may change going through multiple proxies. - (Yellow) Just specifying that this regex must eat the whole input, or none of the input. This is typically best practices for both performance and correctness.
- (Green) Capture group 1 (aka
$1) - basically eat everything, non-greedily, before our "query parameter to remove". The non-greedy*?is mostly a micro-optimization to avoid regex backtracking (read, costly). Most regex DOS exploits are from causing excessive backtracking. - (Red) Capture group 2 (aka
$2) - basically eat everything after the "query parameter to remove". - (Gray) The actual "query parameter to remove" - basically eat the literal
random=, and then any input that is not&.
I like to personify regex as a hungry monster that eats everything in it's path. I like to think this helps my sanity.
So this regex ultimately will produce 2 groups, ignoring the part of the query string we want to ignore.

So the end result of $cache_args will either be $args$ or the concatenation of $1$2 (group 1 and group 2).
It's important to test the regex against different query strings e.g. prefixed group, postfixed group, etc. I used these test cases to double check edge-conditions.
After we have our new normalized query string, it can be used in the cache key. A complete config might look like:
http {
map $args $cache_args {
"~*^(.*?)?&?tag=[^&]*?(&.*)?$" $1$2;
default $args;
}
server {
location / {
proxy_pass http://upstream:80;
proxy_cache main;
proxy_cache_min_uses 0;
proxy_cache_lock on;
# Set the cache key to use our possibly modified $cache_args.
proxy_cache_key $scheme$proxy_host$uri$is_args$cache_args;
add_header X-Cache-Status $upstream_cache_status;
add_header X-Cache-Key $uri$is_args$cache_args;
}
}
} Here I'm using the
X-Cache-Keyresponse header to help test my regex. For this to be a success,X-Cache-Keyshould be the exact same URL that Nginx receives, minus that one query parameterrandom=.
And that's about it. Kind of cool how flexible Nginx is, it gives Varnish a run for its money.