Tip: Ignore a Single Query Parameter when Caching with Nginx
Since it's not entirely obvious, this is how you can ignore a subset of the query string when using Nginx as a cache for an upstream server (think legacy cache-busting behavior, which can make a Nginx cache ineffective).
First off is the proxy_cache_key
directive, which defaults to the value of $scheme$proxy_host$uri$is_args$args
. This is ultimately the value that is hashed by Nginx as the lookup key for the cached result. So naturally, any change to this key will result in a separate canonical cache.
$is_args
resolves to the literal?
or empty string, so you can basically consider this key to be equal to$scheme$proxy_host$request_uri
. All variables are document on the official docs.
But let's say we want to ignore a value e.g. say ?random=2d027f03-54aa-4713-b67a-1d11a3025f59
, but don't want to take the draconic approach of asking Nginx to completely ignore the query string. For this, I think the easiest solution is to use Nginx maps and regex.
Let's break down that scary regex.
- (Purple) The first section isn't regex, but it does enable regex within the map pattern selector. The
*
part declares this regex should be done case-insensitive. RFC does say that URL's should be case-sensitive, but most systems follow Postel's law, as letter-case may change going through multiple proxies. - (Yellow) Just specifying that this regex must eat the whole input, or none of the input. This is typically best practices for both performance and correctness.
- (Green) Capture group 1 (aka
$1
) - basically eat everything, non-greedily, before our "query parameter to remove". The non-greedy*?
is mostly a micro-optimization to avoid regex backtracking (read, costly). Most regex DOS exploits are from causing excessive backtracking. - (Red) Capture group 2 (aka
$2
) - basically eat everything after the "query parameter to remove". - (Gray) The actual "query parameter to remove" - basically eat the literal
random=
, and then any input that is not&
.
I like to personify regex as a hungry monster that eats everything in it's path. I like to think this helps my sanity.
So this regex ultimately will produce 2 groups, ignoring the part of the query string we want to ignore.
So the end result of $cache_args
will either be $args$
or the concatenation of $1$2
(group 1 and group 2).
It's important to test the regex against different query strings e.g. prefixed group, postfixed group, etc. I used these test cases to double check edge-conditions.
After we have our new normalized query string, it can be used in the cache key. A complete config might look like:
Here I'm using the
X-Cache-Key
response header to help test my regex. For this to be a success,X-Cache-Key
should be the exact same URL that Nginx receives, minus that one query parameterrandom=
.
And that's about it. Kind of cool how flexible Nginx is, it gives Varnish a run for its money.