WpW: Dropping Query Strings in LSCache

WordPress Wednesday: Query Strings You Can Ignore

Welcome to another installment of WordPress Wednesday!
Today’s topic is: Query Strings You Can Ignore

As you may know, LSCache is a server-level cache, meaning the basic cache functions are actually carried out by LiteSpeed Web Server. Sometimes certain cache settings are configured at the server level. This allows the settings to apply to all of the server’s web apps that use LSCache.

I mention this because even though this discussion is taking place in the context of WordPress, it applies to everywhere you use LSCache on your server! So, even if you are not a WordPress user, if you have LSCache installed in any of your other web apps (like PrestaShop and MediaWiki, to name a few) these instructions can still apply to you.

Query Strings You Can Ignore: bowl full of coffee beans

Query Strings and Caching

As far as LiteSpeed Cache is concerned, these two URLs are different, and are cached separately:

http://www.example.com/coffee.html?grind=wholebean&roast=light
http://www.example.com/coffee.html?grind=drip&roast=dark

Technically, both refer to the same coffee page, but the output is different depending on the values of the query strings. One URL displays a whole bean light roast, and the other displays a drip grind dark roast. To coffee shoppers, these are very different items.

In this case (indeed, in most cases), saving separate copies is exactly what you want LSCache to do. Query strings are usually important to the content of the page, and so they are also important to caching that content.

Ignorable Query Strings

But not all query strings are relevant when it comes to generating dynamic content. For instance, the “utm” queries that are popular in Google Analytics:

http://www.example.com/coffee.html?utm_source=google&utm_medium=email&utm_campaign=buymorebeans
http://www.example.com/coffee.html?utm_source=google&utm_medium=banner&utm_campaign=coffeeiscool
http://www.example.com/coffee.html?utm_source=twitter&utm_medium=tweet&utm_campaign=tweetabrew

Just like in the previous example, these three URLs refer to the same page. But unlike before, the query strings do not influence the content that is displayed to the user. Their only function is to facilitate tracking for a variety of ad campaigns.

Why does this matter?

Why can’t you just let the cache fill up with these extra entries?

Remember how caching works: the first user to visit a URL has to wait a while for the page to be generated, as it has not yet been cached. Subsequent visitors then get the cached copy served quickly to them.

Now, imagine visitors are coming to your store from a each of the different ad campaigns you have set up:

  • Visitor #1 arrives at the coffee page after seeing the “Buy More Beans” campaign in his email.
  • Visitor #2 lands on the coffee page after being tempted by a “Coffee is Cool” banner.
  • And Visitor #3 finds the coffee page through a “Tweet a Brew” link on Twitter.

All three visitors arrive at the exact same coffee page through three different URLs.

If the URLs have all been cached separately, each visitor will be the first to request their particular URL, and as such, each will have to wait for that first page to be dynamically generated. None of them will get a cached copy of the page.

On the other hand, if the ad campaign information has been dropped, and only one copy of the page is saved in the cache (i.e. http://www.example.com/coffee.html), then it’s only Visitor #1 who must wait for the first page load. Visitors #2 and #3, and everyone else who comes after them, will reap the rewards of caching.

And it’s safe to simply ignore the query strings because they have already served their purpose by the time the server gets involved.

Caching the URLs individually, with query strings intact, causes more of your visitors to be exposed to slow page loads, and results in more work for the WordPress backend.

That is why it matters.

Other query strings you can ignore

Of course, ignorable query strings are not limited to the “utm” variety. There are other known queries that can be ignored, such as Google AdWords’ Google Click Identifier, aka “gclid.” And there’s nothing stopping you from doing your own kind of tracking with a made up query like ?mysitenav=sidebar, which could (for example) help you gain insights about your visitors’ clicking habits within your site.

Any of these query strings can be dropped from the URL without changing the content that is displayed. Therefore, they can also be dropped when saving and retrieving the page from the cache.

Query Strings You Can Ignore: pourover coffee grounds

Configuring Query String Cache Rules

So, how do you instruct LiteSpeed Web Server to ignore certain query strings? You have two options:

  • Configure it for WordPress only, from within the plugin.
  • Configure it server-wide to apply to all of your LSCache-enabled web apps.

WordPress Plugin Level

From the WordPress Dashboard, navigate to Litespeed Cache > Settings > Cache and scroll down to the Drop Query String setting.

In this box, you can list the strings you want to ignore, one per line, using wildcards if desired.

So, to ignore all of the “utm” query strings, the Google Click ID “gclid” query, and your own custom “mysitenav” query, you could enter the following in the box:

utm*
gclid
mysitenav

Press the Save Changes button, and you’re done.

LiteSpeed Web Server Level

If you’d like to configure LSWS to drop certain query strings on a server-wide (or virtual-host-wide, or even site-wide) basis, you can use the Apache-style CacheKeyModify configuration directive.

You’ll need LiteSpeed Enterprise v5.2.3 or higher, and you’ll need access to modify its configuration. If you don’t have the ability to configure LSWS, please ask your hosting provider to add the directive for you. More environment-specific directions are available on our wiki.

As mentioned, you can use this directive in the server level configuration, at the virtual host level, or in a site’s .htaccess, and it looks like this:

<IfModule Litespeed>
CacheKeyModify -qs:utm*
</IfModule>

Upper level configurations are inherited by lower levels. If a lower level adds more rules, they are in addition to those of the upper level. If a lower level doesn’t want to use the upper level’s configuration, the “clear” parameter should be used before adding the new rules.

The CacheKeyModify directive can be used multiple times, but it does not support more than one parameter at a time. Each parameter gets its own line, like so:

<IfModule Litespeed>
CacheKeyModify clear
CacheKeyModify -qs:utm*
CacheKeyModify -qs:gclid
CacheKeyModify -qs:mysitenav
</IfModule>

Place a snippet like that in the appropriate configuration file, and you’re good to go!

Query Strings You Can Ignore: pourover coffee grounds closeup

Conclusion

You can save yourself a real cache-induced headache by dropping certain query strings. Think about all of the situations where people are directed to your site with tracking code attached to the URL: mailing lists, ad campaigns, social media… Caching separate copies of such pages is unnecessary. Plus, it’s a waste of precious resources, and a potential speed bump for your visitors.

Take advantage of LiteSpeed’s ability to drop query strings, and streamline your cache!

Have some of your own ideas for future WordPress Wednesday topics? Leave us a comment!

Don’t forget to meet us back here next week for the next installment. In the meantime, here are a few other things you can do:



Related Posts


Comments