WpW: Limiting the LSCache Crawler

February 14th, 2018 by LSCache 0 Comments

WordPress Wednesday: Limiting the LiteSpeed Cache Crawler

Welcome to another installment of WordPress Wednesday!
Today’s topic is: Limiting the LiteSpeed Cache Crawler

Disclaimer: The information contained in this post is accurate for LSCWP v1.9 [release log]. If you are using a newer version of the plugin, some details may have changed. Please refer to our wiki for the latest!

When we introduced the LiteSpeed Cache Crawler in our WordPress plugin last May, we told you that the crawler has the potential to consume significant system resources, if not configured with efficiency in mind. It was for this reason that we put the on/off switch in the hands of the hosting providers.

Some hosts wanted the ability to allow their clients the use of a LSCWP crawler, but only if the providers had the ability to set some limits.

We thought that was a great idea, and so today we’re going to talk about the new crawler-specific server variables that we created in response to this request.

Why Limit the Crawler?

 

Limiting the LSCache Crawler: Forest Tent Caterpillar Moth

The settings in LSCWP’s Crawler tab have always been designed to minimize server impact. Site admins are encouraged to configure their crawler to be conservative in its usage of resources. That said, users are free to configure it however they like. Site admins on shared hosting could conceivably use crawler settings that bring a server to its knees if they wanted to, negatively impacting every other site on that server.

With no safeguards in place to protect against the rogue site owner in the second scenario, many shared hosting providers have decided not to take a chance on enabling the crawler.

But now, with these new server variables, we are giving hosting providers the ability to set some boundaries for the site owners. The idea being, if they can guarantee that no client can hog all of the resources, then there should be enough to go around.

Exactly What Do These New Variables Limit?

Currently hosting providers can potentially limit two of the crawler settings that are found in LiteSpeed Cache > Settings > Crawler.

Delay

This setting lets LSCache know how often to send a new request to the server. The longer the delay, the less load there is on the server, but the longer it takes to crawl. LSCache defaults Delay to 500 microseconds, but site admins may change the number to whatever they like.

The CRAWLER_USLEEP variable puts a minimum allowed value on the Delay field. You can specify a large number, if you want to slow down all of the crawlers on your server and minimize their impact.

For example, if you set CRAWLER_USLEEP to 1000 microseconds, then your site admins may set their own crawler delays to any number greater than or equal to 1000. All site crawlers across the server will never be allowed to run faster than 1000 microseconds between requests.

Server Load Limit

Server Load Limit keeps the crawler from monopolizing system resources. Once the load reaches this limit, the crawler will be terminated rather than allowing it to compromise server performance. LSCache defaults Server Load Limit to 1, which is fairly conservative, but site admins have the ability to crank it up as high as they like.

With the following server variables, hosting providers can prevent site owners from setting the limit too high.

  • CRAWLER_LOAD_LIMIT sets a default value for the Server Load Limit field.
  • CRAWLER_LOAD_LIMIT_ENFORCE sets a maximum allowed value on the Server Load Limit field. While the previous variable is more of a suggestion, this is an enforced maximum.

For example, if you set CRAWLER_LOAD_LIMIT to 3, then 3 will override the LSCache default of 1. If a site admin enables crawling without changing any of the settings, that site’s Server Load Limit will be 3.

If you use CRAWLER_LOAD_LIMIT by itself, site admins are still free to change it to whatever they like. To prevent this behavior, and set a maximum Server Load Limit, you would use CRAWLER_LOAD_LIMIT_ENFORCE. Set it to 6, for example, and the site owners will not be allowed to exceed a Server Load Limit value of 6.

How to Limit the Crawler

Limiting the LSCache Crawler: Forest Tent Caterpillar Moth

To use the server variables, add them one-per-line to the appropriate configuration file. For example:

CacheEngine on crawler

SetEnv CRAWLER_USLEEP 1000
SetEnv CRAWLER_LOAD_LIMIT_ENFORCE 6

The variables can be used server-wide or on a virtual-host basis, and are completely optional. We’ve incorporated the server variable instructions into our Enabling the Crawler instructions. Check that out to get specific directions for your control panel, for LiteSpeed native WebAdmin, or for no control panel at all.

Hosting providers, we hope that these new variables will allow some of you to open up crawler usage to more of your clients!

Site owners, if you’ve been unable to persuade your hosting provider to enable crawling, maybe this new ability to limit crawler impact will change their minds! Feel free to share this blog post with your host, or share a link to the crawler setup page on our wiki.

Have some of your own ideas for future WordPress Wednesday topics? Leave us a comment!

Don’t forget to meet us back here next week for the next installment. In the meantime, here are a few other things you can do:


Categories:LSCache

Related Posts


Comments