Crawling Your PrestaShop or Magento 2 Store

Cache Crawler for PrestaShop and Magento 2

We recently released two new crawler scripts: one for Magento 2 stores using LiteMage, and another for PrestaShop stores using LSCache.

The LSCache/LiteMage crawler, travels its way through your sitemap file, refreshing pages that have expired in the cache. The purpose is to keep the cache as fresh as possible while minimizing visitor exposure to uncached content.

Why Crawl Your Store?

Well, first let’s look at how cache plugins store pages without a crawler. A user request kicks off the whole process. The cache is empty until users start sending requests. The first time a visitor requests a page, the request hits the backend. PHP code is invoked to generate the page, the page is served to the user, and then the page is stored in cache for next time.

That’s a fairly time-consuming process for the server.

Now, let’s look at what happens when a crawler builds the cache. When the crawler requests a page, the request hits the backend. PHP code is invoked to generate the page, but because of the special header that lets LSWS know that this is a crawler that initiated the request, the full page doesn’t need to be served. It is simply stored in cache.

Additionally, with the crawler refreshing expired pages at regular intervals, the chances that a user will encounter an uncached page is significantly diminished. This makes for a faster site.

So, now that you’re sold on the idea, let’s look at how it’s done.

Please Note: The concepts behind the PrestaShop cache crawler script and the Magento 2 cache crawler script are the same, and so the instructions are very similar. Where they differ, we’ll put PrestaShop on the left, and Magento 2 on the right.

Before You Begin

There are a few things you need to do before you can begin to use the new script.

  • You must be using LiteSpeed Cache for PrestaShop or LiteMage 2 for Magento 2. Follow those links to install and enable the relevant extension.
  • The crawler must be enabled at the server level, or you will see the warning message Server crawler engine not enabled. Please check.... If you are using a shared hosting server, please contact your hosting provider, or see our instructions.
  • Prepare your site’s XML sitemap using the tool of your choice.

Generate a Sitemap

If you haven’t yet generated your sitemap, you can use an online sitemap generator such as XML-Sitemaps.com.

After the crawl is finished, click DOWNLOAD YOUR XML SITEMAP FILE and put it where the crawler script can access it.

If you’d prefer to do it from within your site, both PrestaShop and Magento 2 have modules that can generate XML sitemaps.

PrestaShopMagento 2
The Google Sitemap module is quite popular for Prestashop, and it’s fast.

For PrestaShop v1.6, the Google Sitemap Module is installed by default.

For v1.7+, it needs to be installed from source:

  • Download and rename file to gsitemap.zip
  • Navigate to Modules and Services
  • Click Upload a Module and drag the file to into the box to install

Configure the sitemap as desired and press Generate Sitemap. You can find the URL of your new sitemap displayed in the Your Sitemaps section.

Magento 2 has a builtin module for generating a sitemap and it’s fast.

  • Navigate to Magento Admin > Stores > Settings > Configuration > Catalog > XML Sitemap
  • Set Generation Settings > Enabled to Yes
  • Navigate to Magento Admin > Marketing > Seo & Search > Sitemap
  • Click the Add Sitemap button.
  • Set Filename = sitemap.xml and Path = /
  • Click the Save & Generate button

A sitemap.xml file will be generated in your Magento 2 document root.

Using the Cache Crawler Script

Download the crawler script from the appropriate link below, and then change the permissions so that the file is executable, like so:

PrestaShopMagento 2
Download, then chmod +x cachecrawler.sh

Usage: bash cachecrawler.sh https://www.example.com/sitemap.xml

Download, then chmod +x M2-crawler.sh

Usage: bash M2-crawler.sh https://www.example.com/sitemap.xml

The crawler scripts take some parameters:

  • -h to display help
  • -m for when Desktop & Mobile have different themes
  • -i NUM to change the default interval from 0.1s to NUM.

Changing the Crawl Interval

How often do you want to re-initiate the crawling process? This depends on how long it takes to crawl your site and your Public Cache TTL. The default TTL is one day (24 hours). Let’s say you want to run the script by cron job every 12 hours.

Add the following to your crontab to run twice a day, once at 3:30am and once at 3:30pm:

PrestaShopMagento 2
30 3/15 * * * path_to_script/cachecrawler.sh https://www.example.com/sitemap.xml -m -i 0.230 3/15 * * * path_to_script/M2-crawler.sh https://www.example.com/sitemap.xml -m -i 0.2

You can use an online crontab tool to help you to verify time settings.

Verifying the Crawler is Working

When using the browser developer tool, load a page where the TTL has expired. On the first view after the crawler runs, you should see:

PrestaShopMagento 2
X-LiteSpeed-Cache: hitX-LiteSpeed-Cache: hit,litemage

The cache hit indicates that the crawler cached the page for you. If the crawler were not running, you’d have gotten a cache miss.

Any questions about our new crawler scripts? Let us know in the comments!


Categories:LiteMage Cache , LSCache

Related Posts


Comments