wallpaperscraft.com

Introducing sub-second cache purge

One of the main ways for a content delivery network to accelerate a piece of content is to cache it on an edge server in close proximity to the visitor. Once cached (usually after the first request) the content is served to all visitors at high speed, without the need to contact the origin, which may be physically located in a distant country or even on another continent. This greatly accelerates the rate at which visitors can fetch a copy of your content and it also allows you to scale seamlessly since all the load from the origin is shifted to the CDN.

However, there is also a down side to caching. Every now and then you may wish to update the contents of a file, or simply push a new version of a software without changing it's file name. Since the old version will still be in the CDN cache, the new version will not be served until the cache is considered expired, unless you let the CDN know that it should fetch a fresh copy.

The process of doing this is called cache purge. Until recently, at PUSHR we've been using a traditional approach to cache purging which involves making a request to all edge cache servers on the network with a specific header that, when sent, forces the web servers to remove the cached file. The next time when a visitor requests this same file, the edge server would contact the origin and fetch a fresh copy. This has worked well and has been quite a painless process except that it has been quite slow and relies on one more third-party module that needs be built into the web servers. To add to this, the module itself seems to be abandoned as it's last commit on Github was on Dec 23, 2014.

On to sockets, then!

We've decided to come up with a cache purging solution of our own. It had to have the following features:

  • Be faster than the current approach
  • Have as little dependencies as possible so that it could be portable and would work out of the box on various Linux systems
  • Could be easily tweaked to purge almost any file on the system

To achieve that we are using something that most modern Linux distros have by default: Netcat. As Wikipedia puts it"Netcat (often abbreviated to nc) is a computer networking utility for reading from and writing to network connections using TCP or UDP. Netcat is designed to be a dependable back-end that can be used directly or easily driven by other programs and scripts."

One thing that Wikipedia doesn't say is that Netcat is fast. Where a complete purge of a file from the whole network would previously last as long as it's slowest web server needs to process the request, using nc has allowed us to complete this action pretty much instantly. At PUSHR we monitor our network constantly and have alerts for pretty much everything everywhere, but it's still better not to rely on a web server because let's be honest - In the real world they can and will eventually be overloaded at one point or another, thanks to one attack or another.

We've been observing anywhere from 1 up to 5 seconds purge time if some edges were experiencing high load or extremely high number of connections with the old approach. Using sockets, we've observed a consistent sub-second purge times for the whole network. All edges will receive a command and a target directory to work with, and will simply remove the cached file(s) via simple default operations, without third-party software.

Previously, we used to wait for a response from the edge locations which we would then parse to figure out if the operation went as expected or if there was an error. This meant even more delays, and eventually as the network grows, purge times would slowly get higher and the whole process would be getting slower and heavier. With PUSHR's sub-second purge we no longer do this. We expect no answer at all unless something goes wrong which makes the action a one way communication under normal operation. If something goes wrong, the customer is notified right away. The purge is a parallel process that is fired instantly to all edge locations at once just like before but it is now faster, lighter and less "chatty". In addition, it scales great.

But...numbers?

The highest recorded purge time of the new approach to date is 894ms during peak hour.