Boosted Varnish - High Performance Caching made easy

Error message

The spam filter installed on this site is currently unavailable. Per site policy, we are unable to accept new submissions until that problem is resolved. Please try resubmitting the form in a couple of minutes.

Note: I am hosting a BoF at DrupalCon London about this: Join us in Room 333 on Thursday 25th August from 11:00 - 12:00 (second half).

Introduction

Varnish is a fast, really fast reverse-proxy and a dream for every web developer. It transparently caches images, CSS / Javascript files and content pages, and delivers them blazingly fast without much CPU usage. On the other hand Apache - the most widely used webserver for providing web pages - can be a real bottleneck and take much from the CPU even for just serving static HTML pages.

So that sounds like the perfect solution for our old Drupal 6 site here, right? (Or our new Drupal 7 site.)

We just add Varnish and the site is fast (for anonymous users) ...

The perfect solution?

But if you do just that you'll be severely disappointed, because Varnish does not work with Drupal 6 out of the box and, even with Drupal 7, you can run into problems with contrib modules. You also need to install an extra Varnish Module and learn VCL. And if you have a module using $_SESSION for anonymous users, you have to debug this and find and fix it all, because if Varnish is seeing any cookie it will by default not cache the page. The reason for this is that Varnish can't know if the output is not different, which is actually true for the SESSION cookie in Drupal. (Logged in users see different content from logged out ones). That means that those pages are not cached at all and that is true for all pages on a stock (non pressflow) Drupal installation.

So Varnish is just for experts then? Okay, we go with just Boost then and forget about Varnish. Boost just takes a simple installation and some .htaccess changes to get up and running. And we'll just add more Apache machines to take the load. (10 machines should suffice - no?)

Not any longer! Worry no more: Here comes the ultimate drop-in Varnish configuration (based on the recent Lullabot configuration) that you can just add. With minimal changes, it'll work out of the box.

That means that if you have Boost running successfully and can change your Varnish configuration (and isntall varnish on some server), you can run Varnish, too.

How to Boost your site with Varnish

And here are the very simple steps to upgrade your site from Boost to Boosted Varnish.

1. Download Varnish configuration here: http://www.trellon.com/sites/default/files/boosted-varnish.vcl_.txt
2. Install and configure Boost (follow README.txt or see documentation on Boost project page)
3. Set Boost to aggressivly set its Boost cookie
4. Setup Apache to listen on port 8080
5. Setup Varnish to listen to port 80
6. Replace default.vcl with boosted-varnish.vcl

Now we need to tweak the configuration a little:

There is a field in Boost where you can configure pages that should not be boosted. We want to make sure those pages don't cache in Varnish either.

In Boost this will just be a list like:

user
user/*
my-special-page

In Varnish we have to translate this to a regexp. Find the line in the configuration to change it and do:

##### BOOST CONFIG: Change this to your needs
       # Boost rules from boost configuration
       if (!(req.url ~ "^/(user$|user/|my-special-page)")) {
         unset req.http.Cookie;
       }
##### END BOOST CONFIG: Change this to your needs

And thats it. Now Varnish will cache all boosted pages for at least one hour and work exactly like Boost - only much faster and much more scalable.

We had a site we worked on where we had a time of 4s for a page request under high load and brought this down to 0.17s.

The only caveat to be aware of here is that pages are cached for at least one hour, so there is an hour of delay until content appears for anonymous users. But this can be set to 5 min, too, and you'll still profit from the Varnish caching. In general this setting is similar to the Minimum Cache Lifetime setting found in Pressflow.

The code line to change in boosted-varnish.vcl is:

##### MINIMUM CACHE LIFETIME: Change this to your needs
    # Set how long Varnish will keep it
    set beresp.ttl = 1h;
##### END MINIMUM CACHE LIFETIME: Change this to your needs

Even 5 min of minimum caching time give tremendous scalability improvements.

Actually with this technique I can instantly make any site on the internet running Boost much much faster. I just set the backend to the IP, set the hostname in the VCL and my IP address will serve those pages. So you could even share one Varnish server instance for all of your pages and those of your friends, too. I did experiment with EC2 micro instances and it worked, but for any serious sites you should at least get a small one. I spare the details for another blog post though - if there is interest to explore this further.

How and Why it works

The idea of this configuration is quite simple.

Boost is a solution which works well with many many contrib modules out of the box. With Varnish you need to use Pressflow or Drupal 7 and you need to make sure no contrib modules are opening sessions needlessly, which can be quite a hassle to track down. (Checkout varnish_debug to make this task easier here: http://drupal.org/sandbox/Fabianx/1259074)

But Boost's behavior and rules can be emulated in Varnish, because if it is serving a static HTML page, it could serve also a static object out of the Varnish cache.

And the property that is distinguishing between boosted and non-boosted pages is the DRUPAL_UID cookie set by Boost.

The cookies (and such the anonymous SESSION) are removed whenever Boost would have been serving a static HTML page, which would mean that Drupal never got to see that Cookies in the first place, so we can safely remove them.

The second thing to prevent Drupal from needlessly creating session after session is a very simple rule:

If a SESSION was not sent to the webserver, do not send a SESSION to the client. If a SESSION was sent to the webserver, return the SESSION to the client. So SESSION cookies will only be set on pages that are excluded from caching in Varnish like the user/login pages or POST requests (E.g. forms). As Drupal has the pre-existing SESSION cookie, it does not need to create a new SESSION.

To summarize those rules in a logic scheme:

# Logic is:
#
# * Assume: A cookie is set (we add __varnish=1 to client request to make this always true)
# * If boosted URL -> unset cookie
# * If backend not healthy -> unset cookie
# * If graphic or CSS file -> unset cookie
#
# Backend response:
#
# * If no (SESSION) cookie was send in, don't allow a cookie to go out, because
#   this would overwrite the current SESSION.

Why Boost and Varnish?

Now the question that could come up is: If I have Varnish, why would I need Boost anymore?

Boost has some very advanced expiration characteristics, which can be used for creating a current permanent cache on disk of the pages on the site.

This can help pre-warm the varnish cache in case of a varnish restart. But as it turns out, you can use the stock .htaccess and boosted varnish will still work - as long as the DRUPAL_UID cookie is set. It might be possible as further work to just write a contrib module doing exactly that.

But Boost can also be really helpful in this special configuration as you can set your Varnish cache to a minimum lifetime of - for example - 10 min. And instead of Drupal being hit every 10 min, Apache is just happily serving the static HTML page Boost had created until it expires.

The advantage of that is:

If there are 1000 requests to your frontpage, Apache will just be hit once and then Varnish will serve this cached page to the 1000 clients. So instead of Apache having to serve 1000 page requests, it just have to serve one every 10 min. Multiply that with page assets like images and CSS and JS files and you get some big savings in traffic going to Apache.

Conclusion

Varnish is a great technology, but it has been difficult to configure and there are lots of caveats to think of (especially with Drupal 6). This blog post introduced a new technology called Boosted Varnish, which lets Varnish work with every page that is running the Boost Module by temporarily adding it to the active working set of varnish and fetching it frequently back from the permanent Boost cache on disk. The purpose is not for those that are already running high performance drupal sites with Mercury stack, but for those that are using Boost and want to make their site faster by adding Varnish in front of it without having to worry about Varnish specifics.

I created a sandbox project to create any issues related to the configuration on:

http://drupal.org/sandbox/Fabianx/1259052

Have fun with the configuration and I am happy to hear from you or see you tomorrow at my BoF session at DrupalCon London!