Boosted Varnish - High Performance Caching made easy

Aug
24

Note: I am hosting a BoF at DrupalCon London about this: Join us in Room 333 on Thursday 25th August from 11:00 - 12:00 (second half).

Introduction

Varnish is a fast, really fast reverse-proxy and a dream for every web developer. It transparently caches images, CSS / Javascript files and content pages, and delivers them blazingly fast without much CPU usage. On the other hand Apache - the most widely used webserver for providing web pages - can be a real bottleneck and take much from the CPU even for just serving static HTML pages.

So that sounds like the perfect solution for our old Drupal 6 site here, right? (Or our new Drupal 7 site.)

We just add Varnish and the site is fast (for anonymous users) ...

The perfect solution?

But if you do just that you'll be severely disappointed, because Varnish does not work with Drupal 6 out of the box and, even with Drupal 7, you can run into problems with contrib modules. You also need to install an extra Varnish Module and learn VCL. And if you have a module using $_SESSION for anonymous users, you have to debug this and find and fix it all, because if Varnish is seeing any cookie it will by default not cache the page. The reason for this is that Varnish can't know if the output is not different, which is actually true for the SESSION cookie in Drupal. (Logged in users see different content from logged out ones). That means that those pages are not cached at all and that is true for all pages on a stock (non pressflow) Drupal installation.

So Varnish is just for experts then? Okay, we go with just Boost then and forget about Varnish. Boost just takes a simple installation and some .htaccess changes to get up and running. And we'll just add more Apache machines to take the load. (10 machines should suffice - no?)

Not any longer! Worry no more: Here comes the ultimate drop-in Varnish configuration (based on the recent Lullabot configuration) that you can just add. With minimal changes, it'll work out of the box.

That means that if you have Boost running successfully and can change your Varnish configuration (and isntall varnish on some server), you can run Varnish, too.

How to Boost your site with Varnish

And here are the very simple steps to upgrade your site from Boost to Boosted Varnish.

1. Download Varnish configuration here: http://www.trellon.com/sites/default/files/boosted-varnish.vcl_.txt
2. Install and configure Boost (follow README.txt or see documentation on Boost project page)
3. Set Boost to aggressivly set its Boost cookie
4. Setup Apache to listen on port 8080
5. Setup Varnish to listen to port 80
6. Replace default.vcl with boosted-varnish.vcl

Now we need to tweak the configuration a little:

There is a field in Boost where you can configure pages that should not be boosted. We want to make sure those pages don't cache in Varnish either.

In Boost this will just be a list like:

user
user/*
my-special-page

In Varnish we have to translate this to a regexp. Find the line in the configuration to change it and do:

##### BOOST CONFIG: Change this to your needs
       # Boost rules from boost configuration
       if (!(req.url ~ "^/(user$|user/|my-special-page)")) {
         unset req.http.Cookie;
       }
##### END BOOST CONFIG: Change this to your needs

And thats it. Now Varnish will cache all boosted pages for at least one hour and work exactly like Boost - only much faster and much more scalable.

We had a site we worked on where we had a time of 4s for a page request under high load and brought this down to 0.17s.

The only caveat to be aware of here is that pages are cached for at least one hour, so there is an hour of delay until content appears for anonymous users. But this can be set to 5 min, too, and you'll still profit from the Varnish caching. In general this setting is similar to the Minimum Cache Lifetime setting found in Pressflow.

The code line to change in boosted-varnish.vcl is:

##### MINIMUM CACHE LIFETIME: Change this to your needs
    # Set how long Varnish will keep it
    set beresp.ttl = 1h;
##### END MINIMUM CACHE LIFETIME: Change this to your needs

Even 5 min of minimum caching time give tremendous scalability improvements.

Actually with this technique I can instantly make any site on the internet running Boost much much faster. I just set the backend to the IP, set the hostname in the VCL and my IP address will serve those pages. So you could even share one Varnish server instance for all of your pages and those of your friends, too. I did experiment with EC2 micro instances and it worked, but for any serious sites you should at least get a small one. I spare the details for another blog post though - if there is interest to explore this further.

How and Why it works

The idea of this configuration is quite simple.

Boost is a solution which works well with many many contrib modules out of the box. With Varnish you need to use Pressflow or Drupal 7 and you need to make sure no contrib modules are opening sessions needlessly, which can be quite a hassle to track down. (Checkout varnish_debug to make this task easier here: http://drupal.org/sandbox/Fabianx/1259074)

But Boost's behavior and rules can be emulated in Varnish, because if it is serving a static HTML page, it could serve also a static object out of the Varnish cache.

And the property that is distinguishing between boosted and non-boosted pages is the DRUPAL_UID cookie set by Boost.

The cookies (and such the anonymous SESSION) are removed whenever Boost would have been serving a static HTML page, which would mean that Drupal never got to see that Cookies in the first place, so we can safely remove them.

The second thing to prevent Drupal from needlessly creating session after session is a very simple rule:

If a SESSION was not sent to the webserver, do not send a SESSION to the client. If a SESSION was sent to the webserver, return the SESSION to the client. So SESSION cookies will only be set on pages that are excluded from caching in Varnish like the user/login pages or POST requests (E.g. forms). As Drupal has the pre-existing SESSION cookie, it does not need to create a new SESSION.

To summarize those rules in a logic scheme:

# Logic is:
#
# * Assume: A cookie is set (we add __varnish=1 to client request to make this always true)
# * If boosted URL -> unset cookie
# * If backend not healthy -> unset cookie
# * If graphic or CSS file -> unset cookie
#
# Backend response:
#
# * If no (SESSION) cookie was send in, don't allow a cookie to go out, because
#   this would overwrite the current SESSION.

Why Boost and Varnish?

Now the question that could come up is: If I have Varnish, why would I need Boost anymore?

Boost has some very advanced expiration characteristics, which can be used for creating a current permanent cache on disk of the pages on the site.

This can help pre-warm the varnish cache in case of a varnish restart. But as it turns out, you can use the stock .htaccess and boosted varnish will still work - as long as the DRUPAL_UID cookie is set. It might be possible as further work to just write a contrib module doing exactly that.

But Boost can also be really helpful in this special configuration as you can set your Varnish cache to a minimum lifetime of - for example - 10 min. And instead of Drupal being hit every 10 min, Apache is just happily serving the static HTML page Boost had created until it expires.

The advantage of that is:

If there are 1000 requests to your frontpage, Apache will just be hit once and then Varnish will serve this cached page to the 1000 clients. So instead of Apache having to serve 1000 page requests, it just have to serve one every 10 min. Multiply that with page assets like images and CSS and JS files and you get some big savings in traffic going to Apache.

Conclusion

Varnish is a great technology, but it has been difficult to configure and there are lots of caveats to think of (especially with Drupal 6). This blog post introduced a new technology called Boosted Varnish, which lets Varnish work with every page that is running the Boost Module by temporarily adding it to the active working set of varnish and fetching it frequently back from the permanent Boost cache on disk. The purpose is not for those that are already running high performance drupal sites with Mercury stack, but for those that are using Boost and want to make their site faster by adding Varnish in front of it without having to worry about Varnish specifics.

I created a sandbox project to create any issues related to the configuration on:

http://drupal.org/sandbox/Fabianx/1259052

Have fun with the configuration and I am happy to hear from you or see you tomorrow at my BoF session at DrupalCon London!

AttachmentSize
boosted-varnish.vcl_.txt10.74 KB

38 Comments

Incredible. This would have

Incredible. This would have made my life so much easier if I knew it a year ago.

Thanks! Yes, mine too ;-).

Thanks! Yes, mine too ;-). That is why I had created it.

Best Regards,

Fabian Franz

Bookmarked :) Excellent

Bookmarked :)

Excellent article. Thanks!

Only one question: given that no release has been tagged yet in the Boost 7.x-dev branch, can this technique be used for Drupal 7 sites as well? The state of the Drupal 7 port of Boost is quite unclear (and the development snapshot hasn't been updated since April 4 — well over 4 months ago).

Hi Wim, It is great to hear

Hi Wim,

It is great to hear from you. CDN is a great module!

Yeah, the state of boost-7.x unfortunately is not as good as I would like it to be.

I believe however that I can create a simple module (in my sandbox) to just set the UID cookie when logged in and delete it when logged out / logging out.

That is all that Boosted Varnish at that point really needs. It would of course rely heavily on the drupal page cache, but should still give significant savings by caching frequently accessed pages in Varnish.

Best Regards,

Fabian Franz

awesome solution for

awesome solution for anonymous traffic-- but is there nothing similar for authenticated traffic?

Yes, there are several

Yes, there are several possible approaches (and we talked about most of them at my BoF session):

In general you will want to make sure to have: APC and Memcache installed and that all your static assets (images, CSS, JS) are either served by a CDN or Varnish.

* Block Cache: This can be cached on a per user basis and this can bring already some savings.
* ESI: Edge-Side Includes - Install esi module and get the content included by Varnish by splitting it into an user specific and anonymous part. Needs proper planning.
* AJAX: Same idea as ESI, but uses JS to do it. (See g.d.o High Performance GRoup for an announcement in a sandbox using something like that)
* Authorize + Retry: Have Varnish connect to a Node.js server, get the session uid and a secret authorize hash combined with that, set a flag in the header, do a retry and now get the page from drupal and cache it in the hash specified by req.http.host + req.http.url + my-secret-cookie.

While the last means that every page is cached for each user, this can still be a viable solution for pages with a rather small working set.

Combine this with some data just saved as cookies (or got in via AJAX) and you already have powerful caching in place.

I hope this helps.

Best Regards,

Fabian Franz

Excellent post! But I wonder,

Excellent post! But I wonder, isn't the Expire module the answer in the longer term? It is the work of the Boost's maintainer, mikeytown2, "ripping the boost expiration code out so other modules can use it". It already integrates with the Varnish module. What do you think?

Hi Tomas, It absolutely is,

Hi Tomas,

It absolutely is, but I believe there is still a usecase for Boosted Varnish now. And even Boosted Varnish can be combined with the expire module.

Still Boost has set some very simple rules of when to serve a page out of the Cache and when passing through to Drupal and those are emulated by Varnish.

Actually I am really looking forward for expire for 7.x and adding a module for boost 6.x, which will hook into the expiration function of boost and PURGE varnish via http PURGE request from Drupal.

So this is again not mutually exclusive.

I hope this answers your question.

Best Regards,

Fabian Franz

This is similar to how we run

This is similar to how we run at my work. Except we use a "notification engine" to re-crawl all expired pages from the boost_cache table. Varnish has a lifetime of 10 days and breaking news hits the front page in under a minute. It's still fairly buggy but it does work (most of the time). My next project after getting ESI going is to work on the expire module and make it plug into boost, varnish, nginx, etc. Aiming for spring of 2012... BUT I need to get ESI working so I don't miss something in expire.

If your looking for a project timeline; you can see when I stopped working on boost
http://drupalcode.org/project/boost.git/shortlog/refs/heads/6.x-1.x
and when I started working on advagg
http://drupalcode.org/project/advagg.git/shortlog/refs/heads/6.x-1.x?pg=3
ESI is next as thats what we need next here @ my work.

BTW if you haven't checked out AdvAgg I highly recommend it.
http://drupal.org/project/advagg

Howdy Fabian, Thank you for

Howdy Fabian,

Thank you for this! With my limited knowledge, I wouldn't have been able to tackle varnish without out. Now, I am successfully running boosted varnish on d6 and it is boatloads quicker than just boost.

One question on your vcl; How would I modify the following section to allow for a single outside IP to access cron?

# Do not allow outside access to cron.php or install.php.
if (req.url ~ "^/(cron|install)\.php$" && !client.ip ~ internal) {
# Have Varnish throw the error directly.
error 404 "Page not found.";
# Use a custom error page that you've defined in Drupal at the path "404".
# set req.url = "/404";
}

I attempted a couple different mods as well as commenting it out and I couldn't get it to work. Any insight would be most appreciated!

Thank you and cheers!
Isaac

Hi Isaac, The solution is

Hi Isaac,

The solution is much easier:

At the very top you find:

acl internal {
  "127.0.0.1";
}

You can change this to read:

acl internal {
  "127.0.0.1";
  "81.25.16.73";
}

or whatever your other external IP is that needs to access cron.

As an explanation:

client.ip ~ internal

checks if the accessing ip is in the "acl" group called "internal".

I hope that helps and I am glad "Boosted Varnish" works well for you!

Best Regards,

Fabian Franz

Thank you; this did the trick

Thank you; this did the trick and is most appreciated. Boosted Varnish rocks!

Cheers!
Isaac

Fabian, When trying to run

Fabian,

When trying to run varnishd with the following (with boosted-varnish as above) -

varnishd -f /etc/varnish/boosted-varnish.vcl -s malloc,512G -T 127.0.0.1:2000 -a 0.0.0.0:80

I get the following error -

Message from VCC-compiler:
Expected ';' got 'req.http.Cookie'
(program line 174), at
('input' Line 45 Pos 41)
  set req.http.Cookie = ";__varnish=1;" req.http.Cookie;
----------------------------------------###############-

Running VCC-compiler failed, exit 1

I'm running Varnish 3.0.1 and assume that this might have something to do with a change in concatenation requirements, but I've yet to figure out how to fix it.

Is there a recommended varnish max version to use with the boosted-varnish.vcl?

Thanks for an informative BOF!

William

Hi Elder Brother, Boosted

Hi Elder Brother,

Boosted Varnish was tested with Varnish 2.1. To run it with either Varnish 2.0 or Varnish 3.01 some changes need to be made.

For 2.0 this would be that the beresp was still called obj and all occurrences would need to be replaced.

For 3.01 I would refer you to the post on upgrading to 3.0:

https://www.varnish-cache.org/docs/3.0/installation/upgrade.html

In particular String concatenation would need an added "+" between the words and beresp.cacheable should be replaced with beresp.ttl > 0s.

Please let me know if that helps you.

Thank you very much.

Best Regards,

Fabian

Hi Fabian, I took my lead

Hi Fabian,

I took my lead form the default vcl for amending the concatenated lines, there are quite a few!

Also needed to change one instance of 'restart' to 'return(restart)'

Now get the following, so I presume Varnish is up and running -

# varnishd -f /etc/varnish/boosted-varnish.vcl -s malloc,512M -T 127.0.0.1:2000 -a 0.0.0.0:80
SMA.s0: max size 512 MB.
SHMFILE owned by running varnishd master (pid=13955)
(Use unique -n arguments if you want multiple instances.)

Now to work out why it isn't delivering my site on port 80... ;)

Thanks!

Could you post your

Could you post your default.vcl for vaarnish3 . I have tired of this update work.

Thank you very much
Ryan

Sorry if there is an obvious

Sorry if there is an obvious implicit answer to this -

What caching mode should be used in Pressflow when using both Boost and Varnish?

Thanks.

Thx for this awesome

Thx for this awesome documentation. I create an video portal and had now some problems with the varnish caching. For Drupal 7 is it nessesary to add

&& !(req.http.Cookie ~ "session_api_session=") to the vcl to get the correct page. Maybe you can extend your documentation for the D7 users.

  # So it is safe to remove all cookies for the boosted paths.
  if (!(req.http.Cookie ~ "DRUPAL_UID=") && !(req.http.Cookie ~ "session_api_session=")) {

Another problem is if I use the filefield uploadprogress bar it always occur an error. I am not able to upload a file anymore. If I use the apache port, which I had to change during the varnish installation it works. Could it be that there is also something missing? Any suggestions are welcome, because google don't deliver me any results for this problem. This default.vcl is the best standard installation I found for Drupal 6 and 7. Thx for sharing your wor here with us.

Greetings Bronco

Since this article is almost

Since this article is almost 6 months old things might have progressed.

I would like to know what the best way is to setup a fast Drupal 7 server.

Is it still as mentioned Varnish & Boost (still looking for a good VLC for D7)
or am I still better of with the combination of Boost, Apc & Memcache which is slight easier to install.

Most of my visitors are anonymous.

I addition will I have any drawback with the one of other caching methode if I want to use Apache Solr 3?

Looking forward for your professional input.

Varnish will always trump

Varnish will always trump (standalone) Boost in terms of performance and given your users are mostly anonymous, I would highly recommend this path.

Boost, apc and memcache are easier to install... but they won't provide as big a performance improvement as varnish. If you don't *need* that much performance, then it might not be worth the effort, but if you are going to see some load, varnish is the hands down winner.

Solr searching won't be cached by boost, nor a proxy. So it's advisable to use memcache and apc to speed up these page loads.

Thanks for the input. Does

Thanks for the input.

Does that mean I would use all 4 types of caching in combination or is it better to use the stated 3 three below?

Varnish (fronted users) + APC + Memcache (back-end usage + Apache Solr)?

If so would you have a suggestion where to get started?

The reasoning given in this

The reasoning given in this post for using Boost + Varnish over stand alone Varnish is that it gives you an additional layer of caching. I.e. If Varnish needs to be restarted and you still have pages cached by Boost, then you're not taxing your server nearly as much by serving html pages with Apache once (which will subsequently be cached by Varnish). Otherwise your entire caching layer is gone (other than Drupal page cache, which is slower than Boost - still bootstrapping Drupal/PHP). For many sites this is overkill, but could keep your site running through an intensely busy period if you needed to make a change to Varnish during that time for example. The disadvantage is you add another layer to the stack, so you're adding complexity (albeit not that much).

Getting started? All of the related modules to support said services come with good documentation (README.txt and INSTALL.txt). Using Ubuntu Server makes installing these services pretty straight forward. sudo apt-get install varnish memcached php-apc However, they'll need some configuration for your setup.

Excellent article. I am

Excellent article.

I am attempting to get this working using Drupal6. I have boost working great and its caching pages as static HTML and writing the boost to the page source.

Varnish is caching all static content images/css etc. I get a HIT on all direct URLs to static content.

My problem is that varnish is giving a MISS on all boosted pages. (clean URLS is enabled).

www.homepage.co.uk
www.homepages.co.uk/menu1/
www.homepages.co.uk/menu2/

etc. these all show a MISS in the headers with varnish debug enabled.

Is there a way I can force varnish to cache the boosted pages as well as static content? I could check for "X-Header: Boost Citrus 1.9" and force it to cache any page with this header or is there a better way?

Thanks,

Could you please check what

Could you please check what Varnish is saying as X-Varnish-Debug-TTL?

For pages that are included in Varnish-Cache I see:

X-Varnish-Debug-TTL: 3600.000

For those that are not using it (and excluded for example) I see:

X-Varnish-Debug-TTL: 120.027

Please also make sure that you are logged out.

An easy way to test headers is also with Curl:

curl -I http://www.homepage.co.uk/ # once to get cache warm

curl -I http://www.homepage.co.uk/ # twice, now you should get a HIT Varnish (1)

Also make sure that you are using the configuration attached to this blog post.

Good Luck!

- Fabian

For the missed pages I

For the missed pages I get:

X-Varnish-Debug-TTL: -1.000

NOT logged in. And repeated curl -I requests still show MISS. I am using the config included.

That is really interesting

That is really interesting and very strange.

I have never seen that before.

Maybe its a problem of a newer Varnish Version?

I do suspect that cacheable is wrong, you might want to look into the log module for varnish and see if you see something in syslog for the line that is setting the magicmarker ...

Good luck!

- Fabian

I am also using :

I am also using : https://github.com/perusio/drupal-with-nginx

perhaps there is a conflict with how boost is setup with nginx and varnish.

X-Varnish-Debug-Age 0 Could

X-Varnish-Debug-Age 0
Could this be drupal/boost setting the max age of pages to zero, thus the boosted pages are never getting cached in varnish?

ah now I am getting

ah now I am getting somewhere...my nginx + boost config is setting Expires header to 1977..

location = / {
add_header Expires "Tue, 13 Jun 1977 03:45:00 GMT";
# We bypass all delays in the post-check and pre-check
# parameters of Cache-Control. Both set to 0.
add_header Cache-Control "must-revalidate, post-check=0, pre-check=0";
# Funny...perhaps. Egocentric? Damn right!;
add_header X-Header "Boost Citrus 1.9";

try_files /cache/normal/$host/_${args}.html /cache/perm/$host/_.css /cache/perm/$host/_.js /cache/$host/0/.html /cache/$host/0/index.html /index.php;
}

What do you suggest changing the headers too, so that the HTML gets cached in Varnish also?

Thanks!

Varnish should work with

Varnish should work with those Boost Headers just fine.

What I would try first is:

* Remove NGINX out of the mix and use Apache (with or without boost htaccess rules). That will show you if its a problem with Drupal or the Varnish configuration.
* Add NGINX, do not add any headers and push everything to Drupal.
* Re-Add headers, but push everything to Drupal.
* Re-add the boost static caching code.

That is how I would about go solving this problem.

Best regards,

Fabian

and again: # Now for some

and again:

# Now for some header tweaking. We use a date that differs
# from stock Drupal. Everyone seems to be using their
# birthdate. Why go against the grain?
add_header Expires "Tue, 13 Jun 1977 03:45:00 GMT";
# We bypass all delays in the post-check and pre-check
# parameters of Cache-Control. Both set to 0.
add_header Cache-Control "must-revalidate, post-check=0, pre-check=0";
# Funny...perhaps. Egocentric? Damn right!;
add_header X-Header "Boost Citrus 1.9";

# We try each boost URI in succession, if every one of them
# fails then relay to Drupal.
try_files /cache/normal/$host${uri}_${args}.html /cache/perm/$host${uri}_.css /cache/perm/$host${uri}_.js /cache/$host/0$uri.html /cache/$host/0${uri}/index.html /index.php?q=$no_slash_uri&$args;
}

# We try each boost URI in

# We try each boost URI in succession, if every one of them
# fails then relay to Drupal.

Doesn't Apache take care of this automatically if Varnish hits Apache?

Has anyone got this working

Has anyone got this working with Drupal 7, Varnish 3, and Boost?

I've installed boost and when I go to the status report page (/admin/reports/status) I see "Boost will not function properly while Drupal core cache is enabled. Disable Boost or the core cache." So I disabled the core cache.

When I do that I noticed the following HTTP header being set:
Cache Control: no-cache, must-revalidate, post-check=0, pre-check=0

Re-enable the core cache and the headers read:
Cache Control: public, max-age=900

So essentially boost insists on core cache being disabled which sets "no-cache" essentially preventing varnish from caching.

Or you leave core cache enabled, the cache control headers are set to public, but then boost doesn't work.

How do I get varnish + boost + the cache control being set to public? Do I have to hack core and/or boost?

Thanks
Brad

How Can I warmup the cache

How Can I warmup the cache (auto preload the cache before it expire) ratha than the first use access? please guid me!!!

You should look into the

You should look into the "Boost crawler" settings, which essentially loads a set of pages during a cron run. There's lots of options there, good luck!

After lot of research I don't

After lot of research I don't see this method as something worth trying. I wrote a lot about this here http://drupaldump.com/boost-and-varnish-and-general-tips-and-rants-about... but to sum it up, you don't gain anything with using boost.

Varnish is much less reliable

Varnish is much less reliable a cache than static files on disk, but it's also much faster. The advantage of boost+varnish is that with a warm boost cache, server load will be less during a varnish restart. That said, not all use cases will benefit from this configuration. It's really up to you to test and find what is best for your use case.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options