Oct 17

Drupal and Varnish HTTP accellerator

For many reasons, it's a good idea to have a HTTP accelerator/reverse proxy on your webserver to take some of the burdens of HTTP and cache away from Apache and PHP. Squid is most commonly used for this purpose, but after hearing about fellow Dane Poul-Henning Kamp's creation, Varnish, I decided to try that out before messing with that many-armed monster, Squid.

And I was very pleasantly surprised. It took me all of 20 minutes to get working, and the greater part of that was changing all my VHost files to use a different port number, as Varnish is now taking port 80.

If you're reading this, it was served through Varnish. Drupal appears to have no issues at all with being accessed through Varnish. All my sites work as they ought to.

All I did was:

  1. Grab the latest release .debs for Ubuntu Hardy 64-bit
  2. Install them on my webserver
  3. Uncomment the
    backend default
    
    setting in
    /etc/varnish/default.vcl
    
  4. Change the port numbers in
    /etc/default/varnish
    
  5. Change the port numbers in
    /etc/apache/ports.conf
    
  6. Restart Apache and Varnish
  7. Profit!

I did do some tests before number 6 to make sure that everything was working, but I didn't have to change anything. Varnished worked as it should out of the box, and with very little labour, my webserver is now much better prepared if I should ever get Digg'd.

This is of course part of a defense in depth, so I still have APC cache running as well, and some other tricks up my sleeve, should everything go crazy.

But I'd just like to take the opportunity to say thank you, phk and the rest of the Varnish team, for making my life easier.

  • tom
    Oct. 18, 2008

    I hate to break it to you,

    I hate to break it to you, but Varnish is not doing anything for you. By default, Varnish will not cache any object when the request contains a cookie. Since Drupal sets session cookies which are present for all requests to your server, Varnish isn't caching a thing. Also, because Drupal sets Expires headers in the past and Cache-Control headers, the most expensive thing you're serving as far as total CPU cycles, queries and I/O (pages generated by Drupal to anon visitors) won't be cached.

    I've recently gone through the process of getting Drupal to play nice with Akamai, and the process for Varnish is very similar. To really make Varnish work, some hacking of bootstrap.inc, index.php, a custom module and changes to your Varnish configuration are required to take full advantage.

    In index.php after $return = menu_execute_active_handler(); <?php global $user; if (!$user->uid && count(drupal_get_messages(NULL, FALSE)) != 0){ header("Expires: Sun, 19 Nov 1978 05:00:00 GMT"); header("Cache-Control: no-cache, no-store"); } ?> This will make sure that pages with messages served to anonymous users will not be cached, or you'd end up serving article pages to anon users with "Your comment has been queued...". Expires is set to Dries's birthday, as is default for Drupal.

    On my site, I've disabled page cache, leaving anonymous caching to our cache (Akamai in my case) so all I have to worry about is drupal_page_header() in bootstrap.inc which now looks like this: <?php function drupal_page_header() { //header("Expires: Sun, 19 Nov 1978 05:00:00 GMT"); header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT"); //header("Cache-Control: store, no-cache, must-revalidate"); //header("Cache-Control: post-check=0, pre-check=0", FALSE); } ?>

    If you're running Drupal 5, which doesn't have D6's ip_address() function, no host based access, poll voting, etc will work without some tweaks. I've added this function in bootstrap.inc, and added a call to it in drupalbootstrap() in the DRUPAL_BOOTSTRAP_CONFIGURATION case. replacing $_SERVER['REMOTE_ADDR'] is a bit of a hack, but is far easier than patching every occurrence of it in all of core with a function call.

    [HTML_REMOVED]

    The last bit of Drupal code is a module to set a cookie so that your cache/proxy can differentiate anonymous from logged in users. Both Akamai (with their advanced configuration package) and Varnish can pass traffic through to the origin and not cache it based on the presence of a cookie.

    [HTML_REMOVED]

    / * Implementation of hook_user() * Set/unset nocache cookie on login/logout / function varnish_user($op, &$edit, &$account, $category = NULL){ switch($op){ case 'login': setcookie('nocache', 1, time()+31536000, '/'); break; case 'logout': setcookie('nocache', '', time()-31536000, '/'); break; } } ?>

    Finally, you're going to need to tell Varnish to cache objects with cookies, except if it's the nocache cookie (unless the file extension is js, css, jpg, jpeg, gif, png, etc.). As I don't have any experience with Varnish's vcl configuration language, and the documentation seems a little sparse, I'll leave this up to you, but I have found a couple places to start: http://varnish.projects.linpro.no/wiki/VCLExampleCacheCookies http://wiki.developer.mindtouch.com/User:PeteE/Varnish_Installation

    There, now Varnish is serving the most expensive portion of your requests, the parts generated by Drupal, and is also serving your js, css and images. You'll probably also want to set TTLs for the various bits you're serving - I've got images and flash at 7 days, css and js at 6 hours and pages at 10 minutes.

  • Oct. 18, 2008

    Drupal needs to change, but it is not easy ...

    Currently Drupal does not send headers that are cache friendly. Each request needs to be revalidated with the server regardless.

    At best, it will cache css, js, and media files only, but not the HTML pages themselves.

    Having the correct headers can be very beneficial to Drupal. See our article [HTML_REMOVED]Increasing Drupal's speed using Squid Caching Reverse Proxy[HTML_REMOVED].

    So Drupal needs HTTP header changes in order to support any caching proxy, whether it is Squid or Varnish or something else.

    The patch used in the article is [HTML_REMOVED]#147310[HTML_REMOVED], and it is stalled because of the impact on intermediate proxies. It works great for users who are not behind a proxy, but once a user is behind a proxy, it will cache its own copy and things will not work when users try to login, ...etc.

    Tom's solution above will also suffer from the same limitation.

  • mikkel
    Oct. 18, 2008

    Ah, why can’t such things

    Ah, why can't such things never be easy ;)

    Well, at least Varnish was easy to configure, now we just need to fix Drupal. I may have an idea, but I'm going to post it at bug #147310 so that others may benefit as well :)

  • tom
    Oct. 18, 2008

    We’ve got Akamai

    We've got Akamai substituting cachebusting headers for paths not matching "/files /sites /misc /modules /themes*", which takes care of the issue of forward proxies between the client and reverse proxy. It is a fairly trivial configuration change on most reverse proxies to rewrite Cache-Control and Expires headers, and makes sure the client (through the forward proxy) is hitting the reverse proxy every time.

About

I am Mikkel Høgh. I have been a web developer for about 8 years. I have my own company, Reveal IT, a long with a couple of friends. We specialize in helping our customers build awesome web sites with open source tools like Drupal and Django.

Subscribe

Elsewhere

Categories

Recent Posts

Archive

BlogRoll

Popular Posts