Blog RSS Feed Subscribe

Jordi Boggiano

Jordi Boggiano Passionate web developer, specialized in web performance and php. Partner at Nelmio, information junkie and speaker.

Categories

ESI - Full page caching with Symfony2

Launched about a month ago, techup.ch runs on the Symfony2 PHP framework, which is still undergoing heavy development but is already a great framework.

Full page caching basics

Don't get me wrong, the framework is fast, pages are rendered by our fairly modest server in 40-50ms on average, so it hardly needs optimization. However I still wanted to try and squeeze more speed out of it, and also get a chance to play with cool stuff, so I decided to implement full page caching with ESI into the application.

The way this works is that you typically install some reverse proxy like Varnish, which will sit between the web and your http server. More complex setups might include another http server in front of varnish to gzip output but I won't go into details on that in this post. The purpose of the reverse proxy is that it will cache the output of your application, for as long as you specify in your Cache-Control header. Once a page is cached, it will just return the output to the clients straight, without ever hitting your http server, php, or your application. Needless to say this is ideal for performance. Symfony2 is a great match for this type of cache because it's supported natively as I'll show, and it also implements a reverse proxy layer in php, that can be used for development or on hostings where you can't have access to Varnish. It acts just the same and is automatically turned off if an ESI-capable proxy is added in front of php.

ESI awesomeness

Of course the issue with caching the entire output is that most sites have areas with dynamic content, especially when users are logged in. This is where ESI comes into play. ESI stands for Edge Side Includes, and is a standard that defines a way to tell reverse proxies how to assemble pages out of smaller bits, that can be cached for various amounts of time, or fully dynamic.

So if you take for example an event page on techup, you have two user-dependent areas, the "login with twitter" button, which turns into "@username" once you're logged in, and the "attend" button is also showing attend or unattend depending on the user viewing the page. Those two areas are ESI includes. What this means for the reverse proxy is that it will first try and fetch the main page content out of its cache, and if found, it will then process the <esi:include src="http://..." /> tags that it finds. Those tags contain the url to a sub-component of the page. So one url will point to an action in one of my controllers that only outputs an attend button, green or red depending on the user viewing it. The rest of the page is still taken out of the cache.

Each of those sub-components have their own Cache-Control header, which means that you can composite a page with various components that expire after various durations.

The way this is done in Symfony2 is pretty straightforward. Your controller actions must always return a response object, and all you need for the reverse proxy cache length is to set the shared max age of the response - beware, max age will apply to the entire page, so you really want to use the shared variant. It's as simple as calling $response->setSharedMaxAge(3600);, 3600 being the length in seconds.

In your templates, if you use Twig, and you really should with Symfony2, it is also quite easy to define an <esi:include /> tag. You call out the controller/action that you want to execute, give it some parameters, and specify it as being standalone which means it's an ESI include, for example {% render 'FooBundle:Default:attendButton' with ['event_id': event.id], ['standalone': true] %}. For more info on how to set that up feel free to go read the Symfony2 docs on the topic.

Invalidation woes

The tricky part, which is also a slightly controversial topic, is invalidation. In theory if you say that a page or sub-component is cache-able for X seconds, you should just live with it and let it be cached, even if the data changed. Now this is an acceptable downside on really high traffic sites, or in cases where only admins publish content and it doesn't really matter if it takes a few seconds/minutes to appear to the end users. But I like to give our users feedback when they add or change data, and I think they should see it straight away, so I decided to invalidate the cached pages in the proxy whenever the data is modified.

I will refer you to the docs as to how to actually setup support for purging (invalidating) caches in your proxy of choice, no point in repeating it all here, but what I want to share is the approach I took on actually managing invalidation. As you may know, invalidation can quickly get very tricky to handle. So what I did is just built centralized methods that contain all the invalidation logic for one domain model. When that model changes, it's passed to the matching method and all the urls that will render it are purged. This at least allows you to keep a good overview of the pages that are affected, and gives you a single point of entry to make adjustments to those invalidation rules.

// src/Application/FooBundle/Controller/FooController.php
protected function invalidateEvent($event)
{
    $args = array('event' => $event->getId(), 'title' => $event->getSlug());
    $this->invalidate('viewEvent', $args);
    $this->invalidate('home');
}

protected function invalidate($route, $parameters = array())
{
    $url = $this->router->generate($route, $parameters, true);

    $context = stream_context_create(array('http'=>array('method'=>'PURGE')));
    $stream = fopen($url, 'r', false, $context);
    fclose($stream);
}

This example implementation will do a PURGE request to the site URL. This only scales if you have one single Varnish instance though. I assume you must do a PURGE request on each if you have a redundant setup, but in this case it might become cleaner to use an external job queue like Gearman to execute those outside of php.

There are a few gotchas you should consider, especially if you use the Symfony2 reverse proxy and not Varnish. First of all one thing that is fairly obvious is that you must prevent anyone from purging stuff, otherwise attackers could DDoS you with PURGE requests and make your load skyrocket. The second issue is that if you return a 404 code for "Not purged" a.k.a the page wasn't cached, fopen() will throw a php warning, which is really not that nice. For this reason, and since I don't want to care whether the purge happened or not for now, I chose to just respond always with a 200. It could be handled nicer with curl though, if you really need to have a proper response code to your PURGE requests.

// app/AppCache.php
protected function invalidate(Request $request)
{
    if ($_SERVER['SERVER_ADDR'] !== $request->getClientIp() || 'PURGE' !== $request->getMethod()) {
        return parent::invalidate($request);
    }

    $response = new Response();
    if (!$this->store->purge($request->getUri())) {
        $response->setStatusCode(200, 'Not purged');
    } else {
        $response->setStatusCode(200, 'Purged');
    }

    return $response;
}

The results

It sounds nice and all, but is it actually working?

I used JMeter to benchmark the site with and without reverse proxy. Note that I used the integrated Symfony cache layer and not Varnish, so the results would be even better with Varnish since it's written in C and doesn't have to to hit apache and php on every request.

Before:

/ => 63req/sec
/86/rails-hock => 100req/sec
/api/events/upcoming.json => 70req/sec
/api/event/10.json => 120req/sec

After:

/ => 200req/sec *
/86/rails-hock => 230req/sec
/api/events/upcoming.json => 100req/sec *
/api/event/10.json => 800req/sec

* my 20mbps internet line was the bottleneck for those because they have too large response bodies

In short: Holy crap. Now for the two first pages tested, the improvement is "modest" because they include sub-components which are not cacheable, so they always require some full framework cycles. But the last one which is from the API is just amazing, with 8 times more requests processed per second.

All I can say to conclude is that this is worth playing with, and that Symfony2 really doesn't disappoint with regard to speed. If you have any experience with that kind of setup and want to add anything feel free to do so in the comments, questions are also welcome.

December 09, 2010 // PHP

Post a comment:


Formatting: you may use [code php] [/code] (or other languages) for code blocks, links are automatically linked. <strong>, <em> and <blockquote> html tags are allowed without nesting, the rest will be escaped.

Subscribe to this RSS Feed Comments

2010-12-09 17:07:30

greut

Why do you need the kitchen sink when some great improvements can be done with very simple template-level caching using APC/Memcache?

https://github.com/greut/template/blob/master/src/template.php#L95-120
Cheers

2010-12-09 17:14:23

Seldaek

@greut: great improvements can be done with output cache in the application itself, sure, or even just caching the data before rendering. However leveraging the http protocol is imo more elegant, and since it bypasses the entire application and even the http server/php if you install Varnish, it's even greater improvements. I'm not saying it's the only way to increase performance, but it sure is a good one.

2010-12-09 17:35:58

Jérôme

Hi Jordi,

You do not have to bother with gearman to send a PURGE to Varnish. Using one Varnish (well two for failover) is perfectly fine for the average website.

Sending a PURGE with register_shutdown_function after the HTML is flushed is an acceptable alternative to Gearman for most use cases.

'Hope that helps.

:)

2010-12-09 18:43:25

Joshua Jonah

Why not just use SSI tags? I don't quite get the difference. Something like this: http://joshuajonah.ca/blog/2010/06/18/poor-mans-esi-nginx-ssis-and-django/

2010-12-09 19:09:24

Jérôme

Joshua :
ESI and SSI are more or less the same concept. They just do not operate at the same level, and it makes sense (sometimes) to use the ReverseProxy to do the job so you can leave your webservers alone :)

Cheers :)

2010-12-09 20:45:06

David Zuelke

Since PHP 5.3, you can supply the "ignore_errors" context option (set it to true, obviously) to suppress errors when the response status code isn't 200. Guess who wrote the patch ;)

Anyway, what you're doing with the 200 code isn't kosher. You attach two different meanings to the same status code, not good HTTP behavior. However, semantically, PURGE should, in my opinion, always succeed, so always sending "200 OK" in both cases is fine, as it isn't really an error condition if you PURGE a resource that doesn't exist. You should also consider "204 No Content" as an alternative, or, if you want to use a 4xx code, "409 Conflict" or "410 Gone".

2010-12-09 20:48:49

David Zuelke

greut: because it doesn't scale as well. Doing it purely on the HTTP level does scale very well.

Joshua: AFAIK, nginx doesn't do HTTP level caching like a proxy, and probably not for SSIs either. With Jordi's approach, each ESI goes through Varnish as well, where it could be cached, too (with different TTLs, for instance)

2010-12-10 07:00:27

Fabien

Jordi described an example with expiration, but the same can be done with validation or a combination of the two HTTP caching model. This is indeed a very powerful technique.

2010-12-10 13:19:48

Seldaek

David: Yeah as I said the 200 isn't very pretty it was a quick way to shut off the error, but I also think that purging "nothing" is not a failure, so 204 No Content sounds good. Joshua: the great approach with ESI I think is that if you use the php reverse proxy from the framework itself you can deploy it anywhere and it'll work, it's not restricted to nginx users.

2010-12-14 14:26:03

Kevin

Can you not use an eTag solution to invalidate the cache?

2010-12-14 15:41:29

Seldaek

@Kevin: Yes, you can use ETags of course, but with ETags you still need to query the framework and fetch the content to somehow be able to generate an ETag to compare against, so it's not as fast as a Cache-Control header which just caches a resource for good until it expires.

2011-04-07 10:42:52

Matt

> if you use the Symfony2 reverse proxy

Can you explain a little bit more?

How to set up Symfony reverse proxy?

2011-04-07 14:18:00

Seldaek

@Matt: You can read more about it there http://symfony.com/doc/2.0/book/http_cache.html

2011-05-12 01:12:09

kreischweide

Thanks for informative example! One question on the @username and attend-button approach: How do you distinguish the current user? As the <esi:include> definition seems to be a part of the parent context (which is cached and public), how can you include a private element and pass the user or state?

2011-05-12 02:16:10

Seldaek

@kreischweide: Basically the reverse proxy will do the request to the parts that it does not have in the cache, and forward the user cookies and stuff to that request, so since those are never cached, they are rendered for every user on every request according to their own session. I hope that clears it up.

2011-07-06 00:45:03

Mat

Thanks for this article :)

2011-12-12 20:12:55

Nico

Nice article, thank you!

I tried your code, but I have a "500 Internal Server Error", because fopen doesn't work. Do you have any idea?

Thanks!

2013-10-03 08:16:12

Ardian

My app got correct Cache Header, the value is :

Age:4
Cache-Control:max-age=0, public, s-maxage=36000

But why the status code code is 200 and not 302 ?
Is 302 status is only for Validation method with if modified since ?

Also my web developer toolbar still show that my app run some doctrine queries. So I think although the response Cache Header is correct, the request is still full processed within my app.

Thanks for your help.

2013-11-03 09:42:13

Vladimir

If you will return code 304 use this example in your controller:

$response = new Response();
$response->setLastModified($article->getPublishedAt());

if ($response->isNotModified($this->getRequest())) {
return $response; // this will return the 304 if the cache is OK
}

2014-04-03 17:19:25

Rick

Hi Jordi,

Nice article.

I'm struggeling with some embedded content using "render_esi" helper. The page with this embedded content is only accessable to authorized users. Problem is that the user sessions somehow gets lost and must login again.

Any thoughts?