Browser caching is one of the best ways to take care of your users while also speeding up your website... so why is it so hard?
This article was originally published on 19 October, 2014 at Medium.
Browser caching is a magical thing. Every time you visit a website, the files that make up that site are downloaded to your computer by your browser, which then uses those files to render it for you. This takes time and valuable bandwidth, especially on slower connections. The next time you visit that same site, however, the files needed to render it are already downloaded to your computer. Your browser will use the files it already has to render the site, preventing you from having to waste your bandwidth and wait for the files to download.
Now, this is what happens in a wonderful, magical fairy tale land where all web developers do the right thing and make sure the browser caching is set up on their site. Unfortunately, this is rarely the case. In the spirit of making up statistics, I estimate that somewhere between 30โ90% of websites donโt just leverage browser caching, theyโre actively working against it. ๐
See, there are a couple of ways to implement browser caching. You can use whatโs called a cache-control
header, which tells the browser how long it can hold on to the file. If you look at my own website, you can see that the Google Analytics script Iโm downloading from Google has this header set.
Two hours isnโt bad. If I revisit my site sometime within the next two hours, my browser will still use the version of that file that I just downloaded. The cool part is that since itโs being served by Google instead of by my server, that file will be cached for every website that uses it, including mine!
However, if Google updates that file within the next 2 hours, I wonโt get those changes. ๐๐
This is also a great example of where a lot of websites use a technique called cache busting. Cache busting is where you fool the browser into thinking something like this Google Analytics script is a different file than it previously downloaded, guaranteeing the user always gets the latest version of the file. Developers achieve this by adding something like ?timestamp=2017-10-19T13:04:29.423Z
to the end of the filename. Itโs just a parameter that uses the current date and time, so the browser always thinks that this file is different and, thus, always downloads a new version. Now, every time I go to that site Iโll have to download a new copy of that file.
๐๐๐
So... is there a better way?
Remember when I said there are a couple of ways to implement browser caching? Letโs talk about that second one, which is called ETags. ETags are something that the server generates and sends along when a browser tries to retrieve a file for the first time. Hereโs an example of a fresh request for one of the files served from my website.
This is my logo, which has an etag
header set. This header contains a calculated hash of the file that Iโm downloading. The server created that hash for me and sent it up with the file. Now letโs take a look at the second time I request that file.
Since the browser received an etag
header when it first downloaded this file, it made sure to send that ETag back to the server in the if-none-match
header. The server will compare that ETag against what it calculates for its version of the file. If they donโt match, the server will send a new copy of the file to the browser.
However, if they do match it wonโt send back a file at all. Instead it will send back a 304 Not Modified
response, telling the browser nothing has changed. The Trezy.com server has just saved me time and bandwidth. Just how much has it actually saved me, though?
On a fresh visit to Trezy.com, the website downloads 781KB of files and takes around 1.34 seconds to finish. Thatโs just for my very tiny website. For other websites that you visit regularly, you can imagine that those numbers will be much larger.
On a second, cache-friendly visit to my website, though, it only downloads 46.1KB of files and takes 1.05 seconds to finish. Thatโs a 94% savings in bandwidth, and 22% savings in response times. Again, this is just on my very, very tiny website. Imagine what those savings could look like on a larger website like Facebook or Twitter! ๐ค
Letโs take a look at some implementation
Weโll look at the very simple implementation of ETags I used for my website, which runs on a Koa.js server. Below is the core of what koa-etag
does. It takes care of making sure that your serverโs response have ETags calculated and attached to the responses.
function etag(options) {
return function etag(ctx, next) {
return next()
// `getResponseEntity` checks to see if the response has a body, and
// stringifies it depending on what kind of body it has (JSON, HTML,
// image, etc)
.then(() => getResponseEntity(ctx))
// `setEtag` calculates the ETag from the response body if it exists,
// then sets that ETag as a header
.then(entity => setEtag(ctx, entity, options));
};
}
This isnโt all we need to do, though. The response from the server will still contain the contents of the rest and it will respond with a 200 OK
, telling the browser that itโs basically getting a new file. Hereโs the missing piece of the puzzle.
function conditional() {
return function conditional(ctx, next) {
return next().then(() => {
if (ctx.fresh) {
ctx.status = 304;
ctx.body = null;
}
});
}
}
Alright, whatโs going on here? ctx.fresh
is a Koa.js-specific property, but itโs basically checking if the old and new ETags match. This middleware, koa-conditonal-get
, uses this property to determine how to proceed. If they do match, the information the browser has is fresh
and we donโt need to send anything new along, so weโll set the status to that nifty little 304 Not Modified
and delete the response body, eliminating the need to send who knows how much gobbledy-gook across the wire. If the ETags donโt meet up to Will Smithโs standards, weโll need to send along the new content.
Make it real!
If you work on one of the 30โ90% of websites that donโt implement browser caching, thereโs no better time than now to fix it! This is a stupid simple implementation of ETags that may not cut the mustard for a lot of systems, but itโs a great start. Itโs not just for websites, either! APIs can leverage ETags to make developerโs lives easier, too. If you have arguments about scalability, Iโll just go ahead and point out to you, Mr. or Mrs. Smarty Pants, that Github implements ETags in their API, so there. ๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐๐