Background
For the uninitiated, the term "asset" refers to a resource referenced by a web page, such as an image, a script, or a stylesheet. Since assets change relatively infrequently, it's common to serve them with a cache expiration date far in the future, so browsers will store and reuse them rather than request them repeatedly.¹For this to work, however, you must be able to "bust the cache" and force browsers to pick up the latest version whenever you modify an asset. Browser caches are keyed on URL, so modified assets need new URLs. Rails achieves this by including each asset's last-modified timestamp in its query string. For example,
image_tag("rails.png") produces:<img src="/images/rails.png?1230601161" ...>
What's Wrong with Cache-Busting Query Strings
So where's the problem? It's rooted in the fact that common web servers ignore query strings when serving static files.² Web servers don't know to check that an asset's last-modified timestamp matches the request's query string, and even if they did, they couldn't respond successfully in the event of a mismatch, since they only have one version of each asset at any given time.Asset mix-ups come in two varieties:
- A new asset is served in response to a request for an old asset. This can happen, for example, if a new version of an asset is deployed to an asset server before it's deployed to all Rails servers. It can even happen on a single Rails server that serves its own assets due to the delay between when it formats an asset URI and when it receives and handles the request for that asset. Getting the wrong version of an asset can seriously break a page. Fortunately, reloading the page a short time later should fix this kind of mix-up, since the new page will no longer include any old asset URIs.
- An old asset is served in response to a request for a new asset. If you have multiple servers, either for redundancy or to offload the responsibility of serving assets, there will be a brief period during deployment when your servers' assets are out of sync. If one Rails server gets a new version of an asset before all asset servers, it will begin formatting URIs for that asset with the new timestamp, and a request for that new asset might be handled by a server that has the old version. This is worse than case 1 because a simple page reload doesn't fix it. Unless the affected user clears his browser cache, he will continue to see a broken page until the asset expires, is modified again, or becomes obsolete. It gets worse: if your assets are publicly cacheable (a best practice), proxies can further compound the problem, repeating one bad asset response to many users, poisoning all of their browser caches.³
Properties of a Correct Asset Caching Strategy
There are many ways to serve assets, from Apache to CDNs, and all of them can work. For correctness, just be sure that your strategy has these properties:- Either:
- Asset servers can serve multiple versions of the same asset, or
- You manually version your assets (i.e. you rename every time you edit).
- Asset servers can serve multiple versions of the same asset, or
- New assets are fully deployed to asset servers (alongside old assets) before any server begins serving pages that reference them. This implies a two-step deployment process: assets first, then code.
- Asset servers keep old assets until no servers are still serving pages that reference them and any cacheable pages that reference them have expired.
Properties of an Optimal Asset Caching Strategy
When aiming for correctness, it's easy to sacrifice optimality. For example, the Rails AssetTagHelper doc provides two alternatives to query string timestamps (involvingRAILS_ASSET_ID and RELEASE_NUMBER), neither of which you'd ever want to use if you deploy more than once a month. The condition for optimality is simple: - An asset's URI should change only when its content changes.
Wealthfront's Asset Caching Strategy
So how do we satisfy all of these constraints at Wealthfront? Glad you asked! It's a four-pronged approach:- Renaming assets to include their version
- Coercing Rails to format asset URIs our way
- Fixing stylesheets before deployment
- Guaranteeing asset availability
Renaming assets to include their version
We add timestamps to our asset filenames just before packaging them up for deployment to our asset servers. This way we can serve them using any dumb file server, and multiple versions of the same asset can sit side-by-side in the file system. Here's our dead-simple renaming script:#!/bin/sh for f in $(find public -type f) do ts=$(ls -o --time-style=+%s $f | cut -d' ' -f5) mv $f $(echo -n $f | sed "s/[a-z0-9]*$/$ts.\0/") done
base.css → base.1292435738.css). Note that this timestamp format is Unix time (seconds since the epoch), precisely what Rails uses. It's agnostic to the system time-zone—one less thing to worry about. Coercing Rails to format assets URIs our way
We'd love to specify our URI format usingconfig.action_controller.asset_path_template in production.rb. Unfortunately, it doesn't have access to the asset timestamp, so we monkey patch. Here's our config/initializers/assets.rb: require 'action_view/helpers/asset_tag_helper'
module ActionView
module Helpers
module AssetTagHelper
def rewrite_asset_path(source, path = nil)
asset_id = rails_asset_id(source)
if asset_id.blank?
source
else # foo.png -> foo.1252928347.png
source.sub(/[a-z0-9]*$/, asset_id + '.\0')
end
end
end
end
end if Rails.env.production?Fixing stylesheets before deployment
In a standard Rails setup, background images in stylesheets are requested without versioned URIs, so for correctness, you must 1) never edit images referenced by stylesheets and 2) keep them around for at least one release after they're no longer referenced. Or you can do what we do: version the background image URIs in your stylesheets before you deploy them. Here's our script that inserts timestamps into the filenames of the background image URIs in our stylesheets. It writes the resulting files to some other directory ($1), the command-line argument: #!/bin/bash
mkdir -p $1
for f in $(ls public/stylesheets)
do
awk '
BEGIN { OFS="" }
/^(.*)url\(([^\)]*)(.*)/ {
split($0, a, /(url\(|\))/);
cmd = "ls -o --time-style=+%s public" a[2];
cmd | getline ls_out;
close(cmd)
split(ls_out, file_info)
cmd = "echo -n " a[2] " | sed -e \"s/[a-z0-9]*$/" file_info[5] ".\\0/\""
cmd | getline file_name
close(cmd)
print a[1], "url(", file_name, ")", a[3];
next
}
{ print $0 }' public/stylesheets/$f > $1/$f
# stylesheet needs to get the max timestamp of itself and all referenced images
ts1=$(grep -o '\b1[0-9]\{9\}\b' $1/$f | sort -r | head -1)
ts2=$(ls -o --time-style=+%s public/stylesheets/$f | cut -d' ' -f5)
ts=$(echo -e "$ts1\n$ts2" | sort -r | head -1)
touch -d @$ts $1/$f
doneGuaranteeing asset availability
This last part's simple. To satisfy correctness conditions 1 & 2, we push new assets to our asset servers before pushing code that references them, and we keep old versions of assets around for a while. Partly to help achieve these goals, we've outsourced the responsibility of serving our assets from our Rails servers to a couple of Nginx servers.⁵ We push new assets to them without downtime and without deleting old assets. Never deleting old assets from asset servers is a perfectly acceptable policy. To free disk space, we installed a cron job on our asset servers that occasionally looks for assets with multiple versions and deletes all but the latest three.Parting Tips
If you decide to try our approach, be sure that:- Your Rails server(s) end up with non-versioned assets (no timestamps in their filenames).
- Your asset server(s) end up with the versioned assets (timestamps in their filenames).
Note: If your Rails servers are your asset servers, it's okay for the versioned and non-versioned assets to sit side-by-side in the same directory tree.
- If you use the Rails
:cache => '...'option to concatenate stylesheets or scripts, be sure to generate the concatenated files and give them the max timestamp of their constituent source files when preparing a release for deployment.
Additional Resources
While you've got assets on the brain, check out:- asset_fingerprint: a gem that renames assets as we've suggested and also provides you the option of using digests instead timestamps. Just be sure to ignore Eliot's two suggestions for getting your asset server to respond correctly to versioned asset requests. You know better.
- jammit: a gem that does just about everything else you might want to do with assets: pre-packaging, minification, compression, image embedding, font embedding, and more.
Footnotes
- Google has published a thorough caching guide.
- Ironic, isn't it? The fact that web servers ignore query strings when serving static files is precisely what motivated Rails's default cache-busting strategy.
- Steve Souders, among others, has reported that some old proxies incorrectly discard query strings entirely when storing and reading from their caches. This dramatically increases the likelihood of Rails asset mix-ups for the unlucky users behind them.
- Preserving asset timestamps doesn't require the system clocks of all asset servers to be synchronized, as the Rails AssetTagHelper doc states.
cp -p,touch -r, and Subversion'suse-commit-timesmay come in handy.
- We plan to move our assets to Amazon's Cloudfront soon, now that it supports SSL. We also plan to reintroduce multiple asset hosts and put our assets on a cookieless domain. So get off our back.