So I've been doing quite a bit of web development recently, and several of my HTML files use resources from CDNs, like jQuery or normalize.css. While I'm developing, I refresh pages quite often, and also usually use my browser's development tools to disable caching. This means that I hit the CDNs quite often, and aside from the speed and bandwidth usage, one of the CDNs actually started rate limiting me... oops. In other projects, I'd usually just pull those resources onto my local server, keep them there, and be done with it. But the stuff I'm currently working on is for distribution, so I'd like to keep the CDN URLs in the HTML, and not have to rewrite them by hand.
Enter git filters: Documented in the Git Book Chapter 8.2 and in gitattributes, they provide a way to pipe source code files through an external program on checkout, as well as on diff, checkin, and so on. This allows you to have one version of a file checked into the repository, but to use a filter to make changes to the files that actually end up in the working copy on your machine. These changes are also reversed by the filter and not checked back into the repository on checkin, and don't show up in any commands like git diff, git status, etc.
So in this case, the files I want to have in the repository will have lines that look something like this:
<link rel="stylesheet" href="https://example.com/example.css" /> <script src="https://example.com/example.js"></script>
but when I check these files out into my local working copy, they should get rewritten into something like this:
<link rel="stylesheet" href="_cache/example.css" /> <script src="_cache/example.js"></script>
Of course, Perl to the rescue!
There are two git filters: "smudge", which is applied when the files are checked out, and "clean", which is applied when the files are staged / checked in, or the working copy is compared against the repository, etc. Each filter script takes its input on STDIN and provides the filtered output on STDOUT. The filters are set up by setting git config filter.filtername.smudge and git config filter.filtername.clean to the scripts to be executed. Since the filter script tends to be local to your system and there might be arguments specific to the repository, it's probably best to use git config --local so these are stored on a per-repository basis in .git/config. Then, you set up a .gitattributes file which specifies a line "pattern filter=filtername", e.g. "*.html filter=myfilter". (There are also more advanced ways to implement and configure filters, which is described in the docs I linked to above.)
So now, I can implement my filter with a regular Perl while (<>) { print; } loop, applying whatever transformations I like. My "smudge" filter is the one that does the heavy lifting, looking for HTML tags like the above that are prefixed with <!--cacheable-->, fetches those URLs into the local cache directory if they haven't been fetched before, and rewrites the tags to point to the local resources. It also records the original URL in a comment so that all the "clean" filter has to do is put that original URL back into the tag. And there we go, problem solved!
You can find my code on Bitbucket as "htmlrescache". I've implemented the "clean" and "smudge" filters in one script, and also implemented an "init" command that sets up the git configuration I described above.
For one real-world example, see this HTML file, which contains the line:
<!--cacheable--><link rel="stylesheet" href="https://cdnjs.cloudflare. +com/ajax/libs/normalize/8.0.0/normalize.min.css" integrity="sha256-oS +rCnRYXvHG31SBifqP2PM1uje7SJUyX0nTwO2RJV54=" crossorigin="anonymous" / +>
However, checked out on my local machine, that line shows up as:
<!-- CACHED FROM "https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0 +.0/normalize.min.css" --><link rel="stylesheet" href="_cache/normaliz +e.min.css" integrity="sha256-oSrCnRYXvHG31SBifqP2PM1uje7SJUyX0nTwO2RJ +V54=" crossorigin="anonymous" />
And yet:
$ git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
(I also have an older git filter lying around somewhere that implements a couple of SVN keywords like $Id$ or $Date$ for git, but that's not really ready for publication at the moment. If someone is interested I could maybe find some time to prepare it.)
In reply to Using a git filter to Cache HTML Resources Locally by haukex
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |