Re^2: Using a git filter to Cache HTML Resources Locally
by Corion (Patriarch) on Oct 11, 2018 at 08:13 UTC
|
Basically you do the same steps, but without involving git:
- Look at all HTML files
- Identify external resources
- Download the external resources
- Rewrite HTML files to use the local resources
You can just invoke the linked htmlrescache program as
htmlrescache clean my/file.html
Just make sure you only operate on a copy, not on the original. | [reply] [d/l] [select] |
|
|
You can just invoke the linked htmlrescache program as
htmlrescache clean my/file.html
Sorry, that's not quite right: git filters are provided the files on STDIN and must output on STDOUT, the filename on the command line is only informative for the script - because filters can be used on files that are being added/deleted, git doesn't make any guarantees that the file even exists. In fact, git filters aren't normally even given the filename, I had to specify %f in the git filter setting. My script just uses the filename to calculate the path of the cache directory relative to the file.
However, your comment served as the inspiration to update the script so that it now supports new options: -i for inplace editing (use -IEXT to specify an extension for the backup file; uses $^I under the hood), and -G to disable the use of git, so paths are resolved relative to the current working directory instead of the git working directory. So thank you for that! :-)
So now you can do:
$ htmlrescache -GI.bak smudge my/file.html # cache HTML resources
$ mv my/file.html.bak my/file.html # restore original file
or
$ htmlrescache -Gi smudge my/file.html # cache HTML resources
$ htmlrescache -Gi clean my/file.html # restore original URLs
| [reply] [d/l] [select] |
Re^2: Using a git filter to Cache HTML Resources Locally
by haukex (Archbishop) on Oct 11, 2018 at 10:24 UTC
|
How do you do this without filters?
I'm not quite sure what you're asking specifically... you can edit the files manually, you can used a hacked solution like this, or you can use a different script like this (the now removed predecessor of htmlrescache). This is how you would use htmlrescache standalone.
| [reply] [d/l] [select] |
|
|
What I'm trying to get at, is that good practice of local caching without rewriting urls, ex <script src="//cdn..../../../"
Or using something like url_for('resource.js') in javascript or perl, to load resources based on configuration/environment
something thats one and done not physical rewrite on each change
| [reply] [d/l] |
|
|
local caching without rewriting urls, ex <script src="//cdn..../../../"
Maybe I'm missing something obvious here, but how would you do this for URLs like the ones I showed, e.g. https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.0/normalize.min.css? Remember I said these files are for public distribution, and I can't rely on people having a local server - it's even possible to open the HTML files from the local disk (file://). Using the public CDN URLs is easy, and doesn't require me to distribute a bunch of extra files along with my own.
something like url_for('resource.js') in javascript or perl, to load resources based on configuration/environment
Sure, that's a possibility - but then that code gets run for everybody. This tool is really only meant to be a development aid, it applies only to the local git working copy, and only when it's configured by the user. The files in the repository are those for distribution, and thanks to the filter they always keep their public CDN URLs.
The only downside of the git filter approach that I can see so far is the small performance penalty on some git operations. So I'm not yet convinced that there is something wrong with this approach.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
The interesting part is not the URL rewriting but the automatic download of the remote URL to a local file.
It wouldn't be hard to try to load both URLs from Javascript, or whatever, but why add additional complexity when you can just rewrite the file?
| [reply] |
|
|
|
|
|
|