This is a long read, and may or may not be interesting. But I figured I'd write it up anyway since someone else may find it interesting or informative.
4 1/2 years ago I bought the domain JimAndKoka.com for my wife as a birthday present. I then proceeded to let it rot for another 6 months before I finally did anything with it.
I had "homepages" set up back when I was in college, because that was the hip thing to do back then. Alas, I was also a putz in college and my programming exposure was pascal with a little C++. I had minimal graphics skills, nominal javascript, and a passable amount of HTML. In short, my pages sucked. There was a fair bit of content (a lot of it cobbled together off of the rest of the internet), but managing it was a bitch.
This time, this time would be different. I was a professional perl programmer, I'd built all sorts of big, interactive websites and apps, I knew databases. I could build an awesome, dynamic website with easy to manage and update content. It'd rock. Chicks would flock to me if the domain didn't make it blatantly obvious I was married.
Naturally, there are some technologies needed for an interactive website. I already had the nascent components of Basset formed, so of course I was going to use that. That was the M part of my MVC website, now I just needed the other two. For the controller layer, I just used simple CGI scripts, with the acknowledgement that I'd eventually move to something better (I finally did last year). For the view layer, I needed a templating system. I originally looked at and used HTML::Template, but it didn't work quite the way I wanted. I tried to go with HTML::Mason, but it looked like a big pain in the ass to install and run in a non-mod_perl environment. Besides, I'd gotten burned a few times by the volatile API back then, so I was wary.
Instead, in a fit of hubris and laziness, I decided to write my own system, which is what turned into Basset::Template. Right now it's fast and robust and extremely useable. This is that story, as best as I can remember it. I consider it a success story, but I wanted to re-tell it to give the budding template engineers out there an idea of what they're getting in to.
The running joke in web circles is that sooner or later everybody writes their own templating system. The progression is normally along these lines:
If you're lucky, you'll end up with something like Template::Toolkit at the end. If you're not, you'll end up with one of the myriad homebrew solutions littering so many websites. The ones where the dev staff grits their teeth when they talk about it and swear that they'll get rid of it someday, if they can, but rarely do because the cost is so high.
Instead of that route to madness, I took a different approach and simply opted to embed perl into my templates. I really liked Mason's syntax, but had already opted not to go with it. HTML::Template and Text::Template but they rubbed me the wrong way. Mason made sense, so I wanted something like that.
I initially waffled about it, since I didn't really want to go to the hassle of building a templating system and building a parser and all that other crap that goes into it. So I stuck with HTML::Template for a few months before I had my great flash of inspiration.
For example, Template: <table> %% foreach my $row (@rows) { <tr> %% foreach my $cell (@$row) { <td><% $cell %></td> %% } </tr> %% } </table> CGI: print "<table>\n"; foreach my $row (@rows) { print "\t\t<tr>\n"; foreach my $cell (@$row) { print "\t\t\t<td>$cell</td>"; } print "\t\t</tr>\n"; } print "</table>\n";
So all you needed to do was reverse the quotes. Easy!
Well, not easy. There were a few additional caveats I had off the bat. Such as, some snippets of code needed to output a value, some just needed to execute code (such as a loop directive).
Okay, that was easy. I added two sets of syntax to it, <% $value %> (standard ASP or Mason or etc variable embed tags) and % at the start of a line to indicate something that needed to be executed, but had no output (similar to Mason's % at the start of a line, except mine allowed leading whitespace and Mason's didn't (at least not at the time, I hadn't kept up on it)). So all I needed to do was add an additional pre-processor to turn the <% % > tags into % tags to redirect to STDOUT.
That was keen, but it immediately introduced another issue - the template always output to STDOUT, I couldn't capture the output and spit it out somewhere. Yes, I could've redirected STDOUT to something and capture it, but that would've been a nuisance. I also wasn't sure if it would be error prone, maybe with threads or other output or somethingI also sure didn't want to require someone else to do the capture and redirect, so I needed a different approach.
I opted for a different filehandle, which I just called OUT. I'd then capture and redirect its output as desired.
Next issue I had was how to get variables into it. My templates needed data to display, after all. But...what's "into" the template? The template doesn't exist as executable code, so where should it live?
I didn't want to pull it into the calling package's namespace, because I wasn't sure if the variables would stomp on each other. In fact, that quickly expanded to not wanting to pull it into any calling package, for fear of clobbering something.
So I built up a complicated internal method to generate what I assumed to be a reasonably safe namespace for the template. It's multiple namespaces deep in the symbol table and uses the name of the file to construct it, with the hopes of having something unique that other templates couldn't clobber, or the user having managed to use himself.
And while I was at it, I used a similar approach to generate an arbitrary scalar value to toss the template data into. Now I didn't need to worry about the filehandle at all.
It was then just a simple matter of writing an importer to import passed variables into the new namespace of the template. All said, I think I reached this point in development after around a week or so. It was pretty powerful and fast enough, and only took a week. That's a quick ROI, and no doubt one of the reasons that homegrown systems are so popular. Why spend a week learning something when you can spend a week writing your own?
Next, just reverse the quotes. Go through and change all <% %> quoting to %% quoting, then flipflop all of those so the code was unquoted and the output was. It's fairly easy. That then gives you a big string with a program in it, living in a particular namespace. Just eval it and look at the magic scalar for the output and you're done.
Except it wasn't. There are all sorts of oddball little edge cases that pop up.
Yeah, yeah, I could've just made demands on the code to require you to add semi-colons or only have the value you're going to output in the code, but I didn't like that. So I added in code to add semicolons and allow additional text before my return value.
The initial pass broke horribly because I forgot that there were cases where you didn't want to add a semicolon (such as %% foreach my $val (@array) {), so I wrote up a more advanced parser to deal with those cases.
I then ran it as is for a few months.
The first stumbling block I ran into was when I wanted to use a sprintf in my code. Whoops. The pre-processor was looking for any % anywhere on the line as the start of a code block, so it was goofing up royally. I hastily changed my default code delimiter to be '%%', knowing that it would break on that string (such as a literal % in a sprintf statement), but it was less likely to occur, so an acceptable risk. A few months later I further tweaked it so that %%s weren't an issue at all.
The next problem was that I ran into the (simplified) case <% ";" %> Well, my simple little preprocessor would assume that you wanted to execute the code "; and then output the code ". Obviously, Perl disagreed, so I needed to beef up my parser. Incidentally, that version of the parser is the one that's on the current release, but it only handles the cases of ";" and ';' and ignores q{;}, qq{;}, qw{;} and the like. It's fixed internally, but more on that later.
So it was running fine, but was pretty slow. I needed to speed it up. The big bottleneck was that pre-processing step to flip the quotes. But, I realized, the pre-processing should always be the same (unless the template has changed), so I can just cache it to disk. I implemented a caching scheme to store it on disk and then as the first step of pre-processing, look to see if a valid one already exists. If it does, great - use it. if not, then pre-process and re-cache. Come to think of it, this may have been when I implemented the deep nested packages approach, so I could have something easily written to disk that would always come out of the cache the same way, but this was years ago, so I'm foggy on the details.
I now was caching stuff and running along very fast. Awesome.
But it really sucked to have large code blocks in my templates. Since I'd chosen to use %% to start code and "\n" to end code, I had to prepend large codeblocks with strings of %%s. This sucked. So I added an additional set of tags - <code> and </code> to delimit large blocks of code.
Next I wanted to embed templates. By this point, I had large blocks of common code duplicated all over my website, and I wanted to get rid of that. So I needed to embed them.
I opted for <& &> for my embed syntax and quickly determined that I wanted two different approaches - sometimes it'd be keen to hand in additional variables to the subtemplate, and sometimes it'd be keen to just have it inside the calling template's environment and inherit everything currently in scope. I'm sure I had a really really good reason for it at the time, but I can't quite remember it now.
<& /path/to/other/template.tpl { 'arg1' => \'val1', 'arg2' => \'val2' } &> or <& /path/to/other/template.tpl &>
So that was created with different rules to allow for the two different approaches. It also unearthed a nuisance - passing in variables always had to be done by reference. It had always been there, but hidden down a few layers so I rarely encountered it. The second issue was the key value was a simple string without a sigil. So I couldn't pass $msg and @msg into the same namespace, it wouldn't allow it, I'd have to alias one.
This lead to me allowing simple scalar values to not be passed by reference and allowing an optioanl sigil on the variable key.
It also introduced the problem of needing to know exactly where these templates live. Previously, with just a single template, only its cgi needed to know where it was, and it was usually in the same directory. But now? Templates could talk to templates in other directories? How does it find them? Relative paths? Relative to what? Where does this template exist, anyway?
This lead to the addition of a template root to allow for absolute paths that weren't server dependent (paths from the server's root would've been horrible, since I needed it to work in different directory locations on my machine vs. my host). That way, I could always use absolute paths and have them work, but not worry about portability.
As my templates were getting more complex, around this time I also added in template comments <# comment #> to allow for better documentation. They're better than HTML comments because the processor would strip them out before display to the user.
As a further attempt to optimize, I added in an optional flag to compress whitespace, since it's not necessary in HTML anyway.
Since this was adding in a lot of additional stuff to the template, I added in a simple debugging syntax as well, above and beyond doing %% print STDERR...;, especially since it theoretically decoupled the debugger from STDERR w/o resorting to redirection of the handle. Note that I've never implemented anything like that in practice, it always just goes to STDERR.
It then ran and worked for a while.
I introduced another optimization and allowed embedded templates to have their pre-processed value stored in the pre-processed parent - <<%+ /path/to/template.tpl +&>. This skipped the step of firing up another template object, looking for a pre-cached copy, and returning and executing it. It just all ran inline in the parent as if it was always there. But, it lost the ability to look for changes in the subtemplate. So if you changed the subtemplate, the supertemplate wouldn't automatically reflect. You needed to blow away the cache or flat the supertemplate as changed. Naturally, this is an optimization to be used sparingly - either when speed really really really counts or when the embedded template rarely (if ever) changes.
But there was a lot of stuff that I ended up duplicating a lot. Such as HTML escapes or URL escapes. This was done by exposing the template object in the template to call methods on it. <% $self->html_escape($value) %>. But I didn't like having to put it in all the time, so I finally diverged from my "pure perl" approach by adding in pipes.
The pipe syntax is easy - just end off a variable with a pipe and a pre-defined token (extensible via subclasses) to internally translate into a method call. Instead, I'd now just type <% $value | h %>. Much simpler, but I never really liked the break from being true perl. I justified it to myself figuring that the | h was part of the template quoting not the perl, but never really convinced myself.
Recently (within the last week - not yet released)), I started needing a few additional things. First of all, I finally fixed the bug with the <% q{;} %> (ed- fixed typo) syntax. I spent about a day mulling over it and trying to think about how to fix it with a fancy parser or something, but wasn't convinced I could catch all cases. Instead, I opted to wrap up everything in those tags inside an anonymous subroutine, execute it, and display the results. This naturally takes advantage of perl's functions not requiring the "return" keyword.
update - I took advantage of Jenda's awesome suggestion down below and updated the internal build to use do {} blocks instead of anonymous subs. It loses the ability to type <% return $val %>, but I can live with it, it's a heck of an improvement otherwise.
I also had another problem - and that was that I had no way to generate snippets of text and pass them around in the template. I'd need to do that in the CGI layer with an extra method then hand in the pre-processed data to the template. But this didn't scale well (for example, loops in the template). Besides, I now had my output coming from two different places.
So this called for anoter divergence from perl syntax as I added in an embed redirect - <& /path/to/template.tpl >> $variable &> to instead stuff the output into $variable, which could then be handed around as desired.
This brings us up to current, where my "simple little template package" stands at 1,345 lines (counting documentation) and has been actively worked on for 4 years.
|
---|