dmorgo has asked for the wisdom of the Perl Monks concerning the following question:

Here's a slightly improved heredoc idiom. Not my invention, but I forget where I got it. Probably from here on perlmonks. Shown here in the context of a subroutine, but it doesn't have to be in that context:
sub get_content { my $name = shift; my $content; ($content = <<" END") =~ s/^\s{1,4}//gm; <p>$name</p> <ul> <li>foo</li> <li>bar</li> </ul> END return $content; } print get_content("stuff");
I like this heredoc idiom because it keeps things neat, both with the nice indenting in the Perl code, and also, when it's generating HTML/XML/FooML, it lets me keep the generated text neat as well, since it doesn't remove all leading whitespace, just the perl-code-level indentation.

But what I don't like is that it makes the code brittle in the face of possible changes to the indentation at the beginning. The regular expression part is not the problem. The problem is the match on "    END", which is set in this example to start with four literal spaces.

If someone comes along with an editor that converts four spaces into tabs, the code silently breaks. Not good. Or if someone changes the indentation to two spaces instead of four, it breaks as well.

Is there any better way to specify the "    END" so it will match in a more robust way?

By the way, I do realize that there are other ways to generate HTML (CGI, templates, etc.) but my question is not about how to generate HTML. The HTML shown is just a contrived example. The question is really about better ways to do heredocs.

Update: see the new technique below achieving the same ends (and even better, thanks to Jenda's idea to make the indent size generic) using qq().

Replies are listed 'Best First'.
Re: Better heredocs?
by kyle (Abbot) on Oct 30, 2007 at 20:02 UTC

    I've seen this idiom before and really dislike it for exactly the reasons you describe. My only suggestion, if you want to have a sort of "indented delimiter" is to convert your delimiter spaces to underscores. Using your example, you'd have:

    ($content = <<"____END") =~ s/^\s{1,4}//gm; <p>$name</p> <ul> <li>foo</li> <li>bar</li> </ul> ____END return $content;

    This is resistant to the tab/space problem. On the other hand, it's not quite as clean, and you still have to adjust the regex when the indention changes.

Re: Better heredocs?
by doom (Deacon) on Oct 30, 2007 at 20:52 UTC
    You might've picked up on this from The Perl Cookbook. It's recipie 1.16 of the second edition, and there's a few other embellishments there in the discussion section that you might be interested in.

    Myself, I recommend just giving up. This a piece of fugliness in perl syntax, and we just need to live with the pain (until perl 6 arrives).

    It looks like you're already handling the problem the way I do: I put all my heredocs in wrapper subs that do nothing but return the string... and all of those subs get segregated at the bottom of the file under a comment that says ### heredoc ghetto.

Re: Better heredocs?
by Jenda (Abbot) on Oct 31, 2007 at 14:16 UTC

    I don't really mind the end marker doesn't match up with the code that follows, I'd rather see where does the string literal end and normal Perl code starts. What I would definitely not like to have to do is to update the s/^\s{...}//gm if I indent the code some more. Plus I'd generally like to indent the text one level more than the assignment. Which means that if I did care enough I'd probably use something as:

    sub foo { if (1) { unIndent(my $content= <<'*END*'); foo bar baz <html> <body> </body> </html> *END* print $content; } } foo(); sub unIndent { my ($indent) = ($_[0] =~ /^([ \t]+)/); $_[0] =~ s/^$indent//gm; }
Re: Better heredocs?
by dmorgo (Pilgrim) on Nov 02, 2007 at 15:57 UTC
    Since writing this, I've figured out a way to do indented qq() code:
    sub get_content { my $n = shift; my $content = qq( <html> <foo> <bar>blah</bar> </foo> </html>); $content=~s{(?:^[\n\r]\s{4}|[\n\r]+\s{4})(.*)}{$1\n}gm; return $content; }
    But then I looked at Jenda's offering, and decided to marry the two approaches:
    sub get_content { my $n = shift; my $content = qq( <html> <foo> <bar>blah</bar> </foo> </html>); unIndent($content); return $content; } sub unIndent { $_[0] =~ s/^[\n\r]*//; my ($indent) = ($_[0] =~ /^([ \t]+)/); $_[0] =~ s/^$indent//gm; }
    Works great! Keep polishing that rice bowl.