comment on

Have you ever found yourself working on code with a lot of here-documents and wished you could read the here-documents outside the context of the source file?

I'm working on a CGI application with a lot of here-docs, and this little chunk of code is proving itself useful. The output includes tags with the subroutine name (if applicable, and line number) in which the here-document was found, and BEGIN:line, END:line tags (planning ahead for other uses of the output).

Running the code without a filename argument dups the DATA filehandle and runs the program against the test input in the __END__ section. The output from the test input:

[here-docs in source file: <DATA>]


@@sub:bar:1@@

@@BEGIN:2@@
print <<'FOO';
blah
FOO
@@END:4@@


@@sub:ralph:7@@

@@BEGIN:7@@
sub ralph { print <<EOF }
This is a test!
Here's some more text.
EOF
@@END:10@@

@@BEGIN:13@@
$foo = <<STRING;
blah blah blah blah.
how's that?
STRING
@@END:16@@
[download]

Follow the Read More link for the code:

#!/usr/bin/perl

# usage: grep.html-here-docs filename
# usage: grep.html-here-docs
# (without filename argument, runs
# against test data in the __END__ section)

# grep out all here-documents:
# prints the source filename at the top of the output.
# prints the last subroutine definition name seen
# for each here-document.

use warnings;
use strict;

use re 'eval';

my $R_SP     = qq{[\x20\t]};
my $R_QUOTE  = qq{[\'\"]};

my $TERM = '';
my $p = qr!
    ^                       # at beginning of line
        [^\x23]*?           # match one or more non-comment chars
        (?:
            print ${R_SP}*          # match print, one or more space
            |                       # OR
            \w+ ${R_SP}* = ${R_SP}* # assignment
        )
        << ${R_SP}*         # begin here-doc, zero or more spaces
        (${R_QUOTE}?)       # an optional quote character
        (\w+)               # match/capture one or more word character
+s
        (?(1) \1 )          # if a quote was matched, look for another
+.
        (?(2) (?{$TERM = $2}) )
!x;

my @sub;
my $new_here_doc = 1;

if (@ARGV) {
    open FH, $ARGV[0] or die "error opening input: $!";
} else {
    open FH, "<&DATA" or die "error duping DATA!: $!";
}

while (<FH>) {
    if (1 == $.) { print "[here-docs in source file: @{[@ARGV ? $ARGV[
+0] : '<DATA>']}]\n\n" }
    if (/^[ \t]*sub[ \t]*(\w+)/) {
        @sub = ($1,$.);
    }
    if (my $s = /$p/../^${TERM}/) {
        if ((1 == $s) && @sub) {
            print "\n\@\@sub:$sub[0]:$sub[1]\@\@\n\n";
            @sub = ();
        }
        if ($new_here_doc) {
            print "\@\@BEGIN:$.\@\@\n";
            $new_here_doc = 0;
        }
        print;
        if ($s =~ /E0$/) {
            print "\@\@END:$.\@\@\n\n";
            $TERM = '';
            $new_here_doc = 1;
        }
    }
}

close FH;

__END__
sub bar {
print <<'FOO';
blah
FOO
}

sub ralph { print <<EOF }
This is a test!
Here's some more text.
EOF

# try matching an assignment:
$foo = <<STRING;
blah blah blah blah.
how's that?
STRING
[download]

Update: Fixed thinko in error msg when dup DATA filehandle fails

In reply to Here-document extraction by converter

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.