Have you ever found yourself working on code with a lot of here-documents and wished you could read the here-documents outside the context of the source file?

I'm working on a CGI application with a lot of here-docs, and this little chunk of code is proving itself useful. The output includes tags with the subroutine name (if applicable, and line number) in which the here-document was found, and BEGIN:line, END:line tags (planning ahead for other uses of the output).

Running the code without a filename argument dups the DATA filehandle and runs the program against the test input in the __END__ section. The output from the test input:

[here-docs in source file: <DATA>] @@sub:bar:1@@ @@BEGIN:2@@ print <<'FOO'; blah FOO @@END:4@@ @@sub:ralph:7@@ @@BEGIN:7@@ sub ralph { print <<EOF } This is a test! Here's some more text. EOF @@END:10@@ @@BEGIN:13@@ $foo = <<STRING; blah blah blah blah. how's that? STRING @@END:16@@

Follow the Read More link for the code:

#!/usr/bin/perl # usage: grep.html-here-docs filename # usage: grep.html-here-docs # (without filename argument, runs # against test data in the __END__ section) # grep out all here-documents: # prints the source filename at the top of the output. # prints the last subroutine definition name seen # for each here-document. use warnings; use strict; use re 'eval'; my $R_SP = qq{[\x20\t]}; my $R_QUOTE = qq{[\'\"]}; my $TERM = ''; my $p = qr! ^ # at beginning of line [^\x23]*? # match one or more non-comment chars (?: print ${R_SP}* # match print, one or more space | # OR \w+ ${R_SP}* = ${R_SP}* # assignment ) << ${R_SP}* # begin here-doc, zero or more spaces (${R_QUOTE}?) # an optional quote character (\w+) # match/capture one or more word character +s (?(1) \1 ) # if a quote was matched, look for another +. (?(2) (?{$TERM = $2}) ) !x; my @sub; my $new_here_doc = 1; if (@ARGV) { open FH, $ARGV[0] or die "error opening input: $!"; } else { open FH, "<&DATA" or die "error duping DATA!: $!"; } while (<FH>) { if (1 == $.) { print "[here-docs in source file: @{[@ARGV ? $ARGV[ +0] : '<DATA>']}]\n\n" } if (/^[ \t]*sub[ \t]*(\w+)/) { @sub = ($1,$.); } if (my $s = /$p/../^${TERM}/) { if ((1 == $s) && @sub) { print "\n\@\@sub:$sub[0]:$sub[1]\@\@\n\n"; @sub = (); } if ($new_here_doc) { print "\@\@BEGIN:$.\@\@\n"; $new_here_doc = 0; } print; if ($s =~ /E0$/) { print "\@\@END:$.\@\@\n\n"; $TERM = ''; $new_here_doc = 1; } } } close FH; __END__ sub bar { print <<'FOO'; blah FOO } sub ralph { print <<EOF } This is a test! Here's some more text. EOF # try matching an assignment: $foo = <<STRING; blah blah blah blah. how's that? STRING

Update: Fixed thinko in error msg when dup DATA filehandle fails


In reply to Here-document extraction by converter

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.