reaper9187 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have a text file which contains a lot of information, most of it unimportant. However, there is a block of text that I need to parse which contains information in a tabular format as shown below.
INFO START STIME ETIME COLUMN3 COLUMN4 COLUMN5 aaaa1 bbb1 ccc1 ddd1 eee1 aaaa2 bbb2 ccc2 ddd2 eee2 aaaa3 bbb3 ccc3 ddd3 eee3 aaaa4 bbb4 ccc4 ddd4 eee4 END
The sample output should be a table(hash maybe ?) that maps each element to its corresponding row as follows:
aaaa1:bbb1 ccc1 aaaa1:bbb1 ddd1 aaaa1:bbb1 eee1 aaaa2:bbb2 ccc2 aaaa2:bbb2 ddd2 aaaa2:bbb2 eee2 aaaa2:bbb3 ccc3 aaaa2:bbb3 ddd3 aaaa2:bbb3 eee3

Replies are listed 'Best First'.
Re: Extract table from a block of text
by choroba (Cardinal) on Sep 21, 2014 at 07:21 UTC
    The flip-flop operator can tell you whether you're between the given lines. No need to hash anything as the output depends on the current line only.
    #!/usr/bin/perl use warnings; use strict; while (<DATA>) { if (my $line = /^INFO START$/ .. /^END$/) { next if /^$/ # Skip empty lines. or $line =~ /E/ # Skip the END line. or 1 == $line # Skip the START line. or /^STIME/; # Skip the header. my ($stime, $etime, @cols) = split; print "$stime:$etime\t$_\n" for @cols; print "\n"; } } __DATA__ ... ignore ... INFO START STIME ETIME COLUMN3 COLUMN4 COLUMN5 aaaa1 bbb1 ccc1 ddd1 eee1 aaaa2 bbb2 ccc2 ddd2 eee2 aaaa3 bbb3 ccc3 ddd3 eee3 aaaa4 bbb4 ccc4 ddd4 eee4 END ... ignore again ...
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Hi Choroba,

      As a side note:

      Instead of parsing the sequence number $line you could apply the technique described in Re^4: grep trouble (body of flip-flop range) to skip the boundaries of the flip and the flop. :)

      Cheers Rolf

      (addicted to the Perl Programming Language and ☆☆☆☆ :)

      UPDATE

      in hindsight it's a bad idea to use something like:

      #!/usr/bin/perl use warnings; use strict; while (<DATA>) { if (/^INFO START$/ .. /^END$/ and not //) { print "$_"; } } __DATA__ ... ignore ... INFO START STIME ETIME COLUMN3 COLUMN4 COLUMN5 aaaa1 bbb1 ccc1 ddd1 eee1 aaaa2 bbb2 ccc2 ddd2 eee2 aaaa3 bbb3 ccc3 ddd3 eee3 aaaa4 bbb4 ccc4 ddd4 eee4 END ... ignore again ...

      While it does only print the inner range ...

      STIME ETIME COLUMN3 COLUMN4 COLUMN5 aaaa1 bbb1 ccc1 ddd1 eee1 aaaa2 bbb2 ccc2 ddd2 eee2 aaaa3 bbb3 ccc3 ddd3 eee3 aaaa4 bbb4 ccc4 ddd4 eee4
      ... it's vulnerable to mess up the empty match // (i.e. match again the last successfully matched regular expression) by any other regex happening within the if-branch. :-/

      The usual trap of global dependencies!

      This works perfectly with the dummy data provided in the original post, but the regex to skip the END line might be a bit dangerous because real data might contain a 'E'. In addition, if the file is large, it might be better to do a last, rather than a next, when the line with the END tag is met.
        > but the regex to skip the END line might be a bit dangerous because real data might contain a E

        That's a misunderstanding, $line holds a sequence number which comes only in exponential notation (like 7E0) iff the flip-flop terminates.

        Has nothing to do with the END marker! :)

        Cheers Rolf

        (addicted to the Perl Programming Language and ☆☆☆☆ :)