As the file is in several distinct sections each with it's own format, first break the file into those sections. As it's also a fairly small file, slurp it and split it on the section separator:

## Slurp the file and break into sections my @sections = split '-{103}', do{ local $/; <> }; close *ARGV;

Once you have the sections separated, you can treat each one differently.

Rather than having to count all the spaces and manually construct the unpack formats for dealing with them--which is a PIA and they also might change--notice that each section of stats is preceeded by a header line, and that the left hand edge of the column headers forms a left edge limit for the data in the columns.

Also notice that although some of the column titles are multiple words, every title is preceeded by at least two spaces, whilst the multi-word titles themselves contain only single spaces.

That information allows you to use the header lines to construct the unpack formats programmically. The following subroutine takes a header line, uses it to discover the column boundaries and then uses those to construct a format:

sub buildFmt { my $templ = shift; my @cols; push @cols, $-[ 0 ] while $templ =~ m[(?<=\s\s)(?=\S)]gc; my $p = 0; my $fmt = ''; $fmt .= 'A' . ( $_ - $p ) . ' ' and $p = $_ for @cols; return $fmt; }

This can be reused for all the columnised (sub)sections in the file.

By way of example, this is how to use it break down the four subsections of the 'PLAYER GAME STATISTICS':

## Section 2 ## Break the section into lines my @section2 = split "\n", $sections[ 1 ]; ## discard header lines; shift @section2 for 1 .. 3; ## Away ## Construct the format from the header my $fmt = buildFmt( shift @section2 ); ## Use it to parse the Away player game stats my @awayStats; push @awayStats, [ unpack $fmt, shift @section2 ] while $section2[ 0 ] + =~ m[\S]; print "@$_" for @awayStats; ## Discard blank lines shift @section2 while $section2[ 0 ] !~ m[\S]; ## Away totals ... same two steps again $fmt = buildFmt( shift @section2 ); my @awayTotals = unpack $fmt, shift @section2; print "@awayTotals"; ## Discard blank lines shift @section2 while $section2[ 0 ] !~ m[\S]; ## home ... and again $fmt = buildFmt( shift @section2 ); ## They could vary. my @homeStats; push @homeStats, [ unpack $fmt, shift @section2 ] while $section2[ 0 ] + =~ m[\S]; print "@$_" for @homeStats; ## Discard blank lines shift @section2 while $section2[ 0 ] !~ m[\S]; ## Home totals ... and again $fmt = buildFmt( shift @section2 ); my @homeTotals = unpack $fmt, shift @section2; print "@homeTotals";

Parsing the other sections (with columnised data), is just a repeat of the above.

The code all together as far as I've taken it:

#! perl -slw use strict; $"=' | '; sub buildFmt { my $templ = shift; my @cols; push @cols, $-[ 0 ] while $templ =~ m[(?<=\s\s)(?=\S)]gc; my $p = 0; my $fmt = ''; $fmt .= 'A' . ( $_ - $p ) . ' ' and $p = $_ for @cols; return $fmt; } ## Slurp the file and break into sections my @sections = split '-{103}', do{ local $/; <> }; close *ARGV; ## Section 1 ## Left as an exercise # Section 2 my @section2 = split "\n", $sections[ 1 ]; shift @section2 for 1 .. 3; ## discard header lines; ## Away my $fmt = buildFmt( shift @section2 ); my @awayStats; push @awayStats, [ unpack $fmt, shift @section2 ] while $section2[ 0 ] + =~ m[\S]; print "@$_" for @awayStats; ## Discard blank lines shift @section2 while $section2[ 0 ] !~ m[\S]; ## Away totals $fmt = buildFmt( shift @section2 ); my @awayTotals = unpack $fmt, shift @section2; print "@awayTotals"; ## Discard blank lines shift @section2 while $section2[ 0 ] !~ m[\S]; # home $fmt = buildFmt( shift @section2 ); ## They could vary. my @homeStats; push @homeStats, [ unpack $fmt, shift @section2 ] while $section2[ 0 ] + =~ m[\S]; print "@$_" for @homeStats; ## Discard blank lines shift @section2 while $section2[ 0 ] !~ m[\S]; ## Home totals $fmt = buildFmt( shift @section2 ); my @homeTotals = unpack $fmt, shift @section2; print "@homeTotals"; ## Section3 ...

The output from section 2 debugging lines:

c:\test>588245 boxscore_340251.txt 4 | Drevitch, Scott | | 1 | +1 | 4 | | | | | | | | | | | +| | 8 | Mann, Chris | | 1 | +1 | 2 | 4 | 2 | | | | | | | | | | + | 9 | Anderson, Erik | 1 | | +1 | 4 | | | | | | | | | | | | + | 11 | Lefebvre, Marc | | | -1 | | | | | | | | | | | | | + | 14 | Scott, Mark | | 1 | 0 | 5 | | | | | | | | | | | | +| 17 | Wray, Scott | | | 0 | 10 | 4 | 2 | | | | | | | | | | + | 18 | Lazarev, Yevgeny | 1 | | 0 | 1 | | | | | | | | | | | + | | 20 | Miller, Derek | | | -2 | 1 | | | | | | | | | | | | + | 21 | Fitzpatrick, Chans | | 1 | 0 | | 2 | 1 | | | | | | | | + | | | 22 | Cullaton, Brent | | 1 | +1 | 4 | | | | | | | | | | | + | | 23 | Kotsopoulos, Tommy | | 1 | -1 | 4 | 5 | | 1 | | | | | | +| | | | 24 | Morelli, David | 1 | | 0 | 2 | | | | | | | | | | | +| | 27 | Littlejohn, Frank | 1 | 1 | +1 | 3 | 6 | 3 | | | | | | 1 | + | | | | 28 | Lyke, R.C. | | | +1 | | | | | | | | | | | | | | 29 | Gajda, Tyson | | | 0 | | | | | | | | | | | | | | 30 | Tebbs, Kris | | | 0 | | | | | | | | | | | | | | 32 | Tidball, Curtis | | | 0 | | | | | | | | | | | | | + | 44 | Lupul, Dale | | | -1 | 5 | | | | | | | | | | | | | TOTALS | 5 | 7 | +1 | 46 | 23 | 9 | 1 | | | | | 1 | | | 1 | +1 | 2 | Lupandin, Andrei | | 3 | +1 | 1 | | | | | | | | | | | + | | 4 | Currie, Brent | | | -1 | | 2 | 1 | | | | | | | | | | + | 5 | Yoder, Jami | | | -1 | | 2 | 1 | | | | | | | | | | | 7 | Radoslovich, Matt | 1 | | -1 | 2 | | | | | | | | | | | + | | 8 | Pilkington, Brett | | 1 | +1 | 4 | | | | | | | | | | | + | | 9 | Granbois, Travis | | | 0 | | | | | | | | | | | | | + | 12 | Nadeau, Patrick | | | -2 | 1 | | | | | | | | | | | +| | 13 | Parsons, Don | 2 | 1 | +2 | 4 | | | | | | | | 1 | | | +| | 14 | Starke, Sean | | | 0 | 3 | | | | | | | | | | | | +| 17 | Stewart, Blake | | 1 | -2 | 1 | | | | | | | | | | | +| | 18 | Chwedoruk, Justin | | | 0 | 3 | | | | | | | | | | | + | | 19 | Woollard, Chad | 1 | 1 | +2 | 3 | 2 | 1 | | | | | | | | +| | | 20 | Durdin, Sergei | | | -2 | 2 | | | | | | | | | | | | + | 24 | Harloff, Nick | | | 0 | 3 | 4 | 2 | | | | | | | | | +| | 25 | Stauffacher, Luke | | | 0 | 4 | 7 | 1 | 1 | | | | | | | + | | | 27 | Wathier, Mathieu | | | +1 | 3 | | | | | | | | | | | + | | 41 | Sikich, Zach | | 1 | 0 | | | | | | | | | | | | | +| 83 | Tapp, Jason | | | 0 | | | | | | | | | | | | | | TOTALS | 4 | 8 | -2 | 34 | 17 | 6 | 1 | | | | | 1 | | | | +|

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: parsing question by BrowserUk
in thread parsing question by smist

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.