Perl Newby has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file that contains the following:
CAUGHT STEALING 5602|Edgar Renteria|24|St. Louis Cardinals|1|3 PITCHING 24|St. Louis Cardinals 4379|Andy Benes|L, 2-2|0|6|8|3|3|0|0|1|24|31|41|72|4.41|.258 4625|Heathcliff Slocumb|0|0|1|0|0|0|0|0|0|4|3|8|11|4.08|.231 4979|Mike Mohler|0|0|1|0|0|0|1|3|0|4|8|9|17|8.79|.293 17|Cincinnati Reds 5346|Ron Villone|W, 3-1|0|6.2|5|2|2|3|4|2|28|42|59|101|4.73|.301 6174|Scott Williamson|SV, 2|0|2.1|2|0|0|0|3|0|9|10|25|35|2.53|.211 DOUBLE PLAYS 24|St. Louis Cardinals|2|Vina to Renteria to McGwire|Vina to Renteria +to McGwire
I am trying to parse and print out the PITCHING information to an HTML format. I can get the Cardinals info to parse out, however, I am unable to get the Reds as well. Any suggestion? I am using the following code:
open(INPUT,"c:/MLB_boxscore.TXT") or die "Can't open file"; print "<html><head><title>My page</title></head><body>"; print "<table>"; $pitching = 0; while(<INPUT>) { if($pitching) { last if /^\s*$/; chomp; @LS = (); push @LS, split('\|',$_); print "<tr>"; print "<td> @LS[1] </td>"; print "<td> @LS[2] </td>"; print "<td> @LS[3] </td>"; print "<td> @LS[4] </td>"; print "<td> @LS[5] </td>"; print "<td> @LS[6] </td>"; print "<td> @LS[7] </td>"; print "<td> @LS[8] </td>"; print "<td> @LS[9] </td>"; print "<td> @LS[10] </td>"; print "<td> @LS[11] </td>"; print "<td> @LS[12] </td>"; print "<td> @LS[13] </td>"; print "<td> @LS[14] </td>"; print "<td> @LS[15] </td>"; print "<td> @LS[16] </td>"; print "</tr>"; } elsif(/^PITCHING/) { $pitching = 1} } print "</table></body></html>"; close INPUT;

Replies are listed 'Best First'.
Re: Parsing a Text File
by suaveant (Parson) on Apr 11, 2001 at 23:27 UTC
    Well, Newby... you've been throwing this one around a lot :)

    its obvious you want a bunch of this data... you could do something like this.

    my %data; my $latest = ''; while(<INPUT>) { chomp; if(/^([A-Z]+)\s*/) { #a line with all caps... new section $latest = $1; #what section are we in next; #go to next line } if($latest) { ## Next line handles $data{$latest} as an anonymous ## array, allowing you to store all the lines for ## each section that way push @{$data{$latest}}, $_; #just puts the line in, you could proc +ess it somehow } }
    Now $data{PITCHING} holds a ref to an array of all the lines in the PITCHING section... etc, etc... that help?
                    - Ant
      if(/^([A-Z]+)\s*/)
      The final  \s* is meaningless here; it will always be true.

      Also, based on the data set, he may want to use whitespace in his hash keys. How about this?

      if (/^([A-Z][A-Z\s]*)/)
      I decided to make sure that the first character is uppercase, but allow whitespace after that.

      buckaduck

        Opps, I meant...
        if(/^([A-Z]+)\s*$/)
        To prevent a line starting with a capitol letter but having other stuff... but to allow a whitespace after the marker, just in case.
                        - Ant
(jeffa) Re: Parsing a Text File
by jeffa (Bishop) on Apr 11, 2001 at 23:17 UTC
    The reason is because when this line:
    last if /^\s*$/;
    parses the blank line after the Cardinals, it stops. So change that line to:
    last if /^END PITCHING/;
    and add 'END PITCHING' on a line by itself after the last line for the Reds.

    Also, you can get rid of all 16 TD prints with this one line:

    print "<td>$_</td>" foreach @LS;
    UPDATE: Oops, you don't want to print the first element, so use this:
    print "<td>" . $LS[$_] . "</td>" for (1..$#LS);
    and good advice suaveant!

    Jeff

    R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
    L-L--L-L--L-L--L-L--L-L--L-L--L-L--
    
      Another way without changing the text might be
      last if /^[A-Z]+$/;
      which will make it stop as soon as it hits a line that only has capitol letters in it.
                      - Ant
        Thanks for the help. I changed my code up and it is doing what I had intended, however, it is picking up the first line of DOUBLE PLAYS. Any suggestions would be appreaciated.
        open(INPUT,"c:/MLB_boxscore.TXT") or die "Can't open file"; print "<html><head><title>My page</title></head><body>"; print "<table>"; $pitching = 0; while(<INPUT>) { if($pitching) { last if /^[A-Z]+$/; chomp; @LS = (); push @LS, split('\|',$_); print "<tr>"; print "<td>" . $LS[$_] . "</td>" for (1..$#LS); print "</tr>"; } elsif(/^PITCHING/) { $pitching = 1} } print "</table></body></html>"; close INPUT;