smgfc has asked for the wisdom of the Perl Monks concerning the following question:

Ok, so I was doing this geometery take home test and neatness counts, and my handwriting isnt bad, but I decided to do the proofs on the computer and have perl spit out some nicely formatted text. The problem is I can't think of an intelligent way of dealing with the header/imfomation (Title, Given, Proof statments). Right now it just spits that out, and even the way I did that seems ugly, so suggestions would be amazing. The format of the file to be parsed is:
Title G: #givens Given 1 Given 2 etc. P: proof statement Pr: #start of proof statement 1 => reason 1 etc.
here is the code
#!/usr/bin/perl -w use strict; my ($file, $tr, @s, @r, $len, $lent, $pad, $i, $line); $file="untitled:Desktop Folder:test2"; open (HAN, "<$file") or die ("Can't open $file: $!\n"); $tr=0; #I used $tr so I could skip the if statement in the while loop +W ##### # Start a loop to go through the file to be parsed ##### W: while (<HAN>) { chomp; if ($_ ne "Pr:" and $tr == 0) { #Just print out the Givens and the + Prove statement, "Pr:" appears before the actual proof print $_ . "\n"; next W; } elsif ($tr == 0) { print "\n\n"; $tr=1; next W; } m/(.+) => (.+)/; #format is statement => reason push @s, $1; #constuct an array of statements push @r, $2; #constuct an array of reasons } close (HAN); $len=0; $lent=0; ##### # Find the longest statement and reason for formatting ##### foreach (@s) { if ( $len < length() ) { $len = length(); } } foreach (@r) { if ( $lent < length() ) { $lent = length(); } } $pad=length($#s+1); #This is so the numbers along the side are lined u +p correctly print "_" x ($len+$pad+4) . "|" . "_" x ($lent+2) . "\n"; #Print the t +op ie _____|______ for ($i=0; $i<$#s+1; $i++) { $r[$i] =~ s/#-(\d+?)/($i+1)-$1/ge; #replace #-? with the line # yo +u are on minus ? print $i+1 . "." . " " x ($pad - length($i+1)+1) . $s[$i] . " " x +($len - length($s[$i])) . " | " . $r[$i] . "\n"; #print the stateme +nts and resons }
and any other tips are also very welcome

Title edit by tye as one-word titles complicate simple searches

Replies are listed 'Best First'.
Re: Parsing
by grep (Monsignor) on Feb 11, 2002 at 04:31 UTC
    Couple of things that I saw.
    if ($_ ne "Pr:" and $tr == 0)
    with this statement you can not have anything follow the "Pr:" which your test data has a comment.
    if (!/^Pr:/ and $tr == 0)
    Looks like it will work better.

    I would also just sort your data and pull off the highest value instead of the 2 foreach's

    A style issue that would've helped solve this:
    You're naming could be much more descriptive. When I first started I was like you. I kept thinking "this is a throw-away script" or "I'm the only one who will see this" but, it never turns out that way. So now even on scripts "I know" will be throw away and no one will ever look at I use good names and proper style, because I KNOW they will not be thrown away and someone will look at them :)

    Fredrick Brooks paraphrased from The Mythical Man-Month:
    Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won't usually need your [code]; it'll be obvious.


    grep
    grep> chown linux:users /world
      Ok - you changed your input file again :).
      change the regexp to !/^PR:/ or if you are worried about changeing data !/^pr:/i to make it case insenesitive.

      Also the sort should look something like this:
      $len = length( (sort {$a <=> $b} @s)[0]);


      UPDATE: Whooops that sort was waaay off
      $len_stmts   = (sort {$b <=> $a} map {length} @stmts  )[0];

      grep
      grep> chown linux:users /world
        well i havent dealt with the parsing yet, but I am having trouble with the sort stuff. The foreach loops work great but the
        $len = length( (sort {$a <=> $b } @s)[0]); $lent = length( (sort {$a <=> $b } @r)[0]);
        gives weird results:
        ___________________________________|______________ 1. D midpoint side AB | G 2. F midpoint side AC | G 3. segment ED median | def of median 4. segment BF median | def of median 5. segment AE median | only one line concurrunt with ot +her median 6. N point of concurrency of medians triangle ABC | def of point of + concurrency 7. segment NE / segment AN = 1/2 | median thm 8. point O point of concurrency triangle ABE | def of point of conc +urrency 9. (segment BP / segment PE) * (segment EN / segment NA) * (segment A +D / segment DB) = 1 | cheva 10. segment AD congruent segment DB | def of midpoint 11. m segment AD = m segment DB | docs 10 12. segment AD / segment DB | ? 13. segment AD / segment AD = 1 | sub 11 , 12 14. (segment BP / segment PE) * (1/2) * (1) = 1 | sub 7, 13, 9 15. segment BP / (segment PE * 2) = 1 | multiplication 14 16. segment BP = segment PE * 2 | cross multipy 15 17. segment BP / segment PE = 2/1 | division 16
        with errors for every numeric compare, so i changed it to cmp:
        $len = length( (sort {$a cmp $b } @s)[0]); $lent = length( (sort {$a cmp $b } @r)[0]);
        , but still, same results. And help would be great. The file i am running this on is available in my scratch pad, here scratch pad viewer
      Sorry the comments in the test data aren't real. Real test data would be:
      Problem 2: G: Segment AB congruent Segment BC P: triangle ABC isosceles PR: Segment AB congruent Segment BC => G triangle ABC isosceles => def isosceles
      sorry. Also since i need to preserve the order of the statements/reasons should i sort/find the longest like this:
      $len = length( @{ sort ($a <=> $b) @s }[0]);
      thanks
      ammendment to my last post, i meant sort like this:
      $len = length( ${ sort {$a <=> $b} @s }[0]);
      Do I use ${ or @{. It is a slice, but.... I dont know. Thanks again