smgfc has asked for the wisdom of the Perl Monks concerning the following question:
(the full code is below, and $i is the word ie "triangle"). Also I have a number of regex that I think could be combined, all of which deal wit h white space:s/(^| )$i( |s|$)/"$1$def{$i}$2"/gie;
I would also like to make a regex that finds the names of symbols in the proof and then make sure they are capitalized, but can't seen any distinct way to identify them, i usually end up capitalizing some thing like "angle BISECTOR" instead of "angle ABC". The code and some sample data are below, any other tips are more then welcome as well (since the code and data have very long lines it will be easier to understand if you turn off word wrap :) )s/^\s+//; #trim leading whitespace s/\s+$//; #trim trailing whitespace s/(\S+)\s{2,}/"$1 "/gie; #replace more then one space with one space s/\s{2,}(\S+)/" $1"/gie; #replace more then one space with one space
actual parser:T: Problem #3 A: WIlliam Meyer G: triangle ABC with angle bisectors segment AZ, segment BY, segment CX triangle DEF with angle bisectors segment DJ, segment EI, segment FH P: angle bisectors in similar triangles same ratio as corresponding sides Pr: triangle ABC with altitudes segment AZ, segment BY, segment CX => G triangle DEF with altitudes segment DJ, segment EI, segment FH => G ( m segment AB / m segment DE ) = ( m segment BC / m segment EF ) = ( +m segment AC / m segment DF ) angle BAC congruent angle EDF => def similar triangles angle ABC congruent angle DEF => def similar triangles m angle ABY * 2 = m angle ABC => def angle bisector m angle DEI * 2 = m angle DEF => def angle bisector m angle DEI * 2 = m angle ABC => sub #-1, #-3 m angle DEI * 2 = m angle ABY * 2 => sub #-1, #-3 angle DEI congruent angle ABY => division & doca triangle AYB similar triangle DIE => AA #-1, #-7 ( m segment AB / m segment DE ) = ( m segment BY / m segment EI ) => s +imilar triangles ( m segment BY / m segment EF ) = ( m segment BC / m segment EF ) = ( +m segment AC / m segment DF ) => sub #-1, #-5 ( m segment BC / m segment EF ) = ( m segment CX / m segment FH ) = +> similar reasoning #-2 ( m segment AB / m segment DE ) = ( m segment CX / m segment FH ) = ( +m segment AC / m segment DF ) => similar reasoning #-2 ( m segment AC / m segment DF ) = ( m segment AZ / m segment DJ ) => s +imilar reasoning #-4 ( m segment AB / m segment DE ) = ( m segment BC / m segment EF ) = ( +m segment AZ / m segment DJ ) => similar reasoning #-4
#!/usr/bin/perl -w use strict; my ($file_parse, $file_output, %def, $seen_pr, $tab, @stmts, @reasons, + $len_stmts, $len_reasons, @pad, $i); $file_parse="untitled:Desktop Folder:test2"; # default to be r +ead $file_output="untitled:Desktop Folder:test2output"; # default to be w +ritten $tab=4; # default tab len +gth $seen_pr=0; # wether "Pr:" ha +s been seen in while loop W %def = ( # used for parsin +g the symbols in $file_parse "T:" => "Title", "A:" => "Author(s)", "G:" => "Given(s)", "P:" => "Proof Statement(s)", "T1" => "Statements", "T2" => "Reasons", "point" => "·", "triangle" => "?", "angle" => "‹", "w" => "with", "m" => "measure", "s" => "segment" ); ##### # Some code establishing what file you are working on and what to outp +ut to ##### print "Will parse $file_parse (y/n):"; $_=<>; if (!/y/i) { print "Which file: "; $file_parse=<>; chomp $file_parse; } print "Will write $file_output (y/n):"; $_=<>; if (!/y/i) { print "Which file: "; $file_output=<>; chomp $file_output; } ##### # Start a loop to go through the file to be parsed ##### open (PARSE, "<$file_parse") or die ("Can't open $file_parse: $!\n"); open (OUTPUT, ">$file_output") or die ("Can't open $file_output: $!\n" +); W: while (<PARSE>) { chomp; s/^\s+//; #trim leading whitespace s/\s+$//; #trim trailing whitespace s/(\S+)\s{2,}/"$1 "/gie; #replace more then one space with one spa +ce s/\s{2,}(\S+)/" $1"/gie; #replace more then one space with one spa +ce foreach $i (split " ", $_) { if (exists $def{$i} and !/:/) { s/(^| )$i( |s|$)/"$1$def{$i}$2"/gie; #replace symbols defi +ned in %def with there meaning } } if (!/pr:/i and $seen_pr==0) { if (/:/) { # if $_ contains a +":" then regard as token and print out $_ = uc(); print OUTPUT "\n" if !/t:/i; print OUTPUT $def{$_} . ":\n"; next W; } else { # Print the items u +nder the token print OUTPUT " " x $tab . $_ . "\n"; next W; } } elsif ($seen_pr==0) { $seen_pr=1; print OUTPUT "\n"; next W; } m/(.+) => (.+)/; #format is statement => reason push @stmts, $1; #constuct an array of statements push @reasons, $2; #constuct an array of reasons } close (PARSE); ##### # Find the longest statement and reason for formatting ##### $len_stmts = (sort {$b <=> $a} map {length} @stmts )[0]; $len_reasons = (sort {$b <=> $a} map {length} @reasons)[0]; ##### # Really difficult to understand way of formatting output ##### push @pad, ( ( length($#stmts+1) + $len_stmts + 4) / 2) - int( length( + $def{"T1"} ) / 2 ) - length( $def{"T1"} ) % 2; push @pad, ( ( length($#stmts+1) + $len_stmts + 4) / 2) - int( length( + $def{"T1"} ) / 2 ) +1; push @pad, ( ( length($#reasons+1) + $len_reasons + 2) / 2) - int( len +gth( $def{"T2"} ) ) / 2 - length( $def{"T2"} ) % 2; push @pad, ( ( length($#reasons+1) + $len_reasons + 2) / 2) - int( len +gth( $def{"T2"} ) ) / 2; print OUTPUT "_" x $pad[0] . $def{"T1"} . "_" x $pad[1] . "|" . "_" x +$pad[2] . $def{"T2"} . "_" x $pad[3] . "\n"; for ($i=0; $i<$#stmts+1; $i++) { $reasons[$i] =~ s/#-(\d+?)/($i+1)-$1/ge; #replace #-? with the lin +e # you are on minus ? print OUTPUT $i+1 . "." . " " x (length($#stmts+1) - length($i+1)+ +1) . $stmts[$i] . " " x ($len_stmts - length($stmts[$i])) . " | " . + $reasons[$i] . "\n"; #print the statements and resons } close (OUTPUT);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex help/condensation
by gav^ (Curate) on Feb 15, 2002 at 00:02 UTC | |
by Kanji (Parson) on Feb 15, 2002 at 04:43 UTC |