in reply to REGEX on multiple lines

A "trick" for reading reguarly formated material like this is to change the input seperator to the header string for each block. Consider:

use strict; use warnings; $/ = '0AVERAGE COMPOSITION IN PINS.'; while (<DATA>) { next if ! /EID: Pu-238/; chomp; print "$/$_"; } __DATA__
0AVERAGE COMPOSITION IN PINS. NUMBER DENSITIES IN 1.0E+24/CM3, WT% P +ER MASS INITIAL HEAVY ISOTOPES. ---------------------------- FOR BA-ELEMENTS WITH EID>99100, WT% IS +THE PERCENTAGE LEFT (FRACTION). 0 EID: Cm-243 ND : 8.7352E-08 0 0 3822 0 3278 0 0 3260 0 +++ 0 3242 0 +++ +++ 0 3157 0 0 0 +++ 0 3096 0 0 0 +++ +++ 0 3170 0 0 0 0 0 0 0 3772 3170 3096 3157 3242 3260 3278 3822* 0 0 0 0 0 0 0 0 0 0 1GE 12 Bundle VOID=0% + >> PHOENUT /1.2.8 / << CORE MASTER 9 COMPOS CASE= 1 RP= 5 V= 2.9 CO= 0 B= 3307 + 2007-01-30 13.38.50 Page 668 Job0000 0AVERAGE COMPOSITION IN PINS. NUMBER DENSITIES IN 1.0E+24/CM3, WT% P +ER MASS INITIAL HEAVY ISOTOPES. ---------------------------- FOR BA-ELEMENTS WITH EID>99100, WT% IS +THE PERCENTAGE LEFT (FRACTION). 0 EID: Pu-238 ND : 7.0913E-06 1 1 3667 0 3283 0 0 3266 0 +++ 0 3250 0 +++ +++ 0 3192 0 0 0 +++ 0 3151 0 0 0 +++ +++ 0 3204 0 0 0 0 0 0 1 3630 3204 3151 3192 3250 3266 3283 3667* 1 1 0 0 0 0 0 0 1 1 1GE 12 Bundle VOID=0% + >> PHOENUT /1.2.8 / << CORE MASTER 9 COMPOS CASE= 1 RP= 5 V= 2.9 CO= 0 B= 3307 + 2007-01-30 13.38.50 Page 669 Job0000

Prints:

0AVERAGE COMPOSITION IN PINS. NUMBER DENSITIES IN 1.0E+24/CM3, WT% P +ER MASS INITIAL HEAVY ISOTOPES. ---------------------------- FOR BA-ELEMENTS WITH EID>99100, WT% IS +THE PERCENTAGE LEFT (FRACTION). 0 EID: Pu-238 ND : 7.0913E-06 1 1 3667 0 3283 0 0 3266 0 +++ 0 3250 0 +++ +++ 0 3192 0 0 0 +++ 0 3151 0 0 0 +++ +++ 0 3204 0 0 0 0 0 0 1 3630 3204 3151 3192 3250 3266 3283 3667* 1 1 0 0 0 0 0 0 1 1 1GE 12 Bundle VOID=0% + >> PHOENUT /1.2.8 / << CORE MASTER 9 COMPOS CASE= 1 RP= 5 V= 2.9 CO= 0 B= 3307 + 2007-01-30 13.38.50 Page 669 Job0000

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: REGEX on multiple lines
by johngg (Canon) on Jan 30, 2007 at 20:22 UTC
    Possibly a minor point but, from what I remember of my Fortran programming days and line printer control characters, I suspect that the actual header string for each block is the 1GE 12 Bundle VOID=0%. If memory serves, "1" meant throw a page, "0" meant double-line spacing, " " meant single-line spacing and "+" meant over-print. So I think that would mean that the "Pu-238" data actually appeared on Page 668, just in case the stuff at the top of the page has relevance to the OP's problem.

    Cheers,

    JohnGG

Re^2: REGEX on multiple lines
by igotlongestname (Acolyte) on Jan 31, 2007 at 15:14 UTC
    Thank you sir, that was exactly what I needed. I attempted to incorporate what Graff said below, but quickly found that it what he said is above my level, at least at the moment. I still am quite new and learning things, so thank you so much for your suggestion and coding. I'll attach what I ended up with here in case you see something "bad" or whatever, I simply added what I wanted before it, and allowed a variable definition in the regex search. I understand what Graff was saying enough to catch that what I'm doing is a bad idea, but in time I'll learn better ways and make do with my limited knowledge for now. Thanks to you and graff both!
    #!/usr/local/bin/perl -w use strict; use warnings; print "Enter the output file you would like to analyze: "; chomp (my $filename = <STDIN>); print "Enter the isotope you want to extract (ex: Am-241): "; chomp (my $iso= <STDIN>); open(IN, "<", $filename) or die("Can't read the source:$!"); open(OUT, ">Comp_$filename"); select (OUT); $/ = '0AVERAGE COMPOSITION IN PINS.'; while (<IN>) { next if ! /$iso/; chomp; print "$/$_"; }

      Mostly what Graff was suggesting was that you should get your input from the command line parameters rather than prompt the user for them. That makes it easier to use the script in an automated context where you may wish to do several runs with different parameters perhaps. In essence it means replacing:

      print "Enter the output file you would like to analyze: "; chomp (my $filename = <STDIN>); print "Enter the isotope you want to extract (ex: Am-241): "; chomp (my $iso= <STDIN>);

      with something like:

      # Validate parameters @ARGV == 2 or error ("Too few parameters"); $ARGV[0] =~ /[A-Z][a-z]?-\d{1,3}/ or error ("Expected an isotope first +"); -f $ARGV[1] or error ("<$ARGV[1]> is not a file"); # Extract parameters my ($iso, $filename) = @ARGV; ... sub error { my $msg = shift; print <<"USAGE"; Error: $msg. ExtractIso parses a given dibbly report file and extracts the record for a given isotope. Use as: ExtractIso <iso> <filename> <iso> is the isotope to be extracted. For example Am-214. <filename> is the filename (with path as required) of the record file. For example: Extract Am-214 dibbly.dat would extract and print the record for Am-214 from the dibbly.dat file in the current directory. USAGE exit -1; }

      The error sub uses a HEREDOC to provide an error diagnostic and usage information.


      DWIM is Perl's answer to Gödel