comment on

I posted on here once before, a question about REGEX stuff and you guys were fantastically helpful. I have eventually gotten so that I can write (probably very ugly) but effective scripts for getting the info that I need, so first off thanks to you all. My problem now is that I want to grab multiple lines of data, which occurs repeatedly in the file. For example:

0AVERAGE COMPOSITION IN PINS.  NUMBER DENSITIES IN 1.0E+24/CM3,  WT% P
+ER MASS INITIAL HEAVY ISOTOPES.
 ----------------------------  FOR BA-ELEMENTS WITH EID>99100, WT% IS 
+THE PERCENTAGE LEFT (FRACTION).
0 EID:     Cm-243
  ND : 8.7352E-08
      0 
      0  3822 
      0  3278     0 
      0  3260     0   +++ 
      0  3242     0   +++   +++ 
      0  3157     0     0     0   +++ 
      0  3096     0     0     0   +++   +++ 
      0  3170     0     0     0     0     0     0 
      0  3772  3170  3096  3157  3242  3260  3278  3822*
      0     0     0     0     0     0     0     0     0     0 
1GE 12 Bundle VOID=0%                                                 
+ >> PHOENUT /1.2.8   / << CORE MASTER 
    9  COMPOS                      CASE= 1 RP=  5 V= 2.9 CO= 0 B= 3307
+ 2007-01-30  13.38.50  Page 668  Job0000
 
 
0AVERAGE COMPOSITION IN PINS.  NUMBER DENSITIES IN 1.0E+24/CM3,  WT% P
+ER MASS INITIAL HEAVY ISOTOPES.
 ----------------------------  FOR BA-ELEMENTS WITH EID>99100, WT% IS 
+THE PERCENTAGE LEFT (FRACTION).
0 EID:     Pu-238
  ND : 7.0913E-06
      1 
      1  3667 
      0  3283     0 
      0  3266     0   +++ 
      0  3250     0   +++   +++ 
      0  3192     0     0     0   +++ 
      0  3151     0     0     0   +++   +++ 
      0  3204     0     0     0     0     0     0 
      1  3630  3204  3151  3192  3250  3266  3283  3667*
      1     1     0     0     0     0     0     0     1     1 
1GE 12 Bundle VOID=0%                                                 
+ >> PHOENUT /1.2.8   / << CORE MASTER 
    9  COMPOS                      CASE= 1 RP=  5 V= 2.9 CO= 0 B= 3307
+ 2007-01-30  13.38.50  Page 669  Job0000
[download]

In this example I want to grab all the info about Pu-238, where information for many other elements occurs before and after Pu-238. In addition, there are multiple statepoints throughout the file, therefore multiple occurances of Pu-238. I know that Pu-238 (or whatever isotope I want to search for) is a unique identifier, it's just grabbing all the numerical data, in the format already in the file, that is my problem. I started some code, which is attached below, but it is definitely not complete since I wasn't sure what the best way to grab multiple lines and then return it to an output file is. Any suggestions? Thanks!

#!/usr/local/bin/perl -w
use IO::File;
my $file = IO::File->new;
print "Enter the output file you would like to analyze: ";
chomp ($filename = <STDIN>);
print "Enter the isotope you want to extract (ex: Am-241): ";
chomp ($iso= <STDIN>);
$file->open("< $filename") or die("Can't read the source:$!");

open(OUT, ">Comp_$filename");

select (OUT);

@iso=();
until ($file->eof) {
   my $line = $file->getline();
   if($line =~ /"$iso"/) {
      $line = $file->getline();
      chomp($line);
      @col1 = split(qr/\s+/s, $line);
      push(@iso,"$col1[1] $col[2] $col[3]");
      $line = $file->getline();
      chomp($line);
      @col1 = split(qr/\s+/s, $line);
      push(@iso,"$col1[1]");
#I INTENDED TO DO THIS SAME PROCESS OVER AND OVER UNTIL THE FINAL LINE
+ WAS PROCESSED, THEN LET THE REGEX SEARCH FOR THE NEXT INSTANCE OF WH
+ATEVER IS DESIRED
   }
} # end of until
#for($i=1; $i<=28; $i++){
#      print "UNSURE WHAT THE BEST WAY TO PRINT IN ORDER IS";
#}

close(OUT);
[download]

In looking at how I'm approaching it, I feel there must be a better way to grab multiple lines and save it in the form it's already in to access later, but unsure how to do this, or if some other approach would work well. Also, does using a regex this way work (meaning trying to input a variable into it, as in the form =~ /"$iso"/)? Any help would be appreciated ... thanks!

In reply to REGEX on multiple lines by igotlongestname

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.