in reply to regex help!

Let me write a hypothetical question for you that may or may not be what you were trying to ask:

Most wise monks, I am very new to Perl but have been given a large data file to read that was generated by an old Fortran program. The data are in pairs of lines with a header line and a data line like this:

000 NP U Pu 001 1.270000 000001 3.141000 002 Lev N Pu 003 0.13 000001 3.277118 004 NP U Pu 005 1.000220 000002 3.098761 006 Yac S Yb 007 10.33000 000001 90000000

I need to extract the NP U P lines of data. I have worked out how to read the file. But I can't figure out how to find the data. My code so far looks like this:

open I,"data.dat"; for($I=0;$I<1000;++$I) { $l1=<I>; chop $L1; $L2=<I>; chop $L2; #find the data here printf ("%d, %d, %d\n", $N1, $N2, $n3); }

Can someone help me with the code I need to replace the comment please?


Perl is Huffman encoded by design.

Replies are listed 'Best First'.
Re^2: regex help!
by igotlongestname (Acolyte) on Sep 15, 2005 at 16:56 UTC
    You are all right. I am new, I tried crap but none of it seemed remotely close. What grandfather asked was my exact question. Thank you for the help. Yeah I'm new at this and just need help, the books haven't helped me too much on this subject.

      It is important to show us the "crap" because that shows that you have at least made an effort. It is also important to show some of the data because a description may not be very clear. As you will have noticed from the earlier replies to your original message, we are inclined to grab an idea and run with it - even if it is hopelessly wrong.

      After all that lecturing, here is a solution for you (I suggest you examine this carefully, then reply explaining how you think it works):

      use warnings; use strict; while (<DATA>) { my $match = /(NP\s+)(U\s+)(Pu\s*)/i; last if ! ($_ = <DATA>); next if ! $match; chomp; my $NP = substr $_, $-[1], $+[1] - $-[1] + 1; my $N = substr $_, $-[2], $+[2] - $-[2] + 1; (my $Pu = substr $_, $-[3]) =~ s/(\s)//g;; $NP =~ s/(\s)//g; $N =~ s/(\s)//g; print "NP $NP, N $N, Pu $Pu\n"; } __DATA__ 000 NP U Pu 001 1.270000 000001 3.141000 002 Lev N Pu 003 0.13 000001 3.277118 004 NP U Pu 005 1.000220 000002 3.098761 006 Yac S Yb 007 10.33000 000001 90000000

      Note that the sample data is given as part of the script so tht other monks can simply download the entire thing and run it to see that it works. The sample given prints:

      NP 1.2700000, N 0000013, Pu 3.141000 NP 1.0002200, N 0000023, Pu 3.098761

      Perl is Huffman encoded by design.
        A quick couple things. The first reason I didn't post code, or output, is because I couldn't figure out how to use the tags for presenting it and I felt like an idiot. Like I said I'm new, so thanks for bearing with me. In any event, here is what I had done:
        #!/usr/local/bin/perl -w print "Enter an output file to analyze: "; chomp ($phoenix_out= <STDIN>); open (OUTPUT_FILE, "$phoenix_out") or die "can't open $phoenix_out: $! +"; while (<OUTPUT_FILE>) { chomp; if (/0 EID: 93237 /){ print "$.\n"; next; print "anything?\n"; ($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l) = split; $ND243 = $i; print "The amount of Am243 is $ND243\n"; } #else { print "$. Problem!" }; }
        What I couldn't figure out was how to get it to go to the line after the data that I found. (I realized the line above the data was a unique line and that I could search for it directly). An example of the output is as follows:
        0 EID: 93237 93238 93239 94240 ND : 1.8833E-06 3.1143E-09 9.6338E-07 1.9309E-05 WT%: 8.2146E-03 1.3642E-05 4.2377E-03 8.5291E-02
        The data extends longer across the page, but in the interest of width I didn't include all of it, it's just the same pattern repeated. Lastly, my attempt to understand your file:

        match according to NP, then white space, then U, then white space, then Pu ending ... case insensitive. Last if there is no more data? Also, next loop if there is a match? Is this what moves you to the next line down? Break up the current data into substrings, I can figure out what the nomenclature means although I don't know it offhand, I have books I can read it from. Print off the results.

        Again, thank you for the help, sorry for sucking at my posts early on, and any clarifications you would like to provide would be much appreciated.
      so, with trivial variants on method above:
      #!C:/Perl/bin use strict; # no warnings because using uninit values below use Data::Dumper::Simple; use vars qw ( @nomatch $I1 $I2 $I3 $L1 $L2 @data $i $j ); while (<DATA>) { push @data,$_ ; } { while (@data) { $L2 = pop @data; chomp $L2; #print "\$L2 is: $L2\n"; $L1 = pop @data; chomp $L1; #print "\$L1 is: $L1\n"; #find the data here if ( $L1 =~ / \d\d\d # three digits \s+ # one or more whitespace NP # exact string, NP \s+ # one or more whitespace U # exact string, U \s+ # one or more whitespace Pu # exact string, Pu /x # end match, extended && $L2 =~ / (\d\d\d) # three digits \s+ # one or more whitespace (\d\.\d{6}) # digit, period, six digits \s+ # one or more whitespace (\d{6}) # six digits \s+ # one or more whitespace (\d\.\d{6}) # digit, period, six digits /x ) { my $n1 = $1; $I1 = $2; $I2=$3; $I3=$4; print "\n\tIn linepair ENDING with $n1, NP: $I1, U: $I2, Pu: + $I3\n"; } else { push @nomatch,"\n\tNo match on lines $L1\n\t\t\t and $L2\ +n"; } } print "\n\n\t No Match pairs follow\n"; warn Dumper (@nomatch); } __DATA__ 000 NP U Pu 001 1.270000 000001 3.141000 002 Lev N Pu 003 0.13 000001 3.277118 004 NP U Pu 005 1.000220 000002 3.098761 006 Yac S Yb 007 10.33000 000001 90000000 008 NP U Pu 009 2.130000 000140 5.797712