MKevin has asked for the wisdom of the Perl Monks concerning the following question:

HI here is the code I am working on (in the program itself contains how the txtfile looks like
#!/usr/bin/perl -w # ## Copyright (c) 2007 University of Miami # my $pkgdoc = <<'EOD'; #/**------------------------------------------------------------------ +-------------- # @ file radiosondeparcer.pl # This scirpt parces the fetched radiosonde data from # the plymoth website. # # @ussage radiosondeparcer.pl ddd.hh.yyyy.index.txt # date: Jan 18, 2007 #--------------------------------------------------------------------- +-------------*/ EOD # $Log$ use strict; use warnings; use Getopt::Long; if (@ARGV <1) { print $pkgdoc; exit -1; } my $txtfile = shift; open (IN, $txtfile)||die "cannot open $txtfile for reading"; #/**------------------------------------------------------------------ +--------------- # Read important info on txtfile and add it to the hash # structurepassed in. # # # txtfiles look like this: # # # <TITLE>Plymouth State RAOB Thermodynamic Diagram/Data</TITLE> # </center><pre> # Miami Intl Airp FL US KMIA 1 25.82 -80.28 4 72202 # # Date: 0000Z 24 AUG 05 # Station: KMIA # WMO ident: 72202 # Latitude: 25.82 # Longitude: -80.28 # Elevation: 4.00 # -------------------------------------------------------------------- +----------- # LEV PRES HGHT TEMP DEWP RH DD WETB DIR SPD THETA THE-V THE-W +THE-E W # mb m C C % C C deg knt K K K + K g/kg # -------------------------------------------------------------------- +----------- # SFC 1012 4 29.8 23.8 70 6.0 25.3 100 10 301.9 305.3 298.1 +357.0 18.65 # 1 1000 113 28.8 23.8 74 5.0 25.1 95 12 302.0 305.4 298.2 +357.7 18.88 # 2 961 466 25.4 22.3 83 3.1 23.2 93 15 302.0 305.3 297.7 +354.9 17.91 # 3 925 802 23.2 19.4 79 3.8 20.5 95 15 303.0 305.9 296.4 +349.0 15.51 # 4 850 1536 19.0 14.0 73 5.0 15.7 95 14 306.0 308.3 294.7 +341.8 11.91 # 5 769 2390 14.2 8.2 67 6.0 10.5 96 11 309.8 311.4 293.6 +337.0 8.91 # 6 753 2568 12.4 10.5 88 1.9 11.2 97 11 309.7 311.7 294.8 +342.2 10.65 # 7 737 2748 11.8 6.8 71 5.0 8.8 100 11 310.9 312.5 293.6 +336.9 8.44 # 8 700 3178 9.4 4.4 71 5.0 6.5 110 11 312.9 314.3 293.4 +336.3 7.51 # ... # # #--------------------------------------------------------------------- +--------------*/ LINE: while (my $line = <IN>) { next LINE if ($line =~ /<TITLE>/); next LINE if ($line =~ /</center>/); my my ($LAT, $LON) = (split(' ', $line)[-4,-3]); next LINE if ($line =~ /Date\:/); next LINE if ($line =~ /Station\:/); next LINE if ($line =~ /WMO/); next LINE if ($line =~ /Latitude\:/); next LINE if ($line =~ /Longitude\:/); next LINE if ($line =~ /------------------------------------------ +-------------------------------------/); next LINE if ($line =~ /LEV/); next LINE if ($line =~ /mb/); my ($LEV, $PRES, $HGHT, $TEMP, $DEWP, $RH, $DD, $WETB, $DIR, $SPD +, $THETA, $THE-V, $THE-W, $THE-E, $W) = split (' ", $line); } open (OUT, ">$txtfile.reduced); print OUT "$LAT, $LON, $PRES, $HGHT, $TEMP, $DEWP, $DIR, $SPD\n"; close (IN); close (OUT);
Now here is the question in the third line of the txtfile how do I pull out the 4th and 3rd to last numbers and place them as $LAT, $LON. Another question: I want to pull out the print (OUT) info for only $LEV = sfc, $PRES = 850, 500, 250. The scf pressure changes all the time.

Replies are listed 'Best First'.
Re: PARSER help
by GrandFather (Saint) on Jan 25, 2007 at 04:19 UTC

    The following does what it seems you want and cleans up your original code somewhat. Note however that modules such as HTML::TreeParser are generally much better (more reliable and easier to maintain) tools for extracting the raw data from an HTML page such as the source of this data seems to be.

    use strict; use warnings; my $lat; my $long; # First seek location line while (<DATA>) { next unless /(-?\d+(?:\.\d*)?)\s+ (-?\d+(?:\.\d*)?)\s+ \d+\s \d+/ +x; ($lat, $long) = ($1, $2); last; } print "$lat, $long\n"; # Skip to data lines while (<DATA>) {last if /^-+$/}; while (<DATA>) {last if /^-+$/}; while (<DATA>) { my ($LEV, $PRES, $HGHT, $TEMP, $DEWP, $RH, $DD, $WETB, $DIR, $SP +D) = split ' '; last unless defined $SPD; print "$PRES, $HGHT, $TEMP, $DEWP, $DIR, $SPD\n"; } __DATA__ <TITLE>Plymouth State RAOB Thermodynamic Diagram/Data</TITLE> </center><pre> Miami Intl Airp FL US KMIA 1 25.82 -80.28 4 72202 Date: 0000Z 24 AUG 05 Station: KMIA WMO ident: 72202 Latitude: 25.82 Longitude: -80.28 Elevation: 4.00 ---------------------------------------------------------------------- +--------- LEV PRES HGHT TEMP DEWP RH DD WETB DIR SPD THETA THE-V THE-W TH +E-E W mb m C C % C C deg knt K K K +K g/kg ---------------------------------------------------------------------- +--------- SFC 1012 4 29.8 23.8 70 6.0 25.3 100 10 301.9 305.3 298.1 35 +7.0 18.65 1 1000 113 28.8 23.8 74 5.0 25.1 95 12 302.0 305.4 298.2 35 +7.7 18.88 2 961 466 25.4 22.3 83 3.1 23.2 93 15 302.0 305.3 297.7 35 +4.9 17.91 3 925 802 23.2 19.4 79 3.8 20.5 95 15 303.0 305.9 296.4 34 +9.0 15.51 4 850 1536 19.0 14.0 73 5.0 15.7 95 14 306.0 308.3 294.7 34 +1.8 11.91 5 769 2390 14.2 8.2 67 6.0 10.5 96 11 309.8 311.4 293.6 33 +7.0 8.91 6 753 2568 12.4 10.5 88 1.9 11.2 97 11 309.7 311.7 294.8 34 +2.2 10.65 7 737 2748 11.8 6.8 71 5.0 8.8 100 11 310.9 312.5 293.6 33 +6.9 8.44 8 700 3178 9.4 4.4 71 5.0 6.5 110 11 312.9 314.3 293.4 33 +6.3 7.51 </pre>

    Prints:

    25.82, -80.28 1012, 4, 29.8, 23.8, 100, 10 1000, 113, 28.8, 23.8, 95, 12 961, 466, 25.4, 22.3, 93, 15 925, 802, 23.2, 19.4, 95, 15 850, 1536, 19.0, 14.0, 95, 14 769, 2390, 14.2, 8.2, 96, 11 753, 2568, 12.4, 10.5, 97, 11 737, 2748, 11.8, 6.8, 100, 11 700, 3178, 9.4, 4.4, 110, 11

    DWIM is Perl's answer to Gödel
Re: PARSER help
by BrowserUk (Patriarch) on Jan 25, 2007 at 04:25 UTC

    Is this what you're aiming for?

    C:\test>junk4 junk 25.82, -80.28, 1012, 4, 29.8, 23.8, 100, 10 25.82, -80.28, 1000, 113, 28.8, 23.8, 95, 12 25.82, -80.28, 961, 466, 25.4, 22.3, 93, 15 25.82, -80.28, 925, 802, 23.2, 19.4, 95, 15 25.82, -80.28, 850, 1536, 19.0, 14.0, 95, 14 25.82, -80.28, 769, 2390, 14.2, 8.2, 96, 11 25.82, -80.28, 753, 2568, 12.4, 10.5, 97, 11 25.82, -80.28, 737, 2748, 11.8, 6.8, 100, 11 25.82, -80.28, 700, 3178, 9.4, 4.4, 110, 11

    #!/usr/bin/perl -w use strict; my $header = do{ local $/ = "\n-"; <DATA> }; my( $lat, $lon ) = $header =~ m[ Latitude: \s+ ( \S+ ) \s+ Longitude: \s+ ( \S+ ) \s+ ]smx; my $discard = map{ scalar <DATA> } 1 .. 4; while( my $line = <DATA> ) { my( $pres, $hght, $temp, $dewp, $dir, $spd ) = ( split (' ', $line), 10 )[ 1, 2, 3, 4, 8, 9 ]; print "$lat, $lon, $pres, $hght, $temp, $dewp, $dir, $spd\n"; } __DATA__ <TITLE>Plymouth State RAOB Thermodynamic Diagram/Data</TITLE> </center><pre> Miami Intl Airp FL US KMIA 1 25.82 -80.28 4 72202 Date: 0000Z 24 AUG 05 Station: KMIA WMO ident: 72202 Latitude: 25.82 Longitude: -80.28 Elevation: 4.00 ---------------------------------------------------------------------- +--------- LEV PRES HGHT TEMP DEWP RH DD WETB DIR SPD THETA THE-V THE-W TH +E-E W mb m C C % C C deg knt K K K K + g/kg ---------------------------------------------------------------------- +--------- SFC 1012 4 29.8 23.8 70 6.0 25.3 100 10 301.9 305.3 298.1 35 +7.0 18.65 1 1000 113 28.8 23.8 74 5.0 25.1 95 12 302.0 305.4 298.2 35 +7.7 18.88 2 961 466 25.4 22.3 83 3.1 23.2 93 15 302.0 305.3 297.7 35 +4.9 17.91 3 925 802 23.2 19.4 79 3.8 20.5 95 15 303.0 305.9 296.4 34 +9.0 15.51 4 850 1536 19.0 14.0 73 5.0 15.7 95 14 306.0 308.3 294.7 34 +1.8 11.91 5 769 2390 14.2 8.2 67 6.0 10.5 96 11 309.8 311.4 293.6 33 +7.0 8.91 6 753 2568 12.4 10.5 88 1.9 11.2 97 11 309.7 311.7 294.8 34 +2.2 10.65 7 737 2748 11.8 6.8 71 5.0 8.8 100 11 310.9 312.5 293.6 33 +6.9 8.44 8 700 3178 9.4 4.4 71 5.0 6.5 110 11 312.9 314.3 293.4 33 +6.3 7.51

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: PARSER help
by graff (Chancellor) on Jan 25, 2007 at 04:31 UTC
    You have a few problems, apart from your main question. It's good that you have use strict but you need to understand how scoping of variables really works.

    Your main problem is that you are declaring variables with "my" inside the while loop, but then you are trying to use those variables outside the loop, where they become undefined/undeclared. The script, as posted, will generate syntax errors on the final "print" statement, because all the variables being printed have gone "out of scope" (at that point, program execution has exited the block in which they were defined -- i.e. the while loop).

    (You also have a syntax error from repeating "my" twice on the line where you assign values to $LAT and $LON, as well as mismatched quotes in the last line of the while loop.)

    You need to declare the variables that will be printed before you go into the loop, then assign values to them as appropriate within the loop, then print them when the loop is done. Something like this:

    #!/usr/bin/perl -w #... (documentation could be a little better... consider using pod) use strict; use warnings; die "Usage: $0 textfile\n" unless ( @ARGV == 1 and -f $ARGV[0] ); my $txtfile = shift; open(IN, $txtfile) or die "open failed on $txtfile: $!"; my ( $LAT, $LON, $PRES, $HGHT, $TEMP, $DEWP, $DIR, $SPD ); while (<IN>) { if ( /<pre>/ ) { # cue to grab next line for lat/lon $_ = <IN>; # use a regex to match and capture floating-pt n +umbers: ( $lat, $lon ) = ( /\s+ (-?\d+\.\d+) \s+ (-?\d+\.\d+) \s+/x; } elsif ( /^\s*(?:SFC|\d+)\s+\d+/ ) { # line of data my @flds = split; ( $PRES, $HGHT, $TEMP, $DEWP, $DIR, $SPD ) = @flds[1..4,8,9]; } } open( OUT, ">", "$txtfile.reduced" ) or die "open failed on $txtfile.r +educed: $!"; print OUT "$LAT, $LON, $PRES, $HGHT, $TEMP, $DEWP, $DIR, $SPD\n"; close OUT;
    Now, that version will do basically what the OP code would do (if it were runnable), but I'm not sure this is what you really want to do. It will simply print out the 6 fields of interest that come from the last line of data in the file, together with the lat/lon value from the file's header.

    If that's what you want, then your done. But I would have expected something more. And then again, it doesn't seem as though you are showing us the whole data file -- there is no  </pre> tag to be found in the sample...

Re: PARSER help
by Anonymous Monk on Jan 25, 2007 at 04:21 UTC
    You have many syntaxt errors here
    my ($LEV, $PRES, $HGHT, $TEMP, $DEWP, $RH, $DD, $WETB, $DIR, $SPD +, $THETA, $THE-V, $THE-W, $THE-E, $W) = split (' ", $line);
Re: PARSER help
by jesuashok (Curate) on Jan 25, 2007 at 04:22 UTC
    how do I pull out the 4th and 3rd to last numbers

    my my ($LAT, $LON) = (split('\s+', $line)[-1,-2] +);
    -1 for the last element from the array ( list ).