Seriously, he has to output to FORTRAN! ( Fortran::Format ). Think what that's doing to him! Show him some perl love!

I've chosen a data driven design. I've extracted all the field options to one place $fields: defining the regexp to parse the string, possible fixups and the fortran output format.

I've added Getopt::Long to let the user pick a specific range of dates. All of the configurable information is at the start of the script. I'd eventually pull the parsing into a module with a callback to print the output lines, so that the script just contains the part of user interest.

WWW::Mechanize::Cached will build a nice local-side cache, which is handy when you realize half-way through a huge scrape run that you need to grab an extra field. I query if the last request came out of cache, and skip sleeping before the next request.

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize::Cached; use Date::Simple::D8 (':all'); use Fortran::Format; use Getopt::Long; #configuration: my $start_date = '19400101'; my $last_date = '20090718'; my $filename = "c:/perl/scripts/496/hla.txt"; my $root_url = "http://bub2.meteo.psu.edu/wxstn/wxstn.htm"; my $help = 0; my $result = GetOptions( "start=s" => \$start_date, "end=s" => \$last_date, "help" => \$help, ); my $usage = <<EOS; $0 - Parse $root_url from $start_date to $last_date Options: --start=19400101 --end=20090718 EOS die $usage if ( $help or !$result ); my $fields = { depth => { regexp => qr/Rain or Liquid Equivalent\s+:\s+(\S*)/, format => "I2.1,6X", fix => { TRACE => 99, '(N/A)' => 99, '0' => 99 }, }, rain => { regexp => qr/Snow and\/or Ice Pellets\s+:\s+(\S*)/, format => "F4.2,6X", fix => { TRACE => '0.00' }, }, snow => { regexp => qr/Snow Depth\s+:\s+(\S*)/, format => "F4.2,6X", fix => { TRACE => '0.00' }, }, hdd => { regexp => qr/Degree-Days\s+:\s+(\S*)/, format => "I2.1,6X", } }; sub print_output { my ( $date, $data, $fortran ) = @_; my @data = @$data; my %fortran = %$fortran; my $output = join( " ", $date, @data[ 0 .. 2 ], @fortran{qw( depth hdd rain s +now )} ) . "\n"; print $output; print OUTPUT $output; } #### end configuration # Open file for writing open( OUTPUT, '>', $filename ) or die "open: $!\n"; # Initiate browsing agent my $mech = WWW::Mechanize::Cached->new( keep_alive => 1 ); # Create date list my $date = Date::Simple::D8->new($start_date)->prev; my $end_date = Date::Simple::D8->new($last_date); $mech->get($root_url); while ( $date->next <= $end ) { # Submit the first form my $resp = $mech->submit_form( form_number => 1, fields => { dtg => $date } ); # Download the resulting page, text only, and scrape for data my $page = $mech->content( format => 'text' ); my @data = ( $page =~ /:\s\s\s\s(\d\d)/g ); my %fortran; foreach $field ( keys %$fields ) { my $regexp = $fields->{$field}->{regexp}; my $format = $fields->{$field}->{format}; my $fix = $fields->{$field}->{fix} || {}; #parse page for this field my ($parsed) = $page =~ /$regexp/; #fix field foreach my $key ( keys %fix ) { $parsed = $fix{$key} if $parsed eq $key; } # Format the output for Fortran analysis chomp( my $f = Fortran::Format->new($format)->write($parsed) ) +; $fortran{$field} = $f; } # Prepare output for screen and file print_output( $date, \@data, \%fortran ); sleep .1 unless $mech->is_cached(); $mech->back(); } # Exit the loop # Close the written file close(FH);

In reply to Re: Improve My Code... by spazm
in thread Improve My Code... by cheech

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.