The code below is used to create a list of dates from 18960101 to 20090717 using Date::Simple, access and submit a form using Mechanize, and loop over all dates and scrape the resulting data pages for various climatological variables. The data is formatted using Fortran::Format and the array is printed to the screen and to a file.

I know the code is fairly messy, and I know there are many instances where several of my lines could be replaced with a single line or subroutine, etc. Please offer suggestions as to how I can reduce the total number of lines in the script or improve the readability in general. Also, any advice on improving the speed of the program would be appreciated.

Thanks for your input!

use strict; use warnings; # Declare globals my (@dates, @hlahdd, @rain, @snow); # Create date list use Date::Simple::D8 (':all'); my $start = Date::Simple::D8->new('19400101'); my $end = Date::Simple::D8->new('20090718'); while ( $start < $end ) { push @dates, "$start"; chomp(@dates); $start = $start->next; } # Open file for writing open(FH, '>', "c:/perl/scripts/496/hla.txt") or die "open: $!\n"; # Initiate browsing agent use WWW::Mechanize; my $url = "http://bub2.meteo.psu.edu/wxstn/wxstn.htm"; my $mech = WWW::Mechanize->new(keep_alive => 1); $mech->get($url); # Start the loop while (@dates) { # Submit the first form $mech->submit_form( form_number => 1, fields => { dtg => $dates[0], } ); # Download the resulting page, text only, and scrape for d +ata my $page = $mech->content(format=>'text'); my @data = ($page =~ /:\s\s\s\s(\d\d)/g); my @rain = ($page =~ /Rain or Liquid Equivalent\s+:\s+(\S* +)/); # Replace 'TRACE' if ($rain[0] eq 'TRACE') { $rain[0] = '0.00'; } my @snow = ($page =~ /Snow and\/or Ice Pellets\s+:\s+(\S*) +/); # Replace 'TRACE' if ($snow[0] eq 'TRACE') { $snow[0] = '0.00'; } my @depth = ($page =~ /Snow Depth\s+:\s+(\S*)/); # Replace '(N/A)/TRACE/0' if ($depth[0] eq '(N/A)' or $depth[0] eq 'TRACE' or$depth[ +0] eq '0') { $depth[0] = "99"; } my @hdd = ($page =~ /Degree-Days\s+:\s+(\S*)/); # Format the output for Fortran analysis use Fortran::Format; my $fdepth = Fortran::Format->new("I2.1,6X")->write($depth +[0]); chomp $fdepth; my $frain = Fortran::Format->new("F4.2,6X")->write($rain[0 +]); chomp $frain; my $fsnow = Fortran::Format->new("F4.2,6X")->write($snow[0 +]); chomp $fsnow; my $f = Fortran::Format->new("I2.1,6X")->write($hdd[0]); chomp $f; # Assign data to the array @hlahdd = ("$dates[0] $data[0] $data[1] $data[2] $fdepth $ +f $frain $fsnow\n"); # Print the array to screen and to file print "@hlahdd"; print FH "@hlahdd"; # Slow down boi... then go back a page sleep .1; $mech->back(); shift(@dates); } # Exit the loop # Close the written file close(FH);


In reply to Improve My Code... by cheech

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.