comment on

I have to agree with Anonymous Monk - "test it and see". Unfortunately, testing this sort of code takes a bit of work. If you are new to testing, what follows might feel a bit overwhelming. If so, read Test::Simple and feel free to ask lots of questions.

So, some tips:

Define some sample dummy input - in this case one or more dummy pdf files.
For each dummy file, define what links should be found.
Move your code from a script to subroutines that allow you to test each stage of your algorithm - this makes it easier to compare inputs and outputs. Also during debugging you will find it easier to pinpoint the source of a problem.
Use Test::More to compare actual to expected outputs from each subroutine.

I've included an example of what I mean. First, here's what your script might look like after its been broken up into subroutines. I've put in two: one to find the links (getAllLinks(...)) and one to retrieve the byte count with each link (getByteCount(...)). I've done it this way because the techniques for testing those two parts of your script are very different. Please forgive typos: this is only a reorganization for demonstration purposes. It hasn't been run through a compiler.

use strict;
use warnings;
use WWW::Mechanize;

my $start = "http://www.domain.com";
my $mech = WWW::Mechanize->new( autocheck => 1 );
my $regex = qr/\d+.+\.pdf$/;
my @aLinks = findAllLinks($mech, $start, $regex);

for my $link ( @links ) {
   my $url = $link->url_abs;
   my $bytecount = getByteCount($mech, $url);
   print "Fetching $url";
   print "   $bytecount bytes\n";
} 

sub findAllLinks {
  my ( $mech, $start, $regex ) = @_;

  $mech->get( $start );
  return $mech->find_all_links( url_regex => $regex );
}

sub getByteCount {
  my ($mech,$url) = @_;
  my $filename = $url;
  $filename =~ s[^.+/][];
  $mech->get( $url, ':content_file' => $filename );
  return -s $filename;
}
[download]

Now, here's an example of a test script. A test script is just a plain old script that ends, by convention, with .t rather than .pl. What this test script does is pass various combinations of inputs to the subroutines getAllLinks(...) and getByteCount(...). To compare the actual outputs of those functions with the expected outputs, we wrap each subroutine call with one of two special testing functions: is(...), is_deeply(...).

Your test script might look something like this. Again, this code hasn't been run through a compiler - consider it more as a demonstration of how to use Test::More:

use strict;
use warnings;
use Test::More qw(no_plan);  #imports testing tools
use MyModule;                #that's your code

my $mech = WWW::Mechanize->new( autocheck => 1 );

#call repeatedly with various values of $start, $regex
# is_deeply compares data structures element by element
# is_deeply($got, $expected, $description_of_test)

my $start = "http://www.domain.com";
my $regex = qr/\d+.+\.pdf$/;
my $aExpected = [ 'foo.pdf', 'baz.pdf' ]; 
is_deeply(getAllLinks($mech, $start, $regex), $aExpected
          , "getAllLinks: start=$start, regex=$regex");

#call repeatedly with various urls
# is compares simple scalars
# is($got, $expected, $description_of_test)
is(getByteCount($mech, $url), $iExpected
   , "getByteCount: url=$url");
[download]

Best, beth

In reply to Re: www::mechanize file download script by ELISHEVA
in thread www::mechanize file download script by jaytan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.