I have to agree with Anonymous Monk - "test it and see". Unfortunately, testing this sort of code takes a bit of work. If you are new to testing, what follows might feel a bit overwhelming. If so, read Test::Simple and feel free to ask lots of questions.
So, some tips:
- Define some sample dummy input - in this case one or more dummy pdf files.
- For each dummy file, define what links should be found.
- Move your code from a script to subroutines that allow you to test each stage of your algorithm - this makes it easier to compare inputs and outputs. Also during debugging you will find it easier to pinpoint the source of a problem.
- Use Test::More to compare actual to expected outputs from each subroutine.
I've included an example of what I mean. First, here's what your script might look like after its been broken up into subroutines. I've put in two: one to find the links (getAllLinks(...)) and one to retrieve the byte count with each link (getByteCount(...)). I've done it this way because the techniques for testing those two parts of your script are very different. Please forgive typos: this is only a reorganization for demonstration purposes. It hasn't been run through a compiler.
Now, here's an example of a test script. A test script is just a plain old script that ends, by convention, with .t rather than .pl. What this test script does is pass various combinations of inputs to the subroutines getAllLinks(...) and getByteCount(...). To compare the actual outputs of those functions with the expected outputs, we wrap each subroutine call with one of two special testing functions: is(...), is_deeply(...).
Your test script might look something like this. Again, this code hasn't been run through a compiler - consider it more as a demonstration of how to use Test::More:
use strict;
use warnings;
use Test::More qw(no_plan); #imports testing tools
use MyModule; #that's your code
my $mech = WWW::Mechanize->new( autocheck => 1 );
#call repeatedly with various values of $start, $regex
# is_deeply compares data structures element by element
# is_deeply($got, $expected, $description_of_test)
my $start = "http://www.domain.com";
my $regex = qr/\d+.+\.pdf$/;
my $aExpected = [ 'foo.pdf', 'baz.pdf' ];
is_deeply(getAllLinks($mech, $start, $regex), $aExpected
, "getAllLinks: start=$start, regex=$regex");
#call repeatedly with various urls
# is compares simple scalars
# is($got, $expected, $description_of_test)
is(getByteCount($mech, $url), $iExpected
, "getByteCount: url=$url");
Best, beth |