in reply to what the pattern searches for

It appears to be looking for and capturing lines containing page numbers in one of 4 forms. Roughly:

' nnn <page> ' ' - nnn - <page> ' ' A-nnn <page> ' ' page nnn of mmm '

This becomes fairly clear if you break up and expand the regex a little using /x.

my $pattern = qr[ ( \n\s* [0-9]{1,3} \s*\n\s* <page> \s*\n | \n\s* - [0-9]{1,3} - \s*\n\s* <page> \s*\n | \n\s* [A-Za-z]-[0-9]{1,3} \s*\n\s* <page> \s*\n | \n\s* page \s* [0-9]{1,3} \s* of \s* [0-9]{1,3} \s*\n ) ]x;

There are many ways the regex could be improved, but that isn't what you asked :)


Examine what is said, not who speaks.        The end of an era!
"But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
"Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon