in reply to what the pattern searches for
It appears to be looking for and capturing lines containing page numbers in one of 4 forms. Roughly:
' nnn <page> ' ' - nnn - <page> ' ' A-nnn <page> ' ' page nnn of mmm '
This becomes fairly clear if you break up and expand the regex a little using /x.
my $pattern = qr[ ( \n\s* [0-9]{1,3} \s*\n\s* <page> \s*\n | \n\s* - [0-9]{1,3} - \s*\n\s* <page> \s*\n | \n\s* [A-Za-z]-[0-9]{1,3} \s*\n\s* <page> \s*\n | \n\s* page \s* [0-9]{1,3} \s* of \s* [0-9]{1,3} \s*\n ) ]x;
There are many ways the regex could be improved, but that isn't what you asked :)
|
|---|