Polyglot has asked for the wisdom of the Perl Monks concerning the following question:
Pages which have already been identified will be tagged in the format of (<a id="GC_2"></a>) where the "GC" is the book code and the number is the page. There is little point in looking for a skipped page out of its range, so I'm trying to match only between these tags where the page should be.
Code example:
my $replace = sub { push @findmissing, "$bookabbrev\t$1\t$2\t$3\n"; return '' }; foreach (0..$#missingpages) { chomp $missingpages[$_]; my $skipped=$missingpages[$_]; my $before=qq|<a id="$bookabbrev\_|.($skipped-1).qq|">|; my $after=qq|<a id="$bookabbrev\_|.($skipped+1).qq|">|; s/$before.*?(.{0,25})(?<!\d)($missingpages[$_])(?>\D)(.{0,25}).*?$afte +r/$replace->()/eg for @source; }
| In the above, the "$skipped-1" and "$skipped+1" should really just be something like "<$skipped" and ">$skipped" because there are times when several consecutive pages are skipped, and the code could never hope to find a tag for the one just before or just after each of them. I've looked here and via Google and found that I can use an "if-then" type of expression within a regex, but have found no examples for how to do this. How would you do this? |
Here is a sample of the output that gets pushed into @findmissing to help me scan for legitimate missed numbers (neither of these was a page number, as evidenced by the context).
UPDATE: It appears this is not possible in perl. At least, I have not found a way to do this. What I have found is of limited value, but may help me to find the majority of cases...ugly as anything, though. I'm now doing this...GC kes."--Wylie, b. 16, ch. 1 Did this haughty potenta GC rty."--Wylie, b. 16, ch. 1 This document clearly re
Oh, and this is slow as molassess/ (??{$missingpages[$_]-1|$missingpages[$_]-2|$missingpages[$_]-3|$m +issingpages[$_]-4|$missingpages[$_]-5|$missingpages[$_]-6}) .*? (.{0,30}) (?<!\d) ($missingpages[$_]) (?>\D) (.{0,30}) .*? (??{$missingpages[$_]+1|$missingpages[$_]+2|$missingpages[$_]+3|$m +issingpages[$_]+4|$missingpages[$_]+5|$missingpages[$_]+6}) /$replace->()/egx for @source;
Blessings,
~Polyglot~
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to use "less than" and "greater than" inside a regex for a $variable number
by LanX (Saint) on Oct 01, 2012 at 20:52 UTC | |
by Polyglot (Chaplain) on Oct 01, 2012 at 21:28 UTC | |
by AnomalousMonk (Archbishop) on Oct 02, 2012 at 03:18 UTC | |
by Polyglot (Chaplain) on Oct 04, 2012 at 19:42 UTC | |
by AnomalousMonk (Archbishop) on Oct 06, 2012 at 10:41 UTC | |
by AnomalousMonk (Archbishop) on Oct 02, 2012 at 05:24 UTC |