Foxpond Hollow has asked for the wisdom of the Perl Monks concerning the following question:
Here's the text it should be matching in:
And here are the two regexes:<!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong> 020</strong></td> + <td class="contentSmall" valign="top">|a 9780470086223 (hardback)</t +d> </tr> <!-- end: full-000-body-cdl90 --> <!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong> 24510</strong></t +d> <td class="contentSmall" valign="top">|a Heads in the sand : |b how +the Republicans screw up foreign policy and foreign policy screws up +the Democrats / |c Matthew Yglesias</td> </tr> <!-- end: full-000-body-cdl90 --> <!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong> 24610</strong></t +d> <td class="contentSmall" valign="top">|a How the Republicans screw u +p foreign policy and foreign policy screws up the Democrats</td> </tr> <!-- end: full-000-body-cdl90 --> <!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong> 61020</strong></t +d> <td class="contentSmall" valign="top">|a Democratic Party (U.S.)</td +> </tr> <!-- end: full-000-body-cdl90 -->
if ($MARC_page =~ m{ (?:020<)? # MARC code followed by a bracket to identify .*? # followed by anything \|a\s # followed by a pipe and the subfield (\d{13}) # followed by a 13-digit ISBN code }xmgs) { my $isbn = $1; } if ($MARC_page =~ m{ 245\d{0,2} # MARC code 245 followed by 0-2 indicators .*? # followed by anything \|a\s # followed by a pipe and the subfield (.*?) # followed by the title \| # followed by a pipe and the next subfield }xmgs) { my $title = $1; }
It works correctly now after I rearranged the regexes. However, before when I had the ISBN regex coming after, it would not match anything. I tried changing \d to just . to see where it would even land, and it was matching with "Democratic Pa," which would have been the next match after where the title regex matched. For the record, the correct matches should be "9780470086223" for the ISBN and "Heads in the sand : " for the title match.
As far as I'm aware, a regex with the g flag should match globally, meaning it would ignore wherever another regex happened to stop searching. Is this not correct? If I am right, can someone tell me why I'm seeing this behavior, and how I might correct it? Thanks a lot.
p.s. this is just a random example book and I don't mean to make any political statements by its use
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: regex only matching from last match
by jwkrahn (Abbot) on Sep 20, 2009 at 00:54 UTC | |
by AnomalousMonk (Archbishop) on Sep 20, 2009 at 02:41 UTC | |
by Foxpond Hollow (Sexton) on Sep 24, 2009 at 16:44 UTC | |
|
Re: regex only matching from last match
by Anonymous Monk on Sep 20, 2009 at 00:35 UTC | |
|
Re: regex only matching from last match
by Marshall (Canon) on Sep 20, 2009 at 22:32 UTC | |
by Foxpond Hollow (Sexton) on Sep 24, 2009 at 16:50 UTC |