I was having a bear of a time recently trying to figure out why my regex wasn't matching correctly, until I finally tried changing where it came up. It turns out that it was only looking from wherever the last regex happened to leave off.

Here's the text it should be matching in:

<!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;020</strong></td> + <td class="contentSmall" valign="top">|a 9780470086223 (hardback)</t +d> </tr> <!-- end: full-000-body-cdl90 --> <!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;24510</strong></t +d> <td class="contentSmall" valign="top">|a Heads in the sand : |b how +the Republicans screw up foreign policy and foreign policy screws up +the Democrats / |c Matthew Yglesias</td> </tr> <!-- end: full-000-body-cdl90 --> <!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;24610</strong></t +d> <td class="contentSmall" valign="top">|a How the Republicans screw u +p foreign policy and foreign policy screws up the Democrats</td> </tr> <!-- end: full-000-body-cdl90 --> <!-- filename: full-000-body-cdl90 --> <tr> <td class="contentSmall" valign="top" id=bold width="5%" nowrap><str +ong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;61020</strong></t +d> <td class="contentSmall" valign="top">|a Democratic Party (U.S.)</td +> </tr> <!-- end: full-000-body-cdl90 -->
And here are the two regexes:
if ($MARC_page =~ m{ (?:020<)? # MARC code followed by a bracket to identify .*? # followed by anything \|a\s # followed by a pipe and the subfield (\d{13}) # followed by a 13-digit ISBN code }xmgs) { my $isbn = $1; } if ($MARC_page =~ m{ 245\d{0,2} # MARC code 245 followed by 0-2 indicators .*? # followed by anything \|a\s # followed by a pipe and the subfield (.*?) # followed by the title \| # followed by a pipe and the next subfield }xmgs) { my $title = $1; }

It works correctly now after I rearranged the regexes. However, before when I had the ISBN regex coming after, it would not match anything. I tried changing \d to just . to see where it would even land, and it was matching with "Democratic Pa," which would have been the next match after where the title regex matched. For the record, the correct matches should be "9780470086223" for the ISBN and "Heads in the sand : " for the title match.

As far as I'm aware, a regex with the g flag should match globally, meaning it would ignore wherever another regex happened to stop searching. Is this not correct? If I am right, can someone tell me why I'm seeing this behavior, and how I might correct it? Thanks a lot.

p.s. this is just a random example book and I don't mean to make any political statements by its use


In reply to regex only matching from last match by Foxpond Hollow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.