I don't see how the code in your OP did not work, it seems to me it should. And you're saying that your new latest code matches what you want (but I don't think it does that for the right reason, it is probably a happy coincidence -- more on this below). So, there isn't much more help to be provided, since your problem is solved.

I would suggest however that you probably want to avoid quantifiers such as * or + with the match-all dot (i.e. the .* and .+ patterns) when possible, and even also .*? and .+?, although these latter two are much less dangerous in terms of matching more than what you want.

It is often safer and better to be more specific on the characters you want to match, using the appropriate character class. In the case in point, using a regex like /<a href=\"(\w*?)\.htm\">/ig would probably be safer, because, with \w+ or \w*, you're guaranteed to match only alphanumeric characters (and possible underscores), so you know for sure that you're not gonna match any tag-opening and tag-closing characters (angle brackets), backslashes, quote marks, etc.

As for your latest code you posted, [^<br>]*? doesn't do what you probably think it does, even though you report that the result happens to be what you want. [^<br>] is a negative character class that matches everything except the following individual characters: < > b r. I doubt somewhat that what you really meant it to be. The fact that it contains the < character (and will therefore stop matching at the first tag-opening character) is probably enough to save your day in this specific case, but be aware that it won't work if the string that you want to retrieve contains any "b" or any "r."

HTH.


In reply to Re^3: regex not matching how I want it to :( by Laurent_R
in thread regex not matching how I want it to :( by glwa

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.