in reply to Re: 3 capture multi line regex
in thread 3 capture multi line regex

Can you explain what these two lines are doing? Like what is the 2nd part in the first line doing? and how about the <> in the 2nd line?
href=[\"\'] ([^\"\']+) [\"\']> #second href (not captured) \s*([^<>]+?)\s* #text inside second <a></a>

Replies are listed 'Best First'.
Re^3: 3 capture multi line regex
by Ieronim (Friar) on Jun 30, 2006 at 18:44 UTC
    i modified the regex a bit (i found a bug there), so two lines you mentioned become
    href\s*=\s*[\"\'] [^\"\']+ [\"\']\s*> #first href (not c +aptured) \s*([^<>]+?)\s* #text inside first <a></a> +(captured)
    At first line, i find a href= string followed by quotes (single or doble — ["']) containing string free of quoting symbols (i used a negated character class: [^"'] means NOT ["']).
    At the next line i simply find a text without tags within. If you think there will be another tags within your link, it would be better to use
    \s*(.+?)\s* # non-greedy capturing of everything till the +next </a>
    instead.
      Hi.

      Your regex matched fine the first time but I need to put all occurences into an array. I can't get the array to hold anything now

      push (@results, "$1::$2::$3"), $result_content =~ m/$regex/;
      I tried adding /g to the end but it doesn't contain anything at all. I tried adding /g to the regex itself but it errors out.

      What am I doign wrong?

        your code does a very strange thing: you AT FIRST put a string "$1::$2::$3" into array and then perform search!

        Use a cycle :)

        push (@results, "$1::$2::$3") while $result_content =~ /$regex/g;
        or a cleaner but IMO more ugly code:
        while ($result_content =~ /$regex/g) { push (@results, "$1::$2::$3"); }