Qiang has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to match two patterns in text and stores them into two arrays.

the original one i used was like this and it works.

@t1 = $resp->content=~/($cat_url_pattern\d+)/g; @t2 = $resp->content=~/$cat_url_pattern\d+','([^']+)'/g;
now that doesn't look efficient. so i try the following one. but it produces two arrays filled with the repeated items.
i.e @t1 filled with $t1[0] and @t2 filled with $t2[0]. aparently $1 and $2 never change after the first loop.

while ($resp->content=~/($cat_url_pattern\d+)','([^']+)'/g){ push @t1, $1; push @t2, $2; }
what is wrong with the 'while' one ?

Edited by Chady -- removed html tags from code

Replies are listed 'Best First'.
Re: backreference question.
by graff (Chancellor) on Jun 01, 2004 at 00:58 UTC
    Given davidj's demonstration, the question becomes: what sort of object is $resp? It's plausible that putting the object's "content" method in a while loop has the effect of resetting the regex engine's pointer back to the beginning of the string on each iteration.

    update: if you put the value of $resp->content into a scalar, you should be able to put that scalar into the while loop and have it work as expected -- e.g.:

    my $string = $resp->content; while( $string=~/your_regex/g ) { ... }
      It's plausible that putting the object's "content" method in a while l +oop has the effect of <br>resetting the regex engine's pointer back t +o the beginning of the string on each iteration.
      thank you, graff , that's probably the cause of the problem. I haven't found much on that topic yet. but putting the $resp->content into a scalar before the while loop solve the problem.

      btw, the $resp is a response object from LWP module,like this

       $br = LWP::UserAgent->new;
       $resp = $br->get("http://www.example.com");
      
        There are many possibilities for why what happened happened. None are likely to be heavily documented.

        If the content method returns a different Perl scalar each time with the same contents, then what happens is that each pattern match is matching a scalar that has never been matched before. So each time Perl will start from the beginning.

        If the content method attempts to do a pattern match internally, then pos is being reset.

        If it directly assigns to pos (I can't imagine why it would but...), then that assignment wins.

        In any case few module authors would think about someone trying to do a pattern match like this, so it is a good idea not to expect it to be documented accurately, and not to rely on them making your life easy. Putting the content into a scalar like graff suggested is a generally good idea unless you really know the code you are calling, and how Perl will deal with that.

Re: backreference question.
by Happy-the-monk (Canon) on Jun 01, 2004 at 00:30 UTC

    it is the list context in the   @t1 = ... m//g   which gives you multiple results for just that single match. I does, however, not create a loop.

    That's why it doesn't work in your while loop.   $1   and   $2   get filled only once, because without the list context, that's what the matching operator   m   does in those cases.

    Update: Wrong thought, obviously.
      I'm not sure that is correct. The following code seems to work the way he is expecting his to:
      use strict; my $str = "1 2 3 4 5 6 7 8 9 10"; my (@a, @b); while($str =~ m/(\d+) (\d+)/g) { push @a, $1; push @b, $2; } print "\@a = @a\n"; print "\@b = @b\n";
      output:
      @a = 1 3 5 7 9 @b = 2 4 6 8 10

      I'm curious
      davidj