in reply to Re^3: Normal regexes stop working
in thread Normal regexes stop working

Sorry, I didn't mean to come off as refusing anything. It's just that it is intermittent. So you run the same code block the first many times and it works fine, and then at some point in there it fails and then fails thereafter. Here's the most recent failure.
my $q_meta = quotemeta($q); my $tmp_test = ($page_title =~ /$q_meta/i) || 0; if (!$tmp_test) { my $tmp_test2 = lc $page_title eq lc $q; $is_exit = 1 if $tmp_test2; my $tmp_test3 = is_utf8($page_title); my $tmp_test4 = is_utf8($q); # For debugging. if ($is_dilbert) { warn qq(\nDILBERT 1a\t$page_title\t$q_meta\t$tmp_tes +t\t$tmp_test2\t$tmp_test3\t$tmp_test4\n") if \ $is_dilbert; }

produces DILBERT 1a      Polygram compilation albums     Polygram\ compilation\ albums   0       1       1       1

Replies are listed 'Best First'.
Re^5: Normal regexes stop working
by Anonymous Monk on Oct 24, 2010 at 17:26 UTC
    So whats $q? $page_title? This is the important part, the bytes that go into these values, its probably invisible whitespace or nonbreaking whitespace that is tripping you up (they just look like whitspace, but are different bytes)
      On another suggestion in this thread, I'll have Data::Dumper print them out with $Data::Dumper::Useqq=1. I'm highly confident there is no extra whitespace, but we'll find out in a moment.

      The variables come from a static page url, e.g. http://domain/path. $q is path, unescaped. And $page_title comes from a database of paths.

      As you can see from that code, in that latest example, they both had values of 'Polygram compilation albums'. I also did an eq comparison, which came out to 1. I suppose the whitespace wasn't clear in my printout because the tabs didn't copy exactly.

      Edit: here was the printout from the first failure after making that change:

      warn qq(\nDILBERT 1a: "), Dumper($q_meta), " ", Dumper($page_title) if + $is_dilbert; DILBERT 1a: "$VAR1 = "Haji\\ Ayub\\ Afridi"; $VAR1 = "Haji Ayub Afridi";"
        Thx--I haven't figured out the error yet, but I did make some progress.

        I found that if $q was set to $VAR1 = "\n\302\240"; the time before then it doesn't work thereafter.

Re^5: Normal regexes stop working
by JavaFan (Canon) on Oct 24, 2010 at 17:44 UTC
    That's neither values for $test and $test2, nor a stand alone piece of code.

    I'll just give up on you.

      Sorry I thought that was a standalone code example and it was clear that $test was $q_meta and $test2 was $page_title as that was the only =~ in there.

      I tried to have the warn statement print out exactly what they were defined with, in that last error case 'Polygram compilation albums', and you can see the eq comparison succeeded whereas the regex did not (the weird bug).

      As per another suggestion I will print them out using Data::Dumper with $Data::Dumper::Useqq=1 and report back. If you've moved on, just feel free to ignore and sorry for wasting your time. If you want to see something else in particular, please let me know.

      I didn't paste in the full file because it is 7204 lines and proprietary. I could extract the piece and run it standalone, but every time I do that it works fine. In fact, it works fine in this case as well, at the beginning. Only after some time this particular block (the one I pasted in as is) begins to fail.

      Edit: here was the printout from the first failure after making that change:

      warn qq(\nDILBERT 1a: "), Dumper($q_meta), " ", Dumper($page_title) if + $is_dilbert; DILBERT 1a: "$VAR1 = "Haji\\ Ayub\\ Afridi"; $VAR1 = "Haji Ayub Afridi";"
      And here is an equivalent standalone example that works fine, but eventually breaks within the script:
      #!/usr/bin/perl my $q = 'Haji Ayub Afridi'; my $page_title = 'Haji Ayub Afridi'; my $q_meta = quotemeta($page_title); my $tmp_test = ($page_title =~ /$q_meta/i) || 0; if (!$tmp_test) { print "BAD"; } else { print "GOOD"; }