in reply to Pattern match not working sometimes

On the face of it everything looks fine. Have you tried printing the failing strings to see what the nature of the beast is that doesn't match? You'll probably want to translate unprintable and white space octets as hex to make them visible.

We could help more if you gave us a sample failing string - it need only be 10 or so characters long.

True laziness is hard work
  • Comment on Re: Pattern match not working sometimes

Replies are listed 'Best First'.
Re^2: Pattern match not working sometimes
by dantheman1210 (Beadle) on Mar 18, 2012 at 20:33 UTC

    OK, so I added:

    my $stuff = unpack('B64', $datagram); print "This is data: $stuff\n";

    within the final else. Here is the output

    Length : 1205 Length : 1205 Did not match! This is data: 00001010010111000101011001001001010011110011111100100100 +01110110 Length : 1205 Length : 1205

    The only thing that I noticed is that all the data that doesn't match seem to start with four 0's, but I would think that it would still match. Anyway let me know if you see anything.

      The issue is the ^ anchor and the character causing grief is a new line at the start of the line! If you change your match to /(\C\C\C\C)(.*)/s the problem is fixed. Note that there is no need for anchors in any case - you want the first four octets followed by anything so that will always match at the start of the string (unless there are fewer than 4 octets).

      True laziness is hard work
        That worked!!! I never thought about the data itself causing the issue since I was working with binary data, just thought if it as 0's and 1's, not the characters that they could be associated with. Thanks again, you saved me from having to basically start over with the design.

      It looks like your first byte is ASCII 10, which is a line feed. Is that breaking the regex because it's hitting an EOL? What if you change the "s" modifier to an "m" modifier so it will match over EOLs within the string?

      Does it also fail when you get that pattern "00001010" anywhere else in the first four bytes?

      If that's the problem it should fail if you get

      0000 1011 (vertical tab) 0000 1100 (form feed) 0000 1101 (carriage return)

      If you're interpreting as Unicode you might also see failures if you get a few other combinations that don't start with 0000 that get interpreted as premature EOL and cause it to fail.

      (Edit: looks like GrandFather posted a better solution than changing the match mode. Dropping the anchors is simpler.)