Re: Pattern match not working sometimes

Replies are listed 'Best First'.
Re^2: Pattern match not working sometimes by dantheman1210 (Beadle) on Mar 18, 2012 at 20:33 UTC
OK, so I added: `my $stuff = unpack('B64', $datagram); print "This is data: $stuff\n";` [download] within the final else. Here is the output `Length : 1205 Length : 1205 Did not match! This is data: 00001010010111000101011001001001010011110011111100100100 +01110110 Length : 1205 Length : 1205` [download] The only thing that I noticed is that all the data that doesn't match seem to start with four 0's, but I would think that it would still match. Anyway let me know if you see anything.	[reply] [d/l] [select]
Re^3: Pattern match not working sometimes by GrandFather (Saint) on Mar 18, 2012 at 21:42 UTC
The issue is the ^ anchor and the character causing grief is a new line at the start of the line! If you change your match to `/(\C\C\C\C)(.*)/s` the problem is fixed. Note that there is no need for anchors in any case - you want the first four octets followed by anything so that will always match at the start of the string (unless there are fewer than 4 octets). True laziness is hard work	[reply] [d/l]
Re^4: Pattern match not working sometimes by dantheman1210 (Beadle) on Mar 18, 2012 at 23:31 UTC
That worked!!! I never thought about the data itself causing the issue since I was working with binary data, just thought if it as 0's and 1's, not the characters that they could be associated with. Thanks again, you saved me from having to basically start over with the design.	[reply]
Re^3: Pattern match not working sometimes by bitingduck (Deacon) on Mar 18, 2012 at 21:52 UTC
It looks like your first byte is ASCII 10, which is a line feed. Is that breaking the regex because it's hitting an EOL? What if you change the "s" modifier to an "m" modifier so it will match over EOLs within the string? Does it also fail when you get that pattern "00001010" anywhere else in the first four bytes? If that's the problem it should fail if you get `0000 1011 (vertical tab) 0000 1100 (form feed) 0000 1101 (carriage return)` [download] If you're interpreting as Unicode you might also see failures if you get a few other combinations that don't start with 0000 that get interpreted as premature EOL and cause it to fail. (Edit: looks like GrandFather posted a better solution than changing the match mode. Dropping the anchors is simpler.)	[reply] [d/l]