Re: matching characters and numbers with regex

Using a look-ahead combined with pos avoids any need for sums and finds overlapping matches (if that is part of your spec.).

$ perl -Mstrict -Mwarnings -E '
my $str = qq{\0} x 32;
substr $str,  5,  2, qq{\x0b\x9e};
substr $str, 26,  5, qq{\x3c\x5a\x1e\x6b\x48};
substr $str, 11, 11, qq{\x0f\x2c\x34\x3c\x5a\x1e\x6b\x48\x0b\x9e\x88};
say unpack q{H*}, $str;

for my $quant ( 8, 4, 2 )
{
    say q{};
    say qq{$quant [\\x0a-\\x9f] found at @{ [ pos $str ] }}
       while $str =~ m{(?x) (?= [\x0a-\x9f] {$quant} ) }g;
}'
00000000000b9e000000000f2c343c5a1e6b480b9e88000000003c5a1e6b4800

8 [\x0a-\x9f] found at 11
8 [\x0a-\x9f] found at 12
8 [\x0a-\x9f] found at 13
8 [\x0a-\x9f] found at 14

4 [\x0a-\x9f] found at 11
4 [\x0a-\x9f] found at 12
4 [\x0a-\x9f] found at 13
4 [\x0a-\x9f] found at 14
4 [\x0a-\x9f] found at 15
4 [\x0a-\x9f] found at 16
4 [\x0a-\x9f] found at 17
4 [\x0a-\x9f] found at 18
4 [\x0a-\x9f] found at 26
4 [\x0a-\x9f] found at 27

2 [\x0a-\x9f] found at 5
2 [\x0a-\x9f] found at 11
2 [\x0a-\x9f] found at 12
2 [\x0a-\x9f] found at 13
2 [\x0a-\x9f] found at 14
2 [\x0a-\x9f] found at 15
2 [\x0a-\x9f] found at 16
2 [\x0a-\x9f] found at 17
2 [\x0a-\x9f] found at 18
2 [\x0a-\x9f] found at 19
2 [\x0a-\x9f] found at 20
2 [\x0a-\x9f] found at 26
2 [\x0a-\x9f] found at 27
2 [\x0a-\x9f] found at 28
2 [\x0a-\x9f] found at 29
$
[download]

I hope this is helpful.

Cheers,

JohnGG

Comment on Re: matching characters and numbers with regex Download Code

Replies are listed 'Best First'.
Re^2: matching characters and numbers with regex by james28909 (Deacon) on May 31, 2014 at 22:27 UTC
i think i have found a better approach to this, but its gonna take alot of code to perform the task because i have to match 00 - FF in sets of 4 then sets of 8 then sets of 16 characters. tell me if this will work correctly: `<br> while ($string){ read $string, $chunk, 4; if ($chunk =~ FFFF); print ("corrupted"); ;` [download] I would have to make while loops for 0000 thru FFFF. and for 4 characters then 8 characters, then 16 characters. What i am trying to do is scan the string for any repeating characters, as in "0000", "FFFF", "00000000", "FFFFFFFF", "0000000000000000", "FFFFFFFFFFFFFFFF", and i would have to do that for every hexadecimal character, so its gonna take many many loops and lines of code. Im trying to think of a way to simplify this as much as possible. Also thank you for the examples.	[reply] [d/l]
Re^3: matching characters and numbers with regex by johngg (Canon) on Jun 01, 2014 at 00:42 UTC
No need to use while loops, use backreferences in your pattern. In the following code I make arrays of references to substrings of 2, 4 & 8 characters without converting bytes to string representations. I then test by matching each dereferenced element against the pattern and print an error if I find repeats. I test a clean string first then introduce some repeats and test it again. use strict; use warnings; use 5.014; my $str = q{}; $str .= chr for 0 .. 31; say qq{\n}, q{\| . . . ^ . . . } x 4; say unpack q{H}, $str; for my $len ( 8, 4, 2 ) { say qq{\nChecking groups of $len}; my $quant = $len - 1; my @groups = map { \ substr $str, $_ $len, $len } 0 .. ( length( $str ) / $len ) - 1; for my $idx ( 0 .. $#groups ) { say qq{Found @{ [ unpack q{H}, $1 ] } }, qq{at offset @{ [ $len $idx ] }} if ${ $groups[ $idx ] } =~ m{((.)\2{$quant})}; } } substr $str, 0, 2, qq{\x3e\x3e}; substr $str, 16, 8, qq{\xac} x 8; substr $str, 4, 4, qq{\x7f} x 4; substr $str, 26, 4, qq{\x45} x 4; say qq{\n}, q{\| . . . ^ . . . } x 4; say unpack q{H}, $str; for my $len ( 8, 4, 2 ) { say qq{\nChecking groups of $len}; my $quant = $len - 1; my @groups = map { \ substr $str, $_ $len, $len } 0 .. ( length( $str ) / $len ) - 1; for my $idx ( 0 .. $#groups ) { say qq{Found @{ [ unpack q{H}, $1 ] } }, qq{at offset @{ [ $len $idx ] }} if ${ $groups[ $idx ] } =~ m{((.)\2{$quant})}; } } [download] The output. \| . . . ^ . . . \| . . . ^ . . . \| . . . ^ . . . \| . . . ^ . . . 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f Checking groups of 8 Checking groups of 4 Checking groups of 2 \| . . . ^ . . . \| . . . ^ . . . \| . . . ^ . . . \| . . . ^ . . . 3e3e02037f7f7f7f08090a0b0c0d0e0facacacacacacacac1819454545451e1f Checking groups of 8 Found acacacacacacacac at offset 16 Checking groups of 4 Found 7f7f7f7f at offset 4 Found acacacac at offset 16 Found acacacac at offset 20 Checking groups of 2 Found 3e3e at offset 0 Found 7f7f at offset 4 Found 7f7f at offset 6 Found acac at offset 16 Found acac at offset 18 Found acac at offset 20 Found acac at offset 22 Found 4545 at offset 26 Found 4545 at offset 28 [download] I hope this helps you along. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: matching characters and numbers with regex by james28909 (Deacon) on May 31, 2014 at 22:55 UTC
or better yet, i could read each byte without converting it to string, and see if the next 2/4/and 8 bytes matches it. if it does then its a corrupt file	[reply]
Re^4: matching characters and numbers with regex by james28909 (Deacon) on May 31, 2014 at 23:42 UTC
nevermind, i just read and understood the comments lol, thanks again yall :)	[reply]
Re^5: matching characters and numbers with regex by james28909 (Deacon) on Jun 01, 2014 at 02:25 UTC