in reply to Re: matching characters and numbers with regex
in thread matching characters and numbers with regex

i think i have found a better approach to this, but its gonna take alot of code to perform the task because i have to match 00 - FF in sets of 4 then sets of 8 then sets of 16 characters. tell me if this will work correctly:
<br> while ($string){ read $string, $chunk, 4; if ($chunk =~ FFFF); print ("corrupted"); ;

I would have to make while loops for 0000 thru FFFF. and for 4 characters then 8 characters, then 16 characters.
What i am trying to do is scan the string for any repeating characters, as in "0000", "FFFF", "00000000", "FFFFFFFF", "0000000000000000", "FFFFFFFFFFFFFFFF", and i would have to do that for every hexadecimal character, so its gonna take many many loops and lines of code. Im trying to think of a way to simplify this as much as possible.

Also thank you for the examples.

Replies are listed 'Best First'.
Re^3: matching characters and numbers with regex
by johngg (Canon) on Jun 01, 2014 at 00:42 UTC

    No need to use while loops, use backreferences in your pattern. In the following code I make arrays of references to substrings of 2, 4 & 8 characters without converting bytes to string representations. I then test by matching each dereferenced element against the pattern and print an error if I find repeats. I test a clean string first then introduce some repeats and test it again.

    use strict; use warnings; use 5.014; my $str = q{}; $str .= chr for 0 .. 31; say qq{\n}, q{| . . . ^ . . . } x 4; say unpack q{H*}, $str; for my $len ( 8, 4, 2 ) { say qq{\nChecking groups of $len}; my $quant = $len - 1; my @groups = map { \ substr $str, $_ * $len, $len } 0 .. ( length( $str ) / $len ) - 1; for my $idx ( 0 .. $#groups ) { say qq{Found @{ [ unpack q{H*}, $1 ] } }, qq{at offset @{ [ $len * $idx ] }} if ${ $groups[ $idx ] } =~ m{((.)\2{$quant})}; } } substr $str, 0, 2, qq{\x3e\x3e}; substr $str, 16, 8, qq{\xac} x 8; substr $str, 4, 4, qq{\x7f} x 4; substr $str, 26, 4, qq{\x45} x 4; say qq{\n}, q{| . . . ^ . . . } x 4; say unpack q{H*}, $str; for my $len ( 8, 4, 2 ) { say qq{\nChecking groups of $len}; my $quant = $len - 1; my @groups = map { \ substr $str, $_ * $len, $len } 0 .. ( length( $str ) / $len ) - 1; for my $idx ( 0 .. $#groups ) { say qq{Found @{ [ unpack q{H*}, $1 ] } }, qq{at offset @{ [ $len * $idx ] }} if ${ $groups[ $idx ] } =~ m{((.)\2{$quant})}; } }

    The output.

    | . . . ^ . . . | . . . ^ . . . | . . . ^ . . . | . . . ^ . . . 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f Checking groups of 8 Checking groups of 4 Checking groups of 2 | . . . ^ . . . | . . . ^ . . . | . . . ^ . . . | . . . ^ . . . 3e3e02037f7f7f7f08090a0b0c0d0e0facacacacacacacac1819454545451e1f Checking groups of 8 Found acacacacacacacac at offset 16 Checking groups of 4 Found 7f7f7f7f at offset 4 Found acacacac at offset 16 Found acacacac at offset 20 Checking groups of 2 Found 3e3e at offset 0 Found 7f7f at offset 4 Found 7f7f at offset 6 Found acac at offset 16 Found acac at offset 18 Found acac at offset 20 Found acac at offset 22 Found 4545 at offset 26 Found 4545 at offset 28

    I hope this helps you along.

    Cheers,

    JohnGG

Re^3: matching characters and numbers with regex
by james28909 (Deacon) on May 31, 2014 at 22:55 UTC
    or better yet, i could read each byte without converting it to string, and see if the next 2/4/and 8 bytes matches it. if it does then its a corrupt file
      nevermind, i just read and understood the comments lol, thanks again yall :)
        on a side note, if anyone else ever reads this thread, here is a very good page on pattern matching with perl: http://work.lauralemay.com/samples/perl.html