Re^3: matching characters and numbers with regex

I want it to match any 4 characters that match from the beginning of the string. It will be checking for corruptness of a file. If the file is supposed to be

"4428FBABCBED062405E56F853AAE238C4428FBABCBED062405E56F853AAE238CCC9AA594B5B35063A28224E2FE347EE349E9FFEDB897E32725F42C0D9FA2400D56C78EC7E711F47AA032CB76E11996D4"

Then i want to make sure it doesnt have any repeating characters that are 4,8, and 16 characters long. So if this above string was:

"0A0AFBABCBED062405E56F853AAE238C4428FBABCBED062405E56F853AAE238CCC9AA594B5B35063A28224E2FE347EE349E9FFEDB897E32725F42C0D9FA2400D56C78EC7E711F47AA032CB76E11996D4"

Difference in these two string are the Repeating characters 0A0A at the beginning of the string. if it finds repeating characters then it will terminate the program and not continue because its checking for corruptness.

Comment on Re^3: matching characters and numbers with regex Select or Download Code

Replies are listed 'Best First'.

Re^4: matching characters and numbers with regex
by Athanasius (Archbishop) on Jun 01, 2014 at 03:26 UTC

james28909,

Well, this is a completely different spec from the one given previously (as I understood it, anyway)! If this is really all you need, it’s as simple as:

#! perl
use strict;
use warnings;

while (<DATA>)
{
    if    (/^([0-9a-fA-F]{2})\1/)
    {
        print "Found  4 repeating characters: $1$1\n";
    }
    elsif (/^([0-9a-fA-F]{4})\1/)
    {
        print "Found  8 repeating characters: $1$1\n";
    }
    elsif (/^([0-9a-fA-F]{8})\1/)
    {
        print "Found 16 repeating characters: $1$1\n";
    }
    else
    {
        print "Found  0 repeating characters\n";
    }
}

__DATA__
1234FBABCBED062405E56F853AAE238C4428FBABCBED0624
0A0AFBABCBED062405E56F853AAE238C4428FBABCBED0624
0A1B0A1BCBED062405E56F853AAE238C4428FBABCBED0624
0A1B2C3D0A1B2C3DCBED062405E56F853AAE238C4428FBAB
01230A0AFBABCBED062405E56F853AAE238C4428FBABCBED
[download]

Output:

13:12 >perl 914_SoPW.pl
Found  0 repeating characters
Found  4 repeating characters: 0A0A
Found  8 repeating characters: 0A1B0A1B
Found 16 repeating characters: 0A1B2C3D0A1B2C3D
Found  0 repeating characters

13:12 >
[download]

(Note that the final string tested here contains the repeated characters 0A0A, but these are not at the beginning of the string.)

Two obvious questions:

Why shouldn’t a legitimate (i.e., non-corrupt) file begin with repeated characters?
If a file is “corrupted,” will this always manifest as repeated characters at the start of the file? If not, how will you test for other forms of file corruption?

I’ve got a sneaking suspicion that this thread is dealing with an XY Problem. If the answers don’t solve your real problem, you will need to explain the nature of the files and the process(es) by which the corruption may occur.

Update: More compact version:

while (my $string = <DATA>)
{
    for my $chars (2, 4, 8)
    {
        printf "Found %2d repeating characters: %s\n", $chars * 2, $1 
+. $1
            if $string =~ /^([0-9a-fA-F]{$chars})\1/;
    }
}
[download]

(In the actual script, the printf would be replaced by a die statement.)

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]