egal has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

Could you please help with the following issue:

While searching for multiple patterns on the same string, search continues from previous successful match point in the string. Is this expected ? How to make each pattern search start from beginning of the string each time.

Thanks a lot!!

open (MYFILE, "< $dir_dump$file") || die "Could not read dump file\n +"; while (<MYFILE>){ $dumpbuf .= $_; } $dumpbuf =~ s/\s//gs; die "Not a valid file, Check output!" if $dumpbuf !~ m/\d/gs; #match + one die "File contains only zeros, Check output!" if $dumpbuf !~ m/[1-9] +/sg; #match two while ($dumpbuf =~ m/(\w)/gs) { print "==$1\n"; # prints characters after initial two matches }

Replies are listed 'Best First'.
Re: Regex String
by AppleFritter (Vicar) on Jul 04, 2014 at 09:23 UTC

    While searching for multiple patterns on the same string, search continues from previous successful match point in the string. Is this expected ? How to make each pattern search start from beginning of the string each time.

    Yes, this is a feature; the /g modifier in scalar context means "start matching where I last left off". From perlretut:

    The modifier //g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have //g jump from match to match, keeping track of position in the string as it goes along.

    So the solution is to simply remove that modifier from the first two regexen, the ones that check if the file's a valid one. Alternatively, you could reset the position by using pos (which returns an lvalue, so you can assign to it).

    That said, I'm taking a stab in the dark here as to what you want to accomplish. If this isn't it, could you be so kind and give us some sample data, along with the expected output you expect? In particular, I'm curious whether the file is supposed to contain only numbers; the "all zeros" check makes me think it is, but the first check does not ensure this.

    P.S. please don't use line numbers in your code -- it just makes it more difficult to download/copy and paste.

      Hi,

      I am searching a hexdump. Just want to make sure it is not all zeroes, before i split it to bytes.

      Looks like 00000FFEDFF67FFB8FF96FFE200BBFF240020FFBAFF360132FF6500FCFED30079

      Thanks a lot for your reply.

        You're welcome! And in that case, I'd change those checks to the following:

        die "Not a valid file, Check output!" if $dumpbuf =~ m/[^0-9A-F]/; #ma +tch one die "File contains only zeros, Check output!" if $dumpbuf !~ m/[1-9A-F +]/; #match two

        (In particular, note that the second regex should include A-F, since otherwise some valid files will be rejected.)

        ... split it to bytes.

        If this means you want to end up with an array (or string) of byte values for each pair of hex digits, consider pack:

        c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $s = '00000FFEDFF6'; print qq{'$s'}; ;; my @bytes = split '', pack 'H*', $s; dd \@bytes; " '00000FFEDFF6' ["\0", "\0", "\17", "\xFE", "\xDF", "\xF6"]

        Update: Another validation test to make sure you have an even number of hex digits in the source string might also be a good idea.

Re: Regex String
by CountZero (Bishop) on Jul 04, 2014 at 10:40 UTC
    This seems to work:
    use Modern::Perl; while (<DATA>) { chomp; do { say "Not a valid file, Check output!"; next } if /[^0-9A-F +]/; do { say "All zeroes or empty, Check output!"; next } if /^0*$/; say '==', join "\n==", /(..)/g; } __DATA__ 00000FFEDFF67FFB8FF96FFE200BBFF240020FFBAFF360132FF6500FCFED30079 000000000000000000000000000000000 THIS IS NOT A HEX DUMP
    Output:
    ==00 ==00 ==0F ==FE ==DF ==F6 ...(snip)... ==0F ==CF ==ED ==30 ==07 All zeroes or empty, Check output! All zeroes or empty, Check output! Not a valid file, Check output!

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Regex String
by AnomalousMonk (Archbishop) on Jul 04, 2014 at 14:27 UTC

    Note also that the meaningless use of the  /g modifier in the validation statements causes even those to be incorrect:

    c:\@Work\Perl>perl -wMstrict -le "my $dumbuf = '10'; ;; die 'Not a valid file' if $dumbuf !~ m/\d/sg; die 'File contains only zeros' if $dumbuf !~ m/[1-9]/sg; " File contains only zeros at -e line 1.