johnbo has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to match (and eventually replace) a string that contains a starting # followed by a single digit that represents the length of the following byte count, then this is all followed by a string of binary bytes.
For a simple example: #18abcdefgh
This would represent a 8 byte buffer containing the ascii values of 'abcdefgh'.
I've tried the following, but it will not match. Seems to fail as soon as I include the first backreference as a quantifier.
#!/usr/bin/perl use strict; my $inputText = "word1, word2, #18abcdefgh ,word4"; print "before: $inputText\n"; if ($inputText =~ s/#(\d)(\d{\1})(.{\2})/\<Binary block: $2 bytes\>/) { print "(1) = $1\n"; print "(2) = $2\n"; print "(3) = $3\n"; } print "after: $inputText\n";
Your suggestions are welcomed!
johnbo

Replies are listed 'Best First'.
Re: Can I use backreferences as quantifiers in a regex?
by ikegami (Patriarch) on Mar 29, 2009 at 07:00 UTC
    / \# (\d) ( (??{ "\\d{$^N}" }) ) ( (??{ "(?s:.{$^N})" }) ) /x

    Your have a second bug. You can't match chr(0x0A) in the binary data. I fixed this by adding the "s" modifier.

      This works beautifully.

      I tried this without the ^N for a while, but couldn't get the backreferences to work. Thanks for the new knowledge about the 'postponed eval' and the ^N.

      I had seen the postponed eval previously, but had stayed away from trying it due to the comments about 'highly experimental'.

      Thanks for everyone's help!

      johnbo

        $1 could be used for the first $^N, and $2 the second, but $^N is more meaningful.

        They're not that experimental, and they're require to do what you want to do in a match op.

Re: Can I use backreferences as quantifiers in a regex?
by eyepopslikeamosquito (Archbishop) on Mar 29, 2009 at 06:32 UTC

    I suppose you might break it down into separate steps. It's not elegant, but it does seem to work:

    my $inputText = "word1, word2, #18abcdefgh ,word4"; print "before: $inputText\n"; if ( $inputText =~ /#(\d)/g ) { my $n1 = $1; if ( $inputText =~ /(\d{$1})/g ) { my $n2 = $1; if ( $inputText =~ /(.{$1})/g ) { print "(1) = $n1\n"; print "(2) = $n2\n"; print "(3) = $1\n"; } } } print "after: $inputText\n";

    Update: on re-reading the question, the code above only extracts the bits you are after; it does not do the string substitution. I suppose there a number of ways you might do that; one way that springs to mind is to build a new string rather trying to substitute (you may need m//gc for that to stop the match operator resetting the position within the string when a match fails). Something like this:

    use strict; my $inputText = "word1, word2, #18abcdefgh ,word4 "; $inputText .= "word5, word6, #212abcdefghijkl ,word7\n"; print "before: $inputText\n"; my $newstr; { if ( $inputText =~ /\G#(\d)/gc ) { my $n1 = $1; if ( $inputText =~ /\G(\d{$1})/gc ) { my $n2 = $1; if ( $inputText =~ /\G(.{$1})/gc ) { print "(1) = $n1\n"; print "(2) = $n2\n"; print "(3) = $1\n"; $newstr .= "\<Binary block: $n2 bytes\>"; } } } elsif ( $inputText =~ /\G([^#]+)/gc ) { $newstr .= $1; } else { last; } redo; } print "after: $newstr\n";

      Regarding your update, it's easier not to substitute in-place when using a parser.
      sub parse_bin { my $save = pos; my ($size_sz) = /\G \# (\d) /xgc or goto BACKTRACK; my ($bin_sz) = /\G (\d{$size_sz}) /xgc or goto BACKTRACK; my ($bin) = /\G .{$bin_sz} /xgcs or goto BACKTRACK; return $bin; BACKTRACK: pos = $save; return (); } my $outputText = ''; for ($inputText) { pos = 0; for (;;) { if (my ($bin) = parse_bin()) { $outputText .= '<Binary block: '.length($bin).' bytes>'; next; } if (/\G (.[^#]+ ) /xgcs) { $outputText .= $1; next; } last; } }

      Update: The parent's update was updated to generate $newstr since I started. Note that even with the update, the code doesn't behave as a regexp would. It silently drops bits of text instead of backtracking. For example, the "#" in "What you're #?" is dropped, and so is "#1" in "You're #1!".

Re: Can I use backreferences as quantifiers in a regex?
by educated_foo (Vicar) on Mar 29, 2009 at 16:15 UTC
    Yes, e.g.
    '4hhhh' =~ /(\d)(??{ "h{$1}"})/
Re: Can I use backreferences as quantifiers in a regex?
by ig (Vicar) on Mar 29, 2009 at 08:16 UTC

    The following isn't efficient because it keeps re-matching the entire string, but this won't matter unless your strings are very long. It is simple.

    #! /usr/local/bin/perl use strict; use warnings; my $str = "word1, word2, #18abcdefgh ,word4, #24qwer, word5"; print "Before: $str\n"; while($str =~ m/(.*?)#(\d)(\d)(.*)/) { $str = $1 . "<Binary block($2): $3 bytes>" . substr($4, $3); } print "After: $str\n";

    Which produces:

    Before: word1, word2, #18abcdefgh ,word4, #24qwer, word5 After: word1, word2, <Binary block(1): 8 bytes> ,word4, <Binary block +(2): 4 bytes>, word5

    I wondered about the first digit after the '#' so I captured that and put it into the substitution also.

    update: removed unnecessary capture from the RE.

    update2: A better alternative and comparison:

    #! /usr/local/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); my $str = "word1, word2, #18abcdefgh ,word4, #24qwer, word5"; print "Before: $str\n"; while($str =~ m/#(\d)(\d)/g) { substr($str, pos($str) - 3, $2 + 3, "<Binary block($1): $2 bytes>" +); } print "After: $str\n"; my $start = "word1, word2, #18abcdefgh ,word4, #24qwer, word5"; print "\n\n"; cmpthese( -10, { 're-match whole string' => sub { my $str = $start; while($str =~ m/(.*?)#(\d)(\d)(.*)/) { $str = $1 . "<Binary block($2): $3 bytes>" . substr($4 +, $3); } }, 'match #\d\d' => sub { my $str = $start; while($str =~ m/#(\d)(\d)/g) { substr($str, pos($str) - 3, $2 + 3, "<Binary block($1) +: $2 bytes>"); } }, }, );
    Before: word1, word2, #18abcdefgh ,word4, #24qwer, word5 After: word1, word2, <Binary block(1): 8 bytes> ,word4, <Binary block +(2): 4 bytes>, word5 Rate re-match whole string match # +\d\d re-match whole string 75230/s -- +-28% match #\d\d 104315/s 39% + --
Re: Can I use backreferences as quantifiers in a regex?
by JavaFan (Canon) on Mar 29, 2009 at 21:25 UTC
    You cannot do it the way you try to. And that's because the quantifier is dealt with at "compile-time" (compile-time of the regex). It's the same as that you cannot do:
    my $op = "+"; my $result = 3 $op 4;
    If you want to do such a thing, you have to resort to an eval. Luckely, Perl regexes have their own evals, (??{ }) (see earlier in this thread).