Re: Faster way to parse binary stream

Instead of removing parsed chars from the strings, I'd use pos to keep track of where you are in the string. That way, you can intermix the use regexps and substr to extract data from the packed string.

for ($packed) { # alias $_ = $packed;
   pos = 0;

   /\G (.) /xgc or die;
   my $count = unpack('C', $1);

   for my $i (1..$count) {
      /\G (.) /xgc or die;                 # Extract data using a re
      my $length = unpack('C', $1);

      my $str = substr($_, pos, $length);  # Extract data using substr
      pos($_) += $length;                  # Don't forget to upd pos

      push @strings, $str;
   }

   # Make sure there's nothing extra at the end.
   /\G \z /xgc or die;
}
[download]

Another advantage to this method is that you can break your parser down into multiple functions.

sub extract_string {
   /\G (.) /xgc or die;
   my $length = unpack('C', $1);

   my $str = substr($_, pos, $length);
   pos($_) += $length;

   return $str;
}

sub parse {
   for ($_[0]) { # alias $_ = $_[0];
      pos = 0;

      /\G (.) /xgc or die;
      my $count = unpack('C', $1);

      my @strings;
      for my $i (1..$count) {
         push @strings, extract_string();
      }

      # Make sure there's nothing extra at the end.
      /\G \z /xgc or die;

      return @strings;
   }
}

my @strings = parse($packed);
[download]

The final advantage is backtracking. Since the the string isn't being destroyed, parts of it can be re-parsed.

Update: Added advantages and second snippet.

Comment on Re: Faster way to parse binary stream Select or Download Code