in reply to Finding large blocks consisting of a single character (but within certain parameters...)

Guessing the answers to my questions above, this works for your somewhat limited sample:

#! perl -slw use feature qw[ state ]; use strict; sub findEm { state $protected; my( $fh, $blkSize, $chr, $chrCount ) = @_; my @returns; local $/ = \$blkSize; while( <$fh> ) { my( @matches, @protected ); ## If we had a protected zone that spans the block boundary ## start with the residual push @protected, $protected if defined $protected; ## Look for preliminary matches push @matches, $-[0] while m[(${chr}{$chrCount,})]g; ## skip ahead if there are none. next unless @matches; ## look for protected zones push @protected, [ $-[0], ord( $1 ) ] while m[\x74\x75\x76\x77 +(.)]g; ## If there are some, and the last spans off the end of this b +lock ## record the residual for the next block if( @protected and $protected[ -1 ][ 0 ] + $protected[ -1 ][ 1 ] > $blkSize ) { $protected = [ 0, ( $protected[ -1 ][ 0 ] + $protected[ -1 ][ 1 ] ) % $b +lkSize ]; } else { $protected = undef; } ## Destructively iterate the protected zones while( @protected ) { my( $start, $len ) = @{ pop @protected }; ## comparing them against each match (backward) for my $iMatch ( reverse 0 .. $#matches ) { my $match = $matches[ $iMatch ]; ## if this match precedes the start of ## the current protected zone, next zone last if $match < $start; ## If this match is beyond the end of the current zone +, ## next match next if $match > ( $start + $len ); ## The two overlap so discard the match splice @matches, $iMatch, 1; } } ## Calculate the file offset of the current block my $fOffset = ( $. -1 ) * $blkSize; ## In a non list context unless( wantarray ) { ## undef unless we've at least one match return unless @matches; ## or the file offset of the first if we have one or more return $fOffset + $matches[ 0 ]; } ## Map the match block offsets to file offsets and remember th +em push @returns, map $fOffset + $_, @matches; } ## return them return @returns; } my $fileData = pack 'H*', join'',split ' ', do{ local $/; <DATA> }; open RAM, '<', \$fileData; print 'Scalar context: ', scalar findEm( \*RAM, 0x40, chr(0), 8 ); close RAM; open RAM, '<', \$fileData; print 'List context ', join ', ', findEm( \*RAM, 0x40, chr(0), 8 ); __DATA__ 09 43 4A 00 00 00 00 00 00 00 00 00 00 00 FC B0 DD 12 46 33 73 7A 8B 01 00 00 00 00 00 00 98 40 34 3F 79 6D DC 2A 2B 35 FF 90 FA 60 66 58 5A 21 40 06 88 F2 11 EE 65 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 88 CC 02 A0 74 75 76 77 09 00 00 00 00 00 00 00 00 00 AA

Outputs:

C:\test>891765 Scalar context: 3 List context 3, 55, 64

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Finding large blocks consisting of a single character (but within certain parameters...)
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Finding large blocks consisting of a single character (but within certain parameters...)
by TheMartianGeek (Acolyte) on Mar 07, 2011 at 14:56 UTC

    Are these returned offsets, the offsets within the file of the 32k blocks containing matching blocks of null byte blocks? Or the offsets of the null byte blocks within the 32k blocks? Or the offset of null byte blocks within the file?

    They would be the offsets of null byte blocks within the file.

    What does it return in a scalar context if the first 32k block that contains a matching, qualifying block, contains more than one?

    The offset, within the file, of the first qualifying block.

    And as for that subroutine...well, I can't make head or tail out of most of it. Does it need to be that long? And why is the "state" necessary?
      And as for that subroutine...well, I can't make head or tail out of most of it

      Hm. Then I guess it would be easier if you detailed the bits you do understand?

      Does it need to be that long?

      It's shorter if you remove the comments. but I don't think I'll ever be able to make it as short as all those other answers you got.

      And why is the "state" necessary?

      It's not. Feel free to delete it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Hm. Then I guess it would be easier if you detailed the bits you do understand?

        Well, I don't really know where to start...it's all confusing. And if I don't understand it, I can't customize it. (This is especially important; for one thing, in the real situation, the special character string would be followed by four bytes, the first two indicating the length of the protected area.)