Of course, if the pattern is not found (or found late), your solution still slurps the whole file. The following solution keeps at most {buffer size}+{pattern length)-1 bytes of the file in memory at a time.
my $pattern = "\xFF\xD9";
open(FILE, "< $file")
or die("Unable to open input file: $!\n");
binmode(FILE); # Disable "\n" translation.
$/ = \4096; # Arbitrary buffer size.
my $block;
my $partial = '';
my $base = 0;
my $matches = 0;
while ($block = <FILE>) {
my $lookbehind = $partial . $block;
my $pos = -1;
for (;;) {
$pos = index($lookbehind, $pattern, $offset+1);
last if $pos == -1;
print("Found a match at ", $base+$pos-length($partial), "\n");
$matches++;
}
$partial = substr($block, 1 - length($pattern));
$base += length($block);
}
if ($matches) {
print("Found $matches matches.\n");
} else {
print("Pattern not found.\n");
}
Will report overlapping matches.
Untested.
|