in reply to Re^6: Search hex string in vary large binary file
in thread Search hex string in vary large binary file

Here's my test script, the patch to MP4::Info is as described earlier. The "scan" code hopefully does your solution justice. Note this particular script currently only tests for the presence of the HDVD flag, it doesn't look at whether it has a value of 1 (720p) or 2 (1080p) - but that shouldn't make any difference for this benchmark.

#!/usr/bin/env perl use warnings FATAL => 'all'; use strict; use 5.010; use MP4::Info 'get_mp4tag'; my %func = ( external => sub { my $file=shift; return `AtomicParsley "$file" -T 1 2>/dev/null`=~/hdvd/i; }, scan => sub { my $file=shift; my $BUFN = 1024; $BUFN *= 4096; my $SIG = '68 64 76 64 00 00 00 11 64 61 74 61 00 00 00 15' .'00 00 00 00 02 00 00 00'; $SIG =~ tr[ ][]d; $SIG = pack 'H*', $SIG; open my $in, '<:raw', $file or die $!; my( $offset, $buffer ) = ( 0, '' ); while( sysread( $in, $buffer, $BUFN, length $buffer ) ) { my $pos = 1+index( $buffer, $SIG ); if( $pos ) { return 1; } $offset += length( $buffer ) - length( $SIG ); $buffer = substr $buffer, - length $SIG; } close $in; return; }, mp4info => sub { my $file=shift; my $tag = get_mp4tag($file) or return; return !!$tag->{HDVD}; }, ); die "Usage: $0 ".join('|',sort keys %func)." PATH\n" unless @ARGV==2 && exists $func{$ARGV[0]} && -d $ARGV[1]; my $FUNC = $ARGV[0]; my $PATH = $ARGV[1]; use File::Find 'find'; my ($yes,$no,$size) = (0,0,0); find({ wanted=>sub { return unless -f && /\.(mp4|m4[apvb])$/i; $size+=-s; if ($func{$FUNC}->($_)) { $yes++; say "YES $_"; } else { $no++; say " no $_"; } }}, $PATH); say "yes=$yes, no=$no, size=$size";

Replies are listed 'Best First'.
Re^8: Search hex string in vary large binary file
by BrowserUk (Patriarch) on Feb 08, 2015 at 05:08 UTC
    Note this particular script currently only tests for the presence of the HDVD flag, it doesn't look at whether it has a value of 1 (720p) or 2 (1080p)

    So, you're benchmarking a script that finds what the OP is looking for; against a script that doesn't; on files that don't contain it; using a "ready made solution", that you had to patch -- to look for 1/6th of the information required -- and concluding its faster.

    Yeah right!


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      So, you're benchmarking a script that finds what the OP is looking for; against a script that doesn't

      The fix is to change return !!$tag->{HDVD}; to return $tag->{HDVD} && $tag->{HDVD}==2;. The check was set up like that initially to match the "AtomicParsley" check, which was just meant to be a quick double-check on the results.

      on files that don't contain it

      Do you think the OP has the flag set in every one of the files? If there's only one file that doesn't have the flag set, scanning will be slower overall. Even if every file were to have the flag set, scanning is only fast if the flag appears near the beginning of every file. The MP4::Info solution appears to be fast regardless of whether the flag is set or not.

      using a "ready made solution", that you had to patch

      The patch takes a few clicks to find on RT and applies cleanly. What is the problem?

      Yeah right!

      Considering I'm not even an expert on the file format and the set of input data I happened to have lying around may not be representative, I'd actually appreciate it if someone were to find an actual issue with the MP4::Info solution. At the moment it seems like you're just trying to shout it down without even looking at it.