in reply to Re^5: Search hex string in vary large binary file
in thread Search hex string in vary large binary file

your code takes 1.6s to complete ... patched MP4::Info less than 0.1s.

Hm. Sample code? The patch?

P.S. I'm sure you've read You can't parse (X)HTML with regex :-)

Tim Bray, one of the guys that put together the XML spec. does (and apparently prefers to); but that's by-the-by ....

"Parse", in the sense of read-tokenise-build a structure that represents the entire document: I probably could, but it'd be more work than I'd take on. Especially when there are free modules that will do that for me.

But if you want to extract a few values from within a jumble of text for which there is no parser, regex is the way to go.

So, if I don't give a fig for the structure of the document, I treat it as a "jumble of text"; and get the job done.

All I need is a unique anchor. And there *always* is one.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^6: Search hex string in vary large binary file

Replies are listed 'Best First'.
Re^7: Search hex string in vary large binary file
by Anonymous Monk on Feb 07, 2015 at 21:24 UTC

    Here's my test script, the patch to MP4::Info is as described earlier. The "scan" code hopefully does your solution justice. Note this particular script currently only tests for the presence of the HDVD flag, it doesn't look at whether it has a value of 1 (720p) or 2 (1080p) - but that shouldn't make any difference for this benchmark.

      Note this particular script currently only tests for the presence of the HDVD flag, it doesn't look at whether it has a value of 1 (720p) or 2 (1080p)

      So, you're benchmarking a script that finds what the OP is looking for; against a script that doesn't; on files that don't contain it; using a "ready made solution", that you had to patch -- to look for 1/6th of the information required -- and concluding its faster.

      Yeah right!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        So, you're benchmarking a script that finds what the OP is looking for; against a script that doesn't

        The fix is to change return !!$tag->{HDVD}; to return $tag->{HDVD} && $tag->{HDVD}==2;. The check was set up like that initially to match the "AtomicParsley" check, which was just meant to be a quick double-check on the results.

        on files that don't contain it

        Do you think the OP has the flag set in every one of the files? If there's only one file that doesn't have the flag set, scanning will be slower overall. Even if every file were to have the flag set, scanning is only fast if the flag appears near the beginning of every file. The MP4::Info solution appears to be fast regardless of whether the flag is set or not.

        using a "ready made solution", that you had to patch

        The patch takes a few clicks to find on RT and applies cleanly. What is the problem?

        Yeah right!

        Considering I'm not even an expert on the file format and the set of input data I happened to have lying around may not be representative, I'd actually appreciate it if someone were to find an actual issue with the MP4::Info solution. At the moment it seems like you're just trying to shout it down without even looking at it.