Re^3: Search hex string in vary large binary file

Replies are listed 'Best First'.
Re^4: Search hex string in vary large binary file by BrowserUk (Patriarch) on Feb 07, 2015 at 16:18 UTC
Hm. I see this in the same way I see extracting one or two pieces of information from a web page. I can either: Laboriously parse the entire structure of the document into a complex data structure and then traverse it to obtains the bits; Or I can treat the whole thing as unstructured data and just grab the bits I need. In the OPs case, given he only wants a yay or nay answer; and the odds of a false positive are so minuscule; parsing the entire file is a waste of cpu cycles, time, and effort. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^5: Search hex string in vary large binary file by Anonymous Monk on Feb 07, 2015 at 19:56 UTC
In the OPs case, given he only wants a yay or nay answer ... parsing the entire file is a waste of cpu cycles, time, and effort. You probably have much more experience with working with big data than I do, but in this case: On my machine, on a single 1.6GB video file where there is no match, your code takes 1.6s to complete (when working from disk cache), whereas a patched MP4::Info comes up with an answer in less than 0.1s. And MP4::Info scans 41GB of video files in under a second. Example code: `use MP4::Info 'get_mp4tag'; my $tag = get_mp4tag($ARGV[0]); print $tag && $tag->{HDVD} && $tag->{HDVD}==2 ? "Match! $ARGV[0]\n" : "No Match!\n";` [download] In the general case, I tend to think the right tool for the job is much more likely to be a module (if it exists) - except maybe in the case of large amounts of input data, where optimizations may be necessary. P.S. I'm sure you've read You can't parse (X)HTML with regex :-)	[reply] [d/l]
Re^6: Search hex string in vary large binary file by BrowserUk (Patriarch) on Feb 07, 2015 at 20:55 UTC
your code takes 1.6s to complete ... patched MP4::Info less than 0.1s. Hm. Sample code? The patch? P.S. I'm sure you've read You can't parse (X)HTML with regex :-) Tim Bray, one of the guys that put together the XML spec. does (and apparently prefers to); but that's by-the-by .... "Parse", in the sense of read-tokenise-build a structure that represents the entire document: I probably could, but it'd be more work than I'd take on. Especially when there are free modules that will do that for me. But if you want to *extract* a few values from within a jumble of text for which there is no parser, regex is the way to go. So, if I don't give a fig for the structure of the document, I treat it as a "jumble of text"; and get the job done. All I need is a unique anchor. And there always is one. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply]
Re^7: Search hex string in vary large binary file by Anonymous Monk on Feb 07, 2015 at 21:24 UTC
Re^8: Search hex string in vary large binary file by BrowserUk (Patriarch) on Feb 08, 2015 at 05:08 UTC
Some notes below your chosen depth have not been shown here