comment on

your benchmarks always talk about "... seconds using *a hash*"

Did you read the line:

Each run was manually killed after checking that the setting made no difference to the memory usage and ~20 seconds had elapsed. As you can see, it was still only processing ~10 line/s.

The benchmark code is the same as I posted above. It first tests the hash method (and prints out the timing); then the BigOR regex method, and would then print out its timing, except that as it takes the regex engine 34.5 minutes to complete the test (that the hash does in 0.17 seconds), I couldn't be bothered to wait for the 8 hours it would take to complete all 14 runs, so I monitored the tests and when the running line count from the regex test showed that it was still running at ~1 lines per second, I aborted that run (via the task manager) after ~20 seconds.

I was also monitoring the memory usage of the processes in anticipation that if the trie optimisation was being skipped because it would require more memory than preset limit, once that preset limit had been raised high enough that the trie was built, there would be a very obvious jump in memory usage No such jump ever occurred. All 14 instances of the program showed an identical 77MB max memory usage.

If 2^16 equates to 512MiB, then 2^32 must equate to 2^32*8192 -> 32Tib? (which I obviously do not have), but somewhere in between 2^16 & 2^32, there should have been some indication that the trie was being built, and there was not. (I suspect that it also has some hard upper limit to the number of alternations it will try to handle.)

I'd love to see a demonstration of the trie optimisation actually doing something.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^5: Efficient matching with accompanying data by BrowserUk
in thread Efficient matching with accompanying data by Endless

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.