your benchmarks always talk about "... seconds using *a hash*"
Did you read the line:
Each run was manually killed after checking that the setting made no difference to the memory usage and ~20 seconds had elapsed. As you can see, it was still only processing ~10 line/s.
The benchmark code is the same as I posted above. It first tests the hash method (and prints out the timing); then the BigOR regex method, and would then print out its timing, except that as it takes the regex engine 34.5 minutes to complete the test (that the hash does in 0.17 seconds), I couldn't be bothered to wait for the 8 hours it would take to complete all 14 runs, so I monitored the tests and when the running line count from the regex test showed that it was still running at ~1 lines per second, I aborted that run (via the task manager) after ~20 seconds.
I was also monitoring the memory usage of the processes in anticipation that if the trie optimisation was being skipped because it would require more memory than preset limit, once that preset limit had been raised high enough that the trie was built, there would be a very obvious jump in memory usage No such jump ever occurred. All 14 instances of the program showed an identical 77MB max memory usage.
If 2^16 equates to 512MiB, then 2^32 must equate to 2^32*8192 -> 32Tib? (which I obviously do not have), but somewhere in between 2^16 & 2^32, there should have been some indication that the trie was being built, and there was not. (I suspect that it also has some hard upper limit to the number of alternations it will try to handle.)
I'd love to see a demonstration of the trie optimisation actually doing something.
In reply to Re^5: Efficient matching with accompanying data
by BrowserUk
in thread Efficient matching with accompanying data
by Endless
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |