in reply to Re^7: Best way to store/sum multiple-field records? ("significant")
in thread Best way to store/sum multiple-field records?
Well, if you had repeated the same check that I did above (verify that your different approaches are actually doing the same thing) ...Sadly enough, I actually did it before running the benchmark, as shown only in part here:
but I looked at the results too quickly and failed to see the difference (i.e. "Jones" versus "Jones|"). And this difference is quite significant.$ perl -e '$_ = "USERID1|2215|Jones|"; > my( $x, $y, $z ) = split /\|/; > print "( $x, $y, $z )\n";' ( USERID1, 2215, Jones ) $ perl -e '$_ = "USERID1|2215|Jones|"; > ( $x, $y, $z ) = split /\|/, 3; > print "( $x, $y, $z )\n"; > ' ( 3, , ) $ perl -e '$_ = "USERID1|2215|Jones|"; > ( $x, $y, $z ) = split /\|/, $_, 3; > print "( $x, $y, $z )\n"; > ' ( USERID1, 2215, Jones| )
So, I decided to run again the test, not changing the code, but rather changing the data to:
just because this is more in line with the type of data that I have to deal most frequently (no separator at line end), so that is the result:my @strings = qw( USERID1|2215|Jones USERID1|1000|Jones USERID3|1495|Dole USERID2|2500|Francis USERID2|1500|Francis );
Now, clearly, a 2% difference is not significant, this shows that my original untested opinion that it did not really matter to put a limit to the split if the number of available fields is equal to the limit was correct, and that my subsequent opposite opinion based on a faulty test was wrong. Thank you for you enlightenment on this. Just in case someone worries, I am not concluding from that I should believe my untested opinion rather than my test results, but clearly I should be more cautious about the significance of my tests.$ perl bench_inside_outside.pl Rate outside outside2 inside inside2 outside 110902/s -- -2% -39% -40% outside2 113390/s 2% -- -38% -39% inside 181595/s 64% 60% -- -2% inside2 186121/s 68% 64% 2% --
Without getting into the details of your very interesting post, I would say that, sometimes, I really need to know whether one way of doing things if significantly faster than another (say, for example, s/// versus tr///, or m// versus index(), etc.). But in the end,, only real tests with real data really make sense. The benchmark module is quite useful to prune early the tree of possible courses of action. In the end, only test with real data really matters.
I am dealing with a 35M customer base, with about a billion billing services, and dozens of billions of usages (phone calls, SMS, Internet Connections, Video down loadings, etc.) per month. Performance matters for me.
Benchmarks provided by the benchmark module give quite interesting information about the best way to do things, but the really interesting data comes from actual testing.
|
|---|