in reply to Compare two arrays of simple numbers

This node falls below the community's minimum standard of quality and will not be displayed.
  • Comment on Re: Compare two arrays of simple numbers

Replies are listed 'Best First'.
Re^2: Compare two arrays of simple numbers
by punch_card_don (Curate) on Oct 03, 2007 at 00:01 UTC
    Bad day?

    You're right - forgot to specify that the integers are 1 to 9.

      <nitpick> Actually, rather than forgetting to specify that the arrays are expected to contain only integers in the single-decimal-digit range (>0, <10), what you said was:

      I now have two identical arrays, even if the sort is ascii-betical instead of true numerical.

      which sort of implies that you would expect to see some integers >9 (still assuming we are only talking about positive integers).

      BTW, how big are these arrays, and should we expect repeated values? If there are repeated values, when you say "test if @array_1 contains exactly the same set of integers as @array_2" do you mean "the same quantities of elements for each observed value", or simply "the same values present, any number of times"? </nitpick>

      You've heard about the Benchmark module, right? Have you tried that with "join" vs. something else to conclude that "join" is "heavy"?

      As always with this sort of problem, hashes come to mind, but you would need Benchmark to see how it compares to sorting and stringifying.

      Here's a test of hash vs. sort-join-string-compare vs. sort-iterate-numeric-compare, checking results for both "same" and "diff" data sets (requiring that repeated values appear with the same quantity in order to be "same"), with options to change array size and max value in the array:

      And here are the timing results (and output) for a "default" run (array size: 9, max value: 9, running on a 1GHz G4 PowerPC, macosx 10.4.10):
      $ test.pl Rate sortcomp hash sortjoin sortcomp 20202/s -- -0% -58% hash 20243/s 0% -- -58% sortjoin 48077/s 138% 137% -- results: hash:diff sortjoin:diff sortcomp:diff Rate sortcomp hash sortjoin sortcomp 17857/s -- -26% -63% hash 24213/s 36% -- -50% sortjoin 48544/s 172% 100% -- results: hash:same sortjoin:same sortcomp:same
      As you would expect, if we grow the array size but keep to the same limited number of possible array values, the "sortjoin" method will suffer more than the hash method (due to having to build and compare longer strings):
      test.pl 90 ## array size = 90 (lots of repeat values) Rate sortcomp sortjoin hash sortcomp 2199/s -- -53% -61% sortjoin 4645/s 111% -- -18% hash 5653/s 157% 22% -- results: hash:diff sortjoin:diff sortcomp:diff Rate sortcomp sortjoin hash sortcomp 1907/s -- -59% -66% sortjoin 4684/s 146% -- -17% hash 5659/s 197% 21% -- results: hash:same sortjoin:same sortcomp:same
      But overall, I think this is an area where benchmarking, while fun and interesting, is really a bit pointless. Unless this particular task really occupies a huge proportion of what your application is supposed to do, the difference in "efficiency" among these different approaches is likely to be drowned out by everything else the app actually does (file/db/network i/o, etc).
Re^2: Compare two arrays of simple numbers
by Tom_BIX (Initiate) on Jul 05, 2009 at 21:47 UTC
    Don't be a turd. We are all students, we are all teachers. Share what you've got with those in need, and keep your nastiness to yourself.