in reply to Best method to diff very large array efficiently

Can there be duplicate values in either array?

What information do you you need as you result?

  1. the overlap between the arrays?
  2. What is left in the first array, once anything also found in the second is removed?
  3. Or vice versa?
  4. Or both?
  5. Or all three?

BTW: In your example, you name the keys that remain in the hash @dropped which doesn't, in isolation, make a lot of sense?

Also, if you use a hash to determine this, there is no point in sorting the arrays first, it will make no difference to the result, but will just cost time.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Best method to diff large array
by newbieperlperson (Acolyte) on Nov 25, 2013 at 05:13 UTC
    Thank you for taking the time to respond.

    I agree, the sort is not required and I will remove that.

    The information I need is on the differences in @arr_1 that are not in @arr_2.

    I am not at work but will make the edits to the code tmw and check the results.

    I think @dropped is incorrect verbiage, I will change it to be @diff

      You didn't say whether the values in each of the two arrays are uniq?

      Also. what are the values in the arrays? Ie. strings, numbers, integers, small(ish) integers etc.?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Hi there,

        I updated the post based on your questions, to answer your question, the values are unique and its an INT data type

        Thanks,

        AJ