Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have some information I am storing which is unique based on UserIDs. The problem is that the IDs are not sequential (and in fact range from 50,000ish to 120,000ish with approx 2000 unique values. My obvious first thought was to use these IDs in an array. However, then there is the problem of dealing with all of the IDs which are not assigned a value within the array. (This could be a large issue, as I need to create an n x n array to store a metric comparing each ID to every other ID)
@Social[$id_1][$id_2]
The second thought was to use a Hash with the ID as the key. Obviously, for the hash I would only be using the number of entries that I have IDs.
%Social{$id_1}{id_2}
Are there downsides for using a hash I am not considering? Or should I be looking at a different data structure? Thanks. -dm

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: Array vs. Hash
by BrowserUk (Patriarch) on Feb 24, 2012 at 20:15 UTC
    Are there downsides for using a hash I am not considering?

    A fully populated 2000x2000 entry HoHs will require around 250MB. Well within the bounds of any modern system. Even my smartphone can handle it. And you can iterate the entire thing in a couple of seconds, so performance isn't an issue.

    Whether it is the best mechanism for your particular application rather depends on whether this is a one-off program, or if it needs to be run many times per day (or hour or minute); and where the data to populate that HoHs comes from and what you do with it when the program ends.

    The more information you are able to give about the needs and restraints of the application, the more likely you are to get a meaningful assessment of the options available to you.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Thanks Grandfather for reformatting this post. This humble acolyte will endeavor not to make the same error again.

      The purpose of the app is to perform some basic data analysis on students. This is a one off project. I will be performing the analysis a few times on different data sets and compare the results.

      Based on the suggestions, I have implemented using HoH. The application is working, and I am getting usable results

      Thanks!

        The purpose of the app is to perform some basic data analysis on students. This is a one off project. I will be performing the analysis a few times on different data sets and compare the results.

        Then the hash is the perfect choice for your needs :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Array vs. Hash
by nemesdani (Friar) on Feb 24, 2012 at 19:48 UTC
    I think you should definitely use a hash in this case. The IDs being not sequential also shouldn't be a problem, you can always sort $keys if the need arises. My vote: hash.
Re: Array vs. Hash
by CountZero (Bishop) on Feb 24, 2012 at 20:06 UTC
    Definitely a hash, unless you would opt for a proper database structure.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics