in reply to Re: Re: Reading file into a numbered hash
in thread Reading file into a numbered hash

I only offered that because if you're dealing with a largish amount of data, it's significantly more efficient to deal with a sequential data structure (such as an array) than a "random-access" one with a sequential key.
  • Comment on Re: Re: Re: Reading file into a numbered hash

Replies are listed 'Best First'.
Re: Re: Re: Re: Reading file into a numbered hash
by KM (Priest) on Nov 23, 2000 at 02:04 UTC
    I would disagree, since hash lookups aren't exactly slow. And, sometimes speed isn't always what you want in a script. If I always wanted the fastest way, I would likely use C much of the time instead of Perl. Sometimes, you want what makes most sense to "you", and what "you" find intuitive. Here is an example, sort of the same concept, but a little different...

    I had to work on a project that someone else worked on. All his DBI fetchs would be returned into arrays, maybe for the same reasons you are talking about. So, much of the code looked like:

    do something with $array[8] Now, do something with $array[9]

    Well, that means nothing to me. How do I know that the 8th element is the 'FNAME' field and the 9th is the 'LNAME' field? I don't! Also, what happens when you change the table? BAM! Things can get seriously out of whack if I make a 'MNAME' field after the 'FNAME' field.

    So, I started off by changing all his fetchrow_array's to fetchrow_hashref's. Now, I had a more intuitive (and scalable) way to do things like:

    do something with $hashref->{FNAME} do something else with $hashref->{LNAME} Don't forget $hashref->{MNAME}

    Did I sacrifice some speed by using a hash(ref)? Maybe. Is the script now easier to read and maintain? Definitely! So, which is more efficient? I say it is more efficient to have readable and maintainable code. This saves work-time, which is much more measurable than fractions of a second differences that may be gained/lost with using one data structure vs. another.

    So, I disagree that it is more efficient to use an array, based on my experiences. But, regardless, this was an exercise to find AWTDI ;-)

    Cheers,
    KM

      Now I understand KM's point in the more general example, but in the specific case that started this thread, I'd like to suggest that Fastolfe's use of a simple array could get you the same readability as a hash, as the keys you need are always natural numbers. The matter of the zero-based array is handled by tossing in an unshift(@array,'');, following the read, after which $array[5] really refers to line 5 of the original file (and I also note that despite its good looks, using '05' as a hash key isn't going to help future maintainability).

      Another benefit is simpler printing:

      print @array; # gives the same result as: print qq{$hash{$_}} for sort keys %hash; # this

      My conclusion (for now ;-): arrays are easier when your keys are always going to be sequential integers.

        Another benefit is simpler printing:

        print @array; # gives the same result as: print qq{$hash{$_}} for sort keys %hash; # this

        No it doesn't. You are printing an array, and printing a hash by first sorting it's keys. They would not give the same result *except* for the case where the array is in the order you want (I hope if you read in a file, noone moved around elements!). So, they are the same, depending on how each was initialized.. but I may be nit picking :)

        The matter of the zero-based array is handled by tossing in an unshift(@array,'');, following the read

        Why would I want to do that? Now if I somewhere shift, pop, or otherwise alter the array (not the data set itself), things can still get out of whack.

        using '05' as a hash key isn't going to help future maintainability).

        No? If my file is a large data set (flat file db, maybe) then I think using '5' (remember, if you read my first post, the 0 padding was for nicer printing only) is very maintainable. Well, I don't see how it wouldn't be.. or how using an array would be any more maintainable.

        I am not disputing that an array is easy to work with (I like hashes better), isn't quick (hash lookups aren't too slow), or easy to read (well, sometimes they aren't intuitive).. but the original snippet was, again, AWTDI. Maybe at some point we can actually benchmark it with a large file. But, personally, I still like hashes in general for readability, maintainability, form, and function.

        Cheers,
        K "gobble gobble" M