martzpet has asked for the wisdom of the Perl Monks concerning the following question:

I am building a hash of arrays with information from 2 files. I would like to do something like
if ($hash{^$file}) { do something }
First file gives me keys and the first 3 values of the list:
$hash{$thing} = (1,2,3)
$thing is generally a serial number and is unique. In a few cases the serial number may be appended with (O) or (R). So there are 3 potential values:
serial serial(O) serial(R)
second file gives me another 2 values for the list:
push(@$hash{$thing},4,5)
In this file, I am getting data based on serial number, but here there is never anything appended to serial number, so I have:
serial
In order to push, I am looking for the key that starts with serial
^serial
Due to the number of iterations, I am avoiding a foreach for the keys of $hash and I want to keep the (O) or (R) portion as it has significance. Can you do something like:
if ($hash{^$file}) { do something }

Replies are listed 'Best First'.
Re: regex of hash key on the fly
by ikegami (Patriarch) on Nov 13, 2007 at 21:32 UTC

    Tie::Hash::Regex provides that interface you want, but internally loops over all the keys. That's inevitable if you use a regex. If your concern is efficiency, Tie::Hash::Regex is not going to help.

    If I correctly understand what you are trying to do, you could actually store your data in a format that's more useful to you, eliminating the need for regexs entirely.

    $hash{$serial}{''} = ...; $hash{$serial}{O} = ...; $hash{$serial}{R} = ...;
      Like a ton of bricks it hit me that is probably the best thing to do. Strip the (O) or (R) and add it into the array so that I can keep track of that information.
Re: regex of hash key on the fly
by locked_user sundialsvc4 (Abbot) on Nov 13, 2007 at 21:46 UTC

    If the files in question are of any significant size, sort the two files by the same key. Then, you can dispense with the use of a hash (and all of the virtual-memory it consumes) altogether.

    Important: use a disk-based sort.

    As "the COBOL-jockeys of yore" well-knew (and as the present ones still know), sorting a large file is an unexpectedly-fast operation. If you know that your inputs are sorted, then:

    • Any occurrences of "the same key" will always occur together.
    • When the key-value changes, you must be at the end of all occurrences of the preceding key-value and at the beginning of all occurrences of the new one.
    • If any gaps exist, you know without searching that there are no key-values anywhere within that gap.

    Although we might not routinely think of "main memory" as "a disk file," that is, in fact, what it is. A page-fault is expensive even if it does not result in physical I/O. If you are dealing with more than, say, a few thousand records, the amount of speed gained can be quite startling. (As in, hundreds of times faster.)

    Also... don't let "tie" beguile you. A "tied hash" is not a hash. A convenient metaphor it may be, but a true hash it is not.

Re: regex of hash key on the fly
by dragonchild (Archbishop) on Nov 13, 2007 at 21:32 UTC