regex of hash key on the fly

martzpet has asked for the wisdom of the Perl Monks concerning the following question:

I am building a hash of arrays with information from 2 files. I would like to do something like

 if ($hash{^$file}) { do something }
[download]

First file gives me keys and the first 3 values of the list:

  $hash{$thing} = (1,2,3)
[download]

$thing is generally a serial number and is unique. In a few cases the serial number may be appended with (O) or (R). So there are 3 potential values:

  serial
  serial(O)
  serial(R)
[download]

second file gives me another 2 values for the list:

  push(@$hash{$thing},4,5)
[download]

In this file, I am getting data based on serial number, but here there is never anything appended to serial number, so I have:

   serial
[download]

In order to push, I am looking for the key that starts with serial

 ^serial
[download]

Due to the number of iterations, I am avoiding a foreach for the keys of $hash and I want to keep the (O) or (R) portion as it has significance. Can you do something like:

 if ($hash{^$file}) { do something }
[download]

Comment on regex of hash key on the fly Select or Download Code

Replies are listed 'Best First'.
Re: regex of hash key on the fly by ikegami (Patriarch) on Nov 13, 2007 at 21:32 UTC
Tie::Hash::Regex provides that interface you want, but internally loops over all the keys. That's inevitable if you use a regex. If your concern is efficiency, Tie::Hash::Regex is not going to help. If I correctly understand what you are trying to do, you could actually store your data in a format that's more useful to you, eliminating the need for regexs entirely. `$hash{$serial}{''} = ...; $hash{$serial}{O} = ...; $hash{$serial}{R} = ...;` [download]	[reply] [d/l]
Re^2: regex of hash key on the fly by martzpet (Initiate) on Nov 13, 2007 at 21:43 UTC
Like a ton of bricks it hit me that is probably the best thing to do. Strip the (O) or (R) and add it into the array so that I can keep track of that information.	[reply]
Re: regex of hash key on the fly by locked_user sundialsvc4 (Abbot) on Nov 13, 2007 at 21:46 UTC
If the files in question are of any significant size, sort the two files by the same key. Then, you can dispense with the use of a hash (and all of the virtual-memory it consumes) altogether. Important: use a disk-based sort. As "the COBOL-jockeys of yore" well-knew (and as the present ones still know), sorting a large file is an unexpectedly-fast operation. If you know that your inputs are sorted, then: Any occurrences of "the same key" will always occur together. When the key-value changes, you must be at the end of all occurrences of the preceding key-value and at the beginning of all occurrences of the new one. If any gaps exist, you know without searching that there are no key-values anywhere within that gap. Although we might not routinely think of "main memory" as "a disk file," that is, in fact, what it is. A page-fault is expensive even if it does not result in physical I/O. If you are dealing with more than, say, a few thousand records, the amount of speed gained can be quite startling. (As in, hundreds of times faster.) Also... don't let "tie" beguile you. A "tied hash" is not a hash. A convenient metaphor it may be, but a true hash it is not.
Re: regex of hash key on the fly by dragonchild (Archbishop) on Nov 13, 2007 at 21:32 UTC
Tie::Hash::Regex	[reply]