Tie::File to create a Hash?

LittleGreyCat has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

Let me first try to describe the problem I am trying to solve:

I have a large file with many complex entries, each relating to a file in a Unix filestore tree.

I wish to extract a subset of these entries to match part of the current filestore tree; I can produce a list of the current filestore tree using the 'find' command.

So I have two files:

The true filestore list

/fred/myfile
/fred/myfile2
/bert/myfile

The large complex file

user ALLFILES /fred/myfile=/archive/dingbat/fred/myfile 3 6 9 thegoosedrankwine
user ALLFILES /fred/myfile2=/archive/dingbat/fred/myfile 3 6 9 thegoosedrankwine
user ALLFILES /fred/myfile3=/archive/dingbat/fred/myfile 3 6 9 thegoosedrankwine
user ALLFILES /bert/myfile=/archive/dingbat/fred/myfile 3 6 9 thegoosedrankwine
user ALLFILES /bert/myfile2=/archive/dingbat/fred/myfile 3 6 9 thegoosedrankwine

You will note that the third field (up to the '=') in the complex file is the filename in the real filestore tree.

My tentative plan is to set up the first file as a hash, indexed by the whole contents of each line, and then read serially through the second file, splitting out the file name component and matching it with the Hash.

If I get a hit, I then overwrite the matching entry in the Hash with the current line in my complex input file.

At the end I should have copied all the matching entries out of the complex file, and these should now be in the other file.

Any lines without a match will be unchanged.

So, the question:

Can I use 'Tie::File' to generate the Hash (which makes this scalable to work with large files and small memory), should I work in memory, or is there some other Perl feature which will make this so easy that I will be embarrased that I asked the question.

TIA

LGC

Nothing succeeds like a budgie with no teeth.

Comment on Tie::File to create a Hash?

Replies are listed 'Best First'.
Re: Tie::File to create a Hash? by Fletch (Bishop) on May 31, 2007 at 13:54 UTC
You don't want Tie::File, you want to use DBFile or BerkelyDB to create the hash-on-disk of the first file. You'd then read the second line by line and extract the path and use `exists $tied_hash_on_disk{ $path }` to check if you should print the line or not.	[reply] [d/l]
Re: Tie::File to create a Hash? by citromatik (Curate) on May 31, 2007 at 14:20 UTC
A simple solution to your problem doesn't involve perl at all. Using the shell you can achieve this in just a simple line: `$ sed 's/=/ /' complex_file \| join -a 1 -1 1 -2 3 simple_file -` [download] This works only if you have the files sorted by the joint field, e.g: `$ mv simple_file simple_file.bk; sort simple_file.bk > simple_file $ mv complex_file comple_file.bk; sed 's/=/\t/' complex_file.bk \| sort + -k 3,3r \| sed 's/\t/=/' > complex_file` [download] Hope this helps! citromatik	[reply] [d/l] [select]
Re^2: Tie::File to create a Hash? by LittleGreyCat (Scribe) on May 31, 2007 at 15:26 UTC
Interesting approach, and I can see where you are going. 'join' seems to do more or less exactly what I want. Having 'sed' problems, though. The '/\t/' substitution seems to work on the character 't', not creating/removing a tab character as I expected. The resulting changes to the 't' in '/opt' give unexpected results. I will experiment further. Thanks Nothing succeeds like a budgie with no teeth.	[reply]
Re: Tie::File to create a Hash? by blazar (Canon) on May 31, 2007 at 13:57 UTC
Can I use 'Tie::File' to generate the Hash (which makes this scalable to work with large files and small memory), should I work in memory, or is there some other Perl feature which will make this so easy that I will be embarrased that I asked the question. Nope, from perldoc mod://Tie::File `NAME Tie::File - Access the lines of a disk file via a Perl array` [download] You should work in memory if memory is enough. Otherwise if usage is large, go the tie way. Of course find a suitable module and not the one you mentioned. How 'bout DB_File for example?	[reply] [d/l]