in reply to hash function

If I understand your question, you just want to look at the contents of the DB file "words.db". The contents of the file are a set of key/value pairs where the keys are strings composed of a category name and a word taken from one or more input text files.

I don't think Data::Dumper would help in terms of inspecting the contents of the DB file. (thanks for the correction, grep) If you would like a plain-text (flat-table) dump of the DB file contents, something like this would do:

use strict; use DB_File; # Hash with composite key strings: $words{category-word} gives count o +f # 'word' in 'category'. Tied to a DB_File to keep it persistent. my %words; tie %words, 'DB_File', 'words.db'; open( TXT, '>', 'words.txt'; while ( my ($key, $value) = each %words ) { print TXT "$key\t$value\n"; } close TXT; untie %words;
If you want something more "organized" in terms of output (e.g. sorting entries by category or by word), you could run whatever you want on the "words.txt" file, or you can the while loop above to do things other than just print out all the key/value pairs.

Loading the DB file contents into an in-memory data structure shouldn't be a problem for this kind of app, in case you want to do things like sorting, or working out how many categories are associated with each word, etc.

If there's something in particular you want to do with the data that you can't figure out, give us a clearer idea of what that might be (and your first try at a solution).

Replies are listed 'Best First'.
Re^2: hash function
by grep (Monsignor) on Oct 07, 2006 at 18:32 UTC
    Data::Dumper works just fine on Tied hashs.
    use strict; use warnings; use Data::Dumper; use DB_File; my %words; tie %words, 'DB_File', 'words.db'; load() if $ARGV[0]; print Dumper \%words; sub load { my $cnt = 1; foreach my $key ( 'John Cleese', 'Graham Chapman', 'Eric Idle', 'Ter +ry Jones', 'Michael Palin') { $words{$key} = "Gumby $cnt"; $cnt++; } }


    grep
    One dead unjugged rabbit fish later
      But... what if the DB file is really big? (Sometimes people tie a hash to a DB file because of the amount of data, not just for persistence.) When I tried the following test:
      perl -MDB_File -MData::Dumper -e 'tie %h, "DB_File", "junk.db"; for ($i=0; $i<1_000_000; $i++) { $h{"key_$i"}="value_$i" }; warn "the hash is loaded\n"; sleep 15; warn "starting dump\n"; print Dumper(\%h); warn "the hash has been dumped\n"; sleep 15' > /dev +/null
      Memory usage stayed at about 27 MB (macosx/perl 5.8.6) while the hash was being built and throughout the first sleep, then climbed over 450 MB during the Dump phase. Data::Dumper was making its own internal copies of the keys and values.

      (The "junk.db" file itself was 47 MB, and a plain-text print out of keys and values as I suggested above would probably be about half that.)