Re: hash function

If I understand your question, you just want to look at the contents of the DB file "words.db". The contents of the file are a set of key/value pairs where the keys are strings composed of a category name and a word taken from one or more input text files.

~~I don't think Data::Dumper would help in terms of inspecting the contents of the DB file.~~ (thanks for the correction, grep) If you would like a plain-text (flat-table) dump of the DB file contents, something like this would do:

use strict;
use DB_File;

# Hash with composite key strings: $words{category-word} gives count o
+f
# 'word' in 'category'.  Tied to a DB_File to keep it persistent.

my %words;
tie %words, 'DB_File', 'words.db';

open( TXT, '>', 'words.txt';

while ( my ($key, $value) = each %words ) {
    print TXT "$key\t$value\n";
}
close TXT;
untie %words;
[download]

If you want something more "organized" in terms of output (e.g. sorting entries by category or by word), you could run whatever you want on the "words.txt" file, or you can the while loop above to do things other than just print out all the key/value pairs.

Loading the DB file contents into an in-memory data structure shouldn't be a problem for this kind of app, in case you want to do things like sorting, or working out how many categories are associated with each word, etc.

If there's something in particular you want to do with the data that you can't figure out, give us a clearer idea of what that might be (and your first try at a solution).

Comment on Re: hash function Download Code

Replies are listed 'Best First'.

Re^2: hash function
by grep (Monsignor) on Oct 07, 2006 at 18:32 UTC

Data::Dumper

Tied

use strict;
use warnings;
use Data::Dumper;

use DB_File;

my %words;
tie %words, 'DB_File', 'words.db';

load() if $ARGV[0];

print Dumper \%words;

sub load {
  my $cnt = 1;
  foreach my $key ( 'John Cleese', 'Graham Chapman', 'Eric Idle', 'Ter
+ry Jones', 'Michael Palin') {
    $words{$key} = "Gumby $cnt";
    $cnt++;
  }
}
[download]

grep

One dead unjugged rabbit fish later

[reply]
[d/l]

Re^3: hash function

by graff (Chancellor) on Oct 07, 2006 at 19:33 UTC

really

perl -MDB_File -MData::Dumper -e 'tie %h, "DB_File", "junk.db"; 
for ($i=0; $i<1_000_000; $i++) { $h{"key_$i"}="value_$i" };
warn "the hash is loaded\n"; sleep 15; warn "starting dump\n"; 
print Dumper(\%h); warn "the hash has been dumped\n"; sleep 15' > /dev
+/null
[download]

(The "junk.db" file itself was 47 MB, and a plain-text print out of keys and values as I suggested above would probably be about half that.)

[reply]
[d/l]

Re^4: hash function

by chromatic (Archbishop) on Oct 07, 2006 at 19:50 UTC

Data::Dump::Streamer handles this situation better. (A review of Data::Dump::Streamer).

[reply]