in reply to Re^6: search a large text file
in thread search a large text file

Do you mean "many times instead of one or two times" ? (one or two) == (few) in the english language? I don't see the rest of your script, so I can only guess:

1) Did you call the sub "to_hash" more than once ? "to_hash" should be executed once and then never again. And with "once" I mean once in your lifetime and not once per execution of the script. Whenever you want to search, just use "my $db = DBM::Deep->new( "$file.db" );" and start to search. Remember that the file $file.db is permanent on your disk and keeps the info between invocations of your script. Call "to_hash" twice and you also get twice the values.

Additionally you might want to add "$db->clear()" to your "to_hash" subroutine so that even if you have to call it twice (because the source file changed), you get an empty hash before filling it.

2) Maybe your search routine prints out more than you want

Replies are listed 'Best First'.
Re^8: search a large text file
by perl_lover_always (Acolyte) on Feb 10, 2011 at 10:25 UTC
    very simple:
    my $file_in_en=shift; my $hash_en=to_hash($file_in_en); print "@{$hash_en{'despite'}}";
    Results:
    17 18 18 18 18 18 18 18 18 18 18 18
    expected result:
    18
    When I try to use a normal hash in this way I get a correct result:
    my $file_in_en=shift; my %hash_en=to_hash($file_in_en); print "@{$hash_en{'despite'}}"; sub to_hash { my %hash; my $file = shift; open(FILE, "<$file"); foreach $l (<FILE>) { my ($ngram,$line) = split /\t/, $l; push(@{ $hash{$ngram} }, $line); } close FILE; return %hash; }

      yes, this is what I talked about in "1)" above. Your hash is permanent, you have to fill it only once, ever. You need two scripts:

      This is the script you call once whenever your source file changes:

      #!/usr/bin/perl #script 1. Fills the hash use warnings; use strict; use DBM::Deep; my $file_in_en=shift; my %hash_en=to_hash($file_in_en); sub to_hash { my $file = shift; my $db = DBM::Deep->new( "$file.db" ); $db->clear(); open(FILE, "<$file"); foreach $l (<FILE>) { my ($ngram,$line) = split /\t/, $l; push(@{ $db->{$ngram} }, $line); } close FILE; return $db; }

      And this is your search script:

      #!/usr/bin/perl #script 2. Search item in hash use warnings; use strict; use DBM::Deep; my $file=shift; my $hash_en= DBM::Deep->new( "$file.db" ); print @{$hash_en->{'despite'}};

      UDPATE: corrected the search script

        Thanks, made it work with some small modifications! However the file still goes out of memory? which is strange! although I read the file line by line I get out of memory message using 10 gb of ram!