gossamer has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've been given a set of DB_FILE hashes that apparently have null, or undefined, entries that I can't figure out how to properly handle. I'd like to just delete the entries from the database after it's been created with the improper entries.

However, it doesn't seem that they are actually being deleted.

These are hashes of amavisd quarantine spam and virus files that include the type, a key, and the date from the filename, such as virus-b2291ce2b21d1cf6f6af68bd3e355505-20120112T115534-10090-04-3. This is then split into 7 pieces and tab-delimited and stored in the hash.

Apparently either all files don't have exactly seven fields, or there are otherwise some null entries made, which causes my routies to choke and die with "undefined" errors.

The problem is that the delete($hash{$key}) doesn't seem to actually be deleting the hash, because if I run the program again, it finds the same undefined entry.

push @myhosts, 'mymail01'; push @myhosts, 'mymail02'; foreach my $mypc (@myhosts) { for (my $i = 0; $i < 256; $i++) { my $bucket = sprintf('%02x', $i); my $file = sprintf('%s/%s/%02x.db', $qdir, $mypc, $i); tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || next; foreach my $key ( keys %hash ) { if (!defined $hash{$key}) { print "$key not defined\n"; delete($hash{$key}); untie %hash; exit 1; } # $type\t$d$t\t$size\t$from\t$to\t$subj\t$score my @tmp = split /\t/, $hash{$key}, 7; my $type = $tmp[0]; my $dt = $tmp[1]; my ($year, $month, $day) = $dt =~ m|(\d{4})(\d{2})(\d{2})T.*|; printf("bucket: %s\ttype: %s\t%s-%s-%s\t%s\n",$bucket,$type,$ye +ar,$month,$day,$dt) } untie %hash;

Running the program where it produces the "undefined" error looks like this:

bucket: aa type: spam 2012-07-07 20120707T204829 spam-aa0b162b5118763a9f053d1412f9f0b1-20120705T082627-27218-15.gz not +defined Use of uninitialized value within %hash in concatenation (.) or string + at ./list_hash.pl line 66.

The "bucket" line above is the values from the previous hash entry which is successfully defined. It only makes it to the printf if $hash{$key} is defined.

Is the issue that the $hash{$key} that I'm using or referring to doesn't actually match the key I'm trying to delete?

Am I properly determining if a value is undefined? Is (!defined $hash{$key}) the proper way? I've searched quite a bit for the proper way to determine if a value is undefined, and haven't really been successful.

Thanks,
Alex

Replies are listed 'Best First'.
Re: Deleting undefined entries from DB_FILE hash
by jwkrahn (Abbot) on Jul 23, 2012 at 06:07 UTC
    tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || next; foreach my $key ( keys %hash ) { if (!defined $hash{$key}) {

    The undef value is an internal perl meta-value.    When you store data on an external device (like a tied hash) the undef value is converted to a string, so trying to use defined on this value makes no sense.

      Hi,

      Thanks everyone for your offer to help. I've posted below a complete program that runs on my system. Where it encounters an error and quits, it displays the following, with a few of the preceding lines:

      bucket: aa type: spam 2012-07-02 20120702T132446 bucket: aa type: spam 2012-07-07 20120707T083754 bucket: aa type: spam 2012-07-02 20120702T203543 bucket: aa type: virus 2012-07-08 20120708T113543 bucket: aa type: spam 2012-07-16 20120716T151849 bucket: aa type: virus 2012-07-22 20120722T100147 bucket: aa type: spam 2012-07-03 20120703T042249 bucket: aa type: spam 2012-07-07 20120707T204829 Use of uninitialized value in split at ./list_hash.pl line 32. Use of uninitialized value $dt in pattern match (m//) at ./list_hash.p +l line 35. key 2 not defined. Deleting.

      It's reading from 255 hashes for each host, which you aren't going to have, so it probably won't execute for you.

      As I mentioned previously, the delete() isn't actually deleting for some reason. If I run the program again, it dies at the same spot, indicating to me that the record with the undef'd value or key still remained.

      Any ideas greatly appreciated.

      #!/usr/bin/perl -w # use perl; use DB_File; use DBI; use File::Basename qw(basename); use strict; use vars qw($verbose); my $me = basename($0); $me =~ s/\.pl$//; $verbose = shift || 1; sub DBG($); my $qdir = '/var/www/noc.mydomain.com-80/'; my %hashes = ( ); my $version = '1.9'; my @mailhosts = qw(); push @mailhosts, 'mail01'; push @mailhosts, 'mail02'; foreach my $mhosts (@mailhosts) { for (my $i = 0; $i < 256; $i++) { my $bucket = sprintf('%02x', $i); my $file = sprintf('%s/%s/%02x.db', $qdir, $mhosts, $i); tie (my %hash, 'DB_File', $file, O_RDWR, 0600, $DB_HASH) || next; foreach my $key ( keys %hash ) { my @tmp = split /\t/, $hash{$key}, 7; my $type = $tmp[0]; my $dt = $tmp[1]; my ($year, $month, $day) = $dt =~ m|(\d{4})(\d{2})(\d{2})T.*|; if(!defined($year)) { DBG("key 2 not defined. Deleting.\n"); de +lete($hash{$key}); untie %hash; exit 1; }; if(!defined($month)) { DBG("key 3 not. Deleting.\n"); delete($h +ash{$key}); untie %hash; exit 1; }; if(!defined($day)) { DBG("key 4 not defined. Deleting.\n"); del +ete($hash{$key}); untie %hash; exit 1; }; printf("bucket: %s\ttype: %s\t%s-%s-%s\t%s\n",$bucket,$type,$ye +ar,$month,$day,$dt) } untie %hash; } } # end foreach mailhost sub DBG($) { my $msg = shift; print $msg if ($verbose); }

      Version with line numbers:

      1 #!/usr/bin/perl -w 2 3 # use perl; 4 use DB_File; 5 use DBI; 6 use File::Basename qw(basename); 7 use strict; 8 use vars qw($verbose); 9 10 my $me = basename($0); $me =~ s/\.pl$//; 11 $verbose = shift || 1; 12 13 sub DBG($); 14 15 my $qdir = '/var/www/noc.mydomain.com-80/'; 16 my %hashes = ( ); 17 18 my $version = '1.9'; 19 my @mailhosts = qw(); 20 21 push @mailhosts, 'mail01'; 22 push @mailhosts, 'mail02'; 23 24 foreach my $mhosts (@mailhosts) { 25 for (my $i = 0; $i < 256; $i++) { 26 my $bucket = sprintf('%02x', $i); 27 my $file = sprintf('%s/%s/%02x.db', $qdir, $mhosts, $ +i); 28 29 tie (my %hash, 'DB_File', $file, O_RDWR, 0600, $DB_HASH) + || next; 30 foreach my $key ( keys %hash ) { 31 32 my @tmp = split /\t/, $hash{$key}, 7; 33 my $type = $tmp[0]; 34 my $dt = $tmp[1]; 35 my ($year, $month, $day) = $dt =~ m|(\d{4})(\d{2})(\d +{2})T.*|; 36 if(!defined($year)) { DBG("key 2 not defined. Deletin +g.\n"); delete($hash{$key}); untie %hash; exit 1; }; 37 if(!defined($month)) { DBG("key 3 not. Deleting.\n"); + delete($hash{$key}); untie %hash; exit 1; }; 38 if(!defined($day)) { DBG("key 4 not defined. Deleting +.\n"); delete($hash{$key}); untie %hash; exit 1; }; 39 printf("bucket: %s\ttype: %s\t%s-%s-%s\t%s\n",$bucket +,$type,$year,$month,$day,$dt) 40 41 } 42 untie %hash; 43 } 44 } # end foreach mailhost 45 46 sub DBG($) { my $msg = shift; print $msg if ($verbose); } 47

        Instead of this:

        foreach my $key ( keys %hash ) { my @tmp = split /\t/, $hash{$key}, 7; my $type = $tmp[0]; my $dt = $tmp[1]; my ($year, $month, $day) = $dt =~ m|(\d{4})(\d{2})(\d +{2})T.*|;
        Try this and see what you get:
        • Remove the LIMIT in the split function, in this case 7, and change "\t" to "\s+"
        • Test your match string, i.e $dt if it matches
        foreach my $key ( keys %hash ) { my @tmp = split /\s+/, $hash{$key}; my $type = $tmp[0]; my $dt = $tmp[1]; my ($year, $month, $day) = ('','',''); if($dt =~ m|(\d{4})(\d{2})(\d+{2})T.*$|){ ($year, $month, $day)=($1,$2,$3); } ...... ...... printf("bucket: %s\ttype: %s\t%s-%s-%s\t%s\n",$bucket +,$type,$year,$month,$day,$dt); ......

Re: Deleting undefined entries from DB_FILE hash
by 2teez (Vicar) on Jul 23, 2012 at 02:25 UTC
    hi,

    The problem is that the delete($hash{$key}) doesn't seem to actually be deleting the hash

    How do you delete undefined hash key?
    if hash key is not present, there is nothing to delete. Don't you think so?
    Peruse this:

    use Data::Dumper; my $family={ father=>'foo', mother=>'bar',}; for(keys %$family){ if(!defined($_)){ delete $family->{$_} } } print Dumper $family; # print out your hash
    but if you write this:
    use Data::Dumper; my $family={ father=>'foo', mother=>'bar',}; for(keys %$family){ if(defined($_)){ delete $family->{$_} } } print Dumper $family; # print empty hash
    I think you might have to look at your code construct again.

    UPDATE:
    Also in this:

    my $file = sprintf('%s/%s/%02x.db', $qdir, $mypc, $i);
    Hope, $qdir is decleared somewhere in your code?

      It just seemed like there was _something_ in that position in the database for it to return the undefined value.

      I thought the issue was that split was being run against a string with null values among the tab delimiter or the entry was somehow otherwise incomplete, leading to the "Use of uninitialized value in split at ./list_hash.pl line 74." that I receive if I remove the prior checks and try and split the string anyway.

      It loks like you've used all different constructs, and I'm not familar with the abstraction your using. It's beyond my level of perl understanding, unfortunately.

      Thanks,
      Alex

        If then you suspect split function, then check what you are getting for each split "action" ( using print ), before assigning them to other variables. like so:

        print join "\n",split /\t/, $hash{$key}, 7;
        See if you have 7 seperate values. If not, then one can say try "\s+" in the split function, instead of the "\t" like so:
        print join "\n",split /\s+/, $hash{$key}, 7;
        If that works, then :-)! You got it!!!

Re: Deleting undefined entries from DB_FILE hash
by Athanasius (Archbishop) on Jul 23, 2012 at 02:55 UTC

    Just a thought:

    You may need to replace this:

    tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || next;

    with this:

    tie (my %hash, 'DB_File', $file, O_RDWR, 0600, $DB_HASH) || next;

    See DB_HASH of DB_File.

    Update: The code shown is a fragment which does not compile. Please post a complete example if possible; or, at the least, indicate which line in the fragment shown is the line 66 referenced in the error message.

    Plus minor edits.

    HTH,

    Athanasius <°(((><contra mundum