Stamp_Guy has asked for the wisdom of the Perl Monks concerning the following question:

I'm attempting to learn how to use hashes and I'm stuck. I am trying to parse the following file:
614 MNH,16.00|USED,32.00|PB,50.00 615 MNH,36.00|USED,12.00|PB,50.00 616 MNH,96.00|USED,2.00|PB,10.00

The code I am using to do so is as follows:

#!/usr/bin/perl -w use strict; my (@record, @details, @item, $detail, $scottnum, $details, %items); open(DB, "db.txt") || die "Could not open the database: $!"; while(<DB>) { chomp; @record = split(/\t/); $scottnum = $record[0]; @details = split(/\|/, "$record[1]"); foreach $detail (@details) { @item = split(/,/, "$detail"); if (($item[0] ne "") && ($item[1] ne "")){ $items{"$item[0]"} = "$item[1]"; foreach my $key (%items) { print "$scottnum\t $key \t $items{$ +key}\n"; } } } } close(DB) || die "Could not close the database: $!";

I keep getting the error: "use of uninitialized value at parse.pl line 19, <DB> chunk 3". I get that error every other line. Could someone please explain to me what I'm doing wrong?

Stamp_Guy

Replies are listed 'Best First'.
Re: Stuck while learning about the hash
by dws (Chancellor) on Jun 08, 2001 at 06:58 UTC
    Change   foreach my $key (%items) { to   foreach my $key (keys %items) {
      Ok I did that. Sorry, that was a stupid oversight on my part. However, once I did do that, I got the following output:

      614 MNH 16.00 614 USED 32.00 614 MNH 16.00 614 USED 32.00 614 PB 50.00 614 MNH 16.00 615 USED 32.00 615 PB 50.00 615 MNH 36.00 615 USED 12.00 615 PB 50.00 615 MNH 36.00 615 USED 12.00 615 PB 50.00 615 MNH 36.00 616 USED 12.00 616 PB 50.00 616 MNH 96.00 616 USED 2.00 616 PB 50.00 616 MNH 96.00 616 USED 2.00 616 PB 10.00 616 MNH 96.00

      Not only is that 6 of each number instead of 3, but the order is all messed up. Any idea what's going on here?

      Stamp_Guy

        You're getting the repeat because the %items hash isn't unique for each of the "$scottnum". You can fix it easily by putting an undef %items after you print out the hash.
        if (($item[0] ne "") && ($item[1] ne "")){ $items{"$item[0]"} = "$item[1]"; foreach my $key (keys %items) { print "$scottnum\t $key \t $items{$key}\n"; } undef %items; }
        But this probably isn't what you wanted. You probably wanted a hash with a key and a subkey (a hash of hashes). I'm not good at explaining so let me show you instead:
        while(<DB>) { chomp; @record = split(/\t/); $scottnum = $record[0]; @details = split(/\|/, "$record[1]"); foreach $detail (@details) { @item = split(/,/, "$detail"); if (($item[0] ne "") && ($item[1] ne "")){ ########################################### $items{$scottnum}{"$item[0]"} = "$item[1]"; ########################################### } } } foreach my $scottnum (keys %items) { for my $key (keys %{$items{$scottnum}}) { print "$scottnum\t $key \t $items{$scottnum}{$key}\n"; } }
        This way a hash of hashes is created.

        Stylistically, I would have probably done it something like:

        my %info; open(DB, "db.txt") || die "Could not open the database: $!"; while (<DB>) { my @info=split /[\s|,]+/,$_; my $scott=shift @info; while (my $key=shift @info) { $info{$scott}{$key}=shift @info; } } close DB;
        But you are definately on the right track!
        > Not only is that 6 of each number instead of 3, but the order is all messed up. Any idea what's going on here?

        A quick note about order -- arrays will stay in the order you assign them in, but hashes may not. If you assign
        $array[0] = "foo"
        Then the value foo will always be in the first position in the array, unless you later reassign it in some way.

        However, hashes are a bit different. Whenever you have a hash, Perl put's them in whatever order it finds easiest to store them in. So if you assign:

        $hash{foo} = "abc"; $hash{bar} = "xyz";
        and you later run:
        @somearray = keys %hash
        You cannot expect the key 'foo' to be returned first and 'bar' to be second. The only way to guarantee your output here is to run the command through something like sort:
        @somearray = sort(keys %hash);
        -Eric
Re: Stuck while learning about the hash
by Zaxo (Archbishop) on Jun 08, 2001 at 08:42 UTC

    Ignoring the error, your code is logically very clear. It follows the database layout exactly.

    I'd like to point out a couple of hash tricks which will simplify the code and make it easier to extend.

    Unless your print statement is a proxy for an sql insert or a print to a different flat file, you may want to collect all the data in an uberhash. That might not be desirable if the database is large.

    #!/usr/bin/perl -w use strict; my $records = {}; # reference to the anonymous uberhash my @record; #could be declared my in the loop where assigned open(DB, "< db.txt") || die "Could not open the database: $!"; while(<DB>) { chomp; @record = split(/\t/); $records->{$record[0]} = { # we're starting an anon hashref split /\||,/, # split to list on pipe or comma $record[1]}; # done print map "$record[0]\t$_\t$records->{ $record[0] }{ $_ }\n", keys %{$records->{$record[0]}}; } close(DB) || die "Could not close the database: $!"; # now you can set other values in the hash if you like: # $records->{$index}{'description'} = "Randomly indexed item";

    By doing the item hash in one swell foop, we save on temporaries and loop accounting. Separating the printed output from the db parsing makes it easier to cut and paste on the code.

    After Compline,
    Zaxo

    UpdateCorrected missing '#'

Re: Stuck while learning about the hash
by Arguile (Hermit) on Jun 08, 2001 at 11:40 UTC

    Hrmm... I might be really off here but wouldn't this construct be simpler? (pretty new to perl as well and not fully conversant with the intricacies of hashes)

    #!/usr/bin/perl -w use strict; my %data; open DB, "db.txt"; while (<DB>) { @_ = split/[\t,|\n]/; $data{ $_[0] } = { @_[1..$#_] }; } close DB;

    I tried shift @_ then just @_ in the right side of the equation and learned again that it's evaluated all as one not left to right ;). Anyways, this way you get a data structure something like this:

    # pseudo representation 615 = ( MNH => 36.00 USED => 12.00 PB => 50.00 )

    Some examples on how to access it.

    keys %data # 614, 615, 616 $data{615} # an anon hashref $data{615}{USED} # 12.00 keys %{ $data{615} } # MNH, USED, PB

    Printing out the structure (I've never used DATA::Dumper might it apply here?):

    for my $key (keys %data) { print "\n$key"; my $h_ref = $data{$key}; printf "\t%s\t%6.2f\n", $_, $h_ref->{$_} for keys %$h_ref; }
    tilly suggested the $h_ref for speed (less hash lookups per inner loop iteration), I'm declaring inside because I'm lazy and don't want to undef it after.

    It's output:

    614 MNH 16.00 USED 32.00 PB 50.00 615 MNH 36.00 USED 12.00 PB 50.00 616 MNH 96.00 USED 2.00 PB 10.00

    I didn't do any of the null checks, but that's pretty trivial to add. The \n in the split handles getting rid of the newline.. chomp may be better, I'm not sure.

    Update

    Just for fun I one lined the db to hash (not exactly efficient):

    $data{ (split/\t/)[0] } = { (split/[\t,|\n]/)[1..6] } while <DB>;
    Update 2

    Oops, fixed up what Hofmator suggested... I swear I knew that about char classes I just didn't know split well enough and it sort of evolved that way as I worked through ;). Thanks.

      I just want to point out a small error in your code, which in this case proved to be no error at all - but just because you were lucky. You wrote:

      while (<DB>) { @_ = split/[\t|,|\||\n]/; # here things go accidently righ +t ;-) $data{ $_[0] } = { @_[1..$#_] }; }

      You wanted to split on  <tab>, <,>, <|> or <newline> and you used a character class for that. A character class matches any of the characters in that class, e.g. [aeiou] matches any one vocal and you must not use an alternation character <|> in the class! The code worked nevertheless as - by chance - the <|> is part of the character class. It need not be escaped, but you can do it if you want. If you want to use alternation than you would have to write

      @_ = split /\t|,|\|\n/;
      but this is not efficient. The character class is the better idea. So you would end up with
      @_ = split /[\t\n,|]/;

      Furthermore, if you are not sure whether there is additional space around the delimiters this would change to

      @_ = split /[\s,|]+/;

      allowing for one or more of the chars in the character class. I also removed the split on the ending <newline> as this only produces an empty field at the end which is not returned by split.

      See also split and perlre.

      -- Hofmator

        Here's my 10-ct's worth. This is a bit after all the other posts, so it's probably not much help, (insert random excuse here about having to work next to my client) but I decided to post it anyways.

        I got lazy and I also noticed that this is a question about learning to use hashes so I'm keeping the hash references down to a minimum and am using the key to point to useful data. However, doing this loses the sorting capabiltiy that a hash of hashes would allow, so it's sub-optimal.

        #!/usr/bin/perl -w use strict; my %items; open(DB, "db.txt") || die "Could not open the database: $!"; while(<DB>) { chomp; my @record = split(/\t/); my $scottnum = shift @record; my @details = split(/\|/, shift @record); foreach my $detail (@details) { my @item = split(/,/, $detail); if ( ($item[0]) && ($item[1]) ){ $items{$scottnum.'-'.(shift @item)} = shift @item; } } } foreach my $key (sort (keys %items) ) { my ($sn,$mk) = split ('-',$key); print "key: $key \tmk: $mk\t sn: $sn item: $items{$key}\n"; }

        The script returns:

        key: 614-16.00 mk: 16.00 sn: 614 item: MNH key: 614-32.00 mk: 32.00 sn: 614 item: USED key: 614-50.00 mk: 50.00 sn: 614 item: PB key: 615-12.00 mk: 12.00 sn: 615 item: USED key: 615-36.00 mk: 36.00 sn: 615 item: MNH key: 615-50.00 mk: 50.00 sn: 615 item: PB key: 616-10.00 mk: 10.00 sn: 616 item: PB key: 616-2.00 mk: 2.00 sn: 616 item: USED key: 616-96.00 mk: 96.00 sn: 616 item: MNH

        tested on solaris something or other.

        I noticed in the original code that the printing statement was in the wrong place, which is most of why the output errors happened.

        --hackmare.

      This looks like what I was trying to do, but here are a couple problems:

      1. I want to be able to keep the items in order.
      2. I'm dealing with a relatively large file (10,000+ lines), thus it takes an eternity to load everything into memory.
      3. Sometimes I won't know what will be in the place where I inserted USED, MNH, and PB.

      I was working on this project to learn more about hashes and all of you guy's code helped me out tremendously in learning how to use them. I've still got a lot to learn, but you've helped give me a good start.

      Question: is there a way I could make a script like this that would be portable, yet much quicker?

      Stamp_Guy