Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I THINK that this works. I would like to verify and get some suggestions to improve performance.

I have a file that contains hundreds of lines that look like this:
Item1--Item2--Item3 ItemX--Item2--ItemA Item1--ItemV--Item3--Item4
These lines can have any number of items in them. I need to put these into a hash to work with. The approach i am taking is to build each one into a hash reference and then merge it into the main hash.


This is the code that I have so far:
#!/usr/bin/perl -w use strict; use Data::Dumper; my $datafile = "data.txt"; open(FILE,$datafile); foreach my $line (<FILE>) { #Kill linefeeds/returns. Data generated on variety of OS's $line =~ s/\n//g; $line =~ s/\r//g; my @items = split(/--/,$line); my $foo = &bhash(\%shash,@items); @shash{keys %$foo} = values %$foo; } close(FILE); print Dumper \%shash; sub bhash { my($hash,@keys )= @_; my $ref= \$hash; map($ref= \$$ref->{$_}, @keys); return $$ref; }
This seems kind of clunky and error prone to me though.

Replies are listed 'Best First'.
•Re: Generating Hashes from arrays of arbitrary size
by merlyn (Sage) on Sep 30, 2004 at 17:07 UTC
    I'll ask a meta question. What is your assurance that a particular hash value won't have to do double duty as both a value and a nested hashref? In other words, are you sure you're never gonna have:
    A--B A--B--X A--B--Y
    because that would require B to be both a terminal and a non-terminal.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Yes, this is actually a possibility
        This is how I deal with the leaf/node problem. Each level is an array ref, where the first element of the array is the hash ref for the next level, while the second element (if any) is the leaf value (if any).
        use Data::Dumper; my %shash; foreach my $line (<DATA>) { # Get rid of end of line. chomp($line); # Build hash from line. my (@items) = split /--/, $line; @items or next; my $p = \%shash; # Start at root node while (@items > 1) { my $i = shift @items; $p->{$i}[0] ||= {}; # Create new node if necessary $p = $p->{$i}[0]; # Point at new node } my $i = shift @items; # Get last item $p->{$i}[1] = $i; # Make it a leaf of the last node } print(Dumper(\%shash)); __DATA__ Item1--Item2--Item3 ItemX--Item2--ItemA Item1--ItemV--Item3--Item4 Item1--Item2--Item3--Dup! ItemX

        Caution: Contents may have been coded under pressure.
Re: Generating Hashes from arrays of arbitrary size
by dragonchild (Archbishop) on Sep 30, 2004 at 17:09 UTC
    What is the data structure you want to end up with? Do you want a single hash with all of the items as keys (a unique list of all items in your file)? If so, then it's a lot easier than that.
    use strict; use warnings; my $datafile = 'data.txt'; open( my $fh, $datafile ) or die "Cannot open '$datafile' for reading: $!\n"; my %items; while ( defined( my $line = <$fh> ) ) { chomp( $line ); $line =~ s/[\r\n]*$//; my @values = split /--/, $line; @items{ @values } = undef; } close $fh; # At this point, you have %items, which is a lookup table.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Generating Hashes from arrays of arbitrary size
by jdporter (Paladin) on Sep 30, 2004 at 17:08 UTC
    How to do it depends partly on how you want to handle the case
    Item1--Item2 Item1--Item2--Item3
    Should the first Item2 be something different from the second because it's a leaf?

    In my solution below, I'm assuming that the answer is "no".

    my %h; while (<>) { chomp; my @v = split; my $h = \%h; for ( @v ) { $h->{$_} ||= {}; $h = $h->{$_}; } }
Re: Generating Hashes from arrays of arbitrary size
by ikegami (Patriarch) on Sep 30, 2004 at 17:03 UTC

    This returns the same thing:

    use strict; use warnings; use Data::Dumper (); my %shash; my $line; while (defined($line = <DATA>)) { # Get rid of end of line. chomp($line); # Build hash from line. my $p = undef; $p = { $_ => $p } foreach (reverse(split(/--/, $line))); # Merge hashes. my $key; my $base = \%shash; for (;;) { ($key, $p) = each(%$p); last unless ($base->{$key}); $base = $base->{$key}; } $base->{$key} = $p; } print(Data::Dumper::Dumper(\%shash)); __DATA__ Item1--Item2--Item3 ItemX--Item2--ItemA Item1--ItemV--Item3--Item4

    After the fix to my code, it ended up being bigger than your code. ah well. Do note the use of chomp. It's much better than those regexps of yours.

      Do note the use of chomp. It's much better than those regexps of yours.

      Well, no. The OP did mention that the input data comes from a variety of OS's (though he didn't say which OS(s) his script is supposed to run on). Chomp will remove anything at the end of a record that matches "$/", whose default value is OS-dependent, which means that a when a script runs on any sort of unix, chomp will leave a "\r" untouched when the input happens to come directly from a CRLF source.

      I would just recommend simplifying the OP's regex:

      s/[\r\n]*$//;
      Also, I'd recommend a while loop instead of  foreach my $line ( <DATA> ), because the for loop causes the entire file to be slurped into a list before the first iteration begins. For small files, that's not a problem, but why invite this sort of trouble if the files happen to get really big?
        oops, you're right. Script running on multiple OS != Data generated by multiple OS.