in reply to Re^3: Hash of Hashes from file
in thread Hash of Hashes from file

Yes hashes have unique keys, Is it possible to generate a hash like this:
%hoh=( $user => { 'Website' => [website1,website2,website3], 'type' => [type +1,type2,type3]} );
As I said I am not familiar with hashes, just checking.

Replies are listed 'Best First'.
Re^5: Hash of Hashes from file
by scorpio17 (Canon) on Apr 03, 2012 at 15:19 UTC
    Yes, that should work. Then to add a new website/type, you could do this:
    push( @{ $hoh{$user}{'Website'} }, 'website4'); push( @{ $hoh{$user}{'type'} }, 'type4'};

    and to get all websites for a given user:

    my @websites = @{ $hoh{$user}{'Website'} };

    The syntax looks strange because you're storing an array ref, and have to dereference it.

      Thanks a lot,

      Can you also let me know how do I print all websites and types for each user ?

        If your data looks like this:

        %hoh = ( user1 => { Website => ['website1', 'website2', 'website3'], type => ['type1','type2','type3'], }, user2 => { Website => ['website1', 'website2', 'website3'], type => ['type1','type2','type3'], }, user3 => { Website => ['website1', 'website2', 'website3'], type => ['type1','type2','type3'], }, );
        Then try something like this (untested):
        for my $user (sort keys %hoh) { my @websites = @{ $hoh{$user}{'Website'} }; my @types = @{ $hoh{$user}{'type'} }; # assume we have the same number of each? unless (scalar(@websites) == scalar(@types)) { die "number of websites is different from number of types!"; } print "$user :\n"; for ( my $i=0; $i < scalar(@websites); ++$i) { print " $websites[$i]\n"; print " $types[$i]\n"; } print "\n"; }

        However, if you go with data like this:

        %hoh = ( # actually now a hash of arrays of hashes (HoAoH) user1 => [ { Website => 'website1', type => 'type1',}, { Website => 'website2', type => 'type2',}, { Website => 'website3', type => 'type3',}, ], user2 => [ { Website => 'website1', type => 'type1',}, { Website => 'website2', type => 'type2',}, { Website => 'website3', type => 'type3',}, ], user3 => [ { Website => 'website1', type => 'type1',}, { Website => 'website2', type => 'type2',}, { Website => 'website3', type => 'type3',}, ], );
        Then your code becomes (untested):
        for my $user (sort keys %hoh) { print "$user :\n"; # each element of this array is a hash ref for my $data ( @{ %hoh{$user} } ) { print " $data->{'Website'}\n"; print " $data->{'type'}\n"; } print "\n"; }

        So, depending on what you need to do, pick the data structure that makes your life easier.

Re^5: Hash of Hashes from file
by Cristoforo (Curate) on Apr 03, 2012 at 20:57 UTC
    Yes, it should be possible but the syntax might get sticky. As scorpio17 says in this thread, what you want to do with the data is one the the deciding factors in how you want to store it. (Along with other factors). If the data is too large to load into memory, you may need to consider some of the suggestions offered by others here.

    If the requirement is just to produce output like you provided, a Hash of Hashes may be a good choice.

    #!/usr/bin/perl use strict; use warnings; my %data; while (<DATA>) { my ($user, $site, $cat) = /"([^"]+)"/g; $data{$user}{$site} = $cat; } for my $user (keys %data) { my $href = $data{$user}; print $user, "\n"; print "\tWebsite: $_, Category: $href->{$_}\n" for keys %$href; } __DATA__ user="john" website="www.yahoo.com" type="Entertainment" user="david" website="www.facebook.com" type="Social Networking" user="john" website="www.facebook.com" type="Social Networking" user="mike" website="www.google.com" type="Search Engines"
    Output was:
    john Website: www.yahoo.com, Category: Entertainment Website: www.facebook.com, Category: Social Networking mike Website: www.google.com, Category: Search Engines david Website: www.facebook.com, Category: Social Networking
    Update: With a file this large, would it be likely for one user to visit the same website more than once?
      Yes users visit same websites multiple times. For this reason I added Key as string "Website" and value as the actual website.
        I got the following output:
        C:\Old_Data\perlp>perl t33.pl david Website: www.facebook.com, Category: Social Networking john Website: www.yahoo.com, Category: Entertainment Website: www.yahoo.com, Category: Entertainment Website: www.yahoo.com, Category: Entertainment Website: www.facebook.com, Category: Social Networking mike Website: www.google.com, Category: Search Engines Name: john Website Count www.yahoo.com 3 www.facebook.com 1 Type Count Entertainment 3 Social Networking 1 Name: mike Website Count www.google.com 1 Type Count Search Engines 1 Name: david Website Count www.facebook.com 1 Type Count Social Networking 1
        From this data:
        user="john" website="www.yahoo.com" type="Entertainment" user="john" website="www.yahoo.com" type="Entertainment" user="john" website="www.yahoo.com" type="Entertainment" user="david" website="www.facebook.com" type="Social Networking" user="john" website="www.facebook.com" type="Social Networking" user="mike" website="www.google.com" type="Search Engines"
        Notice that there are quotes surrounding every field. The regular expression that captures these fields from the file would need to be changed if thats not the case.

        In my program I use 2 hashes - one to count the number of sites visited by each user, %count, and one to count each address and category (by user), %data. It seems to work OK for this small data set.

        #!/usr/bin/perl use strict; use warnings; my (%data, %count); while (<DATA>) { my ($user, $site, $cat) = /"([^"]+)"/g; $data{$user}{ qq{$site$;$cat} }++; $count{$user}++; } for my $user (sort keys %data) { my $href = $data{$user}; print $user, "\n"; for my $key (keys %$href) { my $str = sprintf "\tWebsite: %s, Category: %s\n", split /$;/, + $key; print $str x $href->{$key}; } } my @ordered = sort {$count{$b} <=> $count{$a}} keys %count; print "\n\n"; for my $user (@ordered) { my $href = $data{$user}; print "Name: $user\n\tWebsite Count\n"; for my $key (sort {$href->{$b} <=> $href->{$a}} keys %$href) { printf "\t%-20s%d\n", (split /$;/, $key)[0], $href->{$key}; } print "\n"; print "\tType Count\n"; for my $key (sort {$href->{$b} <=> $href->{$a}} keys %$href) { printf "\t%-20s%d\n", (split /$;/, $key)[1], $href->{$key}; } print "\n\n"; }
        The line $data{$user}{ qq{$site$;$cat} }++; uses a 'compound' key ($site and $cat joined by $;).

        Here is a dump of %data.

        $VAR1 = { 'john' => { 'www.yahoo.com‡˜Entertainment' => 3, 'www.facebook.com‡˜Social Networking' => 1 }, 'mike' => { 'www.google.com‡˜Search Engines' => 1 }, 'david' => { 'www.facebook.com‡˜Social Networking' => } };

        Update: Whoops, that doesn't count the categories correctly :-(
        If there was another site with the same category, it wouldn't be totaled with the same category from another site.