sdslrn123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Data is:
FLINTSTONES=BARNEY, FRED, WILMA JETSONS=MAX, TONY, WILMA SIMPSONS=LISA, BARNEY, WILMA, HOMER ALCATRAZ=ELIJAH, MAX, WILMA
I know how to calculate the total number of terms. But, how do I calculate the total number of different terms?
while(<DATA>) { chomp; my ($family, $people) = split /--/, $_, 2; #split + into two: family and memebers my @members = split /,\s* +/, $people, -1; #split family line into an array foreach my $member (@members) { #foreac +h member within array push @{$people{$member}}, $family; #push i +nto an array $families{$family}++; my $total_families = scalar(keys %families);

Replies are listed 'Best First'.
Re: Calculating Total Different Array Terms On All Lines of Datafile
by GrandFather (Saint) on Jun 25, 2006 at 23:49 UTC

    I'm not sure what you are after. But this tidy up of your code may contain the magic you require:

    use warnings; use strict; my %people; my %families; while(<DATA>) { chomp; my ($family, $people) = split /=/, $_, 2; #split into two: family and memebers my @members = split /,\s*/, $people, -1; #split family lin +e into an array foreach my $member (@members) { #foreac +h member within array push @{$people{$member}}, $family; #push i +nto an array } push @{$families{$family}}, @members; } my $total_families = keys %families; my $total_people = 0; $total_people += @{$families{$_}} for keys %families; print "Num Families: $total_families\n"; print "Num People: $total_people\n"; print "All given Names: ", join (' ', sort keys %people), "\n"; print "All family names: ", join (' ', sort keys %families), "\n"; print "All names grouped by family:\n"; print " ", join ', ', do {my $famName = $_; map {"$_ $famName"} sort @{$families{$_}};}, + "\n" for sort keys %families; __DATA__ Flintstone=Barney, Fred, Wilma Jetson=Max, Tony, Wilma Simpson=Lisa, Barney, Wilma, Homer Alcatraz=Elijah, Max, Wilma

    Prints:

    Num Families: 4 Num People: 13 All given Names: Barney Elijah Fred Homer Lisa Max Tony Wilma All family names: Alcatraz Flintstone Jetson Simpson All names grouped by family: Elijah Alcatraz, Max Alcatraz, Wilma Alcatraz, Barney Flintstone, Fred Flintstone, Wilma Flintstone, Max Jetson, Tony Jetson, Wilma Jetson, Barney Simpson, Homer Simpson, Lisa Simpson, Wilma Simpson,

    DWIM is Perl's answer to Gödel
Re: Calculating Total Different Array Terms On All Lines of Datafile
by shmem (Chancellor) on Jun 25, 2006 at 23:26 UTC
    For your example (as there are only families and members) it should be the sum of the number of keys in the %family hash and the number of keys in the %people hash.
    my $total_terms = $total_families + scalar (keys %people);

    scalar (keys %hash) is actually the total number of different keys in a hash (exept you have multikeyed hashes, e.g. with BerkeleyDB), since there are no duplicate keys in a hash. And you use all terms of your data as keys to one or another hash.

    You could also introduce a hash only for the purpose of storing each word found in the datafile:

    while(<DATA>) { $seen{$1}++ while /(\w+)/g; } print scalar(keys %seen);

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Calculating Total Different Array Terms On All Lines of Datafile
by TedPride (Priest) on Jun 25, 2006 at 23:25 UTC
    Not sure what exactly you're trying to do here, but in answer to your question, you use a hash:
    while (<DATA>) { chomp; ($family, $members) = split /=/, $_; @members = split /, /, $members; $total{$_}++ for @members; } for (sort keys %total) { print "$_ : $total{$_}\n"; } print '-'x20, "\nTotal: ", scalar keys %total; __DATA__ FLINTSTONES=BARNEY, FRED, WILMA JETSONS=MAX, TONY, WILMA SIMPSONS=LISA, BARNEY, WILMA, HOMER ALCATRAZ=ELIJAH, MAX, WILMA
Re: Calculating Total Different Array Terms On All Lines of Datafile
by rsriram (Hermit) on Jun 26, 2006 at 07:00 UTC

    Hi, Try using a Array::Compare module from CPAN modules

    Sriram