Quicksilver has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to build a tag cloud derived in terms in a database as an exercise but I've found myself out of my depth in terms of linking the pieces of code together to obtain the top 200 words (there are over 20,000 so I'm looking to cut it down a little). I'd be grateful for some help in finding a solution so that I can learn where I've gone awry.

#!c:\perl\bin\perl.exe use strict; use warnings; use DBI; my $count; my $tagCnt; my $logRange; my $fsize; my $k; my @tagfiledb; my @sortkeys; my $word; my $occurrence; my $tags; my %dbtags; my $dbh = DBI->connect('dbi:mysql:milton:localhost', 'user', 'pword'); my $sth = $dbh->prepare ('SELECT word, occurrences FROM statistic'); #load in tag file - do this from db my $tagfile = shift; $sth->execute; while (($word, $occurrence) = $sth->fetchrow_array ){ %dbtags = ($word, $count); } $sth->finish; $dbh->disconnect(); $tags= @tagfiledb; my $useLogCurve = 1; my $minFontSize = 10; my $maxFontSize = 36; my $fontRange = $maxFontSize - $minFontSize; #filter the script to top 200 tags my $maxtags = 200; @sortkeys = sort {$tags->{$b}->{count}<=> $tags->{$a}->{$count}} keys %{$tagfile}; @sortkeys = splice @sortkeys, 0, $maxtags; #determine counts my $maxTagCnt = 1; my $minTagCnt = 10000000; foreach $k (@sortkeys) { $maxTagCnt = $tags->{$k}->{count} if $tags->{$k}->{count} > $maxTagCnt; $minTagCnt = $tags->{$k}->{count} if $tags->{$k}->{count} > $minTagCnt; } my $minLog = log($minTagCnt); my $maxLog = log($maxTagCnt); my $logrange = $maxLog - $minLog; $logrange = 1 if ($maxLog - $minLog); sub DetermineFontSize ($) { my ($tagCont) = @_; my $cntRatio; if ($useLogCurve) { $cntRatio = log($tagCnt)-$minLog/$logRange; } else { $cntRatio = ($tagCnt-$minTagCnt)/($maxTagCnt-$minTagCnt); } $fsize = $minFontSize + $fontRange * $cntRatio; return $fsize; } #output tag cloud print "Content-type: text/html\n"; print<<EOT; <html> <head> <link href="/css/tagcloud.css" rel="stylesheet" type="text/css"> </head> <body> <div class=\"cdiv\"> <p class=\"cbox\"> EOT #output the keys foreach $k(sort @sortkeys){ $fsize = DetermineFontSize($tags->{$k}->{$occurrence}); my $tag = $tags->{$k}->{$word}; printf int($fsize), $tag; } #output end of tag file print <<EOT </p> </div> </body> </html> EOT __END__

Replies are listed 'Best First'.
Re: Building a tag cloud from a database
by dragonchild (Archbishop) on Apr 29, 2008 at 13:09 UTC
    Depending on the database, you could do this:
    SELECT word, occurrences FROM statistic ORDER BY occurrences DESC LIMI +T 200
    Also, most databases will provide some amount of mathematical functioning for you and, in some cases, it can be much more efficient. Just make sure you comment the heck out of the SQL.

    And, the last problem is:

    $dbtags{$word} = $occurrences;

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      Just make sure you comment the heck out of the SQL.

      Better yet, don't put the heck in your SQL in the first place. It would only confuse those who have to maintain the code later.

      ;-)
        Not if you hire a heckler
Re: Building a tag cloud from a database
by stiller (Friar) on Apr 29, 2008 at 12:57 UTC
    Update:Looking over it again, you have a lot of errors in you program that perl would have told you about if you didn't predeclare you variables, like the one I point out first below /update

    v----- NOT $count? while (($word, $occurrence) = $sth->fetchrow_array ){ %dbtags = ($word, $count); } ^--- NOT $occurence?

    If you'd done
    while (my ($word, $occurence) = rather than predeclaring your $count, perl could have told you.

    foreach $k (@sortkeys) { make it

    foreach my $k (@sortekeys) {

    sub DetermineFontSize ($) sub DetermineFontSize

    print <<EOT print <<"EOT";

    Update2:You should also try Data::Dumper, and do some print Dumper( \$var ) various places in your code, start with print Dumper( \%dbtags ); just after pulling from the database. Then, ask yourself why you never use that hash again....

Re: Building a tag cloud from a database
by apl (Monsignor) on Apr 29, 2008 at 13:02 UTC