Hi

I thought I'd post a database solution. Its not really necessary as the Perl code solution works. The advantage of loading into a database is if your file is too large to fit in memory. Also, if you wanted to see different views of the data, it would be probably easier to write an SQL query than to write another program, etc.

I can't vouch for the SQL here - I don't use it often, but it did produce the results similiar to the Perl program above.

You could run it if you had the DBI and DBD::SQLite modules on your system.

The first program creates the database and the second program runs the queries.

#!/usr/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect("dbi:SQLite:dbname=users.lite","","", {PrintError => 1, AutoCommit => 0}) or die "Can't connect"; $dbh->do('DROP TABLE users'); $dbh->do(qq{ CREATE TABLE users (user TEXT, site TEXT, type TEXT) }); my $sql_fmt = "INSERT INTO users VALUES(?,?,?)"; while(<DATA>) { $dbh->do($sql_fmt, {}, /"([^"]+)"/g); $dbh->commit if $. % 1_000_000 == 0; # commit every 1,000,000 } $dbh->commit; $dbh->disconnect; __DATA__ user="john" website="www.yahoo.com" type="Entertainment" user="john" website="www.yahoo.com" type="Entertainment" user="john" website="www.yahoo.com" type="Entertainment" user="david" website="www.facebook.com" type="Social Networking" user="john" website="www.facebook.com" type="Social Networking" user="mike" website="www.google.com" type="Search Engines"
#!/usr/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect("dbi:SQLite:dbname=users.lite","","", {PrintError => 1, AutoCommit => 0}) or die "Can't connect"; # Prepare and print list of all websites to every user my $sth = $dbh->prepare(<<SQL); SELECT * FROM users ORDER BY user, site SQL $sth->execute; while(my @row = $sth->fetchrow_array) { printf "%-15s%-20s%s\n", @row; } print "\n"; # Create list of users from most visits to least for @users array $sth = $dbh->prepare(<<SQL); SELECT user, COUNT(user) Count FROM users GROUP BY user ORDER BY Count DESC, user SQL $sth->execute; my @users; while(my @row = $sth->fetchrow_array) { push @users, $row[0]; } # Counts for each website and counts of categories visited by user for my $user (@users) { $sth = $dbh->prepare(qq{SELECT site, COUNT(site) Count FROM users WHERE user = '$user' GROUP BY site ORDER BY Count DESC }); $sth->execute; printf "Name: %s\n\t%-20s%s\n", $user, qw/ Website Count /; while(my @row = $sth->fetchrow_array) { printf "\t%-20s%s\n", @row; } print "\n"; printf "\t%-20s%s\n", qw/ Category Count /; $sth = $dbh->prepare(qq{SELECT type, COUNT(type) Count FROM users WHERE user = '$user' GROUP BY type ORDER BY Count DESC }); $sth->execute; while(my @row = $sth->fetchrow_array) { printf "\t%-20s%s\n", @row; } print "\n"; } $dbh->disconnect;

Chris

Update: Re-wrote the query in loop of '@users'.


In reply to Re^9: Hash of Hashes from file by Cristoforo
in thread Hash of Hashes from file by cipher

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.