htmanning has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I'm searching a database of messages. Each message has a unique ID but also has a threadID to track conversations. If I search by username, the user may have 128 rows (messages) in the database each with it's own unique ID, but there may only be 8 different threadIDs as each thread may have several messages in it.

The only thing I could think of to do was to create a $threadIDlist. Each time I loop through I check the threadIDlist to see if the current threadID is on the list. If it is, I skip it otherwise I print that row then add the current threadID to the list. There must be a better way to do this but this is where I am:

$rows=$sth->rows; while (($pointer = $sth->fetchrow_hashref) && ($current_count +_all <= $stopcount_all)){ $threadID = $pointer->{'threadID'}; if ($threadIDlist !~ /$threadID/) { $threadIDlist=$threadID . "," . $threadIDlist; $threadcount++; #print message list here } }

It works, BUT I need to know the total number of threadIDs. When I count the number of rows it gives me the total number of rows (128) not the number of threadIDs.

The only thing I could think to do was to do this same loop BEFORE I start printing the list so I could track the $threadcount. I'm assuming that's a pretty lame way of doing it.

If my explanation makes sense, can someone give me a better way to grab the total number of threads without having to loop through twice?

Replies are listed 'Best First'.
Re: Counting fields from database (updated)
by haukex (Archbishop) on Oct 02, 2020 at 21:49 UTC
    I need to know the total number of threadIDs

    It's a little unclear to me from your description whether you just need to know this information, in which case you can ask the database to provide it with a GROUP BY clause, or whether you're looping through all messages anyway, and you just want to count thread IDs*. In the latter case, since I assume the thread IDs are a unique identifier per thread, you can do this with a hash. Here are examples of both:

    use warnings; use strict; use Data::Dump; use DBI; # set up a dummy database my $dbh = DBI->connect("dbi:SQLite::memory:", undef, undef, { RaiseError=>1 }); $dbh->do(<<'END'); CREATE TABLE messages ( msgID INTEGER, threadID INTEGER ) END my $sth_i = $dbh->prepare( 'INSERT INTO messages (msgID, threadID) VALUES (?,?)'); $sth_i->execute($_,$_>>4) for 0..127; # ask the DB to give us per-thread information dd $dbh->selectall_arrayref( 'SELECT threadID, COUNT(*) FROM messages GROUP BY threadID'); # => [ [0, 16], [1, 16], [2, 16], [3, 16], # [4, 16], [5, 16], [6, 16], [7, 16], ] # loop through rows and count threads ourselves my $sth_s = $dbh->prepare( 'SELECT msgID, threadID from messages'); $sth_s->execute; my %threadIDs; while ( my $row = $sth_s->fetchrow_hashref ) { $threadIDs{ $row->{threadID} }++; } dd \%threadIDs; # => { "0" => 16, "1" => 16, "2" => 16, "3" => 16, # "4" => 16, "5" => 16, "6" => 16, "7" => 16 }

    * Update: Upon rereading, it's probably the latter. By the way, $current_count_all <= $stopcount_all sounds like you want to limit the number of records returned, which you can also do in the database, e.g. in MySQL with a LIMIT clause.

    Update 2: For completeness, there are two issues with the approach you showed in the OP. First, you need to be certain that $threadID doesn't ever contain the separator character ("," in this case), and your regex is too simple in that it will also match partial IDs (e.g. /234/ will match in "1234,5678"), which you'd need to prevent by anchoring the regex appropriately - for example, if the IDs are integers, you could say /\b\Q$threadID\E\b/) (note I've used \Q...\E to escape any special characters in $threadID even though I just said they're integers; it's just to play it extra safe). But the method of searching the string for the $threadID will be way less efficient than a hash, so I would always recommend that instead. Also modified the first example to supply the threadID in addition to the count.

      Thanks for this. I've updated my regex with what you suggested. You're right, mine was too simple.

      I used this example from someone else below and it works, but I still have to query the database twice. I can't seem to figure out how to do it in one loop, but it works. I loop through to count the total threads, then loop through again to print the rows.

      I think my logic is flawed. This is a better example of what I'm doing:

      This is the query of the database:

      Select * from messages where (username='$username' or toname='$usernam +e') and message!='' order by dateadded desc, ID desc
      This follows:
      $titlesperpage = 10; $currentpage = 1; $startcount = ($currentpage - 1) * $titlesperpage + 1; $stopcount = $currentpage * $titlesperpage; $current_count = 0; $threadcount=0; $threadIDlist = ""; while ($pointer3 = $sth3->fetchrow_hashref){ $current_count++; $threadID = $pointer3->{'threadID'}; $threads{$pointer3->{'threadID'}}++; if ($threadIDlist !~ /$threadID/) { $threadIDlist=$threadID . "," . $threadIDlist; $current_count--; $threadcount++; } }
      I try to only print 10 records per page. It works perfectly for the first page, but when I come back to print the second page it duplicates some threadIDs because they are not yet in the $threadIDlist var. The second page is called with this query_string:
      ?currentpage=2

        Sorry, but your code is not runnable, and doesn't seem to be representative of the problem you're having. Please see How do I post a question effectively?, I know what I mean. Why don't you?, and Short, Self-Contained, Correct Example.

        This is a guess of what you want that I've pieced together from your various posts.

        use warnings; use strict; use Data::Dump; use DBI; my $titlesperpage = 10; # set up a dummy database my $dbh = DBI->connect("dbi:SQLite::memory:", undef, undef, { RaiseError=>1 }); $dbh->do(<<'END'); CREATE TABLE messages ( msgID INTEGER, threadID INTEGER ) END my $sth_i = $dbh->prepare( 'INSERT INTO messages (msgID, threadID) VALUES (?,?)'); $sth_i->execute($_,$_>>4) for 0..127; for my $currentpage (1..5) { # simulate requests for different pages my $startcount = ($currentpage - 1) * $titlesperpage + 1; my $stopcount = $currentpage * $titlesperpage; dd $startcount, $stopcount; my $sth_s = $dbh->prepare('SELECT msgID, threadID FROM messages'); $sth_s->execute; my %threadIDs; my $rowcnt = 1; while ( my $row = $sth_s->fetchrow_hashref ) { $threadIDs{ $row->{threadID} }++; if ( $rowcnt >= $startcount && $rowcnt <= $stopcount ) { dd $row->{msgID}; } } continue { $rowcnt++ } dd \%threadIDs; }

        But this loops through all results for every request. You can have the database do the pagination, even though you have to hit the database twice it still doesn't require you to loop through all records, only those for the current page.

        for my $currentpage (1..5) { # simulate requests for different pages my $startcount = ($currentpage - 1) * $titlesperpage + 1; dd $dbh->selectall_arrayref( 'SELECT threadID, COUNT(*) FROM messages GROUP BY threadID'); my $sth_s = $dbh->prepare('SELECT msgID, threadID FROM messages ' .'ORDER BY msgID LIMIT ?,?'); $sth_s->execute($startcount, $titlesperpage); while ( my $row = $sth_s->fetchrow_hashref ) { dd $row->{msgID}; } }
Re: Counting fields from database
by wazat (Monk) on Oct 02, 2020 at 23:53 UTC
    If all you want is a count of unique theadIDs, you can use a hash.
    rows=$sth->rows; %threads; while (($pointer = $sth->fetchrow_hashref) && ($current_count_all <= $ +stopcount_all)){ $threads{$pointer->{'threadID'}}++; } my $threadCount = scalar keys %threads;
      Thanks! Your example with the hash works but I still have to hit the database twice. Once to get the total threadcount and then again to print the content. I wasn't able to get it to work in one loop, but that's okay it works. Thanks.
        Clearly you cannot print any counts until after you have read all the data at least once. As you read the DB, compute the counts and store all data (including counts) in a convenient data structure. Pass that structure to a subroutine which formats and prints it any way you like. Any data structure that saves all the data will 'work', but a well designed structure will greatly reduce the amount of code you must write. You have not told us enough about your data or your print requirements for us to offer much help.
        Bill
Re: Counting fields from database
by Anonymous Monk on Oct 03, 2020 at 14:22 UTC
    SELECT DISTINCT threadId FROM ... ... SELECT COUNT(DISTINCT threadID) ...
      Garbage as usual.