Re: Faster of two options for finding keywords in a database

Hi cormanaz!

Your two options don't seem to solve the same task. Number one counts the appearances of the keywords as substrings in the JSON string, while number two counts exact matches of the keywords in the keys of the JSON.

The "best" solution depends on the number of keywords vs. the average number of keys in the columns. Assuming the number of keywords is larger, it would make sense to reverse the check. Create a hash from the keywords and lookup the keys from the JSON. Removing some indexing gives some more speedup.

#!/usr/bin/perl

use v5.24;
use warnings;
use JSON::XS;
use Benchmark 'cmpthese';

my @kwlist = qw(foo bar baz qux q0 q1 q2 q3 q4 q5 q6 q7 q8 q9);
my %kwhash = map {($_, 0)} @kwlist;
my $kwre = join('|',@kwlist);
#my @items = getfromdb("select id,text from table"); # I have a sub th
+at does this
my @items = ([1, '{"foo":0}'], [2, '{"foo":0,"qux":0}'],
    [3, '{"foo":0}'], [4, '{"baz":0}'], [5, '{"twx":0}']); # Wildly gu
+essed data

sub var1 {
    foreach my $i (0..$#items) {
        my ($id,$text) = @{$items[$i]};
        my $count = 0;
        while($text =~ /$kwre/g) {
            $count++;
        }
    }
}

sub var2 {
    foreach my $i (0..$#items) {
        my ($id,$temp) = @{$items[$i]};
        my $kws = decode_json($temp);
        my $count = 0;
        foreach my $k (@kwlist) {
            if (defined($kws->{$k})) {
                $count++;
            }
        }
    }
}

sub var3 {
    foreach my $item (@items) {
        my (undef,$text) = @$item;
        my $kws = decode_json($text);
        my $count = 0;
        $count += exists $kwhash{$_} for keys %$kws;
    }
}

cmpthese(0, {
        var1 => \&var1,
        var2 => \&var2,
        var3 => \&var3,
    });

__DATA__

         Rate var2 var1 var3
var2 176400/s   -- -31% -38%
var1 254510/s  44%   -- -11%
var3 285034/s  62%  12%   --
[download]

Greetings,
🐻

$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

Comment on Re: Faster of two options for finding keywords in a database Download Code