Hi cormanaz!

Your two options don't seem to solve the same task. Number one counts the appearances of the keywords as substrings in the JSON string, while number two counts exact matches of the keywords in the keys of the JSON.

The "best" solution depends on the number of keywords vs. the average number of keys in the columns. Assuming the number of keywords is larger, it would make sense to reverse the check. Create a hash from the keywords and lookup the keys from the JSON. Removing some indexing gives some more speedup.

#!/usr/bin/perl use v5.24; use warnings; use JSON::XS; use Benchmark 'cmpthese'; my @kwlist = qw(foo bar baz qux q0 q1 q2 q3 q4 q5 q6 q7 q8 q9); my %kwhash = map {($_, 0)} @kwlist; my $kwre = join('|',@kwlist); #my @items = getfromdb("select id,text from table"); # I have a sub th +at does this my @items = ([1, '{"foo":0}'], [2, '{"foo":0,"qux":0}'], [3, '{"foo":0}'], [4, '{"baz":0}'], [5, '{"twx":0}']); # Wildly gu +essed data sub var1 { foreach my $i (0..$#items) { my ($id,$text) = @{$items[$i]}; my $count = 0; while($text =~ /$kwre/g) { $count++; } } } sub var2 { foreach my $i (0..$#items) { my ($id,$temp) = @{$items[$i]}; my $kws = decode_json($temp); my $count = 0; foreach my $k (@kwlist) { if (defined($kws->{$k})) { $count++; } } } } sub var3 { foreach my $item (@items) { my (undef,$text) = @$item; my $kws = decode_json($text); my $count = 0; $count += exists $kwhash{$_} for keys %$kws; } } cmpthese(0, { var1 => \&var1, var2 => \&var2, var3 => \&var3, }); __DATA__ Rate var2 var1 var3 var2 176400/s -- -31% -38% var1 254510/s 44% -- -11% var3 285034/s 62% 12% --

Greetings,
🐻

$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$

In reply to Re: Faster of two options for finding keywords in a database by jo37
in thread Faster of two options for finding keywords in a database by cormanaz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.