in reply to Faster of two options for finding keywords in a database
Hi cormanaz!
Your two options don't seem to solve the same task. Number one counts the appearances of the keywords as substrings in the JSON string, while number two counts exact matches of the keywords in the keys of the JSON.
The "best" solution depends on the number of keywords vs. the average number of keys in the columns. Assuming the number of keywords is larger, it would make sense to reverse the check. Create a hash from the keywords and lookup the keys from the JSON. Removing some indexing gives some more speedup.
#!/usr/bin/perl use v5.24; use warnings; use JSON::XS; use Benchmark 'cmpthese'; my @kwlist = qw(foo bar baz qux q0 q1 q2 q3 q4 q5 q6 q7 q8 q9); my %kwhash = map {($_, 0)} @kwlist; my $kwre = join('|',@kwlist); #my @items = getfromdb("select id,text from table"); # I have a sub th +at does this my @items = ([1, '{"foo":0}'], [2, '{"foo":0,"qux":0}'], [3, '{"foo":0}'], [4, '{"baz":0}'], [5, '{"twx":0}']); # Wildly gu +essed data sub var1 { foreach my $i (0..$#items) { my ($id,$text) = @{$items[$i]}; my $count = 0; while($text =~ /$kwre/g) { $count++; } } } sub var2 { foreach my $i (0..$#items) { my ($id,$temp) = @{$items[$i]}; my $kws = decode_json($temp); my $count = 0; foreach my $k (@kwlist) { if (defined($kws->{$k})) { $count++; } } } } sub var3 { foreach my $item (@items) { my (undef,$text) = @$item; my $kws = decode_json($text); my $count = 0; $count += exists $kwhash{$_} for keys %$kws; } } cmpthese(0, { var1 => \&var1, var2 => \&var2, var3 => \&var3, }); __DATA__ Rate var2 var1 var3 var2 176400/s -- -31% -38% var1 254510/s 44% -- -11% var3 285034/s 62% 12% --
Greetings,
🐻
|
|---|