in reply to performance with mysql / file-caching / hash reference on demand

At work, we do hundreds of database calls every second. Some of the resources have performance problems, but they usually involve scanning tables that have millions of rows. Your description gives me a vague picture, but I can't guess what the problem might be. Are you using CGI with a fresh Perl start on every request, or do you preload the libraries (mod_perl, plack, mojo)? Are you using an ORM, or do you write your SQL queries by hand?

BTW, sometimes, the cheapest solution is to upgrade the hardware.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
  • Comment on Re: performance with mysql / file-caching / hash reference on demand
  • Download Code

Replies are listed 'Best First'.
Re^2: performance with mysql / file-caching / hash reference on demand
by derion (Sexton) on May 02, 2021 at 09:17 UTC

    Thank you very much for your reply.

    The webshop is built on the base of a Links management system which would work with mod_perl.
    Some of my code for customer related operations might cause hiccup with mod_perl, so I fear using mod_perl a little. For displaying detailed pages I work with speedycgi since a couple of years which did improve performance significant.
    As far as I have seen one problem is what your question implies.
    I loose performance because there are connections built to mysql more often than probably neccessary.
    This seems to be more my problem with Perl than a general problem though PHP which I try to avoid seems to be quicker there out of the box.

    My webserver is a couple of years old but still is performant and works with SSD:

    Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz Stepping: 7 CPU MHz: 2499.711 BogoMIPS: 5000.02 Virtualization: VT-x
    total used free shared buffers cached Mem: 96455 65478 30976 0 4752 53994 -/+ buffers/cache: 6730 89724 Swap: 949 0 949

    At the moment a detailed page needs about 1.1 seconds to display but I want to add significant more information and I would like to get faster - at least not slower.

    One code example I am using to get media (images, datasheets) from the DB:

    sub get_udx_media_relations { my $SupplierID = shift; my $SupplierPID = shift; my $return; my $query = qq|SELECT ProviderID, SourceID, Folder, Filename, Extension, Purpose, Rank, Priority, Width, Height FROM CT_Media_Relations WHERE SupplierID = '$SupplierID' AND SupplierPID = '$SupplierPID' AND isDeleted != 'Yes' GROUP BY Filename, Extension ORDER BY Priority DESC, Rank ASC|; my $sth = $DB->table('CT_Media_Relations')->prepare ($query) || die $GX::SQL::error; $sth->execute (); my ($prev_priority); while ($_ = $sth->fetchrow_hashref) { if ($prev_priority->{$_->{Purpose}} > $_->{Priority}) { next unless $_->{Purpose} eq 'PDB'; } #get rid of mds when manufacturer is present (4 vs 5) if ($_->{Purpose} eq 'AAB' or $_->{Purpose} eq 'EBD' or $_->{Purpose} eq 'PBS' or $_->{Purpose} eq 'PRI') { if ($prev_priority->{IMG} > $_->{Priority}) { next; } else { $prev_priority->{IMG} = $_->{Priority}; } } $prev_priority->{$_->{Purpose}} = $_->{Priority}; if ($_->{Purpose} eq 'PRI' && !$return->{Primary}->{path}) { $return->{Primary} = $_; $return->{Primary}->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $return->{Primary}->{filename} = $_->{Filename} . '.' . $_->{Extension}; $return->{Primary}->{filename} = $_->{Folder} . '/' . $return->{Primary}->{filename} if $_->{Folder}; } my $media = $_; $media->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $media->{filename} = $_->{Filename} . '.' . $_->{Extension}; $media->{filename} = $_->{Folder} . '/' . $media->{filename} if $_->{Folder}; push(@{$return->{$_->{Purpose}}}, $media); } push(@{$return->{IMG}}, @{$return->{EBD}}) if defined(@{$return->{EBD}}); if ( not defined(@{$return->{IMG}}) and defined (@{$return->{PBS}}) ) { push(@{$return->{IMG}}, @{$return->{PBS}}); } push(@{$return->{IMG}}, @{$return->{AAB}}) if defined(@{$return->{AAB}}); if (defined @{$return->{IMG}} && (!$return->{Primary} || $return->{Primary}->{Priority} < $return->{IMG}[0]->{Priority})) { $return->{Primary} = shift @{$return->{IMG}}; } push(@{$return->{IMG}}, @{$return->{PAB}}) if defined(@{$return->{PAB}}); push(@{$return->{IMG}}, @{$return->{DET}}) if defined(@{$return->{DET}}); return $return; }

    My thought was that it could make sense to put these results in a static whatever state (database, file) on a daily basis to avoid doing the same routine over and over agein.

    Cheers derion
      sub get_udx_media_relations { my $SupplierID = shift; my $SupplierPID = shift; my $return; my $query = qq|SELECT ProviderID, SourceID, Folder, Filename, Extension, Purpose, Rank, Priority, Width, Height FROM CT_Media_Relations WHERE SupplierID = '$SupplierID' AND SupplierPID = '$SupplierPID' AND isDeleted != 'Yes' GROUP BY Filename, Extension ORDER BY Priority DESC, Rank ASC|;

      No placeholders here. That prevents DBI, DBD and the database itself from caching the query, so it has to be parsed again from zero each and every time. Even if you only change the IDs. Using placeholders and prepare_cached() instead of prepare() could improve speed, at least when used in a persistent environment (i.e. mod_perl, FastCGI, basically everything but plain old CGI). And, even worse, you might become a victim of SQL injection. See https://bobby-tables.com/.

      while ($_ = $sth->fetchrow_hashref) { if ($prev_priority->{$_->{Purpose}} > $_->{Priority}) { next unless $_->{Purpose} eq 'PDB'; } #get rid of mds when manufacturer is present (4 vs 5) if ($_->{Purpose} eq 'AAB' or $_->{Purpose} eq 'EBD' or $_->{Purpose} eq 'PBS' or $_->{Purpose} eq 'PRI') { if ($prev_priority->{IMG} > $_->{Priority}) { next; } else { $prev_priority->{IMG} = $_->{Priority}; } } $prev_priority->{$_->{Purpose}} = $_->{Priority}; if ($_->{Purpose} eq 'PRI' && !$return->{Primary}->{path}) { $return->{Primary} = $_; $return->{Primary}->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $return->{Primary}->{filename} = $_->{Filename} . '.' . $_->{Extension}; $return->{Primary}->{filename} = $_->{Folder} . '/' . $return->{Primary}->{filename} if $_->{Folder}; } my $media = $_; $media->{path} = '/dx/' . $_->{ProviderID} . "/" . $_->{SourceID}; $media->{filename} = $_->{Filename} . '.' . $_->{Extension}; $media->{filename} = $_->{Folder} . '/' . $media->{filename} if $_->{Folder}; push(@{$return->{$_->{Purpose}}}, $media); }

      I did not even try to fully understand what happens here. But you are fetching data from the database just to discard it, based on values of the "Purpose" and "Priority" columns. That won't speed up things. Try not to fetch data that you don't need. Make the database filter out everything you don't need, i.e. put your criteria into the WHERE condition. That way, the database won't have to find data you don't need, fetch it, serialize it, and push it through the connection to your application where the DBD has to deserialize it, and all of that just to discard that unwanted data.

      If the remaining code looks similar, that's where you can really improve performance. Use placeholders everywhere to allow at least the database to cache queries, and filter in the database, not in perl.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Thank you very much for your feedback.

        The effect of placeholder and caching was new to me but makes sense.
        I will take a closer look at that hoping that speedycgi already does something there for I still fear the use of mod_perl with my code.

        For the security issue you are right in general but in this case I think there is no risk because only INT requests from the product data find their way to this subroutine. Nevertheless I will clean up my stuff with placeholders which seem to be better in any way.

        I will also rethink the logic in this operation. The big but is that I do have categories of images (single, item in use, primary) and several sources for images where one can have one or more categories of images. I want to display the image group with the highest priority for every category and discard the rest. If a product is edited by staff I still want to access all available images which could result in an editorial pick for every image category. A small part of the idea is the following:

        AAB (product in use) could be two images from source with Priority 5
        EBD (single product image) could be three images which shows the product from different views
        PRI (primary image) if not defined the first image from EBD should be taken

        I start do doubt what I am doing is clever even with prices where I originally thought the way is obvious:

        sub get_Price { my $SupplierID = shift; my $SupplierPID = shift; # tax type 0,1,2 which could be different # depending on the country of the shop my $Tax = shift; # could be e.g. buying price or selling price my $Price_Type = shift; #if the request comes from a basket with amount in basket my $amount = shift || 0; #clientID is the shop/market my $ClientID = $STASH{ClientID}; my $return; my $cond = GX::SQL::Condition->new(); $cond->add("ClientID",'=',$ClientID); my $customer_cond = GX::SQL::Condition->new(); # customerID 10000 is a dummy for public prices $customer_cond->add("CustomerID",'=',10000); # if there are individual prices they are also considered $customer_cond->add("CustomerID",'=',$STASH{account}->{ID}) if $STASH{account}->{ID}; $customer_cond->bool('OR'); $cond->add($customer_cond); $cond->add("SupplierID",'=',$SupplierID); $cond->add("SupplierPID",'=',$SupplierPID); $cond->add("Price_Type",'=',$Price_Type); $cond->bool('AND'); my @prices; my $previous_price; my $udx_prices = $DB->table('UDX_Prices'); $udx_prices->select_options ("ORDER BY Amount ASC, Price ASC"); my $sth = $udx_prices->select([' Price_Type, Amount, Price, Discount, Rebate, Currency, Price_Quantity'], $cond); while (my $_ = $sth->fetchrow_hashref) { #calculate customer discount in percent if available $_->{Price_Discount} = sprintf("%0.2f", ($_->{Price} * ( $STASH{account}->{Discount} ) / 100)); $_->{Price} = sprintf("%0.2f", ($_->{Price} - $_->{Price_Discount} )); my ($new_price,$old_price); $old_price = $_->{Price} * 100 / $_->{Price_Quantity}; $_->{Price} = sprintf("%0.2f", ($_->{Price}) / $_->{Price_Quantity}); $new_price = $_->{Price} * 100; if ($old_price == $new_price) { $_->{Price_Quantity} = 1; } else { $_->{Price} = sprintf("%0.2f", ($old_price / 100)); } # if the previous price is smaller skip the price if ($previous_price->{Price} && $previous_price->{Price} <= $_->{Price}) { next; } # if the amount is the same as before skip the price next if $previous_price->{Amount} == $_->{Amount}; $_->{Tax} = $STASH{shop}->{'Local_Tax' . $Tax}; $_->{Gross_Price} = sprintf("%0.2f", ($_->{Price} * (100 + $STASH{shop}->{'Local_Tax' . $Tax}) / 100)); $_->{Tara} = sprintf("%0.2f", ($_->{Gross_Price} - $_->{Price})); $previous_price = $_; push @prices, $_; } $return->{prices} = \@prices; return $return; }

        I order the prices by amount and price. If an amount equals the previous amount the price will be skipped and if a price is higher than the prevoius price it will be skipped either. First could occur based on rounding and calculation, the second should not happen but gives me some kind of security.

        As said, I am happy for your input because it makes me question myself and brings up new ideas for me.

        Cheers derion