tanger has asked for the wisdom of the Perl Monks concerning the following question:

hi, i have a mysql database where i will be fetching a large amount of data.

So when i call my script to collect all the data and print it out in HTML tables so it looks organized then it'll take wayy long to load.

I just want to make sure I have the fastest code possible I have for my script (its a simple mysql script that just calls a select col,col2,col3,col4,col5,col6,col7,col8,col9,col10 from db where fee = 'done'; then output it in html tables).

Basically i'll be searching through 1 database, 1 table, 10 columns, and 4000+ rows using DBI and MySQL.

So which method of fetching should I use, bind_columns or fetchall_arrayref? Or any other faster method?

Thank you very much
=tanger=

Replies are listed 'Best First'.
•Re: bind_columns OR fetchall_arrayref
by merlyn (Sage) on Jan 04, 2005 at 03:44 UTC
Re: bind_columns OR fetchall_arrayref
by jbrugger (Parson) on Jan 04, 2005 at 06:29 UTC
    Have a look at: Speeding up the DBI, it contains valuable information about DBI, and about what methods to use.
    example from this page (since you want to query large amounts of data):
    $sth->execute; # dataset has 1,000,000 records # # DON'T DO THIS # my $aref = $sth->fetchall_arrayref; # # It would create a 1,000,000 rows array my $max_rows = 5_000; while (my $aref = $sth->fetchall_arrayref(undef, $max_rows)) { # do something with $aref # $aref now contains (at most) 5,000 rows };
Re: bind_columns OR fetchall_arrayref
by dws (Chancellor) on Jan 04, 2005 at 05:06 UTC

    it'll take wayy long to load.

    There are a couple of tricks you can use to start displaying tabular data before the entire table is generated (which can, as you note, take a long time). The big trick is to break the big table up into multiple smaller ones. (You'll have to but them right up against each other to avoid blank lines between them.) The browser will display each table as soon as it can, which will be soon unless you're wrapping things into an even bigger table. To make this trick work, you'll need to specify widths for each cell. This means streaming more data to the browser, but that's the trade-off you have to accept for getting a quicker initial display.

    Give it a try, and see if you quick quicker seeming results. If it works, consider sticking with whichever approach yields the more maintainable code. To my eye, it's the fetchall_arrayref.

      According to this you can cause newer browser to render incrementally by using the newer table features such as COL and COLGROUP. When this information is available a table should be rendered immediately. Whether this is true in practice is another question.

      ---
      demerphq

Re: bind_columns OR fetchall_arrayref
by perleager (Pilgrim) on Jan 04, 2005 at 03:51 UTC
    Hello,

    I was also told bind_columns is the fastest method for large amount of records returned.

    you can check this link out for the fetch methods benchmarking comparisons.

    perleager
Re: bind_columns OR fetchall_arrayref
by petdance (Parson) on Jan 04, 2005 at 06:25 UTC
    Make sure you measure everything. Are you getting adequate performance? If you're currently getting adequate performance, then you don't need to speed it up.

    There are a number of potential bottlenecks in your page display:

    • You may be pulling from a slow table. If that fee column isn't indexed, then the DB will have to walk the entire table searching for matches on status 'DONE'. For that matter, if it's a really huge table, it's more efficient to index on an integer than a VARCHAR. If it's REALLY huge, you may want to look at things like bitmap indices, if you're using something like Oracle.
    • Your HTML may be bigger than necessary. For example, do you have a lot of extra text in your HTML code? If your pages are super-heavy, then the DB may not wind up being the bottleneck.
    • Your code could be slow elsewhere.
    Bottom line is that you don't go optimizing until you KNOW where your bottleneck is. If your query returns in 1 second and your page takes 5 seconds to draw because you're using some slooooow templating system, then the DB is not the problem.

    xoxo,
    Andy

Re: bind_columns OR fetchall_arrayref
by mkirank (Chaplain) on Jan 04, 2005 at 06:11 UTC

    1. As per DBI docs
    "fetchrow_arrayref"
    This is the fastest way to fetch data, particularly if used with "$sth->bind_columns"
    2. When displaying a table in html
    the table is not seen until the </table> tag is encountered , so break up the table
    3. Use paging concept (show only few rows in one page) and give a next previous link , This can be used along with the limit option in the mysql query
      The paging concept is a good idea, but you must be careful: some modules in the Class::DBI family (such as Class::DBI::Pager) do a full SELECT of your data and then only show the requested page, so you do not save on database processing time. Use of the LIMIT-function (as exists in MySQL) would be really helpful here (as implemented in Class::DBI::Plugin::Pager). Again you must take care: if the amount of data changes between calls to display "next" and "previous" pages, the results may be not what you expect.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: bind_columns OR fetchall_arrayref
by foil (Sexton) on Jan 04, 2005 at 06:12 UTC
    bind_columns and fetchall_arrayref are really two different ideas. bind_columns allows the database to cache statements, so your select statements are not repeatedly reprocessed. fetchall_arrayref , more generally fetchall , has to do with getting all the data at once, as opposed to row by row ( batch vs atomic ). My two cents,.. if you are only stitting data to a web page , and not doing any more processing, a row by row method may be better .. no need to fill up memory with unneeded array. Also, I get better performance results from efficient db table setup ... are there duplicate items in columns ? smaller tables may be quicker.
Re: bind_columns OR fetchall_arrayref
by geektron (Curate) on Jan 04, 2005 at 05:13 UTC
    some benchmarks would help you decide ... for ten columns it's not much of a big deal w/  bind_columns ... because there are only 10 vars to track.

    fetchall_arrayref will fetch out your entire dataset ... hence the slowdown, i'd think, talking about the number of records you're attempting to pull.

    note that they are 2 different approaches ... one would entail a  while loop (  bind_columns ) while the other is a complete dataset fetch (  fetchall_arrayref ) with a  while loop for processing ....

    on a side note, i'm more of a fan of  fetchrow_hashref than  fetchrow_arrayref because of the naming that's used. more extensible if you start fetching more columns ...

Re: bind_columns OR fetchall_arrayref
by CountZero (Bishop) on Jan 04, 2005 at 07:12 UTC
    fetchall_arrayref is certainly not a good idea if you really have a large amount of data as it will put a big strain on the memory resources of your webserver.

    I have an application which does a select on multiple joined tables for some combination of conditions out of about 80,000 rows and the largest pages take at most two seconds to render (including some dynamically made graphics and summary information on the data shown) through Template::Toolkit. The data is read with fetchrow_* methods.

    The data is provided by the MySQL database in fractions of a second, most of the time is spent in processing the data and transmitting it to the client. Of course the database is fully indexed on all fields to be selected on.

    As others have already suggested: first check where the real bottleneck is before you start optimising things which are already very fast.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: bind_columns OR fetchall_arrayref
by punkish (Priest) on Jan 04, 2005 at 12:41 UTC
    It is hard to answer this question until you define
    it'll take wayy long to load

    and

    Or any other faster method?

    as well as some benchmark which will help you decide you have achieved the speed you desire.

    You db is so simple it could well be just a text file. On any reasonably modern computer Perl will pump it out faster than any internet connection would be able to handle. Hence, your bottleneck (for loading 4k records, if I understand your problem correctly) will be the client computer rather than the server. Your other worry would be running out of browser memory on your client computer before everything gets loaded. You are asking to load a 40000 element array along with the html fluff. Make sure your users have enough memory on their computers (which, again, should not be a problem).

    In other words, first define the yardstick you want to measure against before worrying about which method to measure. Because unless you recognize your destination, you will not know you have arrived.

    If you just want to compare bind_columns or fetch_*, a simple use Benchmark will solve that problem for you before any of us conjecturing will be able to.

    Good luck.

Re: bind_columns OR fetchall_arrayref
by punch_card_don (Curate) on Jan 04, 2005 at 14:51 UTC
    If you've indexed the 'fee' column, returning just 4,000 rows from mysql will happen a lot faster than any client pc can render, and if you don't specify every dimension of the html table, it will take even longer.

    If you aren't absolutely required by the project spec to produce specfically an html table, then you might consider using pre-formatted output instead:

    <pre> results here </pre>
    You just pad in between columns with spaces. If the values in the columns have varying numbers of digits, it's pretty easy to pad witha variable number of spaces using the same kind of regex for returning fixed-length numbers with leading zeros.

    The browser will render the pre-formatted output in real time.

    If you are absolutely bound to use an html table, at the very least you could use css for the td-element so that you don't have to send a new font tag in every table element.

    Forget that fear of gravity,
    Get a little savagery in your life.

Re: bind_columns OR fetchall_arrayref
by tanger (Scribe) on Jan 04, 2005 at 21:37 UTC
    hi, thanks everyone for replying.

    i dont have too many fancy html, just the plain table tags. no images, text, natta. just table tags. :)

    i'm fetching about 2200 rows, with 10 columns, 1 table and it will take 12-16 seconds to load (i'm using a cable modem) up the html and for it to be displayed on the browser.

    -AT- first i was using fetchall_arrayref (until i learned that this way is the slowest and causes the most memory strain on the webserver.

    -THEN- i decided to use bind_columns with fetch. I too also heard this way was the best from two of the tutorials on here (Files and Databases).

    -HOWEVER- whether using either way, BOTH fetchall-arrayref and bind_columns methods take about the same amount of time to display the results on the browser, again around 12-16 seconds.

    That is mainly why I decided to come here to post a question relating to this topic. I'm figuring bind_columns is the fastest/efficient way to do the job I'm trying to do, and that it takes the same amount of time as fetchall_arrayref because its on the browser end, therefore it still needs to wait for the '/table' tag.

    one question that I now have-
    would placing a max row option or a limit in the sql improve the efficiency/speed?

    thanks
    =perleager=
      once again you are asking a classically wrong question. From your own observation
      -HOWEVER- whether using either way, BOTH fetchall-arrayref and bind_columns methods take about the same amount of time to display the results on the browser, again around 12-16 seconds.

      So, why not use Benchmark and find out if one or the other is significantly faster? Either way, you are getting the same results. And, either way, you are likely to discover that getting the data from the db is probably just a fraction of a second or just a wee bit more. Most of the time is probably spent in assembling the html, sending it via the server, and the cable modem, up to you, and then rendering it on your computer. So, even if you manage to speed up a 200 microsecond job by 500% (the db work), you will still be stuck with 11-15 seconds of other grunt work.

      My advice -- use Benchmark, adjust, and then quit worrying, or else, put a throttle on your max rows returned and page the results n at a time.

      Good luck.

      Cable modem ....?? .... This is your garden variety Comcast or Verizon account from your home ?? .. I don't think anybody gets great upload speed from a web server in that situation (I don't) ... you have to pay $ to get uploadspeed .. Does page load faster locally ? ( not through modem ??) ... bottleneck probably is not your code
      one question that I now have- would placing a max row option or a limit in the sql improve the efficiency/speed?

      The bottleneck here is not fetching the data from the DB, but sending the resulting HTML to the browser and rendering it there. Limiting the number of rows you fetch will reduce the size of the HTML and make it a lot faster. Accessing the DB will also be faster, but if that takes 0.2 secs instead of 0.3 secs is not relevant anyway.

      So if you can live with the page displaying only a part of the data (at least at once, you can have many pages) go for it.