yaconsult has asked for the wisdom of the Perl Monks concerning the following question:

For the project I'm working on, I want the user to be able to generate charts and graphs. I've finished all the backend stuff that parses humungous http-style application log files (3 million+ lines/day) and inserts the fields of the log lines into an sqlite database.

I did some experimenting with generating reports and there seemed to be lots of overhead (read SLOW) when doing very large numbers of queries from a script. I got much better results by letting sqlite do the work by grouping the query results by timestamp (epoch seconds) to get back big arrays containing the data for an entire period and then processing that.

This turned out not to be a problem because I found that I needed to do some post-processing anyway because what wasn't happening, and therefore wasn't in the log, was also important. For example, when looking at load balancing across ports, I needed to know which ports had no activity while other ports were hammered, so my post-processing step fills in zeros in these cases.

I output CSV files which the concerned people load into their favorite spreadsheet for graphing. This has worked well, but now I'm looking at eliminating that last part and automating the report generation. I know I could generate the spreadsheets and have done so before, but spreadsheets aren't really needed as the data will not be changed. And working with huge spreadsheets gets cumbersome. We're just using them as a tool for displaying the data.

I experimented with GD::Graph yesterday and generated GIFs from the CSVs.

An important factor is that I'm going to be done with this project at the end of this month and it will have to be maintained and modified by people with probably limited programming experience.

I'm envisioning having something that can be run which opens an sqlite database which contains log on 24-hour period (spanning two dates).

What would be great would be if the user could select the results they want from some kind of pulldown or checkbox. This would determine the query to be run. Maybe they would then be able to select the start and end times within the log that they were interested in. And the result would be either a graph or table depending on the query.

I don't know if the results should be displayed in some kind of interface or produce output files. I suppose the simple way would be to just have a lot of script options that could be specified on the command line.

So, I've got the Perl Template Toolkit book on my desk but haven't had time to go through it yet. There's an available web server that's used to display Nagios output, so I suppose I could use a web framework, like catalyst, but that might not be the best approach since I won't be around to take care of it.

So, how would you appoach this? Web framework? CGI? Web pages from TT? TK something? Or would you just have a bunch of script arguments to use to specify input, process, output?

Thanks!

  • Comment on Generating graphs and tables for admins - templates, frameworks, files or what?

Replies are listed 'Best First'.
Re: Generating graphs and tables for admins - templates, frameworks, files or what?
by leighsharpe (Monk) on Aug 06, 2009 at 03:26 UTC
    I would (and have) go the route of CGI with either HTML::Template or TT, using GD::Graph to produce the graphs. You may even want to try CGI::Application in conjunction with a templating system.
    because what wasn't happening, and therefore wasn't in the log, was also important
    This should be able to be done in SQL as well, making the db do all the work. Mysql provides the ifnull(colname, 0) command, which I'm sure there is an equivalent to in sqlite.
    Your graph can then be produced with something like:
    # Your SQL here. while (my ($timestamp, $field1, $field2, $field3)=$sth->fetchrow_array +() { push @timeline, $timestamp; push @data1, $field1; push @data2, $field2; push @data3, $field3; }
    To assemble the data into arrays for graphing, and:
    my $graph = GD::Graph::mixed->new($x_size, $height); my @data=([@timeline],[@data1],[@data2],[@data3]); $graph->set( y_label => 'Your graph title', title => "$title", line_width =>2, line_types =>[1,1], skip_undef =>1, box_axis =>1, correct_width =>1, fgclr =>"black", legendclr =>"black", x_label_skip =>$x_skip ) or die $graph->error; $graph->set_text_clr('black'); $graph->set( 'y_number_format' => \&y_format ); $graph->set_title_font(GD::gdSmallFont,12); $graph->set( types => ['area', 'lines','lines'] ); $graph->set( dclrs => ['pink','blue','green'] ); $graph->set_legend('your legend1', 'your legend2', 'your legen +d 3'); $graph->set(y_max_value=>($y_max)); $graph->set(y_min_value=>(0.1)); } my $gd = $graph->plot(\@data) or die $graph->error; binmode STDOUT; print $gd->jpeg(100); }
    to graph it. Wrap it all in a CGI where you can input the necessary values (start time, end time, what kind of data you want, etc. ).
Re: Generating graphs and tables for admins - templates, frameworks, files or what?
by jrsimmon (Hermit) on Aug 06, 2009 at 05:43 UTC
    I have found DBIx::Chart to be an excellent tool for generating graphs from a db. It integrates nicely with a cgi script as well.
Re: Generating graphs and tables for admins - templates, frameworks, files or what?
by pileofrogs (Priest) on Aug 06, 2009 at 16:51 UTC

    Because of your deadlines and the fact that others will have to maintain it and those others have limited programming experience, I'd recommend the old unix maxim: make lots of small tools that do one job well.

    Make a thing that generates the CSV files. Then make a thing that takes a CSV and makes a graph. Then make a thing that gives users a nice interface that lets them choose which csv to make into graphs. There are many ways you could break it up.

    With this approach, if the user interface bit fails for some reason, your successors can fall back on the command line tools. You or your successors can make multiple user interfaces (one gui, one web etc.. etc..) without duplicating the effort of making the data-to-CSV and CSV-to-graph bits. This last point is especially nice if the person making the new user interface isn't you.

    --Pileofrogs

Re: Generating graphs and tables for admins - templates, frameworks, files or what?
by tmiklas (Hermit) on Aug 07, 2009 at 09:41 UTC
    Hi

    I have tried different approaches and the one that worked best for me was simple web page with reports, dropdowns, etc - exactly as you said. To generate those I use scripts as big as needed, not bigger.

    Examples:
    #1 - For on-line radio show we run, I gather data with one script that appends lines to CSV, then another one (plain CGI) pulls that out and plots a graph using Chart::Lines. It's used several times a week so I can live with old plain CGI :-)
    #2 - for our in-house cluster I wrote simple web management/reporting using Catalyst (also to finally learn Catalyst a bit and move away from using CGI everywhere), adding our own Model, TT for View and finally for plotting graphs I've used Google's Chart API, by simply generating URLs in Perl and inserting them into the template. Simple, yet very effective.
    #3 - reporting system for our key corporate application (work in progress as we speak) will be done the same way, except maybe graphs will be generated internally, without 'leaking' data to Google (some of it may be too sensitive, even if they are just plain stats).

    The only downside in using Google Chart API (with or without any Perl modules that are on CPAN) is that you have to have Internet access to see the graph, but IMHO it's much easier to maintain then - especially for non-developers (that's of course subjective). Templates are easy - you can define all in template, uncluding URLs for graphs and then only values come from Controler, all works well. One more - you have to be comfortable with google 'seeing' your data :-) I would say have a look at Google's Chart API - at least you have something more to choose from.

    BTW - I may be totally wrong, please look for other advices, but the one above works well for me.
    Greetz, Tom.