bigbot has asked for the wisdom of the Perl Monks concerning the following question:

I've been struggling with this issue for a while, and basically I'm trying to output a lot of data from Perl to a less screen so that users can scroll through it. Before that even takes place I'm doing an SQL query and returning 28 fields, which I then load into an array so that I can split and parse these fields into individual variables. Some of them contain newlines which is why I do the split/splice thing. While I'd like to do this in memory for maximum performance, sometimes the output can be up to a million lines or more, and this uses a ridiculous amount of memory the way I'm doing it. Currently I'm using this method to query a DB and later output from Perl:
my @results = split(/dbDelimeter/,`SQL_Query`); while(@results) { (var1,var2,var3,etc) = splice(@results, 0, 28); $var1 =~ s/\n//g; $var2 =~ s/oldDateFormat/newDateFormat/; $var3 =~ s/moreReplacements/DoneHere/; $output = sprintf (%-10s %-10s %10s,$var1,$var2,$var3); $finalOutput .= $output; } open LESS, '|less -cSRM' or die $!; print LESS $finalOutput; close LESS;
I wanted to try using a temp file to be safe and not blow up the memory on the server. Then I ran into even more issues. I tried using an "anonymous file handle" because I thought it would be nice to have Perl handle the cleanup. Couldn't get that to work:
open(my $data, "+>", undef) or die; $data = `SQL_Query`; ## Method 1 print $data `SQL_Query`; ## Method 2 ## processing to parse and format data system("less $data"); ## Output: None / Doesn't Work print $data; ## Output: GLOB(0x1c8ef430)
It would be great if somebody could show me different, efficient ways to output data to a less screen, both using memory and the disk if possible. Essentially I want to store all this data on the disk because of the excessive memory usage (unless I'm doing something wrong to use so much memory..) Thanks!

Replies are listed 'Best First'.
Re: Print to Less Screen
by ikegami (Patriarch) on Jun 04, 2011 at 05:48 UTC

    How about

    open LESS, '|less -cSRM' or die $!; my @results = split(/dbDelimeter/,`SQL_Query`); while(@results) { (var1,var2,var3,etc) = splice(@results, 0, 28); $var1 =~ s/\n//g; $var2 =~ s/oldDateFormat/newDateFormat/; $var3 =~ s/moreReplacements/DoneHere/; print LESS sprintf (%-10s %-10s %10s,$var1,$var2,$var3); } close LESS;

    Update: I just noticed that still reads everything into memory. You want something more like

    open my $query_fh, '-|', 'SQL_Query' or die $!; open my $less_fh, '|-', 'less -cSRM' or die $!; while (<$query_fh>) { ... print $less_fh sprintf (%-10s %-10s %10s,$var1,$var2,$var3); } close $less_fh;
      Wow that's like a whole new way of looking at it. I'll try it now, thanks for the quick reply!
      Wow this Perl piping stuff is mind boggling. I'm looking at the perldoc for "Using open() for IPC".

      My apologies for being all over the place yesterday. I would like to try to better define my problem and what I'm looking to do (the less screen is only a small part of it and really I should have picked a better subject). Here is the program and how it should process:

      1) Take user input and pass it to an external program which does a database query using an SQL statement.

      2) Results from query can be extremely large, so storing them in memory doesn't seem to be a good idea. Either store in a temp file, or pipe the input into the program (if possible).

      3) Do processing on results. Specifically, sorting in different user-defined ways, and regex find/replaces on about half of the fields to clean them up. Also I need to format the results for a decent visual output (which I do using printf).

      4) Output processed data to a less screen for the user. Again, this data can be very large, so I would need to store it in another temp file?

      If anyone can give me a general idea of the best way they think I can do this, I would greatly appreciate it!

      Currently my thoughts are that I need to store the data in a temp file in order to do a sort (couldn't pipe it in?) I would use the Perl module File::Temp and store two temp files- One for the DB results, and one for the results of the sort, regex replacements, and printf formatting.

        A BerkeleyDB tied hash temp file might be a good choice. Your hash key would be your sort criteria, the value would ideally be the regexp replaced fields, unless you really need the original data for something else.

        Once you're done reading/storing the SQL results, it's a simple matter to read the hash back in sorted order, doing the printf formatting and less piping then.

        You might also consider saving aside the BerkeleyDB temp files as an expiring cache if you get a lot of common queries that don't necessarily need absolutely up-to-date data, and/or user can specify when they do need the latest data. You're doing a lot of I/O here, it's not gonna be fast. Some users might be willing to trade data currency for response time.

        Is there any specific reason you're not making the SQL query directly from perl? Assuming its possible, that would likely reduce the complexity a great deal, and increase the reliability. Avoiding data problems introduced by parsing raw text output from another program is always good.

        --Dave

Re: Print to Less Screen
by Jenda (Abbot) on Jun 04, 2011 at 10:48 UTC

    IMnsHO, it's the user's business to pipe or not the output of your program to a pager of his/her own choice. What if the user wants to redirect the results to a file? How do you handle that?

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      I don't disagree but nobody has complained so far. The data is pretty much impossible to look at without less. I could certainly add an option for raw output.

      The bigger issue is that I'm still loading the results entirely into memory and this is a problem when a large amount of data is returned. I still can't figure out how to use a file handle in combination with split (which I need in order to parse the DB output). Really the only thing I can think to do is just create a few temp files myself and manage them manually because it just seems to complicated to get these various methods to work.

        Let the users run the script via a shell script that pipes its output to less.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.