in reply to Using FastCGI

FastCGI can help a lot, but where it shines is when you have many requests over a short period of time (ie, high traffic). Every time a CGI script (done the old fashioned way) is invoked, the Perl interpreter fires up, loads all the modules, and runs your script. That startup time can be more significant than a trivial script. FastCGI improves this situation dramatically (as does mod_perl and webserver API integration).

But there's only so much improvement you can get there. You next have to start looking at the algorithms. Anywhere you find yourself creating nested loops, or creating multiple sequential loops to deal with the same data set, you have to ask if there's a better way to do it. Profiling is a first step toward improving code already written, but even before the first step is planning and composing efficient code.

I found a few areas where you could eliminate sequential loops, but I would need to know what goes on in the regexp-comparison loop to see if there's room for further efficiency improvements. And without profiling it's very difficult to know where to focus attention.

#!/usr/bin/perl -wT #What is the T for in -wT? use strict; use CGI qw(:standard); use FCGI; use File::Find; require '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/verbTen +seChanger.pl'; my $search_key = "move"; # --- Different forms of the Searchword --- # # I made a refinement here that eliminated a step. my @verbforms = ( $search_key, map { changeVerbForm( $search_key, 0, $_ ) || (); } 1 .. 4 ); my $category_id = 'subj'; # --- Variables for required info from parser --- # my ( $chapternumber, $sentencenumber, $sentence, $grammar_relation, $argument1, $argument2 ); my @all_matches; ## RESULTS OF SEARCH my $dir = '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/'; opendir(my $dh, $dir) or die $!; # Use a lexical directory handle. # I made a change here where the first grep handles both tests. # That eliminates one loop. my @files = map { "$dir/$_" } grep { -f && /^parsed.*\.txt$/ } readdir($dh); while (FCGI::accept >= 0 ) { local $/ = 'Parsing'; # Why slurp/split when we can read it by 'Pa +rsing' record? print header(); print start_html(); foreach my $file ( @files ) { open my $parse_corpus_fh, '<', $file or die $!; # Eliminate slurp, join, split (3 implicit loops) # replaced by one 'while' loop. while ( my $sentblock = <$parse_corpus_fh> ) { chomp $sentblock; if ( $sentblock =~ /file: \s(\S+)\.txt/ ) { $chapternumber = $1; } foreach my $verbform( @verbforms ) { # blah blah # I don't know what you put here. # Here is an opportunitiy to print per verbform. } # You may have had stuff here too. # Here is an opportunity to print per record. } #Here is an opportunity to print per file. } # Here is an opportunity to print per FCGI iteration. print "</ol><br>"; print end_html(); }

This is untested since I don't know what to fill into the blanks. But it has eliminated a few either sequential or nested loops. See comments for indication of where loops have been refined. It's hard to know the impact it will have without knowing size of data sets, quantity of files handled, and so on, what is happening inside the regexp-matching loop, etc. But it could be a start in the right direction.

By the way, the perl -T switch is for Taint Mode, which is described in brief in perlrun.


Dave

Replies are listed 'Best First'.
Re^2: Using FastCGI
by jonc (Beadle) on Jun 15, 2011 at 04:40 UTC

    Oh, well this was an amazing help. Should I include my full code? It seemed like a lot to post, so I omitted it.

    grep { -f && /^parsed.*\.txt$/ caused an error, so I used  grep { '-f' && /^parsed.*\.txt$/} with single quotes, until this is tested, I just won't check if it's a file.

    Your post definitely gave me hope that I can find a way to optimize the rest of the script. I was really impressed by the use of $/ to eliminate so much. I hope I can begin to think that way. It may help to amalgamate all the files into one before hand to avoid that one loop

    Thanks!

      What error did it cause? I just tested the following code:

      my @found = grep { -f && /\.pl/ } @array;

      ... and it did work.

      You are probably not invoking the script from within the working directory. If you have proper permissions, you can 'chdir', or you could specify the full path like this:

      my @files = grep { -f && /^parsed.*\.txt$/ } map { "$dir/$_" } readdir($dh);

      A subtle change. From an efficiency standpoint, *slightly* worse, as map now acts on every item returned by readdir. But on the other hand, it works if your path is not your current directory. Besides, even if your directory has a thousand files in it, the map and grep aren't costing you much time. Again, this is where profiling comes in. :)

      Look at the CPAN module Devel::Profile.


      Dave

        Well the issue is the screen prints blank or internal error. I guess one big problem I'm having is that I run the script to a local server to check. Is there a way to use Terminal?? I tried adding '-debug' at the end of use CGI, but that didn't work. I've also heard of the error log, but not sure how to easily see it.

        The files are in a different directory. The script is in my cgi-bin. I guess -f only works on current directory?

        Thanks for helping me understand this Dave