comment on

FastCGI can help a lot, but where it shines is when you have many requests over a short period of time (ie, high traffic). Every time a CGI script (done the old fashioned way) is invoked, the Perl interpreter fires up, loads all the modules, and runs your script. That startup time can be more significant than a trivial script. FastCGI improves this situation dramatically (as does mod_perl and webserver API integration).

But there's only so much improvement you can get there. You next have to start looking at the algorithms. Anywhere you find yourself creating nested loops, or creating multiple sequential loops to deal with the same data set, you have to ask if there's a better way to do it. Profiling is a first step toward improving code already written, but even before the first step is planning and composing efficient code.

I found a few areas where you could eliminate sequential loops, but I would need to know what goes on in the regexp-comparison loop to see if there's room for further efficiency improvements. And without profiling it's very difficult to know where to focus attention.

#!/usr/bin/perl -wT
#What is the T for in -wT?
use strict;
use CGI qw(:standard);
use FCGI;
use File::Find;
require '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/verbTen
+seChanger.pl';

my $search_key = "move";

# --- Different forms of the Searchword --- #
# I made a refinement here that eliminated a step.
my @verbforms = ( 
    $search_key, 
    map { changeVerbForm( $search_key, 0, $_ ) || (); } 1 .. 4
);

my $category_id = 'subj';

# --- Variables for required info from parser --- #
my ( 
        $chapternumber, 
        $sentencenumber, 
        $sentence, 
        $grammar_relation,
        $argument1, 
        $argument2 
);

my @all_matches; ## RESULTS OF SEARCH

my $dir = '/Users/jon/Desktop/stanford-postagger-full-2011-04-20/';

opendir(my $dh, $dir) or die $!; # Use a lexical directory handle.

# I made a change here where the first grep handles both tests.
# That eliminates one loop.
my @files = 
    map  { "$dir/$_" } 
    grep { -f && /^parsed.*\.txt$/ } 
    readdir($dh);

while (FCGI::accept >= 0 ) {
    local $/ = 'Parsing'; # Why slurp/split when we can read it by 'Pa
+rsing' record?
    print header();
    print start_html();
    foreach my $file ( @files ) {
        open my $parse_corpus_fh, '<', $file or die $!;
        # Eliminate slurp, join, split (3 implicit loops)
        # replaced by one 'while' loop.
        while ( my $sentblock = <$parse_corpus_fh> ) {
            chomp $sentblock;
            if ( $sentblock =~ /file: \s(\S+)\.txt/ ) {
                $chapternumber = $1;
            }
            foreach my $verbform( @verbforms ) { 
                #  blah blah
                #  I don't know what you put here.
                #  Here is an opportunitiy to print per verbform.
            }
            # You may have had stuff here too.
            # Here is an opportunity to print per record.
        }
        #Here is an opportunity to print per file.
    }
    # Here is an opportunity to print per FCGI iteration.
    print "</ol><br>";
    print end_html();
}
[download]

This is untested since I don't know what to fill into the blanks. But it has eliminated a few either sequential or nested loops. See comments for indication of where loops have been refined. It's hard to know the impact it will have without knowing size of data sets, quantity of files handled, and so on, what is happening inside the regexp-matching loop, etc. But it could be a start in the right direction.

By the way, the perl -T switch is for Taint Mode, which is described in brief in perlrun.

Dave

In reply to Re: Using FastCGI by davido
in thread Using FastCGI by jonc

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.