mkahn has asked for the wisdom of the Perl Monks concerning the following question:

Gracious, benevolent monks:

I've got a file which amalgamates job listings from different categories of a website, and dumps them out. This worked fine at my old isp, and is fine at my new isp when run from the command line, but produces truncated output from a browser(it just stops printing at a point, making it useless), but produces no errors.

The platform was linux, now is freeBSD. I figured its a cgi thing, but I've been through ovid's tutorial, and even understood some of it.

#!/usr/bin/perl -w use strict; use CGI; # get joblinks # Check for recent jobs in 5 categories and return links # check for a few start and end dates # category anchors, to avoid icky month transformations my $query = CGI->new(); my @begindate = split (/ /, scalar localtime); my ($Q,$dow,$mon,$day,$hh_mm_ss,$tz,$yyyy) = @begindate; my $endday = ($day - 6); my $base = 'http://www.craigslist.org'; my @categories = qw/eng sad tch art bus nby\/apa nby\/roo/; print $query->header( "text/html" ), $query->start_html(-title => "CraigsList Distillate", -bgcolor => "#ffffaa" ), $query->h1( "GetJobs.pl" ); print "<a name = top >"; foreach (@categories) { print "<a href = \#$_>$_ </a> &nbsp;"; } for my $i (0 .. $#categories) { open HANDLE_IN, "lynx -source http://www.craigslist.org/sfo/$categ +ories[$i] |" or die "can't open HANDLE_IN $!"; my $grab = 0; while (<HANDLE_IN>) { my $newdate = &newdate($_); next unless(($newdate) && (/bgcolor/)) || ($grab ne 0); print "<a name = $categories[$i]><br><b> $categories[$i] </b> + <BR> <a href = \#top>Top </a>" if $grab == 0; s|href=\/|href=$base\/|ig; $grab++; my $olddate = &olddate($_); print unless $olddate; last if $olddate; } close HANDLE_IN; } print $query->end_html; sub olddate { #is the date more than a few days old? return () unless /\s([\w]{3}\s[\d]+)[\w]{2}/; my ($mon, $dd) = split (/ /, $1); my $test = $dd - $endday; $test <= 0 ? (1) : (); } sub newdate { #is the date less than a few days old? return () unless /([\w]{3}\s[\d]+)/; my ($mon, $qd) = split (/ /, $1); my $test = $qd - $endday; $test >= 0 ? (1) : (); }
Thanks for any insights or comments.
http://wutare.pair.com/~frk/cgi-bin/getjobs.pl
to see it f->up.

http://vader.inow.com/~mrk/cgi-bin/getjobs.pl
to see it work.
mkahn

Replies are listed 'Best First'.
Re: not in my browser (truncated output)
by fglock (Vicar) on Jul 10, 2002 at 20:56 UTC

    Your script might be timing out. I think the call to lynx might be taking too much time.

    Try replacing that line by a read of some dummy data, and see what happens.

      Thanks, you're right. The lnyx call fails from the browser.
Re: not in my browser (truncated output)
by hiseldl (Priest) on Jul 10, 2002 at 22:05 UTC
    I usually turn off buffering right after my use CGI; statement:
    use CGI; $| = 1; ...
    This will turn off output buffering, which sometimes causes the output of CGI scripts to be incomplete on FreeBSD boxes, especially when running outside processes; at least, that was my experience.

    If lynx is timing out, as fglock mentioned, you could also take a look at LWP::UserAgent, here's a snippet:

    use LWP::UserAgent; $ua = new LWP::UserAgent; # Create a request my $req = new HTTP::Request GET => "http://www.craigslist.org/sfo/$cat +egories[$i]"; # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { print $res->content; } else { print "Bad luck this time\n"; }
    Turning off buffering will also allow you to see your script errors in the browser. I find this handy when I need to write a short script, such as you have here, and I need to see the output in the browser whether it completes or not.

    --
    .dave.

      Thanks dave. Your code faithfully returns "bad luck this time". I tried LWP::UserAgent, as above, as well as LWP::Simple get and getprint, and all the external calls fail. These module based solutions fail from the prompt as well as the browser. IS pair just not letting me open an external file?
      The answer was: provide the path to /usr/local/bin/lynx