Hi
Programming in perl is a hobby of mine and I'm hoping you guys can help me improve a script I've written. The script works something like a search engine bot in that it searches a set of web addresses (hosted on one server and presented in a particular format) for data concerning a particular word. The format of my code is as follows -
#!/usr/bin/perl -w
use strict;
$| = 1;
use CGI::Carp "fatalsToBrowser";
use CGI ":all";
use warnings FATAL => 'all';
my @names =
(
"name1",
"name2",
"name3",
);
Start:
foreach my $name (@names)
{
if(!name already_in_database())
{
$res = $ua->get( $url1);
if(expected_data_not_found_on_a_continous_bases())
{
goto Start;
}
...
add_data_found_to_database_for_this_name();
}
}
Over time I've improved the performance of this script. For example, when I get a webpage, I check it to ensure that the page is presenting the data in the correct format. This is necessary because from time to time the server reports that it is busy and the webpage will therefore contain the message "busy, call back later" rather than the expected data. To handle this, I poll this page a number of times, and if I keep getting this busy message, I jump to Start, and start processing this name again from scratch. The busy message is not related to the fact that the server is busy, because when I start the name again I no longer get this message.
I also process each name twice and store the results in separate mysql tables. I can then compare each table to ensure that they are exactly the same. At this point, these tables are matching with an accuracy of 99.9%. Previously, I was getting different results each time I processed the same name, so I'm much happier with my current performance.
To run my script I issue - nohup perl script.cgi, My primary problem now relates to the fact that this script is dieing after an arbitrary length of time and I've now idea why. This is very frustrating because a name might take 10 seconds to process or 10 hours (for the bandwidth available). Plus I have to process each name twice. Therefore I might execute my script and return the following day to find it died 2 hours after I executed it. Or perhaps, it might have run for 8 hours, dieing just before it was finished processing a name with a really long processing time.
I've checked my error_log and I can't find any errors relating to this script. A suitable fix might be to jump to Start once the condition causing the script to die occurs, but I need to find why my script is dieing in the first place.
Any suggestions.
Thanks
Barry.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.