This is probably a very simple thing to do, but I can't seem to find an answer. Here's the problem: I am downloading and parsing hundreds of thousands of html files (from a list)and for whatever reason, every once in a while, the perl script is not able to access one of the file (even though It is there and it usually "sees" it). WHen this happens, the code stops running.

What I would like to do is record the filename that couldn't load and continue on looping through the rest of the list. That way, I don't have to babysit the thing and can come back and try with those that didn't work later. Here's the basic structure of my code:

#!/usr/bin/perl -w use strict; use LWP::Simple; open ("output","> /output/results.txt") || die ("Could not open output + file $!"); open ("input", "< /input/urllist.txt") || die ("Could not open input f +ile $!"); $/=undef; my $urllist=<input>; while($urllist =~ m{(http://.+\.html)}g){ my $url=$1; my $html=''; $html = get("$url") or print "Couldn't fetch $url."; while($html=~ m{(find whatever I want)}gi){ $mysearch=$1; print output "$url|$mysearch\n";} } } close ("output"); close ("input");

Basically, I have a file stored locally that has a bunch of urls. I open this and for every url, I try to access it (using the get command) and then search for various things and save the results. So, it calls the "get" command for hundreds of thousands of urls. Just because of the nature of the web, some of these will not work when it tries, even though they are there. When it calls "get" and fails to find the file, how do I tell it to either keep going (by maybe replacing $html with a whitespace or something) or to move on to the next matched $url from the urllist? Thanks in advance.


In reply to How to allow loop to continue to run after a problem opening a file by rizzy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.