I ask for the wisdom of those who use Perl more than I do, and ask for their opinion on whether the following will work:

while (<>) { $foundit = 0; chomp; $input=$_; if($input =~ /what is/i or $input =~ /who is/i or $input =~ /tell +me about/i){ $input =~ s/what is//ig; $input =~ s/who is//ig; $input =~ s/tell me about//ig; $input =~ s/\?//g; $input =~ s/a //i; ($part1, $part2, $part3) = split " ", $input; open (FILEHANDLE, "<enwikisource-20090621-pages-articles.xml") while(<FILEHANDLE>) { if($_ =~ /<title>$input/i or $_ =~ /<title>$part1 $part2/i + or $_ =~ /<title>$part2 $part3/i or $_ =~ /<title>$part1/i or $_ =~ +/<title>$part2/i or $_ =~ /<title>$part3/i and $correct == 0){ $correct = 1; }else{ continue; } if($_=~ /<p>/i and $foundit == 0){ $foundit = 1; $test = $_; last; } } close FILEHANDLE; ($crap,@goodstuff) = split ">", $test; foreach $item (@goodstuff) { ($finalgoodstuff,$crap)=split "<",$item; $beststuff .= $finalgoodstuff; } print "\n"; print"$beststuff"; print "\n"; $beststuff = ""; $finalgoodstuff = ""; } }

will this work? I am working on a 2.4 GigaByte file (all of Wikipedia) and my Ram is only 2.0 Gigs, so I need all your help on whether this will work, or crash my machine. best case (working like I think it does) it should stop getting data when the file gets to the right point, worst case I run out of Ram and something bad happens....

if anyone could help me, I would be eternally grateful! Thanks!!


In reply to HUGE file poses risk to testing out code... need professional look-see by AI Cowboy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.