I ask for the wisdom of those who use Perl more than I do, and ask for their opinion on whether the following will work:
while (<>) {
$foundit = 0;
chomp;
$input=$_;
if($input =~ /what is/i or $input =~ /who is/i or $input =~ /tell
+me about/i){
$input =~ s/what is//ig;
$input =~ s/who is//ig;
$input =~ s/tell me about//ig;
$input =~ s/\?//g;
$input =~ s/a //i;
($part1, $part2, $part3) = split " ", $input;
open (FILEHANDLE, "<enwikisource-20090621-pages-articles.xml")
while(<FILEHANDLE>) {
if($_ =~ /<title>$input/i or $_ =~ /<title>$part1 $part2/i
+ or $_ =~ /<title>$part2 $part3/i or $_ =~ /<title>$part1/i or $_ =~
+/<title>$part2/i or $_ =~ /<title>$part3/i and $correct == 0){
$correct = 1;
}else{
continue;
}
if($_=~ /<p>/i and $foundit == 0){
$foundit = 1;
$test = $_;
last;
}
}
close FILEHANDLE;
($crap,@goodstuff) = split ">", $test;
foreach $item (@goodstuff) {
($finalgoodstuff,$crap)=split "<",$item;
$beststuff .= $finalgoodstuff;
}
print "\n";
print"$beststuff";
print "\n";
$beststuff = "";
$finalgoodstuff = "";
}
}
will this work? I am working on a 2.4 GigaByte file (all of Wikipedia) and my Ram is only 2.0 Gigs, so I need all your help on whether this will work, or crash my machine. best case (working like I think it does) it should stop getting data when the file gets to the right point, worst case I run out of Ram and something bad happens....
if anyone could help me, I would be eternally grateful! Thanks!!