in reply to Using output again without printing

One easy way to do it is to split it into two scripts, and connect them with a pipe. In the first script you produce output to STDOUT, in the second you read from STDIN.

But in your case that won't help you at all, because you're slurping input into a variable at once, which means a chunk by chunk processing can't happen.

What you can do is to sest local $/ = 'END-OF-DOCUMENT' and thus read the file block by block (assuming you actually have multiple such blocks in your file).

Replies are listed 'Best First'.
Re^2: Using output again without printing
by micwood (Acolyte) on Aug 01, 2008 at 05:18 UTC
    Moritz: (and others) Thanks for your advice. As for reading the new "clean" file in by blocks (as that is what I think you are suggesting), that would be great. The "cleaning" of the first file puts the "END-OF-DOCUMENT" record divider at the end of each record, whereas "h4>Award\s\#\d+<\/h4" is the start of the record, so it should work. But I am a bit confused (my Perl skills are not up to par). Do I need to modify other parts of the document? By just adding  local $/ = 'END-OF-DOCUMENT', the program no longer parses the data from the records. Should it still be using the new clean document with <IN>, but now only reading a record at a time, as such:
    open(OUT, ">/Users/micwood/Desktop/output.txt"); while (<>) { s/\r//g; s/\t//g; s/(<h4>Award\s\#\d+<\/h4>)/\nEND-OF-DOCUMENT\n$1/g; s/(<!-- \/noindex --><\/font>)/\nEND-OF-DOCUMENT\n$1/g; print OUT "$_";} close OUT; my $novalue = '.'; # temp value my $temp = '.'; # temp value my $awardhashref= (); open (IN, "/Users/micwood/Desktop/output.txt"); open(OUT2, ">/Users/micwood/Desktop/output2.csv"); my $allDocs = do { local $/ = 'END-OF-DOCUMENT'; <IN>; }; my $rxExtractDoc = qr {(?xms) (<h4>Award\s\#(\d+)<\/h4>(.*?)END-OF-DOCUMENT) }; while ($allDocs =~ m{$rxExtractDoc}g ) { my %award = (); # award hash $award{'entireaward'}= $1; $award{'A_awardno'}= $2; $award{'entireaward'}=~ s/\n//g; if ($award{'entireaward'} =~ m{Dollars Obligated<\/td><td align= +right>\$([^<]+?)<\/font>}gi){ $award{'B_dollob'} = $1};
    etc, etc Which is fine, as long as it doesn't read the entire new "clean" file as once since I don't think memory could handle that. But if all I need to do is add  local $/ = 'END-OF-DOCUMENT', any clue why it no longer works? Thanks again, and I hope my questions are too simple (just not very good at this).
      Success!!! I played around with it a bit and found another record separator so I didn't have to rely on my created one in the "cleaning" (ie, the "END-OF-DOCUMENT") and then relocated the other "cleaning" commands in that first part of the script to the block read into the memory. And just if you are curious it now looks as such:
      open(OUT, ">/Users/micwood/Desktop/output.csv"); my $novalue = '.'; # temp value my $temp = '.'; # temp value my $awardhashref= (); my $allDocs = do { local $/ = '<\/table>\n<hr>\n<br>'; <>; }; my $rxExtractDoc = qr {(?xms) (<h4>Award\s\#(\d+)<\/h4>(.*?)<\/table>\n<hr>\n<br>) }; while ($allDocs =~ m{$rxExtractDoc}g ) { my %award = (); # award hash $award{'entireaward'}= $1; $award{'A_awardno'}= $2; $award{'entireaward'}=~ s/\n//g; $award{'entireaward'}=~ s/\t//g; $award{'entireaward'}=~ s/\r//g; if ($award{'entireaward'} =~ m{Dollars Obligated<\/td><td align= +right>\$([^<]+?)<\/font>}gi){ $award{'B_dollob'} = $1};
      etc, etc, And it works! Thanks, again. Best, Michael