in reply to Increase speed script

It the files to be cleaned are quite big it is better to process them line by line. I have in mind something like
open LIST,$file; open OUTPUT,">>",'cleaned.txt'; print "[-] Cleaning...\n"; while (<LIST>) { ($w1) = split(/:/,$_); print OUTPUT $w1."\n" unless ($data =~ /-/); } close LIST; close OUTPUT;

Replies are listed 'Best First'.
Re^2: Increase speed script
by marto9 (Beadle) on Jul 11, 2008 at 18:07 UTC
    Ty for you advice. But could you pls tell me why it's better to use the while loop instead of the for loop?
      Because the for brings the whole file to the memory before walking thru the lines. This is usually slower than going line-by-line in the file (as the while does) because:
      1. it fills many pages of the memory with the contents of the file, potentially swapping stuff out -- instead of just allocating a couple of pages for a file buffer;
      2. then it walks thru those pages, potentially using cache lines -- instead of pulling the file buffer to the same cache line over and over, freeing cache lines to other stuff;
      3. goes to the disk all at once, blocking until it has read everything -- instead of going to the disk when it had already processed the last chunk of information and then blocking for less time.
      []s, HTH, Massa