Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi ALL, I'm trying to write a script which will retrieve all the files whose names are present in an excel. I'm using an array to get all the entries present in the excel and splicing the input array when i find a match for a particular entry in the array. But an infinite loop is running even after all the files are fetched. And also the process is very slow when i try to retrieve files from directories that have huge datas in it. So help me out in ending the loop and also in speeding up the process.
use Spreadsheet::ParseExcel; use File::Find; use File::Copy; my $oBook = Spreadsheet::ParseExcel::Workbook->Parse('regions.xls'); my($iR, $iC, $oWkS, $oWkC); my $dir = "D:\schema"; #for mapped drives give \\ and for local drives + \ my $currentdir = $ARGV[0]; my @inputarray = (); my $count = -1; foreach my $oWkS (@{$oBook->{Worksheet}}) { for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $ +oWkS->{MaxRow} ; $iR++) { for(my $iC = $oWkS->{MinCol} ;defined $oWkS->{MaxCol} +&& $iC <= $oWkS->{MaxCol} ; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; push(@inputarray,$oWkC->Value); } } } while ($#inputarray > 0) { find({ wanted => \&edits}, $dir); } sub edits() { if ( -f and /.xml?/ and $#inputarray >= 0) { $file = $_; print $#inputarray; foreach $input(@inputarray) { $count++; if ($#inputarray < 0) { print "Input array empty"; last; } if ($file =~ $input) { @newarraypreceding = splice (@inputarray,0 +,$count-1); @newarrayfollowing = splice (@inputarray,$ +count+1); @inputarray = (@newarraypreceding , @newar +rayfollowing); print $input; print " popped out $input fr +om array"; copy($file, $currentdir//$file) or die "Fi +le cannot be copied."; print "Array count : $#inputarray"; } } } }

Replies are listed 'Best First'.
Re: recursive loop and performace issue in find::file
by Ratazong (Monsignor) on Apr 26, 2011 at 11:45 UTC

    Ouch! Look at the following:

    foreach $input(@inputarray) { .... @inputarray = (@newarraypreceding , @newarrayfollowing); ...
    Never try to remove elements from the array you are currently looping through ... unless you are looking for trouble ;-) (or until you know exactly what you do!) This is probably the reason for your infinite loop.

    Regarding speed: you do a lot of array-copying (splicing and putting together the arrays). Be aware that copying arrays is potentially slow ...

    With your program you seem to want to create a list of filenames and check if they are existing in the filesystem. A typical approach would be to use a hash for the list of filenames (use the filename as keys!). For each filename you find in the file-system you would then check if a corresponding entry exists in the hash - and either mark it there or remove it.

    If you really need the result in an array (which I doubt), you could create an array out of the hash once at the end.

    HTH, Rata

    btw.: looking up something in a hash is typically much faster than looking it up in an array...
      Ratazong,
      Never try to remove elements from the array you are currently looping through ... unless you are looking for trouble ;-) (or until you know exactly what you do!) This is probably the reason for your infinite loop.

      Indeed. I think it would have been ok to go on and explain a couple of the situations where it is ok and how to address it:

      for my $idx (reverse 0 .. $#array) { my $item = $array[$idx]; if (some_test($item)) { splice(@array, $idx, 1); # remove the element } }

      In the above case, it is ok to remove items because of the order they are being processed. Here is another example:

      while (@work) { my $item = shift/pop @work; # Possibly push/unshift items onto @work }

      In the above case we aren't looping over the array, we are using it as a boolean. When all the items have been removed the loop exists. This is useful when implementing your own stack/queue based solution and avoiding recursion.

      I agree with your advice. I am just frustrated when people tell me don't do that unless you know what you are doing. I try to give them examples when it is ok, explain why or examples when it isn't ok and explain why not. I hope you don't mind that I took your advice a little further.

      Cheers - L~R

        Limbic~Region,

        I agree that the phrase don't do this unless you know more is frustrating for some people.

        According to my experience there are three steps when learning (not only programming languages!):

        1. learn (simple) rules
        2. learn the reason for each rule
        3. learn when to break a rule - and what risk is involved
        Based on the code provided by the OP, I assumed that the adequate answer would be on step 1 ... and that further details would be more distracting than helpful. Creating an additional node with further information and excellent examples (as you did) is a very good improvement - especially for other readers which already possess more knowledge/insight. So I certainly don't mind your comments :-)

        Have a nice day! Rata
Re: recursive loop and performace issue in find::file
by moritz (Cardinal) on Apr 26, 2011 at 11:57 UTC
    There's no need to use a while-loop around the call to find.

    Also your current approach is very error-prone: it will result in an infinite loop if one of the files is not found, even if the rest of the code worked. Try this instead:

    my %search_for; $search_for{$_} = 1 for @input_array; find({ wanted => \&edits}, $dir); print "Couldn't find these files: ", join(', ', keys %search_for), "\n"; sub edits { my $file = $_; if ($search_for{$file}) { print "Found $file, we were looking for it!\n"; delete $search_for{$file}; # do the copying etc. here } }
Re: recursive loop and performace issue in find::file
by raybies (Chaplain) on Apr 26, 2011 at 12:00 UTC
    ...(not related to your problem... but...) also the . in your /.xml?/ regex probably isn't doing what you think it should...