in reply to Re: Re: Out of memory.
in thread Out of memory.

I hate suggesting this, but it will probably make the out-of-memory problem go away with the least effort:)

Change

my ($Size)=( (qx{dir /s "$TargetPath"})[-2]=~ /([\d,]+) bytes/ );

to

my ($Size)=( (qx{dir /s "$TargetPath" | find "bytes" })[-2]=~ /([\d,]+ +) bytes/ );

That will filter out the vast majority of the lines produced by the dir /s before they get into perl.

If you have WSH available, then you can use

use Win32::OLE; my $size = Win32::OLE->CreateObject('Scripting.FileSystemObject') ->GetFolder( $TargetPath ) ->size();

Which probably isn't hugely more efficient than dir /s in terms of doing the calculation, but it will prevent the memory problem.

You could also accumulate the directory sizes directly in perl

my (@dirs, @files) = ($TargetPath); scalar map{ push @{ (-d) ? \@dirs : \@files }, $_ } glob pop(@dirs) . '/*' while @dirs; my $size += -s for @files;

Which from my quick tests is much faster than shelling out, and seems to be quicker than using an FSO but it's difficult to measure as file system catching gets involved.

You could also use any of the many File::Find modules to achieve the same effect.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Replies are listed 'Best First'.
Re^4: Out of memory.
by Aristotle (Chancellor) on Jul 23, 2003 at 11:54 UTC
    Ew. map in void context. Hand rolled directory traversal (though that's not as bad on Win32). How about
    use File::Find; my $size; find(sub { $size += -s _ if -f }, $TargetPath);
    (Yes, File::Find is slower than hand rolled traversal. A heck of a lot less (and clearer) code, though, and still beats the dir /s approach by miles.)

    Makeshifts last the longest.

      It wasn't in a void context. It was a scalar context :)

      That said, I'm still in two minds as to whether the 'map in a void context' problem is still a problem any more as (I believe) the main practical reason for not doing it was the inefficiency caused by building the return list only for it to be discarded. Since map now tests for context and doesn't bother building the list if in a void context, the reasons now seem esoteric/style related rather than practical. I agree that in most cases a for loop or modifier is better, but if you need to use a second modifier, map lends itself to the purpose. I wouldn't use it often, but I'm still undecided whether a blanket ban is called for.

      As for the File::Find and related calls. I have a (probably irrational) phobia about them, and it's not only to do with performance. I dislike their callback method of working, and their reliance upon global vars amongst other things. For example, the anonymous sub in your example is being called (many times) in a void context and the (unavoidable) return value is being discarded. How is this different from calling map in a void context?

      This really is "language lawyer" type stuff, but I seriously would appreciate your reaction/interpretation as I know you have made a point of studying many issues of this type.

      I'd also like to understand your (though that's not as bad on Win32) comment. How/why is this different for Win32 as compared to other OS's? I've racked my brains to see what you mean, but the significance escapes me?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

        if you need to use a second modifier, map lends itself to the purpose.
        Hm? It takes a block, doesn't it? Not really comparable to a modifier.
        How is this different from calling map in a void context?

        Good question. Although if you go down that route, you could also say I'm calling find in void context, and the same about pretty much any prodedure call and basically every other line of code.

        In my point of view it's more of a conceptual question - the memory and performance concerns are additional, not the sole reason. I guess this is the mathematical part of my brain kicking in here - but to me, map is for mapping one set to another, and the fact that to do so it loops over the input is more of an implementation detail. On a dataflow architecture, as opposed to the Neumann machines Perl runs on, you could actually do the entire processing in large chunks at once.

        Ivory tower? You decide. (After all, my weakness is false hubris, as opposed to the false laziness most people have to overcome.)

        How/why is this different for Win32 as compared to other OS's?
        Traversal is easier to get right on Win32 systems. Besides some red herring special file types supported by more recent NTFS variants (which, to my knowledge, can be treated as plain files for most intents and purposes, anyway), there are only regular files and directories in a Win32 file system. On Unixoid systems, the situation is decidedly more complex with symlinks in the traversal equation. In fact it's even more tricky if you're collecting statistics on files, considering you have hardlinks, sparse files, fifos, device nodes, and possibly other more exotic beasts to deal with.

        Makeshifts last the longest.

Re: Re: Re: Re: Out of memory.
by blackadder (Hermit) on Jul 24, 2003 at 10:07 UTC
    Thanking you sir,

    The first method did not work.

    Error
    DrvTop=> D:\ZZ-TESTRESTORE (Shared by remote command.) Sizing \\u +kz423\D$\ZZ-TESTRESTORE The name specified is not recognized as an internal or external command, operable program or batch file. :
    The second method ignored sub directories. Would be usefull to me if I knew how get it to include Sub Dirs, no help in the documentaion.

    The last method failed to produce any out out. I tried printing $size and @files but got zilch display.

    So basically I am back to square root of 1.
Re: Re: Re: Re: Out of memory.
by blackadder (Hermit) on Jul 24, 2003 at 09:41 UTC
    Thanking you kind sir, however; <be>
    This did not work
    my ($Size)=( (qx{dir /s "$TargetPath" | find "bytes" })[-2]=~ /([\d,]+ +) bytes/ );
    I got the following error;
    DrvTop=> D:\APPS () Sizing \\Server118\D$\APPSThe name spe +cified is not recognized as internal or external command, operable program or batch file. Use of uninitialized value in pattern match (m//) at C:\Scripts\shr_in +fo1.pl line 44 (#1) (W uninitialized) An undefined value was used as if it were alread +y defined. It was interpreted as a "" or a 0, but maybe it was a mi +stake. To suppress this warning assign a defined value to your variables. To help you figure out what was undefined, perl tells you what ope +ration you used the undefined value in. Note, however, that perl optimiz +es your program and the operation displayed in the warning may not necessa +rily appear literally in your program. For example, "that $foo" is usually optimized into "that " . $foo, and the warning will refer +to the concatenation (.) operator, even though there is no . in your program. Use of uninitialized value in concatenation (.) or string at C:\Scripts\shr_info1.pl line 45 (#1) :
    The OLE method worked but the values were not accurate because it ignored sub directories, would be useful if I knew how to get it to include sub dirs as well. I had a look in the documentation but there was not even a mention of GetFolder. Quick search on PM ,…nothing.

    This method seems to be the easiest for me, but only if I knew how to get the sub directories,…do I need to write a code the will recurs into sub directories?but then I am back to square 1!

    I am about to try out the final method…

      What can I say?

      1. Method 1 works (and was tested) on my setup:
        ($size) = ( ( qx{ dir /s $TargetPath | find "bytes" } )[-2] =~ /([\d, +]+) bytes/ ); print "$TargetPath : $size" \\HIAWATHA\C$\test : 22,578,657

        The only explanation I can think of for the error you posted is that the find command is not available, or not in the path for the userid under which your code is running. The command is provided by the file find.exe which is a standard part of every Windows version I am aware of. It is usually located in

        %systemroot%\system32\find.exe

        which in turn is usually in the standard path setting. I can't imagine why this would not be the case on your system but you could try modifying the line to explicitly run it from the appropriate place.

        my ($Size)=( (qx{dir /s "$TargetPath" | %SystemRoot%\\system32\\find " +bytes" })[-2]=~ /([\d,]+ 002 +) bytes/ );

        Note: You'll need to check that the environment variable %SystemRoot% is available and that find.exe is actually available and visible to you.

        Did you consider trying the command from a CLI?

      2. Method 2: Also works, does produce a total for the entire subtree. Ie it gives the same results as method 1.
        use Win32::OLE; $fso = Win32::OLE->CreateObject('Scripting.FileSystemObject'); $f = $fso->GetFolder( '\\HIAWATHA\c$\test' ) or warn $^E; print $f->Size; 22578657

        You'll notice that this is the same value (minus the commas) as produced by the dir /s method above.

      3. Method 3: This works fine on my system?
        (@dirs, @files) = ( '//HIAWATHA/c$/test' ); scalar map{ push @{ (-d) ? \@dirs : \@files }, $_ } glob pop(@dirs) . +'/*' while @dirs; $size += -s for @files; print $size; 22578657

        Again, the number produced is identical to the other two.

        The only problem I can find with this is that glob doesn't seem to accept UNC paths that use backslashes. It accepts standard paths with backslashes.

        This worksperl -le " print for glob 'c:\test\*'"

        As do full UNC paths using forward slashes

        perl -le " print for glob '//HIAWATHA/c$/test/*' "

        But not UNC paths with backslashes :(. All of the following silently fail!

        P:\test>perl -Mstrict -wle " print for glob '\\\\HIAWATHA\\c$\\test\\* +' " P:\test>perl -Mstrict -wle " print for glob '\\\\HIAWATHA\\c$\\test\\* +' " P:\test>perl -Mstrict -wle " print for glob '\\\\HIAWATHA\\c\$\\test\\ +*' " P:\test>perl -Mstrict -wle " print for glob qq[\\\\HIAWATHA\\c\$\\test +\\*] "

        Which as your code is set up to send full UNC paths to CMD which requires the backslashes probably explains that failure. I hadn't noticed this as I invariably use forward slashes accept when passing things to CMD.

      4. Method 4: File::Find.

        Just for completeness, I also tested the File::Find route that Aristotle posted. With the caveat that this also requires the use of forward slashes rather than backslashes for UNC paths, it also works and produces identical results to the other three.

        use File::Find; $size = 0 find(sub { $size += -s _ if -f }, '\\HIAWATHA\c$\test' ); Can't stat \HIAWATHA\c$\test: No such file or directory at (eval 4) line 1 find(sub { $size += -s _ if -f }, '\\\\HIAWATHA\\c\$\\test' ); Can't stat \\HIAWATHA\c\$\test: No such file or directory at (eval 5) line 1 find(sub { $size += -s _ if -f }, '//HIAWATHA/c$/test/' ); print $size; 22578657

      So, all four methods will work, though they may not be "drop in" replacements for you original line of code, the effort required to make them so is minimal.

      Of course, this wouldn't be complete without answering the question that we are all dying to know the answer to:). Which is the fastest?

      Sorry, but despite my best efforts, I have been unable to arrive a consistant set of figures. The effects of multiple levels of caching, (disk caching, network redirector caching, perl caching stuff) mean that I can get different "winners" each time I re-run the tests. For reasons that I can't begin to explain, I can even get different results on the first run after a re-boot.

      Suffice it to say, once a network is involved (as opposed to my simulations within the same box), method 1 is likely to loose out badly simply because of the volumes of data being generated and discarded. This is true regardless of whether the discarding is done externally using the find filter, or internally to perl. Though saving perl from having to allocate large amounts of memory to store the huge lists generated is substantial of itself.

      I also encountered a occasional anomolies where the OLE code would bottle out with access denied errors, but they where not consistant, and immediatley after failure, running identical code but from a VBScript, produced no such errors, so it looks like there may be a problem somewhere in the Win32::OLE code. If I manage to tie down a reproducable testcase, I'll report this.

      I conclude that either of the perl solutions seems to be a good choice, as they both suffer the same limitations regarding the UNC path syntax and from what I can tell the difference in speed in minimal. The File::Find route is undoubtably cleaner code, so I would recommend Aristotles version over any of the three I offered.

      HTH.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

        I am grateful for your help and explanation kind Sir....I feel that I have asked too many questions, but its frustrating when I search for an answer to a Perl query and can't find anything,…btw, I am a bit slow too…, and then I resort to PM which hasn't failed yet....

        Thanks alot.