I suspect that the slowness is largely due to the fact that the Perl code can't resist doing a stat on each file found and Perl's emulation of stat(2) on Windows does extra work to ask for the count of "links" that exist to that file. Unfortunately, ntfs supports hard links in some way such that the number of hard links is not efficiently cached as in a Unix inode and so the code to look up the link count sometimes does things that can take significantly longer than would be taken by only use of FindNextFile. See p5git://win32/win32.c.:

if (!w32_sloppystat) { /* We must open & close the file once; otherwise file attribut +e changes */ /* might not yet have propagated to "other" hard links of the +same file. */ /* This also gives us an opportunity to determine the number o +f links. */ HANDLE handle = CreateFileA(path, 0, 0, NULL, OPEN_EXISTING, 0 +, NULL); if (handle != INVALID_HANDLE_VALUE) { BY_HANDLE_FILE_INFORMATION bhi; if (GetFileInformationByHandle(handle, &bhi)) nlink = bhi.nNumberOfLinks; CloseHandle(handle); }

It is my experience that the time taken by that code can be fairly short but sometimes is pronounced (and seems to at least nearly lock up much of Windows and so feels like some kind of interlock that also involves networking calls). Though I have yet to find technical details about what is going on.

It is too bad that one can't easily arrange for w32_sloppystat to be true for the many cases when one would like stat to be fast at the expense of things that very often won't matter much to Win32 uses of Perl code.

#ifdef PERL_IS_MINIPERL w32_sloppystat = TRUE; #else w32_sloppystat = FALSE; #endif

It would quite nice if that unconditional FALSE were instead a lookup of some environment variable, like PERL_WIN32_SLOPPY_STAT. (Update: Or does ${^WIN32_SLOPPY_STAT­} = 1; still work for that?)

Though, it is possible to get Perl to quickly iterate over file names in Win32 by avoiding readdir and instead calling FindFirstFile and FindNextFile more directly. There is even such code hidden deep in the archives of this very website. I'll probably eventually succeed in finding it at which point I'll post a pointer to such.

Update: Re: Threads slurping a directory and processing before conclusion looks useful (or at least interesting). It hints that one can get sloppy stat via some special Perl variable. I have not yet looked into whether that is still true. Re: Quickest way to get a list of all folders in a directory says similar things and fills in one more detail. Re^3: Win32api::File and Directories offers some code that might be another good route.

- tye        


In reply to Re: print all files is soo slow! Why? (stat, ntfs, links) by tye
in thread print all files is soo slow! Why? by harangzsolt33

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.