in reply to RE: Descending through directories
in thread Getting a List of Files Via Glob
|
---|
Replies are listed 'Best First'. | |
---|---|
RE: RE: RE: Descending through directories
by t0mas (Priest) on Jun 02, 2000 at 00:20 UTC | |
If someone else have ideas about this please, give it a shot with your own code. I use directory travering quite often so I would really be glad to be able to use the most effective code in my programs. Here we go: This test was run on a Pentiun 233 with 128Mb RAM, Windows 2000, FAT32 filesystem C:\\Program holds 13477 files in 1206 folders of which 137 matches *.txt t1: 27 wallclock secs ( 8.40 usr + 16.76 sys = 25.17 CPU) t2: 24 wallclock secs ( 7.69 usr + 15.57 sys = 23.26 CPU) t3: 47 wallclock secs (20.30 usr + 23.85 sys = 44.15 CPU) t4: 36 wallclock secs (11.04 usr + 23.33 sys = 34.37 CPU) t5: 30 wallclock secs (11.12 usr + 18.02 sys = 29.13 CPU) /brother t0mas | [reply] [d/l] |
by Corion (Patriarch) on Jun 02, 2000 at 11:58 UTC | |
I've just run your program (with slight modifications) under Linux on a dual SMP P2-350 machine, on my home directory, whose subdirectories contain about 20 text files and quite a lot (about 500MB) of html files in several directories. The results amazed me. So I did run this test four times in a row, and the last three results were identical but really amazing : t1: 7 wallclock secs ( 2.43 usr + 4.27 sys = 6.70 CPU) t2: 7 wallclock secs ( 2.43 usr + 4.32 sys = 6.75 CPU) t3: 14 wallclock secs ( 8.25 usr + 5.73 sys = 13.98 CPU) t4: 7 wallclock secs ( 1.62 usr + 4.77 sys = 6.39 CPU) t5: 1 wallclock secs ( 0.84 usr 0.21 sys + 0.00 cusr 0.01 csys = 0.00 CPU) The trend we can see is, that everything is faster in general, about the factor 3 or 4, but what really is amazing is, how little time &t5(); takes, only 1 wallclock second. So I did interchange &t4() and &t5() to see if that result was order dependant : ... t4: 1 wallclock secs ( 0.95 usr 0.18 sys + 0.00 cusr 0.01 csys = 0.00 CPU) t5: 7 wallclock secs ( 1.75 usr + 4.65 sys = 6.40 CPU) But it wasn't. This is really strange and sheds some new light on File::Find which I always considered clumsy, and which is one of the slower routines under Win32. Wonders of Perl :). To see how the results would change, I then reran your test for files that match .html (while going through the source code, there were some things with your regular expressions - the ".txt" RE will match anything consisting of at least four letters with "txt" not at the start and the directory matching will leave out directories which start with a "." (so unix "hidden" directories will not be searched). I ran the test three times and threw away the first test results on about 500 MB of html files. t1: 8 wallclock secs ( 2.59 usr + 4.65 sys = 7.24 CPU) t2: 8 wallclock secs ( 2.47 usr + 4.66 sys = 7.13 CPU) t3: 17 wallclock secs ( 8.65 usr + 5.90 sys = 14.55 CPU) t4: 9 wallclock secs ( 1.67 usr + 5.42 sys = 7.09 CPU) t5: 2 wallclock secs ( 1.04 usr 0.23 sys + 0.00 cusr 0.01 csys = 0.00 CPU) And amazingly, the trend continues, with &t5() beating the rest by far, even though I had thought the whole results should have become console bound anyway, but that wasn't so. I wonder what my tests under NT 4 will bring us :) | [reply] |
by t0mas (Priest) on Jun 02, 2000 at 12:22 UTC | |
Good work. I eagerly await the NT 4 tests. /brother t0mas | [reply] |
by Corion (Patriarch) on Jun 04, 2000 at 05:13 UTC | |
I finally got off my lazy back and ran the test on my home machine, a trusty P-100 with 80 MB RAM, and here are the results (with ActivePerl 5.005_03 build 517): FAT 16 drive (no HD activity during the second run)t1: 17 wallclock secs ( 6.66 usr + 9.89 sys = 16.55 CPU) t2: 16 wallclock secs ( 5.89 usr + 8.47 sys = 14.36 CPU) t3: 41 wallclock secs (16.67 usr + 18.16 sys = 34.83 CPU) t4: 27 wallclock secs ( 8.37 usr + 16.88 sys = 25.26 CPU) t5: 15 wallclock secs ( 7.75 usr + 7.07 sys = 14.82 CPU)NTFS drive (slight HD activity for the later parts of the HD) t1: 96 wallclock secs (30.07 usr + 59.09 sys = 89.17 CPU) t2: 87 wallclock secs (27.73 usr + 53.18 sys = 80.91 CPU) t3: 179 wallclock secs (72.02 usr + 96.92 sys = 168.94 CPU) t4: 142 wallclock secs (36.63 usr + 96.15 sys = 132.78 CPU) t5: 81 wallclock secs (35.33 usr + 43.25 sys = 78.58 CPU) So here File::Find is again on par with the solution reading any directory twice and the solution using rewinddir(), and my favourite method of doing stuff, &t4 dosen't look that good either if you are going for peak performance. The fastest solution takes only half the time, and scanning the whole NTFS HD did take some time as you see :). So once again the rule number one of optimizing holds. Benchmark, benchmark, benchmark. | [reply] |
by t0mas (Priest) on Jun 04, 2000 at 12:35 UTC | |
As you say - Benchmark, benchmark, benchmark. Speed is the King many circumstances, but maybe not all. It seems that t1,t2, and t5 is best in this simple kind of searches, but in more complex cases with lots of heavy evaluations and fileops, t3 and t4 (or a more complex t5) is perhaps better. /brother t0mas | [reply] |
by Corion (Patriarch) on Jun 02, 2000 at 01:45 UTC | |
It always amazes me at which places I find users of eConsole - never would I have thought to find a user on perlmonks :) ! Thanks for doing these tests - I didn't even know there was a Benchmark module ! What amazes me is, that the method of reading a directory twice (as done in t1 and t2) is faster than reading it once and checking for file/directory afterwards - you never stop learning I guess ... I will run these tests on my machine (a lowly P-100 running NT 4) and maybe on a Linux machine as well to get a more complete view of the behaviour :) | [reply] |
by t0mas (Priest) on Jun 02, 2000 at 10:51 UTC | |
The same thing amazes me. I guess that doing a regexp on all rows at once is faster than doing it on every $entry. I don't know how Perl handles this stuff internaly. Maybe it recomplies the regexp every time it uses it or something. Pleas do run the test. I would like to see if the results you get is along the same line as the ones I got. And about eConsole I would like to say - Transparency Rules... /brother t0mas | [reply] |
by softworkz (Monk) on May 29, 2001 at 17:04 UTC | |
| [reply] |
by Corion (Patriarch) on Jun 02, 2000 at 01:46 UTC | |
It always amazes me at which places I find users of eConsole - never would I thought to find a user on perlmonks :) ! Thanks for doing these tests - I didn't even know there was a Benchmark module ! What amazes me is, that the method of reading a directory twice (as done in t1 and t2) is faster than reading it once and checking for file/directory afterwards - you never stop learning I guess ... I will run these tests on my machine (a lowly P-100 running NT 4) and maybe on a Linux machine as well to get a more complete view of the behaviour :) | [reply] |
RE: RE: RE: Descending through directories
by t0mas (Priest) on May 31, 2000 at 10:22 UTC | |
Maybe I'll try to benchmark some of the ways when I'll find some time. /brother t0mas | [reply] |