using grep on a directory to list files for a single date

markkneen has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: using grep on a directory to list files for a single date by zejames (Hermit) on Dec 01, 2004 at 13:43 UTC
Just for fun, I wanted to measure the speed difference of `greping` and just using `while`. So I created, in a test directory, lots of small files : `$dir = "test"; mkdir $dir or die "Unable to create dir : $!" if not ( -d "$dir"); chdir $dir; foreach ( 'aaa' .. 'zzz' ) { open F, "> $_"; my $data = chr(97 + int rand 10); print F $data; close F; }` [download] Then I tried to list each file of this directory, and compare :</o> `use Benchmark qw/cmpthese/; $dir = "test"; opendir DIR, "< $dir"; cmpthese(1000, { 'grep' => sub { opendir DIR, "$dir" or die "Unable to open dir : $!\n"; @list=grep(!/^(\.+?)$/,readdir(DIR)); closedir DIR; }, 'while' => sub { opendir DIR, "$dir" or die "Unable to open dir : $!\n"; while (readdir(DIR)) { push @list, $_ unless /^(\.+?)$/; closedir DIR; } } });` [download] As expected, the difference is huge : `D:\Perl\bin>perl test2.pl Rate grep while grep 6.51/s -- -100% while 2667/s 40833% -- D:\Perl\bin>` [download] Using `grep`, perl interprets `readdir` in list context, and builds and return the whole list of files of the directory, that is huge. When using `while`, perl returnes file names each by each, which is much cheaper in memory. So, in your case : use `while`. For information, I was using Windows XP SP1 and ActivePerl 5.8.4 on a NTFS file system. HTH -- zejames	[reply] [d/l] [select]
Re^2: using grep on a directory to list files for a single date by markkneen (Acolyte) on Dec 01, 2004 at 14:53 UTC
OK, sort of got somthing working but im sure there is a more "efficent" way to do it as its still returning a large array and loads of the elements are empty??? `sub list{ my $path=shift; my $comp=shift; if (! -e $path){die "Error : $path $!\n";} opendir(DIR,$path) or die "Error : $path $!\n"; return sort map { my ($d,$m,$y) = (localtime( (stat "$path/$_")[9] ) )[3..5]; $m+=1; $y+=1900; $m=($m<10)?"0$m":$m; $d=($d<10)?"0$d":$d; my $date = "$d/$m/$y"; if($date eq $comp){"$_\n"}; } grep(!/^(\.+?)$/,readdir(DIR)); }` [download] any ideas?? Thanks for you help on this so far. (goin to try the while() loop next)	[reply] [d/l]
Re^3: using grep on a directory to list files for a single date by ikegami (Patriarch) on Dec 01, 2004 at 15:07 UTC
What is the `if` in the map trying to do? if `$date eq $comp` is false, `map` adds an undef to the returned list. if `$date eq $comp` is true, `map` returns "$_\n". Below, I assume that you're were trying to filter out dates that don't match. Filtering is `grep`'s job, not `map`'s. The "empty" elements you're getting are the undef returned by `map` when `$date eq $comp` is false. `$!` doesn't have any meaningful value after calling `-e`. The `-e` is redundant. `opendir` will fail if the dir doesn't exist, and you already handle that. The capture in `/^(\.+?)$/` wastes time. The `?` is meaningless. I wonder if `$_ eq '.' \|\| $_ eq '..'` would be faster. It's probably faster to divide $comp in $year, $month, $day than to convert all the mtimes to strings. sub list { my ($path, $comp) = @_; $comp =~ m#^(..)/(..)/(....)$# or die("Error: Badly formatted \$comp.\n"); my $comp_d = $1; my $comp_m = $2; my $comp_y = $3; local *DIR; opendir(DIR, $path) or die("Error: Unable to open directory $path: $!\n"); my @filtered_listing; while (<DIR>) { next if /^\.+$/; my ($mtime_d, $mtime_m, $mtime_y) = (localtime( (stat "$path/$_")[9] ) )[3..5]; next unless ( $mtime_d == $comp_d && $mtime_m == $comp_m && $mtime_y == $comp_y ); push(@filtered_listing, $_); } return sort @filtered_listing; } [download]	[reply] [d/l] [select]
Re^4: using grep on a directory to list files for a single date by ikegami (Patriarch) on Dec 01, 2004 at 16:25 UTC
Re^3: using grep on a directory to list files for a single date by revdiablo (Prior) on Dec 01, 2004 at 18:25 UTC
ikegami has already posted an excellent reply, showing exactly how to do it with while. That is probably the best way to solve this particular problem, but I thought I would show you how to use map to filter out elements, for your future reference: `my @array = qw(foo bar baz qux); my @newarray = map { my $foo = $_; $foo =~ s/./\u$&/; # useless example $foo =~ /Ba/ ? $foo : () } grep { /a/ } @array;` [download] The key here is to return an empty list when the condition fails. It's neat that we can do this with map, but it's usually better to use another grep: `my @array = qw(foo bar baz qux); my @newarray = grep { /Ba/ } map { my $foo = $_; $foo =~ s/./\u$&/; # useless example $foo } grep { /a/ } @array;` [download] HTH	[reply] [d/l] [select]
Re: using grep on a directory to list files for a single date by fglock (Vicar) on Dec 01, 2004 at 13:12 UTC
I think that parsing the output of a shell command is the fastest you can go: `ls -R --time=ctime --time-style="+%Y-%m-%d" -g -o` This gives a text like: `./htdocs/gui/img/control/default/cs-iso: total 160 -rw-r-Sr-- 1 1426 2004-10-28 cs-iso_abschic.gif -rw-r-Sr-- 1 1778 2004-10-28 cs-iso_admission_data.gif -rw-r-Sr-- 1 1479 2004-10-28 cs-iso_admit-blue.gif ...` [download]	[reply] [d/l] [select]
Re: using grep on a directory to list files for a single date by Jaap (Curate) on Dec 01, 2004 at 12:53 UTC
You could use the `-M` operator to check the last modification date. Make sure you don't load the whole array into memory first.	[reply] [d/l]
Re^2: using grep on a directory to list files for a single date by markkneen (Acolyte) on Dec 01, 2004 at 13:07 UTC
Thats the sort of Idea I had in mind but not sure how. I know I could use map but i dont have a clue how. Thanks...	[reply]
Re^3: using grep on a directory to list files for a single date by Jaap (Curate) on Dec 01, 2004 at 13:14 UTC
If you don't know how to map/grep, just use a while loop. They tend to be less obfuscated.	[reply]
Re: using grep on a directory to list files for a single date by NiJo (Friar) on Dec 01, 2004 at 19:49 UTC
The key to a solution is reducing disk seeks. I don't know about NTFS, but in Unix there is the 'directory file' (name and inode number->inode (with most of stat() info)->file sectors hierarchy. NTFS should have something similar. If you manage to get all of your information from the 'directory file' only, this can be done in one big read. E. g. you can use an existing naming scheme of the files. The other thing is to play OS. If you need to read the (still some 1000?) filtered files, I'd stat them first. This requires an inode read. But before processing the files one by one, sort them by inode number. You help the OS to reduce disk seeks and to utilize read ahead caches. In one of my toy programms sorting dramatically changes disk sound from noisy screetching to a quiet toktoktok. And it was a lot faster.	[reply]
Re^2: using grep on a directory to list files for a single date by markkneen (Acolyte) on Dec 02, 2004 at 10:15 UTC
Thanks - you have all been fantastic. Really greatfull for everybodies assistance I had a bit of a search around for info about NTFS file table (Master File Table) but not found much in the way of accessing through perl. There is a win32::adminMisc which sounds like it might be up to the job but still reading up on it :( Had to make a few changes to the script to get it to run `while (my $file = readdir(DIR)) {` [download] `while(<DIR>){` <- didn't work. (well not for me). Once again - Thanks... Mark K	[reply] [d/l] [select]
Re^3: using grep on a directory to list files for a single date by zejames (Hermit) on Dec 02, 2004 at 10:22 UTC
angle bracks `<...>` can only be used with file handles, not directory handles. -- zejames	[reply]
The simpler way by Luca Benini (Scribe) on Dec 02, 2004 at 12:41 UTC
find file modificated 3 day ago: find2perl -mtime 3 > a.pl find file modificated from 3 day ago: find2perl -mtime -3 > b.pl ... and after change the resulting script See: man (find\|find2perl)	[reply]