Keeping only the $n newest files/directories in a diretory?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
•Re: Keeping only the $n newest files/directories in a diretory? by merlyn (Sage) on Apr 19, 2003 at 10:40 UTC
`sub purge { my $n = shift; # 100, for example my @newest_first = map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, -M], </var/script/proc/*>; system "/bin/rm", "-rf", @newest_first[$n..$#newest_first] if @newest_first > $n; }` [download] -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re^2: Keeping only the $n newest files/directories in a diretory? by Aristotle (Chancellor) on Apr 19, 2003 at 14:01 UTC
`use File::Path; sub purge { my $n = shift; # 100, for example my @newest_first = do { my @dir = grep -d, </var/script/proc/>; my @age = map -M, @dir; @dir[ sort { $age[$a] <=> $age[$b] } 0 .. $#dir ]; }; splice @newest_first, 0, $n; rmtree \@newest_first, 0, 1 if @newest_first; }` [download] Makeshifts last the longest.*	[reply] [d/l]
Re: •Re: Keeping only the $n newest files/directories in a diretory? by Juerd (Abbot) on Apr 19, 2003 at 13:05 UTC
(ST) Is the ST really worth the effort here? On my system, in a directory with 20000 files, there is no noticable difference between `my @newest_first = map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, -M], <>;` [download] and `my @newest_first = sort { -M $a <=> -M $b } <>;` [download] while the first one took me much longer to type in. Most directories on my systems don't even have twenty thousand files :) Juerd - http://juerd.nl/ - spamcollector_perlmonks@juerd.nl (do not use).	[reply] [d/l] [select]
Re: Re: •Re: Keeping only the $n newest files/directories in a diretory? by halley (Prior) on Apr 19, 2003 at 13:45 UTC
Is the ST really worth the effort here? I can understand the "premature optimization" argument here, but I can also understand it if merlyn (Randal Schwartz) types the top two lines of this optimization in his sleep, or at least with a single keystroke. ;) While the /proc filesystem is not actually a filesystem as such, and a clone in /var would probably not have -M issues, either, the same pruning script would be useful in many different circumstances. If trying to run it on a slower link, such as an SMB mounted share across the corporate campus, on a per-hour cron job, I'd definitely want such an optimization. -- `[ e d @ h a l l e y . c c ]`	[reply]
Re: Re: Re: •Re: Keeping only the $n newest files/directories in a diretory? by Juerd (Abbot) on Apr 19, 2003 at 13:51 UTC
Re: •Re: Keeping only the $n newest files/directories in a diretory? by Anonymous Monk on Apr 19, 2003 at 11:47 UTC
I think this code is golfing at its best ;). It works but I don't understand it... Could someone please explain the following part of merlyn's code: `my @newest_first = map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, -M], </var/script/proc/*>;` [download] Thanks	[reply] [d/l]
Re: Re: •Re: Keeping only the $n newest files/directories in a diretory? by halley (Prior) on Apr 19, 2003 at 12:33 UTC
The Schwartzian Transform is a technique to optimize the sorting of complex data structures in Perl. The simplest case would be to do a single line sort: `my @newest_first = sort { -M $a <=> -M $b } </var/script/proc/>;` [download] However, to sort, one must compare pairs of items, often comparing the same $a to many different $bs, and vice versa. All those -M checks take a lot of time. Read a Schwartzian Transform like this starting with the last line. <.../> is a shorthand call to glob() that returns a list of files. The map above that takes each filename and pairs it to its own -M check results. So you get a list of "foo.blah", timestamp pairs. The square brackets keep each pair in their own little referenced array. The sort line sorts these pairs numerically descending by their second (index 1) element, the timestamps. The map at the top will turn the sorted list of pairs into a list of filenames again, keeping the sorted order. -- `[ e d @ h a l l e y . c c ]`	[reply] [d/l]
Re: Re: Re: •Re: Keeping only the $n newest files/directories in a diretory? by Juerd (Abbot) on Apr 19, 2003 at 13:11 UTC
Re: Re: •Re: Keeping only the $n newest files/directories in a diretory? by Improv (Pilgrim) on Apr 19, 2003 at 12:34 UTC
Ok, reading it from the bottom up, `</var/script/proc/>` [download] Makes a list of files in that directory. It's then used as the second argument to `map [$_, -M]` [download] which, from that list, generates a 2nd-order list, each element of which is a list containing in field one the filename, and in field two the modification time (-M is a test operator giving that -- it's default parameter is $_). If you're a C Programmer, think something like the following for each entry in that 2nd order list: `struct { char filename[FNSIZE]; int mtime; / The date field -- modification time */ }` [download] That structure is then used as an argument to `sort { $a->[1] <=> $b->[1] }` [download] which sorts the first index of the 2nd-order list by the date field. The results are then used as the 2nd argument to `map $_->[0]` [download] which returns a 1d list containing just the filenames (although they're sorted this time). The results are then saved in @newest_first. Code like the above can be intimidating unless you know the map operator well. It's an extremely powerful tool once you do though, and is one of the ways in which Perl's expressiveness borrows from Lisp. Hope this helps.	[reply] [d/l] [select]
Re: Re: •Re: Keeping only the $n newest files/directories in a diretory? by Chady (Priest) on Apr 19, 2003 at 12:39 UTC
That's the Schwartzian Transform. I goes something like this: `</var/script/proc/>` will `glob` and return the list of file/directories in that folder. each element is `map`ped into an array ref`[ ]` containing the original string and the last modified date `-M` the list is `sort`ed by the modification date which is the second element of the array ref, hence `$a->[1]` then the first element of the array (which is the original folder name) is retrieved `$_->[0]` with another `map` So in short, you get the list of folders, you build an array of arrays containing the folder name and the modification date, you sort by the modification date, and return only the folder names to use. Update:* :) He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life. Chady \| http://chady.net/	[reply] [d/l] [select]
Re: Keeping only the $n newest files/directories in a diretory? by Chady (Priest) on Apr 19, 2003 at 10:35 UTC
well, you can use `-M` on the directories and sort through that for the last $n items. or you can parse the directory name and determine according to that. He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life. Chady \| http://chady.net/	[reply] [d/l]