File::Find considered hard?

Replies are listed 'Best First'.
Re: File::Find considered hard? by Corion (Patriarch) on Mar 14, 2004 at 19:01 UTC
The problem is not that it is hard to understand, the problem is, that in 99% of all cases, I just want a list of files, and not some code invoked on it - that's why File::Find::Rule is so much nicer. How often have you written code like the following? `my @files; File::Find::find( sub { push @files, $File::Find::name}, '.' );` [download] The fact that File::Find only gives the local name as a parameter and not the full path to the file and that it sacrifices portability for speed ($USE_NLINK) just adds to that...	[reply] [d/l]
Re: Re: File::Find considered hard? by Jenda (Abbot) on Mar 14, 2004 at 19:28 UTC
Well ... never. I wrote something like `my @files; find( sub { push @files, $File::Find::name if <some condition> }, '.')` [download] a few times though. And usualy the resulting list was much smaller than a list that would contain all files&directories. Most of the time though I want to actually DO something with the files. I do agree the several package variables and $_ are a bit strange, it would be cleaner if the filename and path was passed to &wanted as parameters, but I don't have a problem with it anyway. I do not understand your comment about the USE_NLINK though. From perldoc File::Find: You can set the variable $File::Find::dont_use_nlink to 1, if you want to force File::Find to always stat directories. This was used for file systems that do not have an "nlink" count matching the number of sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS (OS/2 file system), FAT (DOS file system) and a couple of others. You shouldn't need to set this variable, since File::Find should now detect such file systems on-the-fly and switch itself to using stat. This works even for parts of your file system, like a mounted CD-ROM. Jenda Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- Rick Osborne Edit by castaway: Closed small tag in signature	[reply] [d/l]
Re: Re: Re: File::Find considered hard? by Corion (Patriarch) on Mar 14, 2004 at 19:36 UTC
"A bit strange" is exactly the problem for a core module that solves a common problem - hence the comment about the abysmal interface. I do remember File::Find from the time where it always used the nlink entry for scanning for subdirectories, and where it failed in far too many cases. It's nice that they changed it now, but I think it's still an issue with Perl 5.6.1 - but I always set dont_use_nlink unless I forget nowadays. Still, for a module that should provide a nice and easy service, this is much too convoluted.	[reply]
Re: File::Find considered hard? by Abigail-II (Bishop) on Mar 14, 2004 at 21:10 UTC
How often have you written code like the following? Never. chomp (my @files = `find .`); [download] is shorter, immediately clear (at least to me), and works on any platform I would care about. Abigail	[reply] [d/l]
Re: Re: File::Find considered hard? by etcshadow (Priest) on Mar 14, 2004 at 21:29 UTC
Good to know... now I can be rox0rzing all over your systems by creating a file named /home/etcshadow/foo\n/etc/shadow. w00t! </script-kiddie> `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l]
Re: File::Find considered hard? by Abigail-II (Bishop) on Mar 14, 2004 at 21:39 UTC
Re: Re: File::Find considered hard? by Anonymous Monk on Mar 15, 2004 at 04:40 UTC
Some notes below your chosen depth have not been shown here
Re^2: File::Find considered hard? by Anonymous Monk on Mar 15, 2004 at 06:06 UTC
works on any platform I would care about. Hrm. Do other version of Windows other than XP spit out the info you'd expect from 'find .'? Such a command on my WinXP box is invalid.	[reply]
Re: File::Find considered hard? by Abigail-II (Bishop) on Mar 15, 2004 at 08:31 UTC
Re: File::Find considered hard? by perrin (Chancellor) on Mar 14, 2004 at 19:00 UTC
Maybe they are complaining about the fact that File::Find does most of its communication through global variables. That's pretty awful, and not in tune with modern methods for perl modules.	[reply]
Re: File::Find considered hard? by jonadab (Parson) on Mar 15, 2004 at 17:57 UTC
Maybe they are complaining about the fact that File::Find does most of its communication through global variables. Does it really use global variables (err, variables in the main package; I assume it doesn't use the truly global punctuation variables, other than for their intended uses), or does it use package variables in its own package? The former would be very messy; the latter isn't nearly so bad. It wouldn't make sense for an OO module (like DBI), but for a module with a function interface it seems reasonable enough to me. I suspect the OP may be right, and that they may be complaining about passing anonymous functions around. Anybody with a solid familiarity with Perl (or any other language that supports the functional paradigm) will be reasonably comfortable with this, but a newbie coming in from another language (especially a procedural or OO language) may have trouble with it at first. This is not surprising; it's a different paradigm than the ones they're familiar with. They'll also have trouble at first with the list operators, and if you show them closures you'll want to have a camera handy to take a snapshot of the funny looks on their faces. This will pass with time, as they learn the different paradigms that Perl supports and why each is useful. (If they like OO, it may help to tell them that lexical closures are one way to achieve encapsulation. That may spark their interest enough to get them to learn something, instead of turning away in disgust.) `;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}` `split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print`	[reply]
Re: Re: File::Find considered hard? by perrin (Chancellor) on Mar 15, 2004 at 18:43 UTC
Does it really use global variables (err, variables in the main package; I assume it doesn't use the truly global punctuation variables, other than for their intended uses), or does it use package variables in its own package? Call them package variables if you want to -- they still meet the definition of "global variable" in most languages, i.e. they are read/write accessible by any code from anywhere. I'm not going to go into a whole explanation of why using globals to pass information, regardless of the use of OO or not, is a really bad design, because many other people have written about it at length. It's not as if these globals are being used internally only -- they are part of the public API for File::Find.	[reply]
Re: Re: Re: File::Find considered hard? by dragonchild (Archbishop) on Mar 15, 2004 at 18:48 UTC
Re^4: File::Find considered hard? by Aristotle (Chancellor) on Mar 15, 2004 at 20:02 UTC
Re: File::Find considered hard? by Aristotle (Chancellor) on Mar 15, 2004 at 08:08 UTC
The fact that callback interfaces are uncommon is not a factor in calling that of File::Find abysmal, in my opinion. I do think it's a very poor design, but for reasons other than that it's supposedly hard to understand. File::Find's approach to the problem domain is one of absolute minimalism, and it's hard to conceive an even more barren interface. As a result I find that accomplishing nontrivial tasks using File::Find results in messy and hard to read code, no matter how hard I try to fix this. Concerns are hard to separate in any concise fashion. But File::Find is even worse than the previous implies because not only does it make hard things difficult, it also makes trivial things unnecessarily involved. Simply getting a list of files or directories requires too much setup, f.ex. In short, there's never a situation where the interface is a natural fit. Parts of a program that deal with calling `File::Find::find()` always feel like a wart. File::Find::Rule shows how things can be done better: an interface tailored to common tasks in the problem domain improves expressiveness and readability and makes it easier to separate concerns. Makeshifts last the longest.	[reply]
Re: File::Find considered hard? by etcshadow (Priest) on Mar 14, 2004 at 21:26 UTC
For my own part, I will say that: Passing subrefs as a way of delegating authority is cool, and I don't think it's a bad thing, when necessary. The downsides to it are that it is hard for people to understand if they are not used to the concept, and it is not a familiar idiom used in simple things in perl. (consider that getting all files in a directory tree is the sort of thing that a perl user might want to do significantly sooner in his/her perl career than writing a socket server). The second issue (with which I've become intimately familiar) is that its interface does not accomodate being hidden beneath any kind of iterative interface... which makes it crappy for code-reuse. Say you're working on something which needs to iterate through all files in a directory, recursively, but it, itself, get's called in a sort of `$obj->doNextFile()` manner. The fact that File::Find cannot be wrapped in any way to provide the underpinnings for such an interface just sucks eggs. The only way that you can do this is if, on the first call to doNextFile, you call File::Find to generate you a list of all files, and store that as state within your object. If you're working over a very large set of files, this is just stupidly wasteful of space, and causes a huge penalty in fire-up time (what if you might want to bail out early? well you still had to construct a list of every file before you could even start). Anyway, that's my $.02 on why the interface to File::Find is rotten. It's really the sort of module that you would prefer if newbs started using really early on, and currying function calls is the sort of thing that someone new to perl, and maybe just trying to hack together a few simple systems admin scripts, doesn't want to and should have to deal with. Also, it just sucks for code-reuse... I'm still on the knife edge of reimplementing File::Find's features for a project I'm working on, because I just don't want to have to start off by traversing the entire @#$%ing directory tree and storing it in a huge array. `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l] [select]
Re: Re: File::Find considered hard? by Jenda (Abbot) on Mar 16, 2004 at 20:36 UTC
I guess you are right it would be best if File::Find supported both types of interfaces, functional and iterative. There is no currying going on in here though. The find() is a higher order function, but it is not curried. If it was it would allow you to pass it just the wanted() function and get a function "find_and_do_something()": `my $delete_tmp = find(sub {unlink($_) if -f and /\.tmp$/i}); ... $delete_tmp->($one_directory); ... $delete_tmp->($other_directory); ..` [download] Jenda Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. -- Rick Osborne Edit by castaway: Closed small tag in signature	[reply] [d/l]
Re: Re: Re: File::Find considered hard? by etcshadow (Priest) on Mar 16, 2004 at 20:58 UTC
The find() is a higher order function, but it is not curried. Right... I meant that the "wanted" function is (at least frequently) curried, or at least a closure. `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l]
Re: File::Find considered hard? (callbacks) by tye (Sage) on Mar 14, 2004 at 23:11 UTC
See Re: Are you looking at XML processing the right way? (merge) for why callbacks are fundamentally worse than several other interfaces. Another problem with File::Find's interface is that the order in which you get things isn't always the order you want or expect. You also don't get told when you go down or up a level. I find that it is often easier for me to just write a directory searcher than to figure out how to do what I want with File::Find (since I will make the decisions about exactly what order to do things and when I go up and down a level is clear). And remember that you always want to set $dont_use_nlink unless you are only interested in file names, not any file properties. - tye	[reply]
Re: File::Find considered hard? by etcshadow (Priest) on Mar 15, 2004 at 01:53 UTC
Well... here's another way to put it: compare File::Find in perl to find in shell. Now also think of this: "good tools make the easy things easy and the hard things possible." What is the "easy thing" to do with find (or File::Find)? It is to produce a list of every file/directory in a directory tree and/or iterate over that list. So, look at how find (as a shell command) interfaces to shell scripting to perform the simple end of things (bear in mind that it often isn't even necessary to use find for doing a great deal of simple things in shell commands, because most shell commands that have anything to gain by it implement there own directory recursion, via a "-r" or "-R" switch... but that's a whole other argument, all together): `find > contents.txt` do_stuff_to_files `find` `find \| do_other_stuff_to_files` Also, please, let's put aside the fact that you really should be doing those more like: `find -print0 \| perl -0pl012e 's/\\/\\\\/g; s/\n/\\n/sg' > contents.txt` `find -print0 \| xargs -0 do_stuff_to_files` `find -print0 \| do_other_stuff_to_files_but_split_input_on_null` As all of that extra garble is just a result of limitations of shell scripting. They're not part of the inherent concept of what's going on. It's just an artifact. Anyway, now think of how File::Find interfaces to other perl code, and compare this to how other perl code interfaces. I'm not gonna write out how it does work, but lets look at how it should work: `print File::Find::find();` `do_stuff_to_file_list(File::Find::find());` `foreach my $file (File::Find::find) { do_stuff $file }` `my $finder = File::Find->new(); while (my $file = $finder->next()) { do_stuff $file }` I think the reason why you see so many people complain about how File::Find works is that they have an expectation that it work, basically, like above. It's a rare (and wonderful) thing when building a module, to encounter a pre-existing interface to build to... even if the interface is only existing in the mind of every would-be user of the module. Where the creators of File::Find went wrong was when they decided to model the interface to File::Find off of the -exec command to find (rather than the -print or -print0 command). The thing is: -exec is only necessary in the find command because of these two things: You can't do everything you'd want to do with `find -print0 \| xargs -0`. For example, to rename all files to $file.bak, you'd have to write a shell "for" or "while" loop. A lot of people don't like writing shell for and while loops. These are people who understand a lot of the fundamentals of shell, but don't consider themselves to be shell scripters. Of course, in perl, the first is still sort of true, but that's irrelevant because the second is completely wrong. In fact, the opposite is the case. Perl programmers would rather deal with a list or a loop than with a callback. In this sense, File::Find's interface problem is in many ways a microcosm of a common issue with technology: new technology comes along to replace old technology, but carries along artifacts of its predecessor that don't apply any more. The thing that is so tragic about how it happened with File::Find is that the artifact which should have been abandoned has actually been taken as the central feature. Instead of taking the case that should have been the focus (-print0), and realizing that the issues which gave rise to the need for the ugly artifact were not an issue in perl, the implementers focused on the artifact and dropped the central case. Would people have excepted unix find so well, if just plain old `find` didn't work... if you had to `find -exec echo \{\} \;`. Who would have used that? Only people who really, really needed to. And they would have cursed it all the way. `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l] [select]
Re^2: File::Find considered hard? by Aristotle (Chancellor) on Mar 15, 2004 at 07:18 UTC
Just an aside: You can't do everything you'd want to do with `find -print0 \| xargs -0`. For example, to rename all files to $file.bak, you'd have to write a shell "for" or "while" loop. At least with GNU xargs, that's not true. There's an option `-i` which lets you specify a placeholder in the commandline passed to xargs (which can be specified but defaults to `{}`), so a xargs solution for the example above would be `find $FOO -print0 \| xargs -0i mv {} {}.bak` [download] Of course this loses the main advantage of xargs: you are back to spawning one mv process per file, so you might as well just use the portable `-exec` interface. Makeshifts last the longest.	[reply] [d/l]
Re: Re^2: File::Find considered hard? by etcshadow (Priest) on Mar 15, 2004 at 07:56 UTC
Perfectly valid, but I don't think it does anything to damage the point I was trying to make. I mean... even if xargs didn't have that option, you could do something like: `find -print0 \| xargs -0 -l sh -c 'mv "$0" "$0".bak'` [download] But that's just getting silly, and even further from the point. =D `------------ :Wq Not an editor command: Wq` [download]	[reply] [d/l] [select]
Re^4: File::Find considered hard? by Aristotle (Chancellor) on Mar 15, 2004 at 08:09 UTC
Re: File::Find considered hard? by zby (Vicar) on Mar 14, 2004 at 20:17 UTC
It's just that foreach loops are so much more frequently encountered in perl code than passing subroutines. Beside that, the POD for `File::Find` introduces the `wanted` subroutine which creates one more redirection layer in the code and every redirection layer is allways a barriere. Everyone starts with the documentation and what they see is something much more complicated then they think it should be. I believe it would make a big difference if the maintainer changed this to the anonymous subroutine just like in your example.	[reply] [d/l] [select]
Re: File::Find considered hard? by jonadab (Parson) on Mar 15, 2004 at 20:50 UTC
It's just that foreach loops are so much more frequently encountered in perl code than passing subroutines. Well, sure, but foreach loops are one of the most common things in Perl. They're probably more common also than while loops or filehandles, but that doesn't make while loops or filehandles hard to understand... Beside that, the POD for File::Find introduces the wanted subroutine Actually, it's worse than that. The third example in the synopsis passes an anonymous hash using the curly brace anonymous hash constructor, and within that hash one of the values is a reference to a subroutine using \&foo syntax. You need to have at least some grasp of Perl's references to be able to follow this. References are one of the topics a lot of Perl newbies don't get around to for quite a while, because there's quite a bit you can do without needing them. Usually their first need for Perl's references is to construct nested data structures. I should disclaim the following statement by noting that File::Find has not been up to this point a module that I've actually used (largely because I have not felt the need for it), but to me, from looking at the docs on CPAN, the interface doesn't look bad, though it does look like it requires an understanding of certain Perl concepts that people relatively new to the language might not fully understand yet. `;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}` `split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print`	[reply]
Re: File::Find considered hard? by TomDLux (Vicar) on Mar 15, 2004 at 01:11 UTC
The reason people have problems with grep() and map() is because they timidly stick to processing data one element at a time, and do not think in terms of processing sets of data.<p. I've seen shell scripts that loop over all the files in a directory, and one by one select those which match some characteristics. It would have been much simpler to pass the list to a pipe and process the set all at once. Instead, the code invoked a sub-shell dozens or hundreds of times. I blame the emphasis on languages such as C, C++, Pascal, Java, all of which are one-element-at-a-time languages. -- `TTTATCGGTCGTTATATAGATGTTTGCA`	[reply]
Re: File::Find considered hard? by crabbdean (Pilgrim) on Mar 14, 2004 at 21:37 UTC
I think the reason people have problems with `map {code} array` is two fold. One: Usually a programmer will learn "foreach" commands first, and then stick to what they know. Two: its a matter of linguistics - programming in itself is based/modelled on the mental conventions of the natural human thought process. Linguistically we are more likely to say and think "for each item in this array do blah blah" rather than "map this code to each element of this array". Its just a human tendency. (my honest opinion). UPDATE: Just curious, which way is faster? After some experience now with the File::Find module ... hmm .. my comments ... Its frought with danger and not a "clean" piece of programming. It also took me a while to get use to the idea of how it was to be used. I avoid it now and prefer using my own code. Oh, and that it crashed one of our servers because of its memory leaking "features". Personally I think its needs to be rewritten, considering directory traversing is a such common task. Dean The Funkster of Mirth Programming these days takes more than a lone avenger with a compiler. - sam RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers	[reply] [d/l]
Re: Re: File::Find considered hard? by zakzebrowski (Curate) on Mar 19, 2004 at 12:25 UTC
Dean, With respect to your foreach comment, I agree competely. For each element do this just makes sense, and, the idiom is present in other languages as well. (IE. Java (shudder), VB (shudder), Basic (shudder), C, C++) Also, if you think in sql, a foreach statement is similar concept to doing a combinded sql statement. (Eg. `Select example from table where item in (select item from other_table where item_status == 'not_processed')`). Regarding speed comparisons, this has been discussed before. Cheers. ---- Zak - the office	[reply] [d/l]
Re: File::Find considered hard? by dragonchild (Archbishop) on Mar 15, 2004 at 14:11 UTC
I have never used File::Find or any of its cousins, so take my comments with a grain of salt. I don't do much filesystem work, so I rarely need to deal with files. When I have, a simple glob worked nicely. I have looked at File::Find before, but quit reading after the first few lines of the POD. It was making easy things difficult. IMHO, a module that deals with something as basic as finding files should have more than one interface be blindingly simple File::Find fails both those tests. Additionally, I would think that one of those interfaces should be an iterator. Whether or not you pass some set of rules in to the finder, iterators are just plain useful. As for subrefs, map, and the like - that's a separate question. Callbacks are a non-trivial concept, and should be treated as an upper level in programming (at least how programming is taught in most places). That File::Find requires this interface is just another reason why it's not well designed. ------ We are the carpenters and bricklayers of the Information Age. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply]
Re: Re: File::Find considered hard? by synistar (Pilgrim) on Mar 15, 2004 at 21:56 UTC
How about simply using merlyn's File::Finder It lets you do things like: `my @files = File::Finder->in("/tmp");` [download]	[reply] [d/l]
Re^3: File::Find considered hard? by adrianh (Chancellor) on Mar 16, 2004 at 08:03 UTC
or File::Find::Rule.	[reply]
Re: File::Find considered hard? by NetWallah (Canon) on Mar 16, 2004 at 22:51 UTC
This article on the new IO::All module addresses the File::Find interface issue. To quote: File::Find Ask any experienced Perl programmer which core module has the most abysmal interface, and they'd probably say File::Find. Rather than explain how File::Find works (which would take me an hour of research to figure out again), here's an easy way to roll your own search. `use IO::All; my @wanted_file_names = map { $_->name } grep { $_->name =~ /\.\w{3}/ && $_->slurp =~ /ingy/ } io('my/directory')->all_files;` [download] This search finds all the file names in a directory that have a three-character extension and contain the string 'ingy'. The all_files method is a shortcut that returns only the files. There are also all_dirs, all_links, and simply all methods. Offense, like beauty, is in the eye of the beholder, and a fantasy. By guaranteeing freedom of expression, the First Amendment also guarntees offense.	[reply] [d/l]
Re: File::Find considered hard? by MADuran (Beadle) on Mar 15, 2004 at 17:07 UTC
I believe the issue is really one of appearences. `map` and `File::Find` accept a code block, a function and/or a reference to a function. From my ignorant perspective they are anonymous functions and from the imparative way of looking of things this is not how you do it. These anonymous functions do not look like objects so it can not be right from an object oriented view. I realize this is actually some of the LISP concepts showing through and I think of it as a lambda function in LISP which makes it easier to work with. But most working programmers have little exposier to LISP and I think that some exposier would help. This is a crude explantaion but this is how I see it. MADuran Who needs a spiffy sig	[reply] [d/l] [select]
Re: Re: File::Find considered hard? by eserte (Deacon) on Mar 16, 2004 at 17:38 UTC
I think you should really start to learn how to use anonymous functions, function references, closures et al. Once you get familiar with them, you can't live without them.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.