Ask any experienced Perl programmer which core module has the most abysmal interface, and they'd probably say File::Find. (...)

I bet you all have heard/read similar sentences before. The problem is that I simply do not understand them. Why does the concept of passing a function as an argument to another function look so strange and hard to grok to all those people? Why do they consider an interface as simple as "find all files in THESE directories and do THIS with them" abysmal? How is

find( sub {print $_,"\n"}, '.');
harder to understand than
while ($_ = readdir DIR) { print $_,"\n"; }
. (I know these two do not mean the same thing.)

Why do so many people have such huge problems to understand map {code} @aray yet foreach (@array) { code } looks natural to them? :-(

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
   -- Rick Osborne

P.S.: Quite some time ago I asked on a VB forum how do I make a reference to a function in VB. They could not even understand why would anyone want to do such thing :-(

Replies are listed 'Best First'.
Re: File::Find considered hard?
by Corion (Patriarch) on Mar 14, 2004 at 19:01 UTC

    The problem is not that it is hard to understand, the problem is, that in 99% of all cases, I just want a list of files, and not some code invoked on it - that's why File::Find::Rule is so much nicer. How often have you written code like the following?

    my @files; File::Find::find( sub { push @files, $File::Find::name}, '.' );

    The fact that File::Find only gives the local name as a parameter and not the full path to the file and that it sacrifices portability for speed ($USE_NLINK) just adds to that...

      Well ... never. I wrote something like

      my @files; find( sub { push @files, $File::Find::name if <some condition> }, '.')
      a few times though. And usualy the resulting list was much smaller than a list that would contain all files&directories. Most of the time though I want to actually DO something with the files.

      I do agree the several package variables and $_ are a bit strange, it would be cleaner if the filename and path was passed to &wanted as parameters, but I don't have a problem with it anyway.

      I do not understand your comment about the USE_NLINK though. From perldoc File::Find:

      You can set the variable $File::Find::dont_use_nlink to 1, if you want to force File::Find to always stat directories. This was used for file systems that do not have an "nlink" count matching the number of sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS (OS/2 file system), FAT (DOS file system) and a couple of others.

      You shouldn't need to set this variable, since File::Find should now detect such file systems on-the-fly and switch itself to using stat. This works even for parts of your file system, like a mounted CD-ROM.

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature

        "A bit strange" is exactly the problem for a core module that solves a common problem - hence the comment about the abysmal interface.

        I do remember File::Find from the time where it always used the nlink entry for scanning for subdirectories, and where it failed in far too many cases. It's nice that they changed it now, but I think it's still an issue with Perl 5.6.1 - but I always set dont_use_nlink unless I forget nowadays. Still, for a module that should provide a nice and easy service, this is much too convoluted.

      How often have you written code like the following?
      Never.
      chomp (my @files = `find .`);
      is shorter, immediately clear (at least to me), and works on any platform I would care about.

      Abigail

        Good to know... now I can be rox0rzing all over your systems by creating a file named /home/etcshadow/foo\n/etc/shadow.

        w00t!

        </script-kiddie>

        ------------ :Wq Not an editor command: Wq
        works on any platform I would care about.

        Hrm. Do other version of Windows other than XP spit out the info you'd expect from 'find .'? Such a command on my WinXP box is invalid.

Re: File::Find considered hard?
by perrin (Chancellor) on Mar 14, 2004 at 19:00 UTC
    Maybe they are complaining about the fact that File::Find does most of its communication through global variables. That's pretty awful, and not in tune with modern methods for perl modules.
      Maybe they are complaining about the fact that File::Find does most of its communication through global variables.

      Does it really use global variables (err, variables in the main package; I assume it doesn't use the truly global punctuation variables, other than for their intended uses), or does it use package variables in its own package? The former would be very messy; the latter isn't nearly so bad. It wouldn't make sense for an OO module (like DBI), but for a module with a function interface it seems reasonable enough to me.

      I suspect the OP may be right, and that they may be complaining about passing anonymous functions around. Anybody with a solid familiarity with Perl (or any other language that supports the functional paradigm) will be reasonably comfortable with this, but a newbie coming in from another language (especially a procedural or OO language) may have trouble with it at first. This is not surprising; it's a different paradigm than the ones they're familiar with. They'll also have trouble at first with the list operators, and if you show them closures you'll want to have a camera handy to take a snapshot of the funny looks on their faces. This will pass with time, as they learn the different paradigms that Perl supports and why each is useful. (If they like OO, it may help to tell them that lexical closures are one way to achieve encapsulation. That may spark their interest enough to get them to learn something, instead of turning away in disgust.)


      ;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print
        Does it really use global variables (err, variables in the main package; I assume it doesn't use the truly global punctuation variables, other than for their intended uses), or does it use package variables in its own package?

        Call them package variables if you want to -- they still meet the definition of "global variable" in most languages, i.e. they are read/write accessible by any code from anywhere. I'm not going to go into a whole explanation of why using globals to pass information, regardless of the use of OO or not, is a really bad design, because many other people have written about it at length. It's not as if these globals are being used internally only -- they are part of the public API for File::Find.

Re: File::Find considered hard?
by Aristotle (Chancellor) on Mar 15, 2004 at 08:08 UTC

    The fact that callback interfaces are uncommon is not a factor in calling that of File::Find abysmal, in my opinion. I do think it's a very poor design, but for reasons other than that it's supposedly hard to understand. File::Find's approach to the problem domain is one of absolute minimalism, and it's hard to conceive an even more barren interface.

    As a result I find that accomplishing nontrivial tasks using File::Find results in messy and hard to read code, no matter how hard I try to fix this. Concerns are hard to separate in any concise fashion. But File::Find is even worse than the previous implies because not only does it make hard things difficult, it also makes trivial things unnecessarily involved. Simply getting a list of files or directories requires too much setup, f.ex.

    In short, there's never a situation where the interface is a natural fit. Parts of a program that deal with calling File::Find::find() always feel like a wart.

    File::Find::Rule shows how things can be done better: an interface tailored to common tasks in the problem domain improves expressiveness and readability and makes it easier to separate concerns.

    Makeshifts last the longest.

Re: File::Find considered hard?
by etcshadow (Priest) on Mar 14, 2004 at 21:26 UTC
    For my own part, I will say that:
    • Passing subrefs as a way of delegating authority is cool, and I don't think it's a bad thing, when necessary. The downsides to it are that it is hard for people to understand if they are not used to the concept, and it is not a familiar idiom used in simple things in perl. (consider that getting all files in a directory tree is the sort of thing that a perl user might want to do significantly sooner in his/her perl career than writing a socket server).
    • The second issue (with which I've become intimately familiar) is that its interface does not accomodate being hidden beneath any kind of iterative interface... which makes it crappy for code-reuse. Say you're working on something which needs to iterate through all files in a directory, recursively, but it, itself, get's called in a sort of $obj->doNextFile() manner. The fact that File::Find cannot be wrapped in any way to provide the underpinnings for such an interface just sucks eggs. The only way that you can do this is if, on the first call to doNextFile, you call File::Find to generate you a list of all files, and store that as state within your object. If you're working over a very large set of files, this is just stupidly wasteful of space, and causes a huge penalty in fire-up time (what if you might want to bail out early? well you still had to construct a list of every file before you could even start).

    Anyway, that's my $.02 on why the interface to File::Find is rotten. It's really the sort of module that you would prefer if newbs started using really early on, and currying function calls is the sort of thing that someone new to perl, and maybe just trying to hack together a few simple systems admin scripts, doesn't want to and should have to deal with. Also, it just sucks for code-reuse... I'm still on the knife edge of reimplementing File::Find's features for a project I'm working on, because I just don't want to have to start off by traversing the entire @#$%ing directory tree and storing it in a huge array.

    ------------ :Wq Not an editor command: Wq

      I guess you are right it would be best if File::Find supported both types of interfaces, functional and iterative.

      There is no currying going on in here though. The find() is a higher order function, but it is not curried. If it was it would allow you to pass it just the wanted() function and get a function "find_and_do_something()":

      my $delete_tmp = find(sub {unlink($_) if -f and /\.tmp$/i}); ... $delete_tmp->($one_directory); ... $delete_tmp->($other_directory); ..

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature

        The find() is a higher order function, but it is not curried.
        Right... I meant that the "wanted" function is (at least frequently) curried, or at least a closure.
        ------------ :Wq Not an editor command: Wq
Re: File::Find considered hard? (callbacks)
by tye (Sage) on Mar 14, 2004 at 23:11 UTC

    See Re: Are you looking at XML processing the right way? (merge) for why callbacks are fundamentally worse than several other interfaces.

    Another problem with File::Find's interface is that the order in which you get things isn't always the order you want or expect. You also don't get told when you go down or up a level.

    I find that it is often easier for me to just write a directory searcher than to figure out how to do what I want with File::Find (since I will make the decisions about exactly what order to do things and when I go up and down a level is clear).

    And remember that you always want to set $dont_use_nlink unless you are only interested in file names, not any file properties.

    - tye        

Re: File::Find considered hard?
by etcshadow (Priest) on Mar 15, 2004 at 01:53 UTC
    Well... here's another way to put it: compare File::Find in perl to find in shell. Now also think of this: "good tools make the easy things easy and the hard things possible."

    What is the "easy thing" to do with find (or File::Find)? It is to produce a list of every file/directory in a directory tree and/or iterate over that list.

    So, look at how find (as a shell command) interfaces to shell scripting to perform the simple end of things (bear in mind that it often isn't even necessary to use find for doing a great deal of simple things in shell commands, because most shell commands that have anything to gain by it implement there own directory recursion, via a "-r" or "-R" switch... but that's a whole other argument, all together):

    • find > contents.txt
    • do_stuff_to_files `find`
    • find | do_other_stuff_to_files
    Also, please, let's put aside the fact that you really should be doing those more like:
    • find -print0 | perl -0pl012e 's/\\/\\\\/g; s/\n/\\n/sg' > contents.txt
    • find -print0 | xargs -0 do_stuff_to_files
    • find -print0 | do_other_stuff_to_files_but_split_input_on_null
    As all of that extra garble is just a result of limitations of shell scripting. They're not part of the inherent concept of what's going on. It's just an artifact.

    Anyway, now think of how File::Find interfaces to other perl code, and compare this to how other perl code interfaces. I'm not gonna write out how it does work, but lets look at how it should work:

    • print File::Find::find();
    • do_stuff_to_file_list(File::Find::find());
    • foreach my $file (File::Find::find) { do_stuff $file }
    • my $finder = File::Find->new(); while (my $file = $finder->next()) { do_stuff $file }
    I think the reason why you see so many people complain about how File::Find works is that they have an expectation that it work, basically, like above. It's a rare (and wonderful) thing when building a module, to encounter a pre-existing interface to build to... even if the interface is only existing in the mind of every would-be user of the module.

    Where the creators of File::Find went wrong was when they decided to model the interface to File::Find off of the -exec command to find (rather than the -print or -print0 command). The thing is: -exec is only necessary in the find command because of these two things:

    • You can't do everything you'd want to do with find -print0 | xargs -0. For example, to rename all files to $file.bak, you'd have to write a shell "for" or "while" loop.
    • A lot of people don't like writing shell for and while loops. These are people who understand a lot of the fundamentals of shell, but don't consider themselves to be shell scripters.
    Of course, in perl, the first is still sort of true, but that's irrelevant because the second is completely wrong. In fact, the opposite is the case. Perl programmers would rather deal with a list or a loop than with a callback.

    In this sense, File::Find's interface problem is in many ways a microcosm of a common issue with technology: new technology comes along to replace old technology, but carries along artifacts of its predecessor that don't apply any more. The thing that is so tragic about how it happened with File::Find is that the artifact which should have been abandoned has actually been taken as the central feature. Instead of taking the case that should have been the focus (-print0), and realizing that the issues which gave rise to the need for the ugly artifact were not an issue in perl, the implementers focused on the artifact and dropped the central case.

    Would people have excepted unix find so well, if just plain old find didn't work... if you had to find -exec echo \{\} \;. Who would have used that? Only people who really, really needed to. And they would have cursed it all the way.

    ------------ :Wq Not an editor command: Wq

      Just an aside:

      You can't do everything you'd want to do with find -print0 | xargs -0. For example, to rename all files to $file.bak, you'd have to write a shell "for" or "while" loop.

      At least with GNU xargs, that's not true. There's an option -i which lets you specify a placeholder in the commandline passed to xargs (which can be specified but defaults to {}), so a xargs solution for the example above would be

      find $FOO -print0 | xargs -0i mv {} {}.bak

      Of course this loses the main advantage of xargs: you are back to spawning one mv process per file, so you might as well just use the portable -exec interface.

      Makeshifts last the longest.

        Perfectly valid, but I don't think it does anything to damage the point I was trying to make.

        I mean... even if xargs didn't have that option, you could do something like:

        find -print0 | xargs -0 -l sh -c 'mv "$0" "$0".bak'
        But that's just getting silly, and even further from the point. =D
        ------------ :Wq Not an editor command: Wq
Re: File::Find considered hard?
by zby (Vicar) on Mar 14, 2004 at 20:17 UTC
    It's just that foreach loops are so much more frequently encountered in perl code than passing subroutines.

    Beside that, the POD for File::Find introduces the wanted subroutine which creates one more redirection layer in the code and every redirection layer is allways a barriere. Everyone starts with the documentation and what they see is something much more complicated then they think it should be. I believe it would make a big difference if the maintainer changed this to the anonymous subroutine just like in your example.

      It's just that foreach loops are so much more frequently encountered in perl code than passing subroutines.

      Well, sure, but foreach loops are one of the most common things in Perl. They're probably more common also than while loops or filehandles, but that doesn't make while loops or filehandles hard to understand...

      Beside that, the POD for File::Find introduces the wanted subroutine

      Actually, it's worse than that. The third example in the synopsis passes an anonymous hash using the curly brace anonymous hash constructor, and within that hash one of the values is a reference to a subroutine using \&foo syntax. You need to have at least some grasp of Perl's references to be able to follow this. References are one of the topics a lot of Perl newbies don't get around to for quite a while, because there's quite a bit you can do without needing them. Usually their first need for Perl's references is to construct nested data structures.

      I should disclaim the following statement by noting that File::Find has not been up to this point a module that I've actually used (largely because I have not felt the need for it), but to me, from looking at the docs on CPAN, the interface doesn't look *bad*, though it does look like it requires an understanding of certain Perl concepts that people relatively new to the language might not fully understand yet.


      ;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print
Re: File::Find considered hard?
by TomDLux (Vicar) on Mar 15, 2004 at 01:11 UTC

    The reason people have problems with grep() and map() is because they timidly stick to processing data one element at a time, and do not think in terms of processing sets of data.<p.

    I've seen shell scripts that loop over all the files in a directory, and one by one select those which match some characteristics. It would have been much simpler to pass the list to a pipe and process the set all at once. Instead, the code invoked a sub-shell dozens or hundreds of times.

    I blame the emphasis on languages such as C, C++, Pascal, Java, all of which are one-element-at-a-time languages.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: File::Find considered hard?
by crabbdean (Pilgrim) on Mar 14, 2004 at 21:37 UTC
    I think the reason people have problems with map {code} array is two fold.

    One: Usually a programmer will learn "foreach" commands first, and then stick to what they know.
    Two: its a matter of linguistics - programming in itself is based/modelled on the mental conventions of the natural human thought process. Linguistically we are more likely to say and think "for each item in this array do blah blah" rather than "map this code to each element of this array". Its just a human tendency. (my honest opinion).

    UPDATE: Just curious, which way is faster?

    After some experience now with the File::Find module ... hmm .. my comments ... Its frought with danger and not a "clean" piece of programming. It also took me a while to get use to the idea of how it was to be used. I avoid it now and prefer using my own code. Oh, and that it crashed one of our servers because of its memory leaking "features". Personally I think its needs to be rewritten, considering directory traversing is a such common task.

    Dean
    The Funkster of Mirth
    Programming these days takes more than a lone avenger with a compiler. - sam
    RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers
      Dean,

      With respect to your foreach comment, I agree competely. For each element do this just makes sense, and, the idiom is present in other languages as well. (IE. Java (shudder), VB (shudder), Basic (shudder), C, C++) Also, if you think in sql, a foreach statement is similar concept to doing a combinded sql statement. (Eg. Select example from table where item in (select item from other_table where item_status == 'not_processed')).

      Regarding speed comparisons, this has been discussed before.

      Cheers.


      ----
      Zak - the office
Re: File::Find considered hard?
by dragonchild (Archbishop) on Mar 15, 2004 at 14:11 UTC
    I have never used File::Find or any of its cousins, so take my comments with a grain of salt. I don't do much filesystem work, so I rarely need to deal with files. When I have, a simple glob worked nicely.

    I have looked at File::Find before, but quit reading after the first few lines of the POD. It was making easy things difficult. IMHO, a module that deals with something as basic as finding files should

    • have more than one interface
    • be blindingly simple
    File::Find fails both those tests.

    Additionally, I would think that one of those interfaces should be an iterator. Whether or not you pass some set of rules in to the finder, iterators are just plain useful.

    As for subrefs, map, and the like - that's a separate question. Callbacks are a non-trivial concept, and should be treated as an upper level in programming (at least how programming is taught in most places). That File::Find requires this interface is just another reason why it's not well designed.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: File::Find considered hard?
by NetWallah (Canon) on Mar 16, 2004 at 22:51 UTC
    This article on the new IO::All module addresses the File::Find interface issue. To quote:

    File::Find Ask any experienced Perl programmer which core module has the most abysmal interface, and they'd probably say File::Find. Rather than explain how File::Find works (which would take me an hour of research to figure out again), here's an easy way to roll your own search.
    use IO::All; my @wanted_file_names = map { $_->name } grep { $_->name =~ /\.\w{3}/ && $_->slurp =~ /ingy/ } io('my/directory')->all_files;
    This search finds all the file names in a directory that have a three-character extension and contain the string 'ingy'. The all_files method is a shortcut that returns only the files. There are also all_dirs, all_links, and simply all methods.

    Offense, like beauty, is in the eye of the beholder, and a fantasy.
    By guaranteeing freedom of expression, the First Amendment also guarntees offense.

Re: File::Find considered hard?
by MADuran (Beadle) on Mar 15, 2004 at 17:07 UTC
    I believe the issue is really one of appearences. map and File::Find accept a code block, a function and/or a reference to a function. From my ignorant perspective they are anonymous functions and from the imparative way of looking of things this is not how you do it. These anonymous functions do not look like objects so it can not be right from an object oriented view.

    I realize this is actually some of the LISP concepts showing through and I think of it as a lambda function in LISP which makes it easier to work with. But most working programmers have little exposier to LISP and I think that some exposier would help. This is a crude explantaion but this is how I see it.

    MADuran
    Who needs a spiffy sig
      I think you should really start to learn how to use anonymous functions, function references, closures et al. Once you get familiar with them, you can't live without them.
A reply falls below the community's threshold of quality. You may see it by logging in.