Re: File::Find considered hard?
by Corion (Patriarch) on Mar 14, 2004 at 19:01 UTC
|
The problem is not that it is hard to understand, the problem is, that in 99% of all cases, I just want a list of files, and not some code invoked on it - that's why File::Find::Rule is so much nicer. How often have you written code like the following?
my @files;
File::Find::find( sub { push @files, $File::Find::name}, '.' );
The fact that File::Find only gives the local name as a parameter and not the full path to the file and that it sacrifices portability for speed ($USE_NLINK) just adds to that... | [reply] [d/l] |
|
|
my @files;
find( sub {
push @files, $File::Find::name
if <some condition>
}, '.')
a few times though. And usualy the resulting list was much smaller than a list that would contain all files&directories. Most of the time though I want to actually DO something with the files.
I do agree the several package variables and $_ are a bit strange, it would be cleaner if the filename and path was passed to &wanted as parameters, but I don't have a problem with it anyway.
I do not understand your comment about the USE_NLINK though. From perldoc File::Find:
You can set the variable $File::Find::dont_use_nlink to 1, if you want
to force File::Find to always stat directories. This was used for file
systems that do not have an "nlink" count matching the number of
sub-directories. Examples are ISO-9660 (CD-ROM), AFS, HPFS (OS/2 file
system), FAT (DOS file system) and a couple of others.
You shouldn't need to set this variable, since File::Find should now
detect such file systems on-the-fly and switch itself to using stat.
This works even for parts of your file system, like a mounted CD-ROM.
Jenda
Always code as if the guy who ends up maintaining your code
will be a violent psychopath who knows where you live.
-- Rick Osborne
Edit by castaway: Closed small tag in signature | [reply] [d/l] |
|
|
"A bit strange" is exactly the problem for a core module that solves a common problem - hence the comment about the abysmal interface.
I do remember File::Find from the time where it always used the nlink entry for scanning for subdirectories, and where it failed in far too many cases. It's nice that they changed it now, but I think it's still an issue with Perl 5.6.1 - but I always set dont_use_nlink unless I forget nowadays. Still, for a module that should provide a nice and easy service, this is much too convoluted.
| [reply] |
|
|
How often have you written code like the following?
Never.
chomp (my @files = `find .`);
is shorter, immediately clear (at least to me), and works
on any platform I would care about.
Abigail | [reply] [d/l] |
|
|
Good to know... now I can be rox0rzing all over your systems by creating a file named /home/etcshadow/foo\n/etc/shadow.
w00t!
</script-kiddie>
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] |
|
|
|
|
|
|
|
| [reply] |
|
|
Re: File::Find considered hard?
by perrin (Chancellor) on Mar 14, 2004 at 19:00 UTC
|
Maybe they are complaining about the fact that File::Find does most of its communication through global variables. That's pretty awful, and not in tune with modern methods for perl modules. | [reply] |
|
|
Maybe they are complaining about the fact that File::Find does most of its communication through global variables.
Does it really use global variables (err, variables
in the main package; I assume it doesn't use the
truly global punctuation variables, other than for
their intended uses), or does it use package variables
in its own package? The former would be very messy;
the latter isn't nearly so bad. It wouldn't make
sense for an OO module (like DBI), but for a module
with a function interface it seems reasonable
enough to me.
I suspect the OP may be right, and that they may be
complaining about passing anonymous functions around.
Anybody with a solid familiarity with Perl (or any
other language that supports the functional paradigm)
will be reasonably comfortable with this, but a newbie
coming in from another language (especially a
procedural or OO language) may have trouble with it
at first. This is not surprising; it's a different
paradigm than the ones they're familiar with.
They'll also have trouble at first with the list
operators, and if you show them closures you'll want
to have a camera handy to take a snapshot of the
funny looks on their faces. This will pass with
time, as they learn the different paradigms that
Perl supports and why each is useful. (If they
like OO, it may help to tell them that lexical
closures are one way to achieve encapsulation.
That may spark their interest enough to get them
to learn something, instead of turning away in
disgust.)
;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print
| [reply] |
|
|
Does it really use global variables (err, variables in the main package; I assume it doesn't use the truly global punctuation variables, other than for their intended uses), or does it use package variables in its own package?
Call them package variables if you want to -- they still meet the definition of "global variable" in most languages, i.e. they are read/write accessible by any code from anywhere. I'm not going to go into a whole explanation of why using globals to pass information, regardless of the use of OO or not, is a really bad design, because many other people have written about it at length. It's not as if these globals are being used internally only -- they are part of the public API for File::Find.
| [reply] |
|
|
|
|
Re: File::Find considered hard?
by Aristotle (Chancellor) on Mar 15, 2004 at 08:08 UTC
|
The fact that callback interfaces are uncommon is not a factor in calling that of File::Find abysmal, in my opinion. I do think it's a very poor design, but for reasons other than that it's supposedly hard to understand. File::Find's approach to the problem domain is one of absolute minimalism, and it's hard to conceive an even more barren interface.
As a result I find that accomplishing nontrivial tasks using File::Find results in messy and hard to read code, no matter how hard I try to fix this. Concerns are hard to separate in any concise fashion. But File::Find is even worse than the previous implies because not only does it make hard things difficult, it also makes trivial things unnecessarily involved. Simply getting a list of files or directories requires too much setup, f.ex.
In short, there's never a situation where the interface is a natural fit. Parts of a program that deal with calling File::Find::find() always feel like a wart.
File::Find::Rule shows how things can be done better: an interface tailored to common tasks in the problem domain improves expressiveness and readability and makes it easier to separate concerns.
Makeshifts last the longest.
| [reply] |
Re: File::Find considered hard?
by etcshadow (Priest) on Mar 14, 2004 at 21:26 UTC
|
For my own part, I will say that:
- Passing subrefs as a way of delegating authority is cool, and I don't think it's a bad thing, when necessary. The downsides to it are that it is hard for people to understand if they are not used to the concept, and it is not a familiar idiom used in simple things in perl. (consider that getting all files in a directory tree is the sort of thing that a perl user might want to do significantly sooner in his/her perl career than writing a socket server).
- The second issue (with which I've become intimately familiar) is that its interface does not accomodate being hidden beneath any kind of iterative interface... which makes it crappy for code-reuse. Say you're working on something which needs to iterate through all files in a directory, recursively, but it, itself, get's called in a sort of $obj->doNextFile() manner. The fact that File::Find cannot be wrapped in any way to provide the underpinnings for such an interface just sucks eggs. The only way that you can do this is if, on the first call to doNextFile, you call File::Find to generate you a list of all files, and store that as state within your object. If you're working over a very large set of files, this is just stupidly wasteful of space, and causes a huge penalty in fire-up time (what if you might want to bail out early? well you still had to construct a list of every file before you could even start).
Anyway, that's my $.02 on why the interface to File::Find is rotten. It's really the sort of module that you would prefer if newbs started using really early on, and currying function calls is the sort of thing that someone new to perl, and maybe just trying to hack together a few simple systems admin scripts, doesn't want to and should have to deal with. Also, it just sucks for code-reuse... I'm still on the knife edge of reimplementing File::Find's features for a project I'm working on, because I just don't want to have to start off by traversing the entire @#$%ing directory tree and storing it in a huge array.
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] [select] |
|
|
I guess you are right it would be best if File::Find supported both types of interfaces, functional and iterative.
There is no currying going on in here though. The find() is a higher order function, but it is not curried. If it was it would allow you to pass it just the wanted() function and get a function "find_and_do_something()":
my $delete_tmp = find(sub {unlink($_) if -f and /\.tmp$/i});
...
$delete_tmp->($one_directory);
...
$delete_tmp->($other_directory);
..
Jenda
Always code as if the guy who ends up maintaining your code
will be a violent psychopath who knows where you live.
-- Rick Osborne
Edit by castaway: Closed small tag in signature | [reply] [d/l] |
|
|
The find() is a higher order function, but it is not curried.
Right... I meant that the "wanted" function is (at least frequently) curried, or at least a closure.
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] |
Re: File::Find considered hard? (callbacks)
by tye (Sage) on Mar 14, 2004 at 23:11 UTC
|
See Re: Are you looking at XML processing the right way? (merge) for why callbacks are fundamentally worse than several other interfaces.
Another problem with File::Find's interface is that the order in which you get things isn't always the order you want or expect. You also don't get told when you go down or up a level.
I find that it is often easier for me to just write a directory searcher than to figure out how to do what I want with File::Find (since I will make the decisions about exactly what order to do things and when I go up and down a level is clear).
And remember that you always want to set $dont_use_nlink unless you are only interested in file names, not any file properties.
| [reply] |
Re: File::Find considered hard?
by etcshadow (Priest) on Mar 15, 2004 at 01:53 UTC
|
Well... here's another way to put it: compare File::Find in perl to find in shell. Now also think of this: "good tools make the easy things easy and the hard things possible."
What is the "easy thing" to do with find (or File::Find)? It is to produce a list of every file/directory in a directory tree and/or iterate over that list.
So, look at how find (as a shell command) interfaces to shell scripting to perform the simple end of things (bear in mind that it often isn't even necessary to use find for doing a great deal of simple things in shell commands, because most shell commands that have anything to gain by it implement there own directory recursion, via a "-r" or "-R" switch... but that's a whole other argument, all together):
- find > contents.txt
- do_stuff_to_files `find`
- find | do_other_stuff_to_files
Also, please, let's put aside the fact that you really should be doing those more like:
- find -print0 | perl -0pl012e 's/\\/\\\\/g; s/\n/\\n/sg' > contents.txt
- find -print0 | xargs -0 do_stuff_to_files
- find -print0 | do_other_stuff_to_files_but_split_input_on_null
As all of that extra garble is just a result of limitations of shell scripting. They're not part of the inherent concept of what's going on. It's just an artifact.
Anyway, now think of how File::Find interfaces to other perl code, and compare this to how other perl code interfaces. I'm not gonna write out how it does work, but lets look at how it should work:
- print File::Find::find();
- do_stuff_to_file_list(File::Find::find());
- foreach my $file (File::Find::find) { do_stuff $file }
- my $finder = File::Find->new(); while (my $file = $finder->next()) { do_stuff $file }
I think the reason why you see so many people complain about how File::Find works is that they have an expectation that it work, basically, like above. It's a rare (and wonderful) thing when building a module, to encounter a pre-existing interface to build to... even if the interface is only existing in the mind of every would-be user of the module.
Where the creators of File::Find went wrong was when they decided to model the interface to File::Find off of the -exec command to find (rather than the -print or -print0 command). The thing is: -exec is only necessary in the find command because of these two things:
- You can't do everything you'd want to do with find -print0 | xargs -0. For example, to rename all files to $file.bak, you'd have to write a shell "for" or "while" loop.
- A lot of people don't like writing shell for and while loops. These are people who understand a lot of the fundamentals of shell, but don't consider themselves to be shell scripters.
Of course, in perl, the first is still sort of true, but that's irrelevant because the second is completely wrong. In fact, the opposite is the case. Perl programmers would rather deal with a list or a loop than with a callback.
In this sense, File::Find's interface problem is in many ways a microcosm of a common issue with technology: new technology comes along to replace old technology, but carries along artifacts of its predecessor that don't apply any more. The thing that is so tragic about how it happened with File::Find is that the artifact which should have been abandoned has actually been taken as the central feature. Instead of taking the case that should have been the focus (-print0), and realizing that the issues which gave rise to the need for the ugly artifact were not an issue in perl, the implementers focused on the artifact and dropped the central case.
Would people have excepted unix find so well, if just plain old find didn't work... if you had to find -exec echo \{\} \;. Who would have used that? Only people who really, really needed to. And they would have cursed it all the way.
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] [select] |
|
|
Just an aside:
You can't do everything you'd want to do with find -print0 | xargs -0. For example, to rename all files to $file.bak, you'd have to write a shell "for" or "while" loop.
At least with GNU xargs, that's not true. There's an option -i which lets you specify a placeholder in the commandline passed to xargs (which can be specified but defaults to {}), so a xargs solution for the example above would be
find $FOO -print0 | xargs -0i mv {} {}.bak
Of course this loses the main advantage of xargs: you are back to spawning one mv process per file, so you might as well just use the portable -exec interface.
Makeshifts last the longest.
| [reply] [d/l] |
|
|
find -print0 | xargs -0 -l sh -c 'mv "$0" "$0".bak'
But that's just getting silly, and even further from the point. =D
------------
:Wq
Not an editor command: Wq
| [reply] [d/l] [select] |
|
|
Re: File::Find considered hard?
by zby (Vicar) on Mar 14, 2004 at 20:17 UTC
|
It's just that foreach loops are so much more frequently encountered in perl code than passing subroutines.
Beside that, the POD for File::Find introduces the wanted subroutine which creates one more redirection layer in the code and every redirection layer is allways a barriere. Everyone starts with the documentation and what they see is something much more complicated then they think it should be. I believe it would make a big difference if the maintainer changed this to the anonymous subroutine just like in your example. | [reply] [d/l] [select] |
|
|
It's just that foreach loops are so much more frequently encountered in perl code than passing subroutines.
Well, sure, but foreach loops are one of the most
common things in Perl. They're probably more common
also than while loops or filehandles, but that doesn't
make while loops or filehandles hard to understand...
Beside that, the POD for File::Find introduces
the wanted subroutine
Actually, it's worse than that. The third example in
the synopsis passes an anonymous hash using the curly
brace anonymous hash constructor, and within that
hash one of the values is a reference to a subroutine
using \&foo syntax. You need to have at least some
grasp of Perl's references to be able to follow this.
References are one of the topics a lot of Perl newbies
don't get around to for quite a while, because there's
quite a bit you can do without needing them. Usually
their first need for Perl's references is to construct
nested data structures.
I should disclaim the following statement by noting
that File::Find has not been up to this point
a module that I've actually used (largely because I
have not felt the need for it), but to me, from looking
at the docs on CPAN, the interface doesn't look *bad*,
though it does look like it requires an
understanding of certain Perl concepts that people
relatively new to the language might not fully
understand yet.
;$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,".rekcah lreP rehtona tsuJ";$\=$;[-1]->();print
| [reply] |
Re: File::Find considered hard?
by TomDLux (Vicar) on Mar 15, 2004 at 01:11 UTC
|
The reason people have problems with grep() and map() is because they timidly stick to processing data one element at a time, and do not think in terms of processing sets of data.<p.
I've seen shell scripts that loop over all the files in a directory, and one by one select those which match some characteristics. It would have been much simpler to pass the list to a pipe and process the set all at once. Instead, the code invoked a sub-shell dozens or hundreds of times.
I blame the emphasis on languages such as C, C++, Pascal, Java, all of which are one-element-at-a-time languages.
--
TTTATCGGTCGTTATATAGATGTTTGCA
| [reply] |
Re: File::Find considered hard?
by crabbdean (Pilgrim) on Mar 14, 2004 at 21:37 UTC
|
I think the reason people have problems with map {code} array is two fold.
One: Usually a programmer will learn "foreach" commands first, and then stick to what they know.
Two: its a matter of linguistics - programming in itself is based/modelled on the mental conventions of the natural human thought process. Linguistically we are more likely to say and think "for each item in this array do blah blah" rather than "map this code to each element of this array". Its just a human tendency. (my honest opinion).
UPDATE: Just curious, which way is faster?
After some experience now with the File::Find module ... hmm .. my comments ... Its frought with danger and not a "clean" piece of programming. It also took me a while to get use to the idea of how it was to be used. I avoid it now and prefer using my own code. Oh, and that it crashed one of our servers because of its memory leaking "features". Personally I think its needs to be rewritten, considering directory traversing is a such common task.
| [reply] [d/l] |
|
|
Dean,
With respect to your foreach comment, I agree competely. For each element do this just makes sense, and, the idiom is present in other languages as well. (IE. Java (shudder), VB (shudder), Basic (shudder), C, C++) Also, if you think in sql, a foreach statement is similar concept to doing a combinded sql statement. (Eg. Select example from table where item in (select item from other_table where item_status == 'not_processed')).
Regarding speed comparisons, this has been discussed before.
Cheers.
| [reply] [d/l] |
Re: File::Find considered hard?
by dragonchild (Archbishop) on Mar 15, 2004 at 14:11 UTC
|
I have never used File::Find or any of its cousins, so take my comments with a grain of salt. I don't do much filesystem work, so I rarely need to deal with files. When I have, a simple glob worked nicely.
I have looked at File::Find before, but quit reading after the first few lines of the POD. It was making easy things difficult. IMHO, a module that deals with something as basic as finding files should
- have more than one interface
- be blindingly simple
File::Find fails both those tests.
Additionally, I would think that one of those interfaces should be an iterator. Whether or not you pass some set of rules in to the finder, iterators are just plain useful.
As for subrefs, map, and the like - that's a separate question. Callbacks are a non-trivial concept, and should be treated as an upper level in programming (at least how programming is taught in most places). That File::Find requires this interface is just another reason why it's not well designed.
------
We are the carpenters and bricklayers of the Information Age.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
| [reply] |
|
|
my @files = File::Finder->in("/tmp");
| [reply] [d/l] |
|
|
| [reply] |
Re: File::Find considered hard?
by NetWallah (Canon) on Mar 16, 2004 at 22:51 UTC
|
| [reply] [d/l] |
Re: File::Find considered hard?
by MADuran (Beadle) on Mar 15, 2004 at 17:07 UTC
|
I believe the issue is really one of appearences. map and File::Find accept a code block, a function and/or a reference to a function. From my ignorant perspective they are anonymous functions and from the imparative way of looking of things this is not how you do it. These anonymous functions do not look like objects so it can not be right from an object oriented view.
I realize this is actually some of the LISP concepts showing through and I think of it as a lambda function in LISP which makes it easier to work with. But most working programmers have little exposier to LISP and I think that some exposier would help. This is a crude explantaion but this is how I see it.
MADuran Who needs a spiffy sig
| [reply] [d/l] [select] |
|
|
I think you should really start to learn how to use anonymous functions, function references, closures et al. Once you get familiar with them, you can't live without them.
| [reply] |
| A reply falls below the community's threshold of quality. You may see it by logging in. |