jffry has asked for the wisdom of the Perl Monks concerning the following question:

Task: to create a list of every symbolic link pointing to a certain file (in this case, "tomcat").

At first, I used the Unix shell ls command:

#!/usr/bin/perl -w use strict; use warnings; my @lines = grep {/->\s+tomcat\s+/} qx{ls -l /etc/rc.d/init.d}; my @inits = map {(split)[-3]} @lines; print join("\n", @inits);

But I wanted to keep it all Perl, so I used readdir and readlink:

#!/usr/bin/perl -w use strict; use warnings; my $dir = '/etc/rc.d/init.d'; opendir(my $dirh, $dir); my @inits = grep {-l "$dir/$_" && (readlink("$dir/$_") =~ /^tomcat$/)} + readdir $dirh; closedir $dirh; print join("\n", @inits);

I like how I did not have to use 2 arrays in my "readdir" option, but I think, if I tried, I could eventually crunch down the "ls" option to only using 1 array. I'm not certain I could preserve readability if I did that, tho.

I'm somewhat torn between the ease of using the "ls" option compared to the "readdir" option. Maybe the "ls" option seemed easier because I'm not as comfortable using readdir as I am shell commands, and this will go away with more experience?

Aside from being all Perl, is there any other reason to use the "readdir" option over the "ls" option?

EDIT: Just realized that there is a slim chance in the "ls" option of getting a bad element on the list. Because I'm only parsing ls output, a funny file name could mess up that parsing. Whereas with the "readdir" option, I'm certain of what I'm getting. That's actually a very good reason to stick with the all Perl "readdir" option.

Replies are listed 'Best First'.
Re: Unix shell ls vs readdir
by graff (Chancellor) on May 10, 2010 at 23:02 UTC
    Aside from being all Perl, is there any other reason to use the "readdir" option over the "ls" option?

    EDIT: Just realized that there is a slim chance in the "ls" option of getting a bad element on the list. Because I'm only parsing ls output, a funny file name could mess up that parsing. Whereas with the "readdir" option, I'm certain of what I'm getting. That's actually a very good reason to stick with the all Perl "readdir" option.

    Right. Apart from the fact that "ls" might behave with minor but annoying differences on different systems, it's generally trickier / less reliable to parse its text output than to pull file names via direct-access to directory entries using readdir (and link target names via direct-access to symlinks using readlink).

    I have seen file names on unix systems with non-ascii characters and ascii control characters (including line-feed, carriage-return, etc), all of which can be very disorienting when viewed via "ls".

      so what do you think ? in terms of reading the 2Lac of files which one is better in performance ? ls or readdir ?

        I don't know what "the 2Lac of files" is supposed to mean, and "performance" is either too dependent on unknown factors, or else simply irrelevant. Enough practical reasons have been cited to favor the readdir/readlink approach (ease and reliability of file name handling, vs. somewhat more difficult and trouble-prone string parsing), and in some circumstances, running a subshell to run "ls" could be slower than using readdir/readlink.

        If a timing difference between the two methods really matters (which is rarely true), then doing a benchmark "in context" (i.e. under the same conditions as production use) would be prudent.

Re: Unix shell ls vs readdir
by toolic (Bishop) on May 10, 2010 at 19:01 UTC
    Your ls solution will produce different results from your readdir solution if you have "hidden" links, such as:
    .foo -> tomcat
    If that is a concern for you, use ls -la
Re: Unix shell ls vs readdir
by happy.barney (Friar) on May 10, 2010 at 18:35 UTC
    take a look at module File::Find, quite different approach.

      Do you mean use it like this?

      #!/usr/bin/perl -w use strict; use warnings; use File::Find; my @inits; sub wanted { if (-l $_ && (readlink("$_") =~ /tomcat/)) { push @inits, $_ ; } } find(\&wanted, '/etc/rc.d/init.d'); print join("\n", @inits);

      I'm not really seeing what I'm gaining (aside from exposure to a very useful module). It seems like overkill, and I can't determine how to prevent it from recursively going into any subdirectories. The $options{'bydepth'} doesn't seem to do that from what I can understand of the docs.

        If you're looking for files within a single known directory, File::Find (or the recurse method of Path::Class::Dir) will be of little value to you. Their purpose is to call a subroutine for every file under a certain point. Any filtering must be done inside your subroutine.

        Options controlling depth-first or breadth-first processing of the directory tree will only effect order. No filtering would be implied.

      ... or the  recurse method of Class::Path::Dir...

        Err, make that Path::Class::Dir.

        Dyslexic moment. Sorry.

Re: Unix shell ls vs readdir
by jwkrahn (Abbot) on May 10, 2010 at 18:54 UTC
    my @lines = grep {/->\s+tomcat\s+/} qx{ls -l /etc/rc.d/init.d}; my @inits = map {(split)[-3]} @lines;

    If /etc/rc.d/init.d contains any subdirectories then ls will also display the files from them.    Is that what you want?

    If any of the file names contains spaces or tabs or newlines (or other whitespace characters) then (split)[-3] will not return the correct file name.

      Actually, ls -l /a_dir will not list the contents of a_dir's subdirectories on any Unix flavor that I've used. Maybe you are thinking of ls -l /a_dir/* which will do exactly what you described because the shell will expand the glob "/a_dir/*" and then hand that list of arguments to ls, and, of course, when ls is handed a directory name as an argument it lists the contents of that dir.

      But yes, a total forehead slap on the situation with spaces in file names messing up my array ordering. Yet another solid reason to keep it all Perl.

Re: Unix shell ls vs readdir
by stefbv (Priest) on May 10, 2010 at 20:37 UTC

    On some systems the output from "ls -l" may contain ANSI escape sequences, so it might be safer to use "\ls -l" instead.

      it might be safer to use "\ls -l" instead

      Under almost all circumstances, the backslash would not be needed.  On the interactive command line, the backslash prevents alias expansion (such as "ls" —> "ls --color=auto", which then produces the ANSI escape sequences), because alias lookup happens before backslash escapes are processed, and there is no alias for "\ls".

      However,

      1. alias expansion is only done for interactive shells, and not for sh -c ... (i.e. qx{...} ) — unless explicitly requested otherwise,
      2. alias expansion would be done by the shell, but unless there are any shell metacharacters in the command, no shell is involved anyway, as Perl will run ls directly.

        True. I tend to forget that the shell is not involved.

Re: Unix shell ls vs readdir
by JavaFan (Canon) on May 10, 2010 at 18:13 UTC
    Uhm, if you're just interested in the file names, why use "ls -l", then throw everything away the "-l" adds? Why not just a plain "ls"?
      Why not just a plain "ls"?

      Plain ls doesn't show the link target the OP is grepping for...

      It looks like he is using the '->' from the ls -l output to figure out which files are symbolic links pointing to tomcat.

      Elda Taluta; Sarks Sark; Ark Arks