njcodewarrior has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks:

I have a sub-routine that parses a list of files and returns a data structure. I'd like to write a unix-style filter that takes either a list of files from STDIN or a filename(s) on the command line and feeds the list into the sub. The angle operator (<>) looks like the way to go, but I've got a question regarding it's use.

I'm using the following to do this:

#!/usr/bin/perl use strict; use warnings; while ( <> ) { ... }

Doing the following:

ls *.txt | my_script.pl

sticks each file from the ls into $_, allowing me to create a list of files for the sub, but the following:

my_script.pl text.txt

opens the file(s) and reads it line by line, placing each line into $_, which is not what I want. I like to be able to feed @ARGV right into my sub.

So I wrote the following to create the list of files if the user specifies them on the command line:

#!/usr/bin/perl use strict; use warnings; my @files; if ( @ARGV ) { @files = map { -f $_ ? $_ : () } @ARGV; } else { while ( <> ) { chomp; push @files, $_; } }

Is this a clean way to do this (I know, I know: TMTOWTDI) or is there a better way?

njcodewarrior

Replies are listed 'Best First'.
Re: Writing unix-style filters
by jdporter (Paladin) on Apr 11, 2007 at 03:01 UTC

    I think your solution is correct, but your actual implementation could be more succinct.

    This:

    @files = map { -f $_ ? $_ : () } @ARGV;
    could (and arguably should) be written as
    @files = grep { -f $_ } @ARGV;

    This:

    while ( <> ) { chomp; push @files, $_; }
    could be written as
    chomp( @files = <> );
    though you might have a concern if the file list is huge, relative to your available memory. :-)

    Ultimately, the whole chunk of code could be written as

    chomp( @files = @ARGV ? grep { -f $_ } @ARGV : <> );

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
       chomp( @files = @ARGV ? grep { -f $_ } @ARGV : <> );
      This chomps after you do the file test, with the newline not yet removed. So, it will just produce an empty array, when reading the file list from the input, because -f $_ will always return false.

        No; actually, it will not, because the -f test operator here is not being applied to the strings which are returned from the <>. Perhaps it should be; in which case you'd have a very good point.

        chomp( @files = @ARGV ? @ARGV : <> ); @files = grep { -f $_ } @files;
        A word spoken in Mind will reach its own level, in the objective world, by its own weight
Re: Writing unix-style filters
by graff (Chancellor) on Apr 11, 2007 at 04:26 UTC
    I think jdporter covered things pretty well. One more trick you might want to know: the "-t" function (one of the things described in perlfunc, particularly in the section that appears when you run "perldoc -f -X").

    The "-t", used without any parameter, will return true if your current STDIN is coming from a terminal (as opposed to coming from a pipeline or input redirection). Here's how you might use it:

    #!/usr/bin/perl use strict; use warnings; my @files; if ( @ARGV ) { @files = grep { -f } @ARGV; } elsif ( -t ) { # true if STDIN is a tty (not a pipe) die "Usage: $0 file [file2 ...]\n or: ls | $0\n"; } else { # get here when STDIN is connected to a pipe @files = <>; chomp @files; } # ... do stuff with @files
    Try running that without args and without any input pipe or redirection, and you'll see the "Usage:" message.

    Update: <pedantic> I felt compelled to point out that the particular design you want, IMO, is not really a "unix-style filter". For the great majority unix-style filters, the following types of command lines are (more or less) equivalent:

    cat file1 | tool tool file1 tool < file1 cat file1 file2 ... | tool tool file1 file2 ... # (no equivalent using "<" in this case)
    (The first example, of course, demonstrates "useless use of cat", but I included it for pedagogical completeness). </pedantic>
      ++ for fitting pedantry; -- for UUOC (only if I could).
Re: Writing unix-style filters
by ikegami (Patriarch) on Apr 11, 2007 at 05:28 UTC

    If you don't wanna read from the files on the command line, why are you using <>?

    #!/usr/bin/perl # usage: # myscript *.txt # myscript `ls *.txt` # ls *.txt | myscript use strict; use warnings; my @files = @ARGV; if (!-t STDIN) { while (<STDIN>) { chomp; push @files, $_; } } ...

    But do you really need this whole STDIN thing?

    #!/usr/bin/perl # usage: # myscript *.txt # myscript `ls *.txt` use strict; use warnings; my @files = @ARGV; ...

    I must agree on the previously stated point that this is not how unix filters work.

      If you don't wanna read from the files on the command line, why are you using <>?

      Because I'd like to be able to pipe a list of files (from ls, echo, etc) into the script as well as list them on the command line:

      ls *.txt | my_script.pl

      or

      my_script.pl file1.txt file2.txt

      or

      my_script.pl *.txt

      The only way I can figure to do all of the above is using <>.

      I'm learning the ins and outs of the *nix OS at the same time I'm learning perl...am I missing something here?

      njcodewarrior

        Because I'd like to be able to pipe a list of files (from ls, echo, etc) into the script as well as list them on the command line:

        You could easily substitute <STDIN> for <> in your original post since you're using <> exclusively to read from <STDIN>.

Re: Writing unix-style filters
by cdarke (Prior) on Apr 11, 2007 at 08:23 UTC
    Joining the ranks of the pedantic, why use the ls program to list the filenames? The shell does the filename expansion (globbing), not ls:
    echo *.txt|my_script.pl
    is a better shell way. But in Perl, why not call glob yourself?
      Shell expansion means having all the filenames on one big line, which invalidates most of the methods suggested above and can be a real PITA if filenames contain spaces (which is likely, as opposed to newlines). Moreover, there can be a limit on the size of the command line for a command (echo in this case) and ls is able to list contents of directories as well as expanded wildcards.

      I think that your suggestion is more effective when the shell expansion is used directly on the command line of my_script.pl, like this:

      my_script.pl *.txt

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.

        O' darn it! Use "find ... -print0 | xargs -0 ..." construct already!

        In particular (slightly formatted) ...

        find -- walk a file hierarchy
        .
        .
        .
        -print0
          This primary always evaluates to true.  It prints the
          pathname of the current file to standard output, followed
          by an ASCII NUL character (character code 0).
        
        
        xargs -- construct argument list(s) and execute utility
        .
        .
        .
        -0
          Change xargs to expect NUL (``\0'') characters as separators,
          instead of spaces and newlines.  This is expected to be used
          in concert with the -print0 function in find(1).
        

        Above construct may still fail, on behalf of xargs (see the bottom portion, just after options, of above linked man page), say, if there are "many enough" files. A search on Google Groups produced "maximum command line length <tcsh>".

Re: Writing unix-style filters
by Moron (Curate) on Apr 11, 2007 at 13:08 UTC
    ^B The problem is that it is unusual for a program to live long without command line options being added to the functionality.

    ^C For example, the unix program cat, which is functionally pretty simple, will do exactly what you are doing, i.e. check for arguments on the command line, but it will also parse the command line for options and eliminate any of those as not being files and only if non-option syntax is found will it process it as a file rather than reading from STDIN.

    ^I Thus cat -b will be interpreted as an option but no file arguments and read from STDIN rather than trying to open a file called -b.

    ^P Fortunately, modules Getopt::Std and Getopt::Long will do the legwork of that for you (^I they handle two different styles of command line options from which you should choose in advance).

    ___________________________________________________________

    Key to hats: ^I=white ^B=black ^P=yellow ^E=red ^C=green ^M=blue - see Moron's scratchpad for fuller explanation.

    ^M Free your mind