Writing unix-style filters

njcodewarrior has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Writing unix-style filters by jdporter (Paladin) on Apr 11, 2007 at 03:01 UTC
I think your solution is correct, but your actual implementation could be more succinct. This: `@files = map { -f $_ ? $_ : () } @ARGV;` [download] could (and arguably should) be written as `@files = grep { -f $_ } @ARGV;` [download] This: `while ( <> ) { chomp; push @files, $_; }` [download] could be written as `chomp( @files = <> );` [download] though you might have a concern if the file list is huge, relative to your available memory. :-) Ultimately, the whole chunk of code could be written as `chomp( @files = @ARGV ? grep { -f $_ } @ARGV : <> );` [download] A word spoken in Mind will reach its own level, in the objective world, by its own weight	[reply] [d/l] [select]
Re^2: Writing unix-style filters by bart (Canon) on Apr 11, 2007 at 11:18 UTC
`chomp( @files = @ARGV ? grep { -f $_ } @ARGV : <> );` This chomps after you do the file test, with the newline not yet removed. So, it will just produce an empty array, when reading the file list from the input, because `-f $_` will always return false.	[reply] [d/l] [select]
Re^3: Writing unix-style filters by jdporter (Paladin) on Apr 11, 2007 at 15:38 UTC
No; actually, it will not, because the `-f` test operator here is not being applied to the strings which are returned from the `<>`. Perhaps it should be; in which case you'd have a very good point. `chomp( @files = @ARGV ? @ARGV : <> ); @files = grep { -f $_ } @files;` [download] A word spoken in Mind will reach its own level, in the objective world, by its own weight	[reply] [d/l] [select]
Re: Writing unix-style filters by graff (Chancellor) on Apr 11, 2007 at 04:26 UTC
I think jdporter covered things pretty well. One more trick you might want to know: the "-t" function (one of the things described in perlfunc, particularly in the section that appears when you run "perldoc -f -X"). The "-t", used without any parameter, will return true if your current STDIN is coming from a terminal (as opposed to coming from a pipeline or input redirection). Here's how you might use it: `#!/usr/bin/perl use strict; use warnings; my @files; if ( @ARGV ) { @files = grep { -f } @ARGV; } elsif ( -t ) { # true if STDIN is a tty (not a pipe) die "Usage: $0 file [file2 ...]\n or: ls \| $0\n"; } else { # get here when STDIN is connected to a pipe @files = <>; chomp @files; } # ... do stuff with @files` [download] Try running that without args and without any input pipe or redirection, and you'll see the "Usage:" message. Update: <pedantic> I felt compelled to point out that the particular design you want, IMO, is not really a "unix-style filter". For the great majority unix-style filters, the following types of command lines are (more or less) equivalent: `cat file1 \| tool tool file1 tool < file1 cat file1 file2 ... \| tool tool file1 file2 ... # (no equivalent using "<" in this case)` [download] (The first example, of course, demonstrates "useless use of cat", but I included it for pedagogical completeness). </pedantic>	[reply] [d/l] [select]
Re^2: Writing unix-style filters by parv (Parson) on Apr 11, 2007 at 05:03 UTC
++ for fitting pedantry; -- for UUOC (only if I could).	[reply]
Re: Writing unix-style filters by ikegami (Patriarch) on Apr 11, 2007 at 05:28 UTC
If you don't wanna read from the files on the command line, why are you using `<>`? #!/usr/bin/perl # usage: # myscript .txt # myscript `ls .txt` # ls .txt \| myscript use strict; use warnings; my @files = @ARGV; if (!-t STDIN) { while (<STDIN>) { chomp; push @files, $_; } } ... [download] But do you really need this whole STDIN thing? #!/usr/bin/perl # usage: # myscript .txt # myscript `ls *.txt` use strict; use warnings; my @files = @ARGV; ... [download] I must agree on the previously stated point that this is not how unix filters work.	[reply] [d/l] [select]
Re^2: Writing unix-style filters by njcodewarrior (Pilgrim) on Apr 11, 2007 at 23:38 UTC
If you don't wanna read from the files on the command line, why are you using <>? Because I'd like to be able to pipe a list of files (from ls, echo, etc) into the script as well as list them on the command line: `ls .txt \| my_script.pl` [download] or `my_script.pl file1.txt file2.txt` [download] or `my_script.pl .txt` The only way I can figure to do all of the above is using <>. I'm learning the ins and outs of the nix OS at the same time I'm learning perl...am I missing something here? njcodewarrior*	[reply] [d/l] [select]
Re^3: Writing unix-style filters by ikegami (Patriarch) on Apr 12, 2007 at 01:26 UTC
Because I'd like to be able to pipe a list of files (from ls, echo, etc) into the script as well as list them on the command line: You could easily substitute `<STDIN>` for `<>` in your original post since you're using `<>` exclusively to read from `<STDIN>`.	[reply] [d/l] [select]
Re: Writing unix-style filters by cdarke (Prior) on Apr 11, 2007 at 08:23 UTC
Joining the ranks of the pedantic, why use the ls program to list the filenames? The shell does the filename expansion (globbing), not ls: `echo *.txt\|my_script.pl` [download] is a better shell way. But in Perl, why not call glob yourself?	[reply] [d/l]
Re^2: Writing unix-style filters by polettix (Vicar) on Apr 11, 2007 at 09:36 UTC
Shell expansion means having all the filenames on one big line, which invalidates most of the methods suggested above and can be a real PITA if filenames contain spaces (which is likely, as opposed to newlines). Moreover, there can be a limit on the size of the command line for a command (`echo` in this case) and `ls` is able to list contents of directories as well as expanded wildcards. I think that your suggestion is more effective when the shell expansion is used directly on the command line of `my_script.pl`, like this: `my_script.pl .txt` [download] Flavio perl -ple'$_=reverse' <<<ti.xittelop@oivalf Don't fool yourself.*	[reply] [d/l] [select]
Passing list long enough for shell expansion to fail, but not for "find \| xargs" (was: Re: Writing unix-style filters) by parv (Parson) on Apr 11, 2007 at 11:32 UTC
O' darn it! Use "find ... -print0 \| xargs -0 ..." construct already! In particular (slightly formatted) ... find -- walk a file hierarchy . . . -print0 This primary always evaluates to true. It prints the pathname of the current file to standard output, followed by an ASCII NUL character (character code 0). xargs -- construct argument list(s) and execute utility . . . -0 Change xargs to expect NUL (``\0'') characters as separators, instead of spaces and newlines. This is expected to be used in concert with the -print0 function in find(1). Above construct may still fail, on behalf of xargs (see the bottom portion, just after options, of above linked man page), say, if there are "many enough" files. A search on Google Groups produced "maximum command line length <tcsh>".	[reply]
Re: Passing list long enough for shell expansion to fail, but not for "find \| xargs" (was: Re: Writing unix-style filters) by polettix (Vicar) on Apr 11, 2007 at 14:47 UTC
Re: Writing unix-style filters by Moron (Curate) on Apr 11, 2007 at 13:08 UTC
^B The problem is that it is unusual for a program to live long without command line options being added to the functionality. ^C For example, the unix program cat, which is functionally pretty simple, will do exactly what you are doing, i.e. check for arguments on the command line, but it will also parse the command line for options and eliminate any of those as not being files and only if non-option syntax is found will it process it as a file rather than reading from STDIN. ^I Thus `cat -b` will be interpreted as an option but no file arguments and read from STDIN rather than trying to open a file called -b. ^P Fortunately, modules Getopt::Std and Getopt::Long will do the legwork of that for you (^I they handle two different styles of command line options from which you should choose in advance). ___________________________________________________________ Key to hats: ^I=white ^B=black ^P=yellow ^E=red ^C=green ^M=blue - see Moron's scratchpad for fuller explanation. ^M Free your mind	[reply] [d/l]