Reading multiple files one line at a time from arguments

jaypal has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading multiple files one line at a time from arguments by BrowserUk (Patriarch) on Jun 19, 2014 at 03:42 UTC
For learning purposes I would appreciate if some one can suggest a more concise approach (even if it comes at the cost of readability). Taking you at you word; you can do this extremely concisely with a one-liner: `perl -ple1 file1 file2 ...` [download] Or, if you prefer it as a 'full program'; type this into a file called `ptype.pl`: `#! perl -pl` [download] And then on your command line: `ptype file1 file2 file3 file4 ...` [download] If you're using nix, then it would have to be (something like): `#!/usr/bin/perl -pl` [download] With the advantage of being able to use wilcards: `ptype .pl *.txt` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Reading multiple files one line at a time from arguments by Anonymous Monk on Jun 19, 2014 at 04:26 UTC
The advantage of being able to use nix wildcards is a shell property (bash/tcsh/whatever) and irrelevant. The following works just as well: `perl -ple1 .pl *.txt` [download]	[reply] [d/l]
Re: Reading multiple files one line at a time from arguments by davido (Cardinal) on Jun 19, 2014 at 04:04 UTC
If you didn't care about what file name you're processing, the entire script could look like this: `die "Usage: mytest filename [filename [...]]\n" unless @ARGV; print while <>;` [download] ...because (see perlop) the `<>` operator shifts filenames off of @ARGV and opens them internally. If @ARGV starts out empty, it reads from STDIN instead. Here's another approach that is quite similar to yours, but borrows the idea of shifting filenames out of @ARGV: `die "Usage: mytest filename [filename [...]]\n" unless @ARGV; while( my $file = shift ) { open my $ifh, '<', $file or die $!; print "$file($.): $_" while <$ifh>; }` [download] Dave	[reply] [d/l] [select]
Re^2: Reading multiple files one line at a time from arguments by choroba (Cardinal) on Jun 19, 2014 at 08:06 UTC
If you didn't care about what file name you're processing Even if you did, you could retrieve it from `$ARGV`, see perlvar. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^3: Reading multiple files one line at a time from arguments by davido (Cardinal) on Jun 19, 2014 at 08:14 UTC
++! I knew I was forgetting something useful. In that case: `die "Usage: mytest filename [filename [...]]\n" unless @ARGV; while ( <> ) { print "$ARGV($.): $_" } continue { close ARGV if eof }` [download] That will print current file, current line number, and then the line. The continue clause serves to reset the line number before moving on to the next file. Otherwise `$.` would just keep counting upward, giving a total for all files. I have no idea why I indented it that way. Just looking for symmetry or something, I guess. Dave	[reply] [d/l] [select]
Re^4: Reading multiple files one line at a time from arguments by choroba (Cardinal) on Jun 19, 2014 at 08:18 UTC
Re^5: Reading multiple files one line at a time from arguments by davido (Cardinal) on Jun 19, 2014 at 08:40 UTC
Re: Reading multiple files one line at a time from arguments by Laurent_R (Canon) on Jun 19, 2014 at 06:10 UTC
Although you have been shown various shortcuts (using especxially the `while (<>)` construct) applicable if you don't need to distinguish between various files, your general approach is very sound and robust. I would just slightly improve the error reporting mechanism: `open my $fh, "<", "$file" or die "Cannot open file $file: $!";` [download] which tells you which file you failed to open (this makes your life easier when you need to open a dozen files and one is missing, or when your program is constructing the file names by assembling various parts of such names). Aside from that, there is also the question whether you really want your program to fail if just one file is missing; this is usually what I want to do in such a situation, but there are some rare cases where I would just want a warning rather that a failure if just one file is missing from a long list of files.	[reply] [d/l] [select]
Re: Reading multiple files one line at a time from arguments by neilwatson (Priest) on Jun 19, 2014 at 02:58 UTC
Welcome to Perl, You might look at Perl::Critic, the module that helps you improve your style and good practices. There's even an online version. http://search.cpan.org/~thaljef/Perl-Critic-1.121/lib/Perl/Critic.pm http://perl-critic.stacka.to/ Neil Watson watson-wilson.ca	[reply]
Re: Reading multiple files one line at a time from arguments by perlfan (Parson) on Jun 19, 2014 at 02:57 UTC
This seems fine; I would check to make sure each successive `$file` exists and either bail or just skip non-existent files. I'd consider using Getopt::Long to define a flag that can contain multiple values. This way, you'd have something like: `./readfiles.pl -f file1 -f file2 -f file3` [download]	[reply] [d/l] [select]
Re: Reading multiple files one line at a time from arguments by NetWallah (Canon) on Jun 19, 2014 at 04:28 UTC
The only thing I'd add (Although it is not "required") is: `close $fh or warn "Could not close file";` [download] What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against? -Larry Wall, 1992	[reply] [d/l]
Re: Reading multiple files one line at a time from arguments by locked_user sundialsvc4 (Abbot) on Jun 19, 2014 at 12:37 UTC
Your approach seems sensible to me. A production version of such an application would probably handle the “cannot open” case by `print STDERR some-message`, then by incrementing a count of files that could not be opened. If, at the end of the loop, this count is non-zero, the program would `exit()` with some non-zero return code that could easily be checked by a calling script. The `STDERR` output-stream is specifically provided for error-message outputs.