Re^2: reading multiple files one at a time

Replies are listed 'Best First'.
Re^3: reading multiple files one at a time by BrowserUk (Patriarch) on Jun 11, 2005 at 02:47 UTC
Updated: Corrected. Replaced the untested attempt to mitigate for warning produced when one or more of the input files is shoter than the others. #! perl -slw use strict; my @files = @ARGV; my @fhs; my $i = 0; open $fhs[ $i++ ], '<', $_ or die "$_ : $!" for @files; #while( my @lines = map{ scalar <$_> \|\| () } @fhs ) { !!Bad code!! while( ## Build an array of one line (or the null string) from each fi +le. my @lines = map{ defined( my $line =<$_> ) ? $line : '' } @fhs ) { chomp @lines; ## Remove the newlines print join ' ', @lines; ## and concatenate them. } close $_ for @fhs; __END__ P:\test>465737 test1.dat test2.dat test3.dat file1 line 1 file2 line 1 file3 line 1 file1 line 2 file2 line 2 file3 line 2 file1 line 3 file2 line 3 file3 line 3 file1 line 4 file2 line 4 file3 line 4 ... [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^4: reading multiple files one at a time by Zettai (Acolyte) on Jul 13, 2005 at 02:01 UTC
Hi, I tried replying to this when you first posted it but couldn't. So I have just recently joined anyway cause this seems like a great Perl community. I just wanted to say thanks very much. It worked brilliantly. Is there any chance that you could explain what the following line does: while( my @lines = map{ scalar <$_> \|\| () } @fhs ) { Specifically what are you doing in: scalar <$_> \|\| () Is this like forcing the $_ var to be of type scalar? Why do we want to do that? and what is the \|\| () If you or someone could explain this in easy newbie friendly terms that would be great. Thanks again. Peter.	[reply]
Re^5: reading multiple files one at a time by BrowserUk (Patriarch) on Jul 13, 2005 at 05:44 UTC
Sure. Doing so has also highlighted an error The diamond operator `<$var>`, reads lines from a file that is pointed at by the handle `$var`. As with many operators and function calls in Perl, the diamond operator is sensitive to where (termed the context), it is being used, to control some aspects of it's behaviour. For example, if the variable (or filehandle) `$fh` has been opened to a file, then `my $line = <$fh>;` [download] reads one line from that file and assigns it to the (scalar) variable `$line`. This is termed a scalar context; a single (scalar) value is assigned. However, if the same filehandle is used this way: `my @lines = <$fh>;` [download] then instead of one line being read from the file, all the lines are read from the file and the array `@lines` is extended to hold as many lines as there are available in the file. This is termed a list context; multiple (a list of) variables are assigned. It's tempting to think of this as an `array</i> context, but this is a frowned upon term as the variables ne +edn't be an array. <code> my( $v1, $v2, $v3, $v4, $v5 ) = <$fh>;` [download] Here, a list of individually named scalar variables (rather than an array of collectively named scalar variables) are assigned. Hence the statement is a list assignment. In this case, the first five lines from the file will be assigned to the five named variables, *but all* the lines from the file will have been read into memory*. All those after the 5th line will be discarded. The scalar operator is used to force an operator or function into a scalar context, when it would otherwise be called in a list context. So this `my @lines = scalar <$fh>;` [download] Will cause the diamond operator to be called in a scalar context even though it's results are being assigned to an array. After this statement, `@lines` will contain just a single scalar variable (`$lines[ 0 ]`), which will contain the first line from the file. Only that line will have been read from the file and the filehandle (`$fh`) will be pointing at the start of the second line ready for another call to the diamond operator. In the snippet of code, `@fhs` is an array of filehandles each opened to a different file, the names read from the command line via `@ARGV`. Having opened all the files, the task is to read one line from each file in turn and append those lines together to form a single line of output and repeat that process for all the lines in all the file(handle)s. To do this, we need to call the diamond operator on each of the filehandles (`@fhs`) in turn, in a scalar context* to ensure that only one line is read from each, and then append those lines together (after removing the newlines (`"\n"`) from each, and then print the composite line out. So, I range over `@fhs` using map, assigning each to `$_` in turn. Applying scalar to the diamond operator `scalar <$_>` ensures that it is called in a scalar context and only returns one line at a time from each filehandle. These lines are assigned to `@lines`. I can then use chomp to remove the newlines from all of the lines and the join to concatenate them together before printing them out for redirection to the composite file. However, if one or more of the files is shorter than the others, then the diamond operator will return undef when attempting to read a another line from a file that has already been exhausted. Pragmatically, Perl will allow the loop to continue, but it will issue a warning when you attempt to join that undefined value with the other values read in: `print join '', 'fred', 'bill', undef, 'john';; Use of uninitialized value in join or string at (eval 3) fredbilljohn` [download] As I posted the example code, I thought about that situation and (too quickly) added a "quick fix" to deal with it. I made a mistake! It doesn't work at all.. I've corrected that in the snippet above. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l] [select]
Re^3: reading multiple files one at a time by graff (Chancellor) on Jun 12, 2005 at 01:27 UTC
Unix systems already have a standard command line utility to do this. It's called "paste": `paste file1 file2 file3 > merged_file` [download] And like all good unix tools, of course, it has been ported to windows (by cygwin, ATT Research Labs, GNU, and others). But using perl for this (as demonstrated by BrowserUK) is still fun and rewarding in its own right -- e.g. in case you need to do character encoding conversions while merging files, or manage column widths or modify column contents in intelligent ways. (You could use "paste" in combination with other standard unix tools to do these things, but at that point, it's not so different from just writing a little perl script.)	[reply] [d/l]