creamygoodness has asked for the wisdom of the Perl Monks concerning the following question:

Greets,

I need to open many filehandles against the same file. They should all be independent, meaning that they should all be able to seek, read, etc without affecting each other. If I read perlopentut correctly, this should do the trick:

open(my $orig, "<", "foofile"; open(my $dupe, "<&", $orig);
That works as expected on Perl 5.8.6 on my OS X laptop. However, It doesn't work with either 5.8.6, 5.8.0, or bleadperl on a RedHat9 box. seeking/reading against the original messes up the dupe. The dupe SAYS that it's in the right spot, but it doesn't read the data that's there. Here's a test script...
#!/usr/bin/perl use strict; use warnings; open (my $junkfile, '>', "junkfile"); my $junk = ''; for my $first_byte (0 .. 255) { for my $second_byte (0 .. 255) { $junk .= pack('CC', $first_byte, $second_byte); $junk .= "\0" x 2; } } print $junkfile $junk; close $junkfile; open(my $orig, '<', "junkfile") or die $!; seek( $orig, 256, 0); print_locations("Locations after seeking ORIG to 256: "); my $correct; read($orig, $correct, 8); print_locations("Locations after reading 8 bytes from ORIG: "); my @bytes = unpack('C*', $correct); print "ORIG read these bytes: @bytes\n"; open(my $dupe, "<&", $orig) or die $!; print_locations("Locations immediately after duping: "); seek( $dupe, 256, 0); print_locations("Locations after seeking DUPE to 256: "); seek($orig, 120, 0); print_locations("Locations after seeking ORIG to 120: "); my $useless; read($orig, $useless, 20); print_locations("Locations after reading 20 bytes from ORIG: "); my $test; read($dupe, $test, 8); print_locations("Locations after reading 8 bytes from DUPE: "); @bytes = unpack('C*', $test); print "DUPE read these bytes: @bytes\n"; sub print_locations { my $label = shift; print "$label\n "; print "ORIG " . tell($orig) . " "; print "DUPE "; print defined $dupe ? tell($dupe) : "not valid"; print "\n"; }

... and its output:

On RedHat9:

$ /usr/local/blead/bin/perl5.9.3 many.plx Locations after seeking ORIG to 256: ORIG 256 DUPE not valid Locations after reading 8 bytes from ORIG: ORIG 264 DUPE not valid ORIG read these bytes: 0 64 0 0 0 65 0 0 Locations immediately after duping: ORIG 264 DUPE 264 Locations after seeking DUPE to 256: ORIG 264 DUPE 256 Locations after seeking ORIG to 120: ORIG 120 DUPE 256 Locations after reading 20 bytes from ORIG: ORIG 140 DUPE 256 Locations after reading 8 bytes from DUPE: ORIG 140 DUPE 264 DUPE read these bytes: 4 30 0 0 4 31 0 0 $

On OS X:

$ perl many.plx Locations after seeking ORIG to 256: ORIG 256 DUPE not valid Locations after reading 8 bytes from ORIG: ORIG 264 DUPE not valid ORIG read these bytes: 0 64 0 0 0 65 0 0 Locations immediately after duping: ORIG 264 DUPE 4096 Locations after seeking DUPE to 256: ORIG 264 DUPE 256 Locations after seeking ORIG to 120: ORIG 120 DUPE 256 Locations after reading 20 bytes from ORIG: ORIG 140 DUPE 256 Locations after reading 8 bytes from DUPE: ORIG 140 DUPE 264 DUPE read these bytes: 0 64 0 0 0 65 0 0 $

What's the right way to dupe a filehandle?

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Replies are listed 'Best First'.
Re: Duping filehandles
by ambrus (Abbot) on Jan 20, 2006 at 09:25 UTC

    It is impossible to dupe a filehandle that way in Unix. The only thing you can do is to open the same filename multiple times.

    Le me elaborate. There are two easy ways you can copy a filehandle in unix (there are others in fact): you can copy it explicitly with the dup, dup2, or fcntl F_DUPFD calls; or it can be copied implicitly if you fork.

    However, when a handle is copied this way, they will share most of their properties, which means you'll have only one file position, so if you read/write/seek on one of them, that will affect both handle's file position the same way. They also share the so called "file status flags" and the fcntl locks.

    The duplicated filehandles can be closed separately, and they also have a single flag that can be set separately: this is the file descriptor flag FD_CLOEXEC, which determines if the filehandle would be closed on exec.

    You can read more of these properties from the glibc info, or the man pages fcntl(2), dup(2), dup2(2).

    Dominus explains quite clearly why file positions have to be shared by dupe filehandles in slide 21 of Internals of Familiar Unix Commands. While he only mentions fork, this applies to dup2 too, as when a shell starts a program redirected to a file, it has to dup2 the filehandle to the proper filedescriptor number (0 for stdin, 1 for stdout).

    Now let me say a few words about perl. In perl, you don't directly manipulate file descriptors: you have filehandles (IO handles), so that output can be buffered.

    Basically we have this. Let's say you open a file with

    open $F, ">", "filename" or die;
    Then we have a scalar $F which is pointing to a perl filehandle object, which is pointing to a filedescriptor (which is just a number in userspace, but refers to a structure in kernel space), which filedescriptor structure is pointing to a file status structure.

    If you say $G = $F, you just copy the refernce to the filehandle object, and the two references are completely identical. If you close $G, then $F gets closed too, but if you don't, you have to both $F = 0; and $G = 0; to get the filedescriptor closed implicitly by the refcounter. If you binmode $F, then $G gets binmoded too.

    If you instead say open $H, ">&=", $F or die;, then the filehandle object is copied but the filedescriptor is the same. In this case, you can close $F and $H separately (either implicitly by refcount or the close function) but only if both are closed will perl really close the underlying filedescriptor. You can binmode $F and $H separately, and this won't affect the other one. $F and $H will share the same fileno, and the same file position (tell).

    You can open $I, ">&", $F or die;, which is different from the above only by an equals sign. This is indistinguishable from the above as long as you only use the handles from perl and you don't fork or exec. However, here a new filedescriptor gets created (with dup), and you can set the FD_CLOEXEC flag separately so that a fork-execed process would inherit one of the descriptors but not the other. $F and $I will be two filehandles pointing to two different filedescriptors which point to the same file status structure. Thus, they can be binmoded separately, they have different filenos, but they share the same file position.

    The only way you can create a new file status flag to a the same file is a real open, such as open $J, ">", "filename" or die;. Only then will $J and $F have different file positions in the same file, and will they be so that if you lock a part of the file with fcntl, you can not write that part of the file with the other, nor can you unlock that part with the other filehandle.

      Fantastic reply. It answers all my questions, plus more that I hadn't known how to ask. Thanks bunches.

      --
      Marvin Humphrey
      Rectangular Research ― http://www.rectangular.com
      Thanks ambrus. I needed to be able to print stuff to the terminal after redirecting STDOUT to some other file and your info (after a pointer from the Chatterbox) provided useful guidance.

      open( SAVE_OUT, ">&STDOUT" ) or die "Could not open SAVE_OUT: '$!'\n"; open STDOUT, ">> $out_file" or die "Could not redirect STDOUT to '$out_file': '$!'\n"; my $dd_version = $0 . " 1.1"; print SAVE_OUT "$dd_version\n";