Melly has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I'm trying to understand the different behaviour of file-handles and similar (e.g. '<DATA>').

I don't think I'd ever really thought about them - they just worked - but I ran into a problem with CSV_XS and getline. Basically, why does a FH variable work (e.g. $IN), but <IN> and <DATA> don't.

The following code is non-functional without commenting out stuff, but should illustrate what I mean.

use Text::CSV_XS; my $csv = Text::CSV_XS->new(); open(my $IN, '<', 'test.csv'); # ok open(IN, '<', 'test.csv'); # nope my $row = $csv->getline(<DATA>); # nope - Usage: Text::CSV_XS::getline +(self, io) at test_02.pl line 11, <IN> line 2. my $row = $csv->getline(<IN>); # nope - Usage: Text::CSV_XS::getline +(self, io) at test_02.pl line 11, <IN> line 2. my $row = $csv->getline($IN); # ok print ${$row}[0]; __DATA__ a,b,c d,e,f

So, what is the difference between "open(IN..." and "open($IN..."? (and is there a way to alias <DATA> to, e.g., $DATA?)

UPDATE

Ah - \*DATA or \*IN - err, what does '\*' imply?

map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
Tom Melly, pm (at) cursingmaggot (stop) co (stop) uk

Replies are listed 'Best First'.
Re: Filehandles and CSV_XS (updated)
by haukex (Archbishop) on Sep 01, 2023 at 10:46 UTC

      Ah! That (sort-of) makes sense - in all honesty, my brain gives up on at least three things - quantum physics, Trump-supporters, and type-globs...

      map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
      Tom Melly, pm (at) cursingmaggot (stop) co (stop) uk

        in all honesty, my brain gives up on at least three things - quantum physics, Trump-supporters, and type-globs...

        Thanks for making me laugh. :) I'm relieved you didn't mention your brain giving up on use strict, use warnings and lexical variables. You really need to embrace all three. To give a simple example why, notice that this code:

        use strict; use warnings; sub fred { my $fname = 'f.tmp'; open( FH, '<', $fname ) or die "error: open '$fname': $!"; print "file '$fname' opened ok\n"; # ... process file here die "oops"; # if something went wrong close(FH); } eval { fred() }; if ($@) { print "died: $@\n" } # oops, handle FH is still open if an exception was thrown. my $line = <FH>; print "oops, FH is still open:$line\n";
        is not exception-safe because the ugly global file handle FH is not closed when die is called.

        A simple remedy, as noted at Exceptions and Error Handling References, is to replace the ugly global FH with a lexical file handle my $fh, which is auto-closed at end of scope (RAII):

        use strict; use warnings; sub fred { my $fname = 'f.tmp'; open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; print "file '$fname' opened ok\n"; # ... process file here die "oops"; # if something went wrong close($fh); } eval { fred() }; if ($@) { print "died: $@\n" } print "ok, \$fh is auto-closed when sub fred exits (normally or via di +e)\n";

        I know that feeling :-) and yeah I also had that experience when first encountering typeglobs. But, here's the simple explanation I arrived at for my own understanding:

        Perl allows you to create different types of global things using the same name, like '$foo', '@foo', '%foo', 'sub foo' and so on. To store them internally, one logical design would have been to have a hash table of "ScalarGlobals", another hash table of "ArrayGlobals", another hash table of "HashGlobals", and so on. In perl-speak, it might look like $globals{$package_name}[Scalars]{$scalar_name}. But, that makes a lot of hash tables per package name. Larry chose instead to create one hash table per package, and then store a struct which has a "slot" for each type of thing. In perl-speak, it might look like $globals{$package_name}{$thing_name}[ScalarSlot] The end result is lower use of memory by having fewer hash tables.

        In the original language, NAME referred to the file handle, $NAME the scalar, @NAME the array, %NAME the hash, and &NAME the subroutine. But, for reasons I am uninformed of, it was decided that people needed access to these slots in the perl language (probably so that Exporter can be written in Perl instead of written in C) and so *NAME gives you access to this C struct that has a slot for each type of thing. The notation *NAME{IO} is how you access the file handle slot of that struct.

        Now that bare-word file handles are discouraged, that implementation detail of the C-side typeglob structs (and the awkward Perl syntax for it) is more in-our-face than ever, especially since there is no native '$' variable for STDIN, STDOUT, STDERR, or DATA.

Re: Filehandles and CSV_XS
by ikegami (Patriarch) on Sep 01, 2023 at 13:36 UTC

    Ah - \*DATA or \*IN - err, what does '\*' imply?

    *NAME refers to the symbol table entry for "NAME". It contains $NAME (*NAME{SCALAR}), @NAME (*NAME{ARRAY}), %HASH (*NAME{HASH}), &NAME (*NAME{CODE}) and a number of other things, including a file handle (*NAME{IO}) and a directory handle.

    So when you are doing

    print FH ...

    It's effectively a shorthand for *FH, which is effectively a shorthand for *FH{IO}.

    Things commonly accepted as file handles:

    • IO object (*FH{IO}).
    • reference to glob (\*FH).
    • glob (*FH).
    • name of the glob ("FH"). This is less supported than the others.

    open( my $fh, ... ) assigns a reference to a glob to $fh (like \*FH).