in reply to Re^7: Split tab-separated file into separate files, based on column name (open on demand)
in thread Split tab-separated file into separate files, based on column name

Perl is certainly often not the best tool. But if it comes to sed and awk it's hard to believe

Yes, this was my point.

I bet, I could easily translate this given awk script in a one2one fashion to Perl, by encapsulating the open on demand into a short sub.

You don't need to, Larry did that already :-) a2p was part of the Perl core until 5.20, now it lives on CPAN.

$ a2p 11121118.awk #!/usr/bin/perl eval 'exec /usr/bin/perl -S $0 ${1+"$@"}' if $running_under_some_shell; # this emulates #! processing on NIH machines. # (remove #! line above if indigestible) eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift; # process any FOO=bar switches $FS = ' '; # set field separator $, = ' '; # set output field separator $\ = "\n"; # set output record separator $FS = "\t"; line: while (<>) { chomp; # strip record separator @Fld = split($FS, $_, -1); if (($.-$FNRbase) == 1) { @Fields = split($FS, '', -1); # clear fields array for ($i = 1; $i <= ($#Fld+1); $i++) { $Fields[($i)-1] = $Fld[$i]; } next line; } for ($i = 1; $i <= ($#Fld+1); $i++) { &Pick('>', $Fields[($i)-1]) && (print $fh $Fld[$i]); } } continue { $FNRbase = $. if eof; } sub Pick { local($mode,$name,$pipe) = @_; $fh = $name; open($name,$mode.$name.$pipe) unless $opened{$name}++; }

Unfortunately, there's apparently a bug in the translator, and the above script needs a s/\$Fld\[\$i\K\]/-1]/g to fix it.

  • Comment on Re^8: Split tab-separated file into separate files, based on column name (open on demand)
  • Select or Download Code

Replies are listed 'Best First'.
Re^9: Split tab-separated file into separate files, based on column name (open on demand)
by Leon Timmermans (Acolyte) on Aug 30, 2020 at 00:18 UTC
    Unfortunately, there's apparently a bug in the translator, and the above script needs a s/\$Fld\\$i\K\/-1]/g to fix it.
    Patches are welcome!
      Patches are welcome!

      If I had the time at the moment I would look into it more deeply :-/ Though I think that my approach at "fixing" the issue would probably be to try to revert back to the older $[ = 1; using Array::Base instead of adjusting all array indicies...

Re^9: Split tab-separated file into separate files, based on column name (open on demand)
by LanX (Saint) on Aug 28, 2020 at 20:03 UTC
    I know and I didn't mention a2p on purpose ;p

    As you can see it's producing Perl 4 code.

    I'd implement Pick() differently and this whole script is twice as long as needed.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re^9: Split tab-separated file into separate files, based on column name (open on demand)
by LanX (Saint) on Aug 28, 2020 at 21:04 UTC
      Did you try the code?

      That's how I discovered the issue... ;-P

      I think the issue is that awk's arrays are not zero based.

      I checked, and older versions of a2p did indeed set $[ = 1; - apparently while having other strange bugs in the output, it looks like to me (at least on my system, the output is strangely chopped up, e.g. "next linine;" or "}tinue {"). After 5.10, a2p dropped the $[ assignment, but added adjustment of array indicies in some places, while not adjusting the array indicies in other places, e.g. for ($i = 1; $i <= $#Fld; $i++) { $Fields[$i] = $Fld[$i]; } became for ($i = 1; $i <= ($#Fld+1); $i++) { $Fields[($i)-1] = $Fld[$i]; }.

      Awk does not have numerically-indexed arrays at all. There is a convention for using digit strings to emulate numeric indexing, and like Perl, Awk will convert numbers to digit strings upon demand, but Awk arrays are Perl hashes.

        yes Awk is - similar to JS - using the same datatype for Arrays and Hashes, but ...

        > Awk does not have numerically-indexed arrays at all.

        Awk has implicitly numbered arrays like ARGV

        And ARGV[1] corresponds to $ARGV[0] in Perl

        Similarly will awk's split() index the first element with 1 in the resulting "array".

        And the first field from auto-split is $1 not $0 , while perl -a will put it into $F[0]

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery