Re^4: Split tab-separated file into separate files, based on column name (open on demand)

Replies are listed 'Best First'.
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 28, 2020 at 01:30 UTC
Perl is a replacement for awk and sed and can do everything they can, and much, much more. Yes, but sometimes the older tools are better fits for the problem at hand. Some time ago I suggested to another questioner to either use `sed` in his shell script or rewrite the entire script in Perl because `sed` could do the work in less time than Perl needs for startup/shutdown overhead. Perl is more flexible and powerful, but that power does come at a cost and this question happens to fit Awk's domain almost exactly. Awk's greatest strength and greatest limitation is the implicit outer loop. On one hand, that feature allows Awk programs to be very efficient, but on the other hand, it limits Awk to processing input text streams. (though this isn't AwkMonks) I firmly believe that every Perl programmer should learn Awk because learning Awk will make you a better Perl programmer.	[reply]
Re^6: Split tab-separated file into separate files, based on column name (open on demand) (updated) by haukex (Archbishop) on Aug 28, 2020 at 14:00 UTC
... do the work in less time than Perl needs for startup/shutdown overhead. Perl is more flexible and powerful, but that power does come at a cost ... If that really was the point you were trying to make here, then it probably would have been better if you'd benchmarked and shown a solution that's actually faster than Perl. On a longer input file (OP never specified file length, but the fact that the number of columns grew from 3 to 20 is a hint), this pure Perl solution I whipped up is twice as fast as the `awk` code you showed: `use warnings; use strict; my @cols = split /\t/, <>; chomp($cols[-1]); my @fh = map { open my $fh, '>', $_ or die $!; $fh } @cols; while ( my $line = <> ) { chomp($line); my @row = split /\t/, $line; print {$fh[$_]} $row[$_], "\n" for 0..$#row; }` [download] Read more... Benchmark (2 kB) I firmly believe that every Perl programmer should learn Awk because learning Awk will make you a better Perl programmer. Sure, in general, the more programming languages a programmer is exposed to, the better they (usually) become. And yet, there are other situations: Some time ago I suggested to another questioner to either use `sed` in his shell script ... And I once showed someone who was writing an installer shell script how to use a oneliner to do a search and replace to change a configuration variable. And what happened? As the installation script grew, the oneliner just got called over and over again for different variables. While you, I, and the OP may know there are better solutions (as you said yourself, "rewrite the entire script in Perl"), these posts are public and may be read by people who may not know better, and in particular in comparison to `awk`, I disagree with an unqualified "Sometimes Perl is not the best tool for the job." Update - I also wanted to mention: In environments where there are several programmers on a team, most of whom are only focused on one language, having a product consist of code written in several different languages is more likely to cause maintenance problems. These are the reasons I said "throwing yet another new language into the mix" isn't necessarily a good thing. (Also, just in case there's any confusion with non-native speakers, the definition of "unqualified" I was using is "not modified or restricted by reservations", as in an "unqualified statement", and not "not having requisite qualifications", as in an "unqualified person".)	[reply] [d/l] [select]
Re^7: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 28, 2020 at 15:46 UTC
> "Sometimes Perl is not the best tool for the job." OK this a "Jein" situation. Perl is certainly often not the best tool. But if it comes to sed and awk it's hard to believe, because Larry meticulously copied all features. I bet, I could easily translate this given awk script in a one2one fashion to Perl, by encapsulating the open on demand into a short sub. Just look at `perlvar` , `perlrun` and `perltrap` at all the details given concerning awk. Now the startup argument for short data, where overhead counts ... ... startup isn't the same issue anymore like it was 25 years ago. To make it matter we need start a script over and over again. The realistic approach in this case is to write a persistent service which doesn't even need to start up. We are not talking about heavy apps like perltidy which may need a second to initialize. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^8: Split tab-separated file into separate files, based on column name (open on demand) by haukex (Archbishop) on Aug 28, 2020 at 17:00 UTC
Re^9: Split tab-separated file into separate files, based on column name (open on demand) by Leon Timmermans (Acolyte) on Aug 30, 2020 at 00:18 UTC
Some notes below your chosen depth have not been shown here
Re^9: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 28, 2020 at 20:03 UTC
Re^9: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 28, 2020 at 21:04 UTC
Some notes below your chosen depth have not been shown here
Re^8: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 28, 2020 at 23:38 UTC
Re^6: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 28, 2020 at 10:56 UTC
> Awk's greatest strength and greatest limitation is the implicit outer loop Are you aware about Perl's command switches? If not, just have a look at `perlrun` and search for "awk". > Perl needs for startup/shutdown overhead. Probably, but do I want to install awk and sed on Windows? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^7: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 28, 2020 at 23:23 UTC
Are you aware about Perl's command switches? Yes, but the optional implicit outer loops in Perl are different from the implicit outer loop in Awk — Awk's syntax is built around its implicit outer loop, using PATTERN-RULE pairs, while Perl's implicit outer loops are purely for convenience. I have always just written an outer loop explicitly even in one-liners. do I want to install awk and sed on Windows? Unless you already have them, perhaps from Cygwin, possibly not. That is a fair point — on Windows, the Perl startup/shutdown overhead is (probably still) dwarfed by the system startup/shutdown overhead for each process. I kicked Windows out of my personal LAN years ago, though, so I usually do not think to consider its inadequacies. :-)	[reply]
Re^8: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 29, 2020 at 13:56 UTC
Re^9: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 30, 2020 at 02:21 UTC
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 27, 2020 at 21:52 UTC
I think his point was that Awk has an open_on_demand. And if you know Perl, well it's not very difficult to decipher this Awk script ... ( ... oh that's were Larry got these "ideas" from ;-) My concern is that it's neither easier nor shorter than Perl. For comparison here a script version of my one-liner - already w/o taking advantages of command-line switches. `$\="\n"; while (<DATA>) { @F = split; unless (@FH) { open $FH[@FH], ">", "$_.txt" for @F; } else { print $_ shift @F for @FH; } } __DATA__ id name position 1 Nick boss 2 George CEO 3 Christina CTO` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^6: Split tab-separated file into separate files, based on column name (open on demand) by haukex (Archbishop) on Aug 28, 2020 at 13:56 UTC
I think his point was that Awk has an open_on_demand. That's not what I quoted and was replying to, though. It's certainly an interesting feature of `awk`'s to note, but going from there to saying that awk is better than Perl for the job is too much of a strech, IMHO. And, of course I was disappointed to see the "Best Nodes of The Day" list that node first.	[reply] [d/l]