Re^2: Split tab-separated file into separate files, based on column name (open on demand)

Replies are listed 'Best First'.
Re^3: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 27, 2020 at 03:57 UTC
Point is Perl has no mean to `print_and_open_if_necessary()` Sometimes Perl is not the best tool for the job. Awk does have that feature and here is an Awk program that does what our questioner asks: `#!/usr/bin/awk -f BEGIN { FS = "\t" } FNR == 1 { split("", Fields) # clear fields array for (i = 1; i <= NF; i++) Fields[i] = $i next } { for (i = 1; i <= NF; i++) print $i > Fields[i] }` [download] Save it in a file and mark it executable; tested with GNU Awk. Feed it input on stdin or list the files you want it to read on the command line. If you want to add prefixes or suffixes to the output file names, add them to the `print` statement, like so: `print $i > ("out."Fields[i]".txt")`; the parentheses ensure that the invisible concatenation operator will be parsed correctly.	[reply] [d/l] [select]
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by haukex (Archbishop) on Aug 27, 2020 at 19:19 UTC
Since this is currently the top node of the past 24 hours, I'll comment. Sometimes Perl is not the best tool for the job. Awk ... I strongly disagree. Perl is a replacement for awk and sed and can do everything they can, and much, much more. tobyink pointed out IO::All - and while this module may not be in the core, note that CPAN is one of Perl's greatest strengths. If you're familiar enough with awk to whip up this script that's fine, and it's certainly interesting to see how it's done in other languages (though this isn't AwkMonks), but consider that the OP may already not be very familiar with Perl, and throwing yet another new language into the mix is unlikely to be the most efficient approach in the long run.	[reply]
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 28, 2020 at 01:30 UTC
Perl is a replacement for awk and sed and can do everything they can, and much, much more. Yes, but sometimes the older tools are better fits for the problem at hand. Some time ago I suggested to another questioner to either use `sed` in his shell script or rewrite the entire script in Perl because `sed` could do the work in less time than Perl needs for startup/shutdown overhead. Perl is more flexible and powerful, but that power does come at a cost and this question happens to fit Awk's domain almost exactly. Awk's greatest strength and greatest limitation is the implicit outer loop. On one hand, that feature allows Awk programs to be very efficient, but on the other hand, it limits Awk to processing input text streams. (though this isn't AwkMonks) I firmly believe that every Perl programmer should learn Awk because learning Awk will make you a better Perl programmer.	[reply]
Re^6: Split tab-separated file into separate files, based on column name (open on demand) (updated) by haukex (Archbishop) on Aug 28, 2020 at 14:00 UTC
Re^7: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 28, 2020 at 15:46 UTC
Some notes below your chosen depth have not been shown here
Re^6: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 28, 2020 at 10:56 UTC
Re^7: Split tab-separated file into separate files, based on column name (open on demand) by jcb (Parson) on Aug 28, 2020 at 23:23 UTC
Some notes below your chosen depth have not been shown here
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 27, 2020 at 21:52 UTC
I think his point was that Awk has an open_on_demand. And if you know Perl, well it's not very difficult to decipher this Awk script ... ( ... oh that's were Larry got these "ideas" from ;-) My concern is that it's neither easier nor shorter than Perl. For comparison here a script version of my one-liner - already w/o taking advantages of command-line switches. `$\="\n"; while (<DATA>) { @F = split; unless (@FH) { open $FH[@FH], ">", "$_.txt" for @F; } else { print $_ shift @F for @FH; } } __DATA__ id name position 1 Nick boss 2 George CEO 3 Christina CTO` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l]
Re^6: Split tab-separated file into separate files, based on column name (open on demand) by haukex (Archbishop) on Aug 28, 2020 at 13:56 UTC
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by Anonymous Monk on Aug 27, 2020 at 10:01 UTC
Hi jcb, excellent post, thank you! I did write a Perl script after all, but I suspect that your way is much faster! Thanks to all that offered their advice, much appreciated :)	[reply]
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 27, 2020 at 10:16 UTC
> Sometimes Perl is not the best tool for the job Well the OP asked for a one liner but you provided now a script. I have trouble to see why a Perl script may be worse than an Awk script. (?) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: Split tab-separated file into separate files, based on column name (open on demand) by Eily (Monsignor) on Aug 26, 2020 at 15:26 UTC
This might be considered dirty in a real Perl script but should be acceptable in a one-liner. 100% agree with that sentence (which says a lot, since the sentence is "this might be"). You could use operator overloading to replicate that feature. `"Value" > file("path");` or `"Value" >> file("path")` where file returns an object that overloads > and >> Or you could do something closer to C++: `fstream("path") << 120 << " in hexadecimal is " << ctrl::hex << 120; fstream("logs", "a") << ctrl::autoline << "I'm adding this line to the + logs" << "and also this line";` [download]	[reply] [d/l] [select]
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by tobyink (Canon) on Aug 27, 2020 at 16:40 UTC
IO::All does this. toby döt ink	[reply]
Re^5: Split tab-separated file into separate files, based on column name (open on demand) by jo37 (Curate) on Aug 28, 2020 at 16:37 UTC
Great module! So with IO::All this could yield: `$ perl -MIO::All -F'/\t/' -lnae '@files = @F, next if 1 .. 1; @f = @fi +les; $_ >> io(shift @f) for @F' <<EOF id name position 1 Nick boss 2 George CEO 3 Christina CTO EOF $ paste id name position 1 Nick boss 2 George CEO 3 Christina CTO` [download] EDIT Removed `do` statement. Greetings, -jo `$gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$`	[reply] [d/l] [select]
Re^4: Split tab-separated file into separate files, based on column name (open on demand) by LanX (Saint) on Aug 26, 2020 at 15:38 UTC
> You could use operator overloading I don't think it's a good idea to overload two very different operators like > "greater-than" and >> "shift". That's begging for inconsistency problems. (like syntax, precedence, name it ...) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: Split tab-separated file into separate files, based on column name (tangent = open on demand => stream-like) by pryrt (Abbot) on Aug 26, 2020 at 17:32 UTC
So it got me curious, and I did a quick-and-dirty test implementation of `scalar > file()` and `fstream() << scalar`. But I get the "useless use of ... in void context" warnings. So my tangential question: Is there a way to "export" the `no warnings 'void'` from inside the streaming package, rather than requiring it in ::main? It would be best if it could just turn off the warnings for the streaming objects, but leave the warnings on for non-overloaded uses of comparison and bitshift. I tried putting the no-warnings inside the overloaded functions, to try to keep the scope limited, but that's not the right place to prevent the warning. (Yes, I understand this isn't necessarily good practice, or "nice" to the external user. This is just for my own curiosity, and not something I'd put in practical code.) <Reveal this spoiler or all in this thread>	[reply] [d/l] [select]
Re^5: Split tab-separated file into separate files, based on column name (tangent = open on demand => stream-like) by LanX (Saint) on Aug 26, 2020 at 18:06 UTC
That's another good example why overloading is doomed to fail if the operator isn't semantically compatible. Regarding your question: Either you can try to manipulate the `__WARN__` handler in `%SIG` Or you can try exporting warnings inside import() like demonstrated in Modern::Perl (And I agree about the productive code part. ;) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} Updates Rephrased and linked	[reply] [d/l]
Re^6: Split tab-separated file into separate files, based on column name (tangent = open on demand => stream-like) by Eily (Monsignor) on Aug 27, 2020 at 08:34 UTC
Re^7: Split tab-separated file into separate files, based on column name ( Operator overloading ) by LanX (Saint) on Aug 27, 2020 at 10:38 UTC
Some notes below your chosen depth have not been shown here

Updates