Better method to cut columns from delimited file

stevieb has asked for the wisdom of the Perl Monks concerning the following question:

Helped a fellow Perler with a problem on a lesser Perl site, and am looking for feedback on my code. It's late in the day, and somehow I think this can be written shorter and better, but I can't wrap my head around how at the moment. I'm looking for how to make this code shorter and/or more efficient (examples using modules welcome).

The request was to take data like this:

NAME|AGE|CITY|ZIP
AAA|23|STAT|60001
BBB|34|PPOR|12345
CCC|11|TRET|2345
[download]

...then print out a sub-selection of columns in the same format so the output was like this (from cols NAME & CITY):

NAME|CITY
AAA|STAT
BBB|PPOR
CCC|TRET
[download]

Here is my code that I'm having difficulty thinking about how to make smaller/better (requires perl v5.12+ due to the use of each() on the array):

#!/usr/bin/perl

use warnings;
use strict;

use List::Util qw( first );

my @master_cols = qw( NAME CITY );

my @results;
my %positions;

while ( my $line = <DATA> ){
    
    chomp $line;

    my @row = split /\|/, $line;

    if ( $. == 1 ){
        while ( my ( $i, $elem ) = each @row ){
            if ( first { $elem eq $_ } @master_cols ){
                $positions{ $elem } = $i;
            }
        }
    }

    my @row_items;

    for my $pos ( sort values %positions ){
        push @row_items, $row[ $pos ];
    }
    push @results, [ @row_items ];
}

for my $result ( @results ){
    print "\n";
    print join '|', @{ $result };
}

__DATA__
NAME|AGE|CITY|ZIP
AAA|23|STAT|60001
BBB|34|PPOR|12345
CCC|11|TRET|2345
[download]

ps. I've never used any csv-type modules, but I have heard of their use, so if I've overlooked their use for this, feel free to point one out and be harsh on me ;)

Update: I should have clarified... in the code above, if I opt to select different column headers (eg ZIP, AGE, NAME as opposed to NAME, CITY), the order is preserved regardless of the column name or the order passed in via the @master_cols array. I'm sorry this wasn't clear originally. It doesn't take away from the quality responses so far though.

Comment on Better method to cut columns from delimited file Select or Download Code

Replies are listed 'Best First'.
Re: Better method to cut columns from delimited file by BrowserUk (Patriarch) on May 10, 2012 at 04:47 UTC
You're in one-liner terrritory, but the syntax can be a bit tricky. The quotes and backslash in `-F"\\|"` are both required (at least on a windows shell): `C:\test>copy con junk.csv NAME\|AGE\|CITY\|ZIP AAA\|23\|STAT\|60001 BBB\|34\|PPOR\|12345 CCC\|11\|TRET\|2345 ^Z 1 file(s) copied. C:\test>perl -F"\\|" -anle"print join'\|', @F[ 0,2 ]" junk.csv NAME\|CITY AAA\|STAT BBB\|PPOR CCC\|TRET` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^2: Better method to cut columns from delimited file by stevieb (Canon) on May 10, 2012 at 04:56 UTC
Thank you BrowserUK :) To be honest, I haven't often used slices... if I wanted to change the column name(s) I want selected on the fly, could that be fit into this one-liner equation relatively easily? (I'm well past the OP's problem, I'm curious for myself now)	[reply]
Re^3: Better method to cut columns from delimited file by NetWallah (Canon) on May 10, 2012 at 05:21 UTC
Still room for optimization, but here it is (Linux style quotes used): `perl -F"\\|" -anlE'%x= %x?%x:(map{$_=>$i++}@F);print "$F[ $x{NAME}]\|$F +[$x{ZIP} ]"' junk2.txt` [download] (Unwound the slice, and used Names as indexes) I hope life isn't a big joke, because I don't get it. -SNL	[reply] [d/l]
Re^3: Better method to cut columns from delimited file by BrowserUk (Patriarch) on May 10, 2012 at 05:33 UTC
if I wanted to change the column name(s) I want selected on the fly, could that be fit into this one-liner equation relatively easily? If you mean "change which columns are selected", or their ordering, that's trivial: `C:\test>perl -F"\\|" -anle"print join'\|', @F[ 1,3, 0,2 ]" junk.csv AGE\|ZIP\|NAME\|CITY 23\|60001\|AAA\|STAT 34\|12345\|BBB\|PPOR 11\|2345\|CCC\|TRET` [download] If you want to select by column name, that's a bit harder to do as a one-liner: `C:\test>perl -F"\\|" -anle"$n=0;if($.==1){ %names=map{$_,$n++}@F;next}; + print join'\|', @F[@names{'AGE','NAME'}]" junk.csv 23\|AAA 34\|BBB 11\|CCC` [download] But why would you want to do that as a one liner? (Ie. Why would you use the names rather than the column numbers directly in a one-liner?) If you want to write a script that gets names from the command line and uses that to select columns from a file: `#! perl -slw use strict; my @selection = delete @ARGV[ 1 .. $#ARGV ]; my $n = 0; my @colnames = split /\\|/, scalar <>; my %colnames = map{ $_ => $n++ } @colnames; print join '\|',@colnames[ @colnames{ @selection } ]; while( <> ) { print join '\|', ( split '\\|' )[ @colnames{ @selection } ]; } __END__ C:\test>junk.pl junk.csv AGE NAME AGE\|NAME 23\|AAA 34\|BBB 11\|CCC` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re: Better method to cut columns from delimited file by Marshall (Canon) on May 10, 2012 at 07:48 UTC
You are in "one liner territory", but there is no real reason to do that - quoting rules between O/S's are different - just write a simple program and be done with it: #!/usr/bin/perl -w use strict; while (<DATA>) { chomp; # deletes line ending, # If this is a file: # Windows, Unix, Old Apple # <CR><LF>, <LF>, <CR> doesn't matter at all # Perl chomp() in text mode handles all that. # allows $zip to be used in future code # as index [3] without having to worry about # the line ending after the zip code... # assign name and city to index [0,2] respectively # using list slice # just split() upon the '\|" character. # many dataBsase export formats use '\|' as a field separator. # instead of a "," to make the parsing easier. # "John Smith, Sr.", is harder than FIRST\|LAST\|TITLE #Now to be sure, there are fancy CSV modules that can #handle this complexity, but that does not appear to be #the case required here (commas within quotes can mean #different than the ones outside of quotes). my ($name, $city) = (split /\\|/, $_)[0,2]; print "$name",'\|',$city,"\n"; } =prints NAME\|CITY AAA\|STAT BBB\|PPOR CCC\|TRET =cut __DATA__ NAME\|AGE\|CITY\|ZIP AAA\|23\|STAT\|60001 BBB\|34\|PPOR\|12345 CCC\|11\|TRET\|2345 [download]	[reply] [d/l]
Re: Better method to cut columns from delimited file by roboticus (Chancellor) on May 10, 2012 at 10:39 UTC
stevieb: Yet another version, slightly whackier: $ cat plonk.pl #!/usr/bin/perl use strict; use warnings; my @wanted = qw(CITY NAME); # get map of column headings (COLNAME=>INDEX) my %headings = do { my $t = <DATA>; chomp $t; my $cnt=0; map { $_=>$cnt++ } split /\\|/, $t; }; while (<DATA>) { chomp; my @cols = split /\\|/; # A hash slice to get the column numbers we want, and # an array slice to extract those columns from the # list.... print join(", ",@cols[@headings{@wanted}]),"\n"; } __DATA__ NAME\|AGE\|CITY\|ZIP AAA\|23\|STAT\|60001 BBB\|34\|PPOR\|12345 CCC\|11\|TRET\|2345 $ perl plonk.pl STAT, AAA PPOR, BBB TRET, CCC [download] Update: Improved the (pre-existing) comments a little. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re: Better method to cut columns from delimited file by Tux (Canon) on May 10, 2012 at 13:37 UTC
Spreadsheet::Read comes with xlscat, which can filter rows and/or columns from CSV, XLS and other spreadsheet like formats: `$ cat test.csv NAME\|AGE\|CITY\|ZIP AAA\|23\|STAT\|60001 BBB\|34\|PPOR\|12345 CCC\|11\|TRET\|2345 $ xlscat --in-sep-char=\\| -C1,3 test.csv NAME\|CITY AAA\|STAT BBB\|PPOR CCC\|TRET $` [download] For CSV-like formatted files, it uses Text::CSV_XS. Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: Better method to cut columns from delimited file by stevieb (Canon) on May 10, 2012 at 08:18 UTC
Thank you all for the extremely clear, concise and detailed responses. I even further appreciate the quick adaption to my clarification. I have much to ponder tomorrow. Cheers! Steve	[reply]