stevieb has asked for the wisdom of the Perl Monks concerning the following question:

Helped a fellow Perler with a problem on a lesser Perl site, and am looking for feedback on my code. It's late in the day, and somehow I think this can be written shorter and better, but I can't wrap my head around how at the moment. I'm looking for how to make this code shorter and/or more efficient (examples using modules welcome).

The request was to take data like this:

NAME|AGE|CITY|ZIP AAA|23|STAT|60001 BBB|34|PPOR|12345 CCC|11|TRET|2345

...then print out a sub-selection of columns in the same format so the output was like this (from cols NAME & CITY):

NAME|CITY AAA|STAT BBB|PPOR CCC|TRET

Here is my code that I'm having difficulty thinking about how to make smaller/better (requires perl v5.12+ due to the use of each() on the array):

#!/usr/bin/perl use warnings; use strict; use List::Util qw( first ); my @master_cols = qw( NAME CITY ); my @results; my %positions; while ( my $line = <DATA> ){ chomp $line; my @row = split /\|/, $line; if ( $. == 1 ){ while ( my ( $i, $elem ) = each @row ){ if ( first { $elem eq $_ } @master_cols ){ $positions{ $elem } = $i; } } } my @row_items; for my $pos ( sort values %positions ){ push @row_items, $row[ $pos ]; } push @results, [ @row_items ]; } for my $result ( @results ){ print "\n"; print join '|', @{ $result }; } __DATA__ NAME|AGE|CITY|ZIP AAA|23|STAT|60001 BBB|34|PPOR|12345 CCC|11|TRET|2345

ps. I've never used any csv-type modules, but I have heard of their use, so if I've overlooked their use for this, feel free to point one out and be harsh on me ;)

Update: I should have clarified... in the code above, if I opt to select different column headers (eg ZIP, AGE, NAME as opposed to NAME, CITY), the order is preserved regardless of the column name or the order passed in via the @master_cols array. I'm sorry this wasn't clear originally. It doesn't take away from the quality responses so far though.

Replies are listed 'Best First'.
Re: Better method to cut columns from delimited file
by BrowserUk (Patriarch) on May 10, 2012 at 04:47 UTC

    You're in one-liner terrritory, but the syntax can be a bit tricky. The quotes and backslash in -F"\|" are both required (at least on a windows shell):

    C:\test>copy con junk.csv NAME|AGE|CITY|ZIP AAA|23|STAT|60001 BBB|34|PPOR|12345 CCC|11|TRET|2345 ^Z 1 file(s) copied. C:\test>perl -F"\|" -anle"print join'|', @F[ 0,2 ]" junk.csv NAME|CITY AAA|STAT BBB|PPOR CCC|TRET

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Thank you BrowserUK :)

      To be honest, I haven't often used slices... if I wanted to change the column name(s) I want selected on the fly, could that be fit into this one-liner equation relatively easily? (I'm well past the OP's problem, I'm curious for myself now)

        Still room for optimization, but here it is (Linux style quotes used):
        perl -F"\|" -anlE'%x= %x?%x:(map{$_=>$i++}@F);print "$F[ $x{NAME}]|$F +[$x{ZIP} ]"' junk2.txt
        (Unwound the slice, and used Names as indexes)

                     I hope life isn't a big joke, because I don't get it.
                           -SNL

        if I wanted to change the column name(s) I want selected on the fly, could that be fit into this one-liner equation relatively easily?

        If you mean "change which columns are selected", or their ordering, that's trivial:

        C:\test>perl -F"\|" -anle"print join'|', @F[ 1,3, 0,2 ]" junk.csv AGE|ZIP|NAME|CITY 23|60001|AAA|STAT 34|12345|BBB|PPOR 11|2345|CCC|TRET

        If you want to select by column name, that's a bit harder to do as a one-liner:

        C:\test>perl -F"\|" -anle"$n=0;if($.==1){ %names=map{$_,$n++}@F;next}; + print join'|', @F[@names{'AGE','NAME'}]" junk.csv 23|AAA 34|BBB 11|CCC

        But why would you want to do that as a one liner? (Ie. Why would you use the names rather than the column numbers directly in a one-liner?)

        If you want to write a script that gets names from the command line and uses that to select columns from a file:

        #! perl -slw use strict; my @selection = delete @ARGV[ 1 .. $#ARGV ]; my $n = 0; my @colnames = split /\|/, scalar <>; my %colnames = map{ $_ => $n++ } @colnames; print join '|',@colnames[ @colnames{ @selection } ]; while( <> ) { print join '|', ( split '\|' )[ @colnames{ @selection } ]; } __END__ C:\test>junk.pl junk.csv AGE NAME AGE|NAME 23|AAA 34|BBB 11|CCC

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Better method to cut columns from delimited file
by Marshall (Canon) on May 10, 2012 at 07:48 UTC
    You are in "one liner territory", but there is no real reason to do that - quoting rules between O/S's are different - just write a simple program and be done with it:
    #!/usr/bin/perl -w use strict; while (<DATA>) { chomp; # deletes line ending, # If this is a file: # Windows, Unix, Old Apple # <CR><LF>, <LF>, <CR> doesn't matter at all # Perl chomp() in text mode handles all that. # allows $zip to be used in future code # as index [3] without having to worry about # the line ending after the zip code... # assign name and city to index [0,2] respectively # using list slice # just split() upon the '|" character. # many dataBsase export formats use '|' as a field separator. # instead of a "," to make the parsing easier. # "John Smith, Sr.", is harder than FIRST|LAST|TITLE #Now to be sure, there are fancy CSV modules that can #handle this complexity, but that does not appear to be #the case required here (commas within quotes can mean #different than the ones outside of quotes). my ($name, $city) = (split /\|/, $_)[0,2]; print "$name",'|',$city,"\n"; } =prints NAME|CITY AAA|STAT BBB|PPOR CCC|TRET =cut __DATA__ NAME|AGE|CITY|ZIP AAA|23|STAT|60001 BBB|34|PPOR|12345 CCC|11|TRET|2345
Re: Better method to cut columns from delimited file
by roboticus (Chancellor) on May 10, 2012 at 10:39 UTC

    stevieb:

    Yet another version, slightly whackier:

    $ cat plonk.pl #!/usr/bin/perl use strict; use warnings; my @wanted = qw(CITY NAME); # get map of column headings (COLNAME=>INDEX) my %headings = do { my $t = <DATA>; chomp $t; my $cnt=0; map { $_=>$cnt++ } split /\|/, $t; }; while (<DATA>) { chomp; my @cols = split /\|/; # A hash slice to get the column numbers we want, and # an array slice to extract those columns from the # list.... print join(", ",@cols[@headings{@wanted}]),"\n"; } __DATA__ NAME|AGE|CITY|ZIP AAA|23|STAT|60001 BBB|34|PPOR|12345 CCC|11|TRET|2345 $ perl plonk.pl STAT, AAA PPOR, BBB TRET, CCC

    Update: Improved the (pre-existing) comments a little.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Better method to cut columns from delimited file
by Tux (Canon) on May 10, 2012 at 13:37 UTC

    Spreadsheet::Read comes with xlscat, which can filter rows and/or columns from CSV, XLS and other spreadsheet like formats:

    $ cat test.csv NAME|AGE|CITY|ZIP AAA|23|STAT|60001 BBB|34|PPOR|12345 CCC|11|TRET|2345 $ xlscat --in-sep-char=\| -C1,3 test.csv NAME|CITY AAA|STAT BBB|PPOR CCC|TRET $

    For CSV-like formatted files, it uses Text::CSV_XS.


    Enjoy, Have FUN! H.Merijn
Re: Better method to cut columns from delimited file
by stevieb (Canon) on May 10, 2012 at 08:18 UTC

    Thank you all for the extremely clear, concise and detailed responses. I even further appreciate the quick adaption to my clarification.

    I have much to ponder tomorrow.

    Cheers!

    Steve