Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Extracting selected fields form file record

by Anonymous Monk
on Feb 06, 2022 at 12:42 UTC ( #11141167=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello, is there a way in Perl I can extract with a one-line instruction some selected fields from a file record with arbitrary space separated fields (spaces before and after, e.g. " a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 \n"), like so?
(a,b,c,d,e,f) = [line.split()[i] for i in (0,1,3,5,7,9)]
Thank you

Replies are listed 'Best First'.
Re: Extracting selected fields form file record
by soonix (Canon) on Feb 06, 2022 at 13:04 UTC
      another way is filling in undef for the ignored fields

      my ($first,$second,undef,$third,undef,$fourth,undef,$fifth,undef,$sixth) = split / /, $line;

      please also note that split operates on regexes, so /\s+/ might be what is really wanted.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        A single space character (' ') is a special case. I admit that it is a bit buried in the split documentation. Of course, only at most OP knows whether leading whitespace should be kept…
Re: Extracting selected fields form file record
by kcott (Archbishop) on Feb 06, 2022 at 18:04 UTC
    "(a,b,c,d,e,f) = [line.split()[i] for i in (0,1,3,5,7,9)]"
    $ perl -Mstrict -Mwarnings -e '(a,b,c,d,e,f) = [line.split()[i] for i +in (0,1,3,5,7,9)]' Bareword "line" not allowed while "strict subs" in use at -e line 1. syntax error at -e line 1, near ")[" Execution of -e aborted due to compilation errors.

    So, step one would be to learn Perl. See "Perl introduction for beginners".

    "... extract with a one-line instruction ..."

    Ask yourself why you think this requirement is necessary. It rarely has any benefits. It will often reduce readability and, as such, make your code more error-prone.

    "... extract ... from a file record ..."

    For records with fixed-width records, use unpack". See the perlpacktut tutorial; the "Packing Text" section has an example showing exactly how to do this.

    For records with variable-width records, use split. Do be aware of these differences (the linked documentation has details):

    $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split; say "|$x|@y|";' |a0|a1 a2| $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split " "; say "|$x|@y|" +;' |a0|a1 a2| $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split / /; say "|$x|@y|" +;' ||a0 a1 a2 | $ perl -E '$_ = " a0 a1 a2 \n"; my ($x, @y) = split /\s+/; say "|$x|@y +|";' ||a0 a1 a2|
    "... extract ... some selected fields ..."

    There are a variety of ways to achieve this. The best one to choose will probably depend on how you want to subsequently process the selected fields. Here are a couple of examples:

    my @wanted = (extraction_function($string))[0, 1, 3]; my ($f1, $f2, undef, $f3) = extraction_function($string);

    — Ken

Re: Extracting selected fields form file record (updated)
by LanX (Sage) on Feb 06, 2022 at 18:56 UTC
    ehm ...

    > > spaces before and after

    the spaces before are IMHO best dealt by stripping them before splitting.

    $line =~ s/^\s+//;

    Even your pseudo python code can't do this in a one-liner with split (IMHO).

    But more importantly your definition of "field" is fuzzy now.

    Please clarify

    • how do you allow empty fields?
    • are all whitespace characters as separator allowed (like tab...)?
    • are multiple whitespace characters as separator allowed?
    update

    provided there are no "empty fields" and "multiple whitespaces" are allowed as separators:

    You can use a regex like /(\S+)/g ( \S is non-whitespace, the opposite of \s)

    Debugger demo:

    DB<35> $line = " a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 \n" DB<36> x ($line =~ /(\S+)/g)[0,1,3,5,7,9] 0 'a0' 1 'a1' 2 'a3' 3 'a5' 4 'a7' 5 'a9' DB<37>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    ) or by using the magic of ' ' soonix showed us here.

      We can omit parentheses, i.e. capture group number 1 isn't necessary.
      my( $first, $second, $third, $fourth, $fifth, $sixth ) = ( $line =~ m/ +\S+/g )[ 0, grep { $_ % 2 == 1 } 1 .. 9 ];
      Thanks everyone. Using ' ' is working as I need on records from an ASCII file.
      And I'm going to load the records with File::Slurp, so I don't care of chomp:
      use File::Slurp qw(read_file); my @lines = read_file('/path/file',chomp=>1);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11141167]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2022-09-27 23:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (124 votes). Check out past polls.

    Notices?