johny has asked for the wisdom of the Perl Monks concerning the following question:

Hi Can any one give me a code to read the below file and extract only the required columns Content of file
aanis <aanis@xyz.com> (Anis Ahmed A) accessed 2007/10/04 aaputin <aaputin@xyz.com> (Artem Aputin) accessed 2007/10/04 aazarov <alexey.azarov@tlmcom.ru> (Alexey Azarov) accessed 2007/10/04
and i want to extract only 1st 3rd and 5th column.

Replies are listed 'Best First'.
Re: reading from file
by bruceb3 (Pilgrim) on Oct 10, 2007 at 07:29 UTC
    There is not much to say about this.
    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; while (<DATA>) { chomp; /^(.+) (<.+>) (\(.+\)) (.+) (.+)$/; print "$1 $3 $5\n"; } __DATA__ aanis <aanis@xyz.com> (Anis Ahmed A) accessed 2007/10/04 aaputin <aaputin@xyz.com> (Artem Aputin) accessed 2007/10/04 aazarov <alexey.azarov@tlmcom.ru> (Alexey Azarov) accessed 2007/10/04

    And the output is-

    aanis (Anis Ahmed A) 2007/10/04 aaputin (Artem Aputin) 2007/10/04 aazarov (Alexey Azarov) 2007/10/04
      Hi Thanks a lot Bruceb3
      Can you explain the what this line exactly this does. /^(.+) (<.+>) (\(.+\)) (.+) (.+)$/;
      and i'dont want to include braces around the name(I want it to be (aanis Anis Ahmed A 2007/10/04)) and also it should take only if the first column is a word and exclude if it is an digit.
      These should be excluded. 000-01 <000-01@BONNIEB> (000-01) accessed 2007/08/21
      Thanks a lot once again. Regards Johny
        for unix shell:
        perl -lane 's/[)(]//g for @F; print"@F[0,2..$#F-2,$#F]" unless /^\d/' + input
        for win cmd shell:
        perl -lane "s/[)(]//g for @F; print qq{@F[0,2..$#F-2,$#F]} unless /^\ +d/" input
        Regards

        mwa

        When you must use this pattern:

        /^(\w+) <(.+)> (\(.+\)) (.+) (.+)$/;
Re: reading from file
by McDarren (Abbot) on Oct 10, 2007 at 07:36 UTC
    There are lots of way that this could be done.
    Here is a not-very-elegant solution using a regular expression.
    cat users.txt aanis <aanis@xyz.com> (Anis Ahmed A) accessed 2007/10/04 aaputin <aaputin@xyz.com> (Artem Aputin) accessed 2007/10/04 aazarov <alexey.azarov@tlmcom.ru> (Alexey Azarov) accessed 2007/10/04 #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %users; open my $fh, '<', 'users.txt' or die "$!\n"; while (my $line = <$fh>) { chomp($line); my ($user, $fullname, $lastaccess) = ($line =~ m/([a-z]+).*?\((.*? +)\).*?([\d\/]+)$/); $users{$user}{fullname} = $fullname; $users{$user}{lastaccess} = $lastaccess; } print Dumper(\%users);
    Which gives:
    $VAR1 = { 'aanis' => { 'lastaccess' => '2007/10/04', 'fullname' => 'Anis Ahmed A' }, 'aazarov' => { 'lastaccess' => '2007/10/04', 'fullname' => 'Alexey Azarov' }, 'aaputin' => { 'lastaccess' => '2007/10/04', 'fullname' => 'Artem Aputin' } };
    Cheers,
    Darren
Re: reading from file
by atemon (Chaplain) on Oct 10, 2007 at 07:30 UTC

    For each line in file, use regex

    $line =~ $line =~ m{(\w+)\s+<[\w+\@\.]+>\s+(\([\w\s]+\))[\sa-zA-Z]+(\d ++/\d+/\d+)}; print "$1, $2, $3\n";

    --VC

Re: reading from file
by mwah (Hermit) on Oct 10, 2007 at 07:47 UTC
    On a command line, do a
    perl -lane 'print "@F[0,2..$#F-2,$#F]"' input
    if your file is 'input'. If its a windows system, do a
    perl -lane "print qq{@F[0,2..$#F-2,$#F]}" input
    instead.

    Regards

    mwa
      Very nice trick this 2..$#F-2 to deal with the variable number of fields in the input!

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: reading from file
by apl (Monsignor) on Oct 10, 2007 at 09:31 UTC
    It's a shame you didn't show us what you'd tried first.