nurulnad has asked for the wisdom of the Perl Monks concerning the following question:

First of all, I'm not even sure split() is the best way to do this. I just started learning Perl two weeks ago, so I apologize if I seem silly. I have data that look like this:
1xny_01 PROPIONYL-COA CARBOXYLASE COMPLEX B -0.8192 A A A 1xqd_00 CYTOCHROME P450 55A1 -46.5601 A B A
What I need to do is read each line, compare the last three characters (e.g. A A A) and output the line if they are the same. I figured I'd store each of these as separate entities so I used split () in this way:
while ($line = <FILE>) { chomp; ($a, $b, $c, $d, $e, $f, $g, $h) = split (/\s+/,$line);
and I figured I could do something like if $f=$g=$h, then output the line that returns that as true. but the problem is PROPIONYL-COA CARBOXYLASE COMPLEX B is stored as 4 characters while CYTOCHROME P450 55A1 is stored as 2 charaters. How can I set them to be read as a single string? or if split is not the best way, can you suggest any other way to do this?

EDIT: Thank you for the replies, everyone! What I did was simply this:

while ($line = <FILE_NEW>) { chomp; ($a, $b, $c) = (split /\s+/,$line) [ -1,-2,-3 ] ; }

Replies are listed 'Best First'.
Re: ignore some delimiters while using split
by JavaFan (Canon) on Aug 13, 2010 at 00:49 UTC
    You can index lists from the end. Untested:
    my ($str1, $str2, $str3) = (split ' ', $line)[-3, -2, -1]; print if $str1 eq $str2 && $str2 eq $str3;
    Or as a one-liner (also untested):
    perl -ane 'print if $F[-3] eq $F[-2] && $F[-2] eq $F[-1]' data-file
Re: ignore some delimiters while using split
by BrowserUk (Patriarch) on Aug 13, 2010 at 00:50 UTC

    It's easier to just extract the bits you need:

    while( <FILE> ) { my( $x, $y, $z ) = m[(\w)\s(\w)\s(\w)\s*$]; print if $x eq $y and $y eq $z; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      print if (/([a-z])\s+\1\s+\1\s*\z/);

      will let you do your checks within the regex.

        And
            print if m{ (whatever) (?: \s+ \1){$n} \s* \Z }xms;
        generalizes to any n.

        >perl -wMstrict -le "my @lines = ( 'foo A A A', 'foo bar A B A', 'foo bar baz A A A' ); my $n = 3; $n -= 1; for my $line (@lines) { print qq{'$line'} if $line =~ m{ \b ([[:alpha:]]) (?: \s+ \1){$n} \s* \Z }xms; } " 'foo A A A' 'foo bar baz A A A'
Re: ignore some delimiters while using split
by roboticus (Chancellor) on Aug 13, 2010 at 01:58 UTC

    nurulnad:

    If the input file is fixed format, as the two sample lines indicate, you can even extract the fields using unpack:

    my ($f1, $name, $num, $f, $g, $h) = unpack "A9A30A9A2A2A2", $_;

    or substr:

    my $f1=substr $_,0, 8; my $name=substr $_, 8, 32; ...etc...

    ...roboticus

Re: ignore some delimiters while using split
by shawnhcorey (Friar) on Aug 13, 2010 at 01:52 UTC
    <p.It looks like your data is column oriented. If it is, the substr() would be a better choice.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # Make Data::Dumper pretty $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 1; # Set maximum depth for Data::Dumper, zero means unlimited local $Data::Dumper::Maxdepth = 0; my @data; while( <DATA> ){ print; chomp; @data = (); $data[0] = substr( $_, 0, 7 ); $data[1] = substr( $_, 9, 35 ); $data[2] = substr( $_, 45, 8 ); $data[3] = substr( $_, 54, 1 ); $data[4] = substr( $_, 56, 1 ); $data[5] = substr( $_, 58, 1 ); print '@data ', Dumper \@data; } __DATA__ 1xny_01 PROPIONYL-COA CARBOXYLASE COMPLEX B -0.8192 A A A 1xqd_00 CYTOCHROME P450 55A1 -46.5601 A B A
Re: ignore some delimiters while using split
by AnomalousMonk (Archbishop) on Aug 13, 2010 at 10:33 UTC

    The idiomatic Perl way to implement "the last n elements of an array" would be code along the lines of
        my @last_n = @array[-n .. -1];
    or
        for my $i (-n .. -1) { func($array[$i]) }

Re: ignore some delimiters while using split
by perlpie (Beadle) on Aug 14, 2010 at 21:59 UTC
    You could do this at the prompt with something like
    perl -ne 'print if / (\S) \1 \1$/' data.txt
    assuming your file is named data.txt. If you want to save that to a second file then
    perl -ne 'print if / (\S) \1 \1$/' data.txt > second.file

    The -ne flags are very handy at the command line. The -e flag means "evaluate this code". The -n flag doesn't have a convenient mnemonic but wraps a loop around the code for every line of input.

    For example...

    perl -e 'print "hello world\n"'

    ... is a nice canonical example. And...

    perl -ne 'print "hello world. Look at this: $_"' data.txt

    ...will prepend text to every line of data.txt and print the result.

    more docs for command line options at perlrun

    more docs for regular expressions at perlre