Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi, i need help with this one. how to match two last words within tab separated file.

so the line from file is:

12\tWS:00001\tAny number of (\t.+)\tword_1\tword_2\n
i have been try to crack this one but i couldn't

the result i'm trying to get is word_1 and word_2 .

thank you

Replies are listed 'Best First'.
Re: regex question
by moritz (Cardinal) on Jul 28, 2009 at 14:38 UTC
    Use Text::CSV and set the delimiter to "\t", then extract the last two fields. Or split on the tab delimiter, and use the last two fields.

    If you really want a regex you can say /.*\t([^\t]*)\t([^\t]*)$/ and have the result in $1 and $2

Re: regex question
by ELISHEVA (Prior) on Jul 28, 2009 at 17:32 UTC

    Or you could use split + negative indices

    my @aFlds = split(/\t/, $line); my $last_field = $aFlds[-1]; my $next_to_last_field = $aFlds[-2];

    If you also want to remove the final two fields after you have assigned them to variables, you can also use split + pop

    #Note: if $line='A\tB\tC\t\D\tE', then # @aFlds = ('A','B','C','D','E') before popping # and ('A','B','C') after popping my @aFlds = split(/\t/, $line); my $last_field = pop @aFlds; my $next_to_last_field = pop @aFlds;

    Best, beth

Re: regex question
by kennethk (Abbot) on Jul 28, 2009 at 14:41 UTC
    I'm assuming you are going to read in the file and chomp the result, thus removing the newline from the end. You can also do similar things if you slurp the file, etc., but obviously the regex changes. Info can be found in perlretut. For this case, you can capture the last two entries in a tab-delimited record by anchoring to the end of the string ($) combined with using [^] to specify a character class (see Simple word matching), a la:

    #!/usr/bin/perl use strict; use warnings; my $string = "12\tWS:00001\tAny number of (\t.+)\tword_1\tword_2"; print join("\n", $string =~ /\t([^\t]*)\t([^\t]*)$/);
Re: regex question
by locked_user sundialsvc4 (Abbot) on Jul 29, 2009 at 02:59 UTC

    This, to me, is definitely one of those “painful Perl lessons” that this language so-often gently teaches to all of us:   often, “jumping to an algorithm” equals “jumping to a conclusion” equals “making extra work for yourself!”

    Let's face it:   we're taught to figure things out on our own. (In a University setting, to do anything else is called “cheating.”) So, as a result, we tend to re-invent the wheel. But CPAN gently and repeatedly teaches us to do otherwise.

    CPAN serves to remind us all that, no matter what it is that we need to do, we do not all have to independently (re-)arrive at “how” to do it.

    In short... we actually do not have to “figure out how to solve” a problem, in order to successfully solve it.

    The practical implications of this are, actually, rather profound. Instead of “instinctively and reflexively fast-forwarding” from “the problem that now faces us” to “our first notion of how to solve it,” it behooves us to do something that we are not accustomed to do:   to regard the problem at hand, not as the inspiration for yet-another custom algorithm, but rather as a problem that has undoubtedly been solved a hundred times before. Our true objective, then, is just to find it.

      ... but rather as a problem that has undoubtedly been solved a hundred times before. Our true objective, then, is just to find it.

      CPAN often has more than one solution, and the true problem then is to find the solution or implementation that fits best. Often, this means finding the fastest solution and / or the solution that can cope with huge ammounts of data, and / or simply the one that has the least number of critical bugs. Another restriction may be the lack of a C compiler, so XS-based modules can not be used unless someone else has already compiled them. (A typical problem with ActivePerl on Win32.)

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: regex question
by ww (Archbishop) on Jul 28, 2009 at 20:20 UTC

    TIMTOWTDI: reverse the string and capture (using lookahead -- see perlretut) the word before each tab.