Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi, I'm trying to parse a log file, I got this code:
while(<DATA>) { /(\w+)/g print "$1,$2,$3"; } __DATA__ word1 word2 word3 .. .. ..
but this is printing:
word1,, ...
what I'm doing wrong here ?

Replies are listed 'Best First'.
Re: global regex
by si_lence (Deacon) on Jul 08, 2009 at 13:12 UTC
    your code matches the first 'word' and prints it, then goes to the next line. One way of matching all words on a line is
    while(<DATA>) { while ( /(\w+)/g ) { print "$1 "; } }
    cheers

    si_lence

Re: global regex
by Transient (Hermit) on Jul 08, 2009 at 13:16 UTC
    try
    while( <DATA> ) { my @matches = /(\w+)/g; print join( ',', @matches ), "\n"; }
    or
    while ( <DATA> ) { while ( /(\w+)/g ) { print $1, "\n"; } }
    also, you're missing a semicolon after the regexp, but I assume that you just typed that out, and didn't cut and paste =)
Re: global regex
by psini (Deacon) on Jul 08, 2009 at 13:15 UTC

    or use match in array context:

    while(<DATA>) { my @a=/(\w+)/g; print "@a\n"; } __DATA__ word1 word2 word3 .. .. ..

    updates: Typo corrected: s/scalar/array/ thanks to AnomalousMonk

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: global regex
by jethro (Monsignor) on Jul 08, 2009 at 13:26 UTC

    The first parentheses is always stored only in $1 with or without the g switch. You can either use

    while(<DATA>) { while (1) { last if not /(\w+)/g; print $1,"\n"; } }

    or

    while(<DATA>) { my @ar= /(\w+)/g; print @ar,"\n"; }

    As you can see, the g switch works differently in scalar and array context. Read perlre to get the details. And you might add "use warnings;" to your scripts

Re: global regex
by ww (Archbishop) on Jul 08, 2009 at 13:27 UTC
    Several things.
    1. $2 and $3 are not populated by simply using /g
    2. You're not testing for matches
    3. Your __DATA__ section reflects a "log file" which has precisely 3 words per line. Note what happens to line 3 of my __DATA__.
    #!/usr/bin/perl use strict; use warnings; # 778252 while(<DATA>) { if ( $_ =~ /(\w+\s*)(\w+\s*)(\w+\s*)/ ) { print "$1 $2 $3"; } } __DATA__ word1 word2 word3 word4 word5 word6 7word 12345 8word *@- 9word

    Output:
    word1 word2 word3
    word4 word5 word6
    7word 12345 8word

Re: global regex
by Anonymous Monk on Jul 08, 2009 at 13:37 UTC
    but the /g shouldn't make it match the whole string instead stopping after find the first ?

      Depends on context, but yes in this case (assuming your code is simply missing a ";").

      It, however, does not affect what gets put in $1, $2, etc. The earliest set of parens goes in $1, the next earliest set of parens goes in $2, etc. Since you only have one set of parens, only $1 will ever get populated.

Re: global regex
by biohisham (Priest) on Jul 08, 2009 at 14:52 UTC
    ok, everybody hammered the nail on its head, here is a bullet of mercy. You're using a loop, but you've got to realize that when this loop iterates over and over it comes on the pattern and the code you wrote over and again:
    /(\w+)/g; print "$1";
    so, you use only one backreference ($1), that gets populated -from the start-everytime the loop iterates, so what you are doing wrong is assuming that for every file there gotta be separate variables to fill up but the fact is the while(<DATA>){} fetches a new DATA item everytime it goes through and replace it in the $1...hope you got the idea....best of luck
    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind
      I thought I will extend my previous reply with this code to show you that $1 is filled up over and over with new DATA...
      use strict; use warnings; my $match=0; while(<DATA>){ while(my $text = /Name: *(\w+)/g){ ++$match; print "Match no. $match is $1\n"; } } __DATA__ Name: Alpha Name: Beta Name: Gamma Name: Epsilon Name: Delta
      Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind