Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

regex for assignment

by mkmcconn (Chaplain)
on Feb 27, 2001 at 03:21 UTC ( #60974=perlquestion: print w/replies, xml ) Need Help??

mkmcconn has asked for the wisdom of the Perl Monks concerning the following question:

The snippet below gives an example of what I would like to do. My questions follow the snippet.

#!/usr/bin/perl -wl use strict; for (<DATA>){ chomp; my ($base, $end) = $_ =~ /(\w+)\.(tif(?:f)?\b)/gi; next if not defined $base; print "$base \t $end"; } __DATA__ list 1.tif convert 3.tif to baloney.pic path is c:\TIFS\009.tif 10.tif how many "11.tifs" are there? 14.TIFF

Until I came across the idea that regular expressions may be used for assignment to multiple variables as above, I might have tried to concoct split expression to handle the job. This is a new idea to me. Are there potential gotchas or performance penalties, attached to using regular expressions for assignment to variables, as above, instead of split?

split asks me to look for an element I want to discard, but my brain wants 'assignment' to be equal to the elements I'm trying to keep. This is what I'd call a "psychological" advantage: using an idiom that more closely expresses what I want to do. Do you agree?

Replies are listed 'Best First'.
Re: regex for assignment
by merlyn (Sage) on Feb 27, 2001 at 03:26 UTC
    Another way to write this that I find more natural than your "if defined" above is:
    while (<DATA>) { # not "for"... no need to suck all into memory if (my ($base, $ext) = /(\w+)\.(tiff?)\b/) { # we have a good $base and $ext } }
    I'm using a list assignment in a scalar context (the boolean for the if). If the match fails, it's 0 elements, thus false. If the match succeeds, one or more element, thus true.

    -- Randal L. Schwartz, Perl hacker

      merlyn's code doesn't really do the same thing as mkmcconn's code. He checks for a good $base or a good $ext.

      mkmcconn's code is actually looking for a good $base explicitly. Assuming this is the intended behavior, the code should look like this:

      # mkmcconn meaning here: while (<DATA>) { # not "for"... no need to suck all into memory (my ($base, $ext) = /(\w+)\.(tiff?)\b/) if ($base) { # we have a good $base } } # merlyn style here while (<DATA>) { # not "for"... no need to suck all into memory if (my ($base, $ext) = /(\w+)\.(tiff?)\b/) { # we have a good $base or $ext } }

      It is possible to squeeze the check for $base's validity into the same line with the declaration and assignment, but I couldn't think of a way to do it that wasn't ugly. Perhaps someone more skilled than I can think of a way.

      Thanks to mkmcconn for pointing out the cool way to use regexes, and to merlyn for the elegant restatement of the script.

      Update TGI removes foot from mouth and attempts to put it right back in. If I am not completely confused the distinction would need to be made if either parenthetical in the regex could match a null value. Like (\d*). If you do get a null match, are the $\d variable assignments unaffected? How about assigning into a list, are the undefined values dropped? Not that I thought of this before I posted, I didn't get that the match value had to be 0 or 2.

      Update 2 I didn't have time to check the questions I asked yesterday, but I did this morning. Lookout below and you'll get to see what happens with NULL behavior.

      @strings=('aaaaaa','11111aaaaa'); foreach (@strings) { my $ord = (my ($number, $letter) = /(\d*)([a-zA-Z]+)/); print "$_\n"; print "\tORD = $ord\tNUM = $number\tLET = $letter\n"; }


              ORD = 2 NUM =   LET = aaaaaa
              ORD = 2 NUM = 11111     LET = aaaaa

      TGI says moo
        merlyn's code uses the feature of list assignment in scalar context. This returns the number of elements being returned, which, in this case, is either 0 or 2. If the regex matches, then it returns $1 and $2, which means that $base and $ext get their values.

        If the regex matches, then the two variables have values -- one can not be defined if the other isn't. Thus, merlyn's code does the right thing.

        japhy -- Perl and Regex Hacker

        Actually, it's not "or", it's "and". I'm checking for both a good $base and $ext implicitly, or rather, I don't need to check. If the regex matches, both $base and $ext will be set properly. If the regex fails, the match list length is 0, and the if fails.

        -- Randal L. Schwartz, Perl hacker

Re: regex for assignment
by danger (Priest) on Feb 27, 2001 at 04:32 UTC

    I would go with merlyn's while loop, but perhaps do a next unless test on the pattern match, possibly combining assignment if I wanted to use something other than  $1, $2, ...:

    while(<DATA>){ next unless /(\w+)\.(tiff?)\b/i; print "$1 \t $2\n"; } # or this while(<DATA>){ next unless my($base, $ext) = /(\w+)\.(tiff?)\b/i; print "$base \t $ext\n"; }

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://60974]
Approved by root
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (1)
As of 2023-04-02 07:53 GMT
Find Nodes?
    Voting Booth?

    No recent polls found