in reply to pattern matching

Hi vineet2004,

One quick way is to use the "?" modifier like so:

while (<>) { # Only require a non-null first capture if (/(\w\w)(\w\w)?(\w\w)?(\w\w)?/) { # Do something } }

In each of the captures, the "?" says that the item is optional.  Therefore, the match will succeed if only the first capture succeeds.

Note that if you use warnings, you will still need to test each of the captures for null when you use them, to avoid getting uninitialized value warnings.

Update:  bart's comment below about the whitespace is a good one.  I think my solution will still work if you modify it slightly:

if (/(\s*\w\w)(\s+\w\w)?(\s+\w\w)?(\s+\w\w)?/) { # Do something }

Come to think of it, though, that just makes the whitespace part of the match, so ++bart, as his solution looks like a better one to do what you need.


s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Replies are listed 'Best First'.
Re^2: pattern matching
by bart (Canon) on Dec 25, 2006 at 18:54 UTC
    I think liverpole is close, but he's ignoring one important factor: the data contains a space between the data the OP wants to capture, and thus, isn't part of it. So just making the patterns do optional matching, won't cut it.

    Here's what I would do:

    if (/(\w\w)(?: (\w\w)(?: (\w\w)(?: (\w\w))?)?)?/) { # Do something }

    I don't like the idea of making the parallel items optional, instead, I nest them, so $3 cannot match if $2 didn't match, as both are part of the same optional pattern. Ditto with $4, that can only match if both $3 and $2 matched.

      thanks guys....
      thanks bart for ur reply ur code is fine....but actually my problem of pattern matching includes 31(or less) instead of the 4(or less) pairs that as i mentioned in my post. i made the change so as to be able to give an example easily. if i follow ur code pattern then it might turn out to be a bit too long......any shorter version??? thanks again for ur help so far vineet
        Hmm I see what you mean, extending the code with more matches soon gets really unwieldy. You need some kind of looping construct.

        Now I wish I could say you could handle this easily with a single pattern, but unfortunately, a repetition modifier around captures doesn't produce the desired results:

        $_ = 'de ad be ef #junk'; /^(\w\w)(?: (\w\w))*/;
        will only retain two captures: in the end, $1 will be 'de', the first capture, and $2 will be 'ef', the last one — the rest will simple have been forgotten about.

        There's no way around it, this requires a two step approach: Step 1) extract the whole of all the captures, Step 2), split it into parts.

        1. The first approach is to use split for step 2:
          $_ = 'de ad be ef #junk'; /^(\w\w(?: \w\w)*)/; my @capture = split ' ', $1;
        2. Use //g, either in a loop, or in list context.
          1. //g in list context:
            $_ = 'de ad be ef #junk'; my @capture = /\G(?:^|\ )(\w\w)/g;
          2. A loop with //g in scalar context:
            $_ = 'de ad be ef #junk'; my @capture; while(/\G(?:^|\ )(\w\w)/g) { push @capture, $1; }

        Be extremely careful with the latter that you don't accidently cause an endless loop. I did, with

        /(?:^|\G\ )(\w\w)/g
        I'm still not sure why.