in reply to Re: Regular Expression - pattern matching
in thread Regular Expression - pattern matching

Hi Darren,

I need to get the preferred patterns from the  @str array.As i don't have a any general/common rule to due to different patterns,i guess the following assumption can yield the good results.


My assumptions are,

#. check the number of characters before the first "_" (underscore).
#. If it's return only two characters, then get the string untill it reach the second "_"(underscore) from the string else if more than 2 characters then it must be a targetted pattern.
#. I guess i can't able to use the "\w" (word) pattern, because it will take the whole string as a word.

Thank you in advance.
-kulls

Replies are listed 'Best First'.
Re^3: Regular Expression - pattern matching
by McDarren (Abbot) on Feb 22, 2006 at 10:21 UTC
    #. check the number of characters before the first "_" (underscore). #. If it's return only two characters, then get the string untill it reach the second "_"(underscore)
    Okay, well because you are looking for anything that is not an underscore, then a negated character class is probably the way to go. Something like this:
    if (/^([^_]{2}_[^_]+)_/) { print $1; }
    else if more than 2 characters then it must be a targetted pattern
    Sorry, but I don't get that. What is a "targetted pattern"?

    I guess i can't able to use the "\w" (word) pattern, because it will take the whole string as a word.
    Yes and no. It's okay, because you are limiting it to only the first two characters with {2}. But it's probably not okay because \w will also match an underscore. So \w{2} would match something like "_a", which I'm pretty sure you don't want.

    Cheers,
    Darren :)

      Hi Darren,
      I'm looking for the pattern like "xxxxx" if it's "xxxx_sssss" and also if it's "xx_ssss" then i need "xx_ssss" .
      I have reuse your code
      if (/^([^_]{2}_[^_]+)_/) { print $1; }
      and it looks like
      for(@str) { my $temp; if (/^([^_]{2}_[^_]+)_/) { $temp=$1; print $temp."\n"; } else { ($temp)=$_=~/^([a-z0-9]+)\_/xi; print $temp."\n"; } }

      Unfortunately i didn't get the exact pattern which i showed in my output earlier.
      Can you please correct me if i'm wrong ??
      Thank you in advance.
      -kulls
        Okay, after scratching my head for about 20 minutes, I think I've worked out what you want.

        • If the first two characters are not underscores, but the third character is - then keep everything up to (but not including) the second underscore. Otherwise...
        • Keep everything up to (but not including) the first underscore.
        Is that correct?
        If yes, then the following should do it:
        for (@strings) { my $temp = ""; if (($temp) = $_ =~ m/^([^_]{2}_[^_]+)_?/) { print $temp; } elsif (($temp) = $_ =~ m/^([^_]+)/) { print $temp; } }
        Update: Oh, one thing I forgot to point out, was that there was an error in the code in my earlier reply. I hadn't accounted for the fact that there may not be any underscores at all. That's been fixed in the above snippet.