sas429s has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Can some one help me understand this regex. below I know there are 3 letters followed by 2 numbers and a space? after that I don't know. also what is the join doing here?
} elsif($line =~ /interlock/) { @words = split '[a-zA-Z][a-zA-Z][a-zA-Z]\d\d,\s+\interlock +', $line, 2; $line = join '', @words[0],"$date_intlck_rel_res"; print OUT $line;
Thanks

Replies are listed 'Best First'.
Re: Meaning of the regex?? Help!!
by chakram88 (Pilgrim) on Jan 30, 2008 at 15:19 UTC

    Allow me to introduce you to a handy CPAN module that I discovered by way of another kind monk some time ago (paying it forward if you will, even though I no longer recall who pointed me in this direction)

    YAPE::Regex::Explain -- an handy tool for explaining regular expressions. I use it frequently when I come across a regex that I don't understand. Here's a quick example based on your request:

    #!/usr/bin/perl -wl use YAPE::Regex::Explain; my $regex = qr/[a-zA-Z][a-zA-Z][a-zA-Z]\d\d,\s+\interlock/; print YAPE::Regex::Explain->new($regex)->explain;

    You will see the \interlock yeilds an "Unrecognized escape '\i' passed through" --- which YAPE ignores.

    moritz pointed this out above.

    YAPE will produce the following output. (Notice that the '\i' has been removed from the regex.

    The regular expression: (?-imsx:[a-zA-Z][a-zA-Z][a-zA-Z]\d\d,\s+interlock) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- [a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z' ---------------------------------------------------------------------- [a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z' ---------------------------------------------------------------------- [a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z' ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- , ',' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- interlock 'interlock' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Meaning of the regex?? Help!!
by moritz (Cardinal) on Jan 30, 2008 at 13:56 UTC
    It's "three letters, two digits, followed by a comma, by one or more whitespace characters, and" then... uhm... there is no \i defined for perl regexes. So probably just ".. the word interlock".

    split looks for this pattern, and returns the part left of the first match, and between the first and the second match (the 2 at the end of the line limits it to two items).

    The join concatenates the first part of the previous expression (that is, the part of the string before the first match) with $date_intlck_rel_res.

    (Update: I forgot the digits... johngg++)

Re: Meaning of the regex?? Help!!
by toolic (Bishop) on Jan 30, 2008 at 14:17 UTC
    If you use warnings;, you should get a warning message stating:
    Scalar value @words[0] better written as $words[0]

    So, you should use $words[0].

    You may find it more straightforward to just use the concatenation operator, instead of join:

    $line = $words[0] . $date_intlck_rel_res;
Re: Meaning of the regex?? Help!!
by johngg (Canon) on Jan 30, 2008 at 14:58 UTC
    There's a couple of other things to note.

    • You can use a {n} quantifier to get an exact number of things so the [a-zA-Z][a-zA-Z][a-zA-Z] could be written [a-zA-Z]{3}. (You can also use {m,n} for a range of occurances and {m,} for m or more occurances, but you can't do {,n} for up to n occurances).

    • split is most commonly used with a pattern as it's first argument so your split '...', ... should perhaps be split /.../, .... You can have an expression as the first argument but that is usually for patterns that may change at run time; since your expression is a single-quoted string which will not change you should be using a pattern. (Read the documentation for the special case of split'ing on the empty string ''.

    I hope this is of interest.

    Cheers,

    JohnGG

Re: Meaning of the regex?? Help!!
by planetscape (Chancellor) on Jan 31, 2008 at 05:31 UTC