seaver has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

Im pretty sure this is obvious to some, but i cant see anything right now on the topic, and home beckons so here I write

@group1 = ('A','B','C','D'); @group2 = ('E','F','G','H'); $string='AHIGURBOSUOGUBSREGPIUABGAPPRSIUBGSR'; $string =~ /([@group1]{5}[@group2]{2})/; print "$1\n";
basically, im looking for a pattern where the first five letters are any letters in @group1, and the next two letters are any letters in @group2.

I know how to do this by just typing out all the requisite letters between '|' symbols, but when handling long patterns in even longer sequences, id like to be able to call @group1 or @group2 at any time.

Cheers
Sam

Replies are listed 'Best First'.
Re: Using array slices during string matching
by davido (Cardinal) on Oct 08, 2003 at 03:30 UTC
    Your question is a little vague. In the subject you ask about array slices, and in the example code and text of the question there is no mention of array slices. But I think I understand where you're going. Here are some illustrations. Forgive me for having taken liberties with your code. I did so with the objective of isolating the issues at hand.

    First, interpolating an array into a character class in a regular expression.... You might think it to be as simple as /[@array]{n}/. But you would be wrong. Consider the following snippet:

    use strict; use warnings; my @group1 = ('A','B','C','D'); my @group2 = ('E','F','G','H'); my $string="DEAB CDGHEF"; print "Match: $1\n" while $string =~ /([@group1]{4})/g;
    Why on earth are you getting output of "AB C"? Well, remember that arrays interpolated into strings get 'list separation'. So you've interpolated the values of @group1 AND the space character (a few times) into the character class. The space gets in there as the record separator. That's easy to fix:
    local $" = '';
    placed before the regexp match will do it, but don't forget that the list separator has now been set to an empty string, and will remain so until "local" falls out of scope or you change it back. I used local in this case so that at very worst $" would revert back to its old self when its changed definition falls out of scope.

    Since it's kinda ugly goofing around with $" just so that your regexp will work the way you want it to (will you really remember 6 months down the road that $" makes it so that your character class won't match space characters?) you should probably instead pre-concatenate the list contained in @group1 like this:

    my @group1 = ('A','B','C','D'); my $char_class1 = join "", @group1; my $string="DEABCDGHEF"; print "Match: $1\n" while $string =~ /([$char_class]{4})/g;
    Ok, that's taken care of. But what if you are dead set on interpolating an array into a regexp character class, and more specifically, you really want to interpolate an array slice (as the question asked)? Brace yourself, you need a really ugly construct known as ...... er... I can't remember what it's called, which is all for the best because you should forget you saw it. Here's how to use it inside of a character class. You're not going to like it:
    use strict; use warnings; my @group1 = ('A','B','C','D'); my @group2 = ('E','F','G','H'); my $string="DEABCDGHEF"; local $" = ''; print "Match: $1\n" while $string =~ /([@{[@group1[0..2],@group2[2,3]]}]{5})/g;
    If the sheer ugliness and unmaintainability of that construct doesn't scare you away from it you should probably keep your day job.

    Here's how to do what you're trying to do without jumping through voodoo hoops:

    use strict; use warnings; my @group1 = ('A','B','C','D'); my @group2 = ('E','F','G','H'); my $string="DEABCDGHEF"; my $group = join '', @group1[0..2], @group2[2,3]; print "Match: $1\n" while $string =~ /([$group]{5})/g;

    Hope this helps! I'll leave it to you to adapt that final solution to your needs. It doesn't match the attempted functionality of your original code snippet, but learn from the technique; it should be the answer to your question, and can easily be applied to the snippet you used to illustrate your need.


    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
Re: Using array slices during string matching
by Roger (Parson) on Oct 07, 2003 at 23:25 UTC
    Perhaps you need to be a bit more specific on the definition of your matching patterns. You said first five letters are any letters in @group1, should it be rephrased like first five letters should be any letters in @group1, but can contain other letters not in @group2? Otherwise I see that your example works well for patterns with first five characters only from @group1 and next two characters only from @group2.

    @group1 = ('A','B','C','D'); @group2 = ('E','F','G','H'); $string='ABDCDEFABDECAFBBCDAGH'; while ($string =~ /([@group1]{5}[@group2]{2})/g) { print "$1\n"; }
    And the output:
    ABDCDEF BBCDAGH
Re: Using array slices during string matching
by Abigail-II (Bishop) on Oct 08, 2003 at 11:16 UTC
    You almost got it. The only problem you have is that the interpolation of the arrays gives you:
    /([A B C D E]{5}[E F G H]{2})/
    so you have this extra space that can match. To fix this, set the variable $" to the empty string:
    $" = "";
    You may want to do this scoped, by using 'local'.

    Abigail

      Guys,

      thank you!

      I admit I got the wrong idea with the 'array slices' term, and simply using a string of letters with no spaces inbetween ie:

      $group1 = 'ABCDE'; $group2 = 'FGHIJ';
      is what i should do.

      Cheers
      Sam