Spooky has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to look for a pattern that can have at least two capital letters but no more than five followed by five numbers: e.g., AA12345, ABCDE67890. Is it possible to perform this as a one-liner?

Replies are listed 'Best First'.
Re: pattern matching
by FunkyMonk (Bishop) on Apr 10, 2009 at 12:07 UTC
      Curious. You restrict yourself to the Western, unaccented, letters (ASCII), but you allow hundreds of different digits.

      I would have used either

      /[A-Z]{2,5}[0-9]{5}/ # Ascii ranges
      or
      /\p{Lu}{2,5}\p{Nd}{5}/ # Full Unicode set

        Premature generalisation is the root of much evil.

        In practice, since the OP did not mention any character-set complications, it is a reasonable assumption that the task in question does not require matching non-ASCII capital letters or require excluding non-ASCII numerals. So there is no need to worry about Unicode ranges, and distinguishing between \d and [0-9] is splitting hairs.

Re: pattern matching
by Bloodnok (Vicar) on Apr 10, 2009 at 12:14 UTC
    Supplementing FunkyMonks reply, assuming you want a one-liner to scan a file and print any matching line, you want
    perl -ne 'print if /[A-Z]{2,5}\d{5}/' some_file
    In addition to the references provided by FunkyMonk, also look at perlrun.

    A user level that continues to overstate my experience :-))
      ..excellent ..thanks!
Re: pattern matching
by leocharre (Priest) on Apr 10, 2009 at 14:06 UTC

    I often match things like these into text chunks.
    The question and answers here are great to validate or check a value, but what if you're fishing for these out of a text chunk?

    For example.. if the text you are matching into is: $text = 'ABCDE67890';

    Then, yes..

    $text=~/([A-Z]{2,5}\d{5})/ or die; # parenthesis is for "remembering" what we matched, # we can get it with $1 later.. print "got $1";

    However.. if your text chunk is:

    $text='ADABCDE6789023424'; $text=~/([A-Z]{2,5}\d{5})/ or die; print $1;
    # will print 'DE67890' ABCDE67890. Thanks gwadej (see below).

    Then you still match into this. Is this the behaviour you want? I don't know of the context into whuch you are matching.. If it is possible that you want to check the *entire* string as *the* pattern.. You need to do this instead: $text=~/^([A-Z]{2,5}\d{5})$/ or die;
    (Having the ^ means start at beginning, having $ marks the end.)

    If you want to match into a large text chunk such as:

    This YU123456 is a piece of text and the id code would be AG12345 orAH12345

    Then this pattern will match AG12345 but NOT AH12345, and NOT YU12345: /\b[A-Z]{2}\d{5}\b/ And this pattern will match all of them: /[A-Z]{2}\d{5}/

    Just something to keep in mind.

      Although I don't disagree with the overall thrust of your argument, there is a tiny mistake.

      However.. if your text chunk is:
      $text='ADABCDE6789023424'; $text=~/([A-Z]{2,5}\d{5})/ or die; print $1; # will print 'DE67890'

      It actually would print ABCDE67890. Remember the {2,5} matches the longest string it can, unless you make it non-greedy.

      G. Wade