jmaya has asked for the wisdom of the Perl Monks concerning the following question:

I will skip the usual newbie intro.
Basically I am writing a script that takes file names and derives content out of the names.
Example
001WhitePottery.jpg (001) Serial (White) Name of Product (Pottery) Category.
I have the (001) one part fine (\d+) I dont know how to tell the regEx engine to find CAPS and read untill it finds another Cap. This is basically my problem.... Thank you in advance......
John Maya

Replies are listed 'Best First'.
Re: Newbie RegEX question
by suaveant (Parson) on May 13, 2003 at 14:25 UTC
    $file =~ /^(\d+)([A-Z][a-z]+)([A-Z][a-z]+)\.(.*)$/; print "Serial : $1\n"; print "Name : $2\n"; print "Category : $3\n"; print "Extension: $4\n"; #though... you may want to change [a-z] to include any other possible +chars
    Untested, but should work...

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Newbie RegEX question
by jdporter (Paladin) on May 13, 2003 at 14:26 UTC
    Something like this:
    my( $serial, $name, $category ) = $filename =~ /^(\d+)([A-Z][a-z]*)([A-Z][a-z]*)/;
    Or if you want to use the POSIX character classes:
    my( $serial, $name, $category ) = $filename =~ /^(\d+)([[:upper:]][[:lower:]]*)([[:upper:]][[:lower:]]*)/;
    • Updated:
      Normally I don't update my nodes, but Aristotle makes an important point below, and I'd be afraid that someone might cargo-cult my original code, which had [:lower:] instead of [[:lower:]], etc.

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

      Have you ever used them? The character classes you use will any of the characters in :epru and :elorw respectively. POSIX classes are used like this:
      /^(\d+)([[:upper:]][[:lower:]]*)([[:upper:]][[:lower:]]*)/;

      Makeshifts last the longest.

Re: Newbie RegEX question
by broquaint (Abbot) on May 13, 2003 at 14:27 UTC
    Assuming your data is as simple as that provided
    $_ = '001WhitePottery.jpg'; my($serial, $name, $cat) = m< ^ (\d+) ([A-Z][a-z]+) ([A-Z][a-z]+) >x; print "($serial) Serial\n", "($name) Name of Product\n", "($cat) Category\n"; ___output__ (001) Serial (White) Name of Product (Pottery) Category
    See. perlre and perlop for more info.
    HTH

    _________
    broquaint

      broquaint,
      I agree - ".. the data is as simple as that provided". I would only say that it might be worthwhile to create a catch bucket for the purpose of refining the RE over time to catch more and more a-typical cases.
      my @bitbucket; $_ = '001WhitePottery.jpg'; if (m< ^ (\d+) ([A-Z][a-z]+) ([A-Z][a-z]+) >x) { my($serial, $name, $cat) = ($1, $2, $3); print "($serial) Serial\n", "($name) Name of Product\n", "($cat) Category\n"; } else { push @bitbucket , $_; } print "The following files didn't match rule - please fix"; print "$_\n" foreach (@bitbucket);

      I only point this out as the OP claimed newbie status.

      Cheers - L~R