rmflow has asked for the wisdom of the Perl Monks concerning the following question:

It is possible in perl to do something like this:

/($someRegex)/i;

where $someRegex is a regex and if matched then $1 should contain result with case just like in regex, i.e.:
$_ = "aAbc"; $someRegex = "Aa.*";
after performing

/($someRegex)/i;

I need $1 somehow to be equal "Aabc", not "aAbc".


The objective is to make some rules for formatting of certain texts, for example:
rule: PowerGenerator\d+ text: pOwerGeNERator53

the text should be transformed to PowerGenerator53

other example:
rule: Data\d+Bus_[ABC]\d+ text: DATA5Bus_b3

should be converted to Data5Bus_B3

If the text does not match to rule then no changes should be done.

upd: edited misspoken code
upd2: objective examples added

Replies are listed 'Best First'.
Re: regex capture case
by moritz (Cardinal) on Jun 23, 2009 at 07:09 UTC
    In Perl 6 there's the :samecase regex modifier for a similar purpose (though it applies to substitutions and not captures). I implemented that functionality in Perl6::Str. If you don't want to use the whole module, you can still get some inspiration on how to implement such a function.
Re: regex capture case
by grizzley (Chaplain) on Jun 23, 2009 at 07:15 UTC
    No, it isn't possible. $1 is a part of the text from $_ variable, so what's in $_, will be in $1.
Re: regex capture case
by chomzee (Beadle) on Jun 23, 2009 at 07:22 UTC
    Can't you just modify "aAbc" into "Aabc" afterwards? You could use substr to cut out two first chars and then concatenate "Aa" with result of substr.
      The nature of regex is not known, this is just an example. To do what you say I will need to perform a full parse of regex and it will not be "simple solution". I mean, I will need to distinguish regexes like "Dd" and "D\d" etc...
Re: regex capture case
by Crian (Curate) on Jun 23, 2009 at 10:26 UTC

    I don't know if it is the right solution for your problem, but did you thought about using ucfirst on the result?

      what if
      $_ = "Aabc"; $someRegex = "aA.*";
      then I'll need to uc second char.

      $_ and $someRegex are not known.

        What about

        $someRegex = "(aA|Aa).*";

        ...what effect should that have? Or what about

        $someRegex = "(?!aa)[Aa]+.*"

        Basically, what you're asking for is not feasible in the general case. If you know that $someRegex will always be very simple, it becomes tractable; you can use something like YAPE::Regex to parse the regex and match elements thereof to the extracted string to determine which characters' case needs to change. But for arbitrary regexes, it gets hard quickly.

        Depending on what you actually need this for, you may find it simpler to request both a regex and a case template, or to have your input take the form of a restricted pattern that you yourself can then translate into a regex and a case template.

Re: regex capture case
by Marshall (Canon) on Jun 23, 2009 at 12:32 UTC
    Your match statement: /($regex)/i; means: match any of these: Aa,aA,aa,AA followed by zero or more characters. The /i means case insensitive. Also the "Aa.*" does not "anchor" the expression to the front, "Baababoo" would match also. Anchor the regex with the ^ character. "^Aa.*".

    But it appears that to make this work, you should just delete the "i". I am assuming that you misspoke ($regex) should be ($someRegex). Of course it is possible that I've misunderstood your intent.

    Aa.* means 'A' then 'a' then anything which by definition "anything" is case insensitive.

Re: regex capture case
by dsheroh (Monsignor) on Jun 23, 2009 at 12:20 UTC
    As already noted by grizzley, $1 will contain the text from $_ which matched the regex and the text will be in exactly the same form as it appeared in $_. You will need to first extract the matching text with the regex and then carry out any necessary alterations of the match.

    If you tell us the rules used to determine what alterations need to be made to $1, then we may be able to suggest the most efficient ways of accomplishing that, but the regex itself will not be able to do that for you.

      If you tell us the rules used to determine what alterations need to be made to $1, then we may be able to suggest the most efficient ways of accomplishing that

      The objective is to make some rules for formatting of certain texts, for example:
      rule: PowerGenerator\d+ text: pOwerGeNERator53

      the text should be transformed to PowerGenerator53

      other example:
      rule: Data\d+Bus_[ABC]\d+ text: DATA5Bus_b3

      should be converted to Data5Bus_B3

      If the text does not match to rule then no changes should be done.

        Given your rules, this code seems to do what you want but it uses a string eval, the use of which should be treated with caution.

        use strict; use warnings; my @phrases = ( q{Supply from pOwerGeNERator53 today.}, q{DATA5Bus_C3 routed via PoweRgeNerator71 to data17buS_a3}, q{The newPowerGenErATor6 will not change}, ); my %rules = ( q{(?i)\bpowergenerator(\d+)\b} => q{qq{PowerGenerator$1}}, q{(?i)\bdata(\d+)bus_([ABC])(\d+)\b} => q{qq{@{ [ qq{Data$1Bus_} . uc $2 . $3 ] }}}, ); foreach my $phrase ( @phrases ) { print qq{Original: $phrase\n}; my $newPhrase = $phrase; foreach my $rule ( keys %rules ) { $newPhrase =~ s{$rule}{ eval $rules{ $rule } }eg; } print qq{ Amended: $newPhrase\n\n}; }

        The output.

        Original: Supply from pOwerGeNERator53 today. Amended: Supply from PowerGenerator53 today. Original: DATA5Bus_C3 routed via PoweRgeNerator71 to data17buS_a3 Amended: Data5Bus_C3 routed via PowerGenerator71 to Data17Bus_A3 Original: The newPowerGenErATor6 will not change Amended: The newPowerGenErATor6 will not change

        I hope this is of interest.

        Cheers,

        JohnGG