Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Capture Groups

by Bod (Parson)
on Nov 14, 2023 at 20:11 UTC ( [id://11155620] : perlquestion . print w/replies, xml ) Need Help??

Bod has asked for the wisdom of the Perl Monks concerning the following question:

I've had cause to run some code from the console that usually only gets run on a webserver. In its usual environment, it doesn't give me warnings. But, from the console, I get a warning...

$result =~ s/(Mr(s)?) a/\1 A/s; \1 better written as $1 at...

I have always been under the impression (or perhaps illusion) that \1 should be used within the regular expression and $1 outside of it. Like this:

if ($test =~ /some_test/) { $foo = $1; }

Has something changed in the preferred way to capture in a regular expression, or have I been doing it wrong all the time?

Replies are listed 'Best First'.
Re: Capture Groups
by jo37 (Deacon) on Nov 14, 2023 at 20:29 UTC
    I have always been under the impression (or perhaps illusion) that \1 should be used within the regular expression and $1 outside of it.

    That is correct. But you used \1 in the replacement part of a substitution, which is certainly outside the regex.

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
      But you used \1 in the replacement part of a substitution

      Oh!
      So it's a conceptual problem...I was thinking of the whole of the substitution as being part of the regex...

        It's part of the substitution operator, but the regular expression pattern passed to that operator is (Mr(s)?) a.

Re: Capture Groups
by eyepopslikeamosquito (Archbishop) on Nov 14, 2023 at 21:52 UTC

    While jo37 has answered your specific question, I was wondering if this question is related to last year's Split first and last names ... and if you have any update on your learnings from your long name parsing journey.

    👁️🍾👍🦟
      I was wondering if this question is related to last year's Split first and last names

      No connection at all...
      But bonus points for your memory recall and for joining similar dots!

      any update on your learnings from your long name parsing journey

      I did create an internal discussion document and went on to create a parser for dealing with name strings and splitting them up reasonably well. But, given that they are difficult to split with programmatically with certainty, we parse the string then show the user how we have split them allowing them to adjust as their superior human brain sees fit. Except where the names are a known firstname (looked up from a long list) and a single surname. The we don't show the parse results but the user can adjust of they think it's appropriate.

      It is working well for the low volumes of traffic we currently have.

      Our roadmap includes adding AI to this parsing process. It is something that AI should be as good as a human at doing. At least, nearly as good as a human. So far I have written a couple of prompts and fed the AI a variety of tricky names to split up into their component parts and the results look promising. The AI is formatting them nicely as JSON so we should be able to deal with the results.

      It's parsed Johannes Adam Ferdinand Alois Josef Maria Marko d'Aviano Pius von und zu Liechtenstien and told me that I don't have enough fields to properly accommodate all the constituent parts. But I doubt the ruler of Liechtenstien will need us to store his name!

        Can it handle Picasso's full name ? (̿▀̿‿ ̿▀̿ ̿)
        Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso

                        "If it happens once, it's a bug. If it happens twice, it's a feature. If it happens more than twice, it's a design philosophy."

Re: Capture Groups
by eyepopslikeamosquito (Archbishop) on Nov 15, 2023 at 04:05 UTC

    $result =~ s/(Mr(s)?) a/\1 A/s;

    While this is presumably just a cut-down version - used to ask about the unexpected warning you were seeing - your posted sample code made me pull a face ... so if you were to post the complete production regex(en) you are using, we might be able to offer suggestions to improve them. :)

    👁️🍾👍🦟