in reply to Leaking Regex Captures

I am not sure what you want.
It would be helpful if you could give an OUTPUT section like you have a DATA section.
Why does this have to be so complex?
Update: small formatting change.
#!/usr/bin/perl -w use strict; while (<DATA>) { print "testing: $_"; chomp; my @digits = m/\d+/g; print "digits only: @digits\n"; my @numletters = m/\d[^\d]+/g; print "digits_and_letters:@numletters\n\n"; } #Prints: #testing: 1c #digits only: 1 #digits_and_letters:1c # #testing: 2w #digits only: 2 #digits_and_letters:2w # #testing: 2c3w #digits only: 2 3 #digits_and_letters:2c 3w # #testing: 1w1w #digits only: 1 1 #digits_and_letters:1w 1w # #testing: 1w2r #digits only: 1 2 #digits_and_letters:1w 2r # #testing: 2r1c #digits only: 2 1 #digits_and_letters:2r 1c __DATA__ 1c 2w 2c3w 1w1w 1w2r 2r1c

Replies are listed 'Best First'.
Re^2: Leaking Regex Captures
by SuicideJunkie (Vicar) on Aug 05, 2009 at 15:33 UTC
    Note that this is very closely related to the context of: Re: Regex - Matching prefixes of a word

    The original goal of the regex is to match a command string similar to:
    beam 15 crew 5 wounded 2 critical to S.S.Kevorkian
    Where the number-type pairs are optional and may appear in any order, provided that there is at least one of the pairs present. (No point in beaming nobody over)

    Thus, the (\d+)\s*literals form of each piece,
    and the (?: (capture)X | (capture)Y | (capture)Z )+ overall structure.
    Wrapped around that structure is a /^(?:$regexSubstringOf{beam}|$regexSubstringOf{transport}\s* )\s*(?:$structure)\s+(?:to\s+)?$regexObjectName\s*$/i

    And then it all ends up in an addCommand('transport', {crew=>$1,wound=>$2,crit=>$3},$4) if $cmd =~ /regex/i; ($4 is the ship name, captured by the $regexObjectName)


    What I have done to work around the problem is to capture the whole pair, and then inside the addCommand() function, I fire off some more regex to s/\D//g the hash values if they are defined.
    I also have to add a negative lookahead in the captures to prevent '5 crit' from matching as a substring of 'crew': "5cr" and stomping the $1 value before backtracking kicks in.



    To sum up; I want the numbers out of those pairs, with $1 = Number of healthy Crew, $2 = number of wounded, $3 = number of critically injured.
    How I get them is not important, and for multiple copies of them in the command string I don't care which one gets picked, although consistency is desirable and the last one is better than the first since that means a user can just keep typing if they make a mistake, instead of backspacing up to change the number.

      Well, how about this....?
      #!/usr/bin/perl -w use strict; while (<DATA>) { print "testing: $_"; chomp; my @pairs = m/(\d+)\s+(\w+)/g; print "@pairs\n\n"; } #Prints: #testing: beam 15 crew 5 wounded 2 critical to S.S.Kevorkian #15 crew 5 wounded 2 critical # #testing: oh, my gosh, darn 5 killed 2 want_sex_change 10 drunk #5 killed 2 want_sex_change 10 drunk # #testing: what a day:5 wounded 2 critical 20 crew #5 wounded 2 critical 20 crew # #testing: 20 crew and 6 killed and 14 MIA #20 crew 6 killed 14 MIA __DATA__ beam 15 crew 5 wounded 2 critical to S.S.Kevorkian oh, my gosh, darn 5 killed 2 want_sex_change 10 drunk what a day:5 wounded 2 critical 20 crew 20 crew and 6 killed and 14 MIA

        That would involve a lot of post-processing to match up the numbers with the categories and filter the categories to just the valid ones ('crew', 'wounded' and 'crit'). And it can't be inserted into a larger regex match.

        (A lot of work, compared to just: "passing $1, $2, ... $N and some constants into the addCommand() function if and only if the regex matches")


        At the moment I have around 20-25 lines, each with a single regex guarding one call to addCommand(). I thus have a strong aversion to postprocessing on the matches which would cause the code to balloon up.

        As noted earlier in the thread, I do have a workaround which is suboptimal but adequate. Optimal would be if no post-processing was required, due to the captures not getting stomped on.