in reply to regex, pos, \G, and substr

This seems somewhat simpler, though you might want to strengthen the regex to validate the input more.

#! perl -slw use strict; my $data_stg = 'junk text update 8923 mark complete update 8324 mark ' . 'complete more junk update 5438 and 5843 and 1522 mark ' . 'complete update 8435 and 9323 mark complete true junk' ; $data_stg =~ s[update (.+?) mark complete]{ join ' ', map{ "update $_ mark complete"} split '\s+and\s+', $1 }ge; print $data_stg;

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: regex, pos, \G, and substr
by ff (Hermit) on Jun 03, 2007 at 03:02 UTC
    I think it's perceptive to split the guts of the phrases on [ ]and[ ], but it's really important in my case that the leftovers are only digits. While I could throw grep { /^\d+$/ } in front of the split, I'd lose visibility to any non-digit stuff that was (mistakenly) there in the process of following through with the replace side of the (s)ubstitute operator. In other words, I'd rather leave everything alone if there's anything "non-digit" besides the and splitters in there. BTW, I like the single-quotes for delimiting the split regex.

      That's what I meant by strengthening the regex. Note that the non-conformant additional third line is left untouched:

      #! perl -slw use strict; my $data_stg = 'junk text update 8923 mark complete update 8324 mark ' . 'complete more junk update 5438 and 5843 and 1522 mark ' . 'complete more junk update junk and 5843 and 1522 mark ' . 'complete update 8435 and 9323 mark complete true junk' ; $data_stg =~ s[update ((?:\d+|\s|and)+) mark complete]{ join ' ', map{ "update $_ mark complete"} split '\s+and\s+', $1 }ge; print $data_stg; __END__ ## Output wrapped to match input for easier verification. junk text update 8923 mark complete update 8324 mark complete more junk update 5438 mark complete update 5843 mark complete + update 1522 mark complete more junk update junk and 5843 and 1522 mark complete update 8435 mark complete update 9323 mark complete true junk

      Alternatively, verify that the split values are numeric, produce a warning and put the original back if not:

      #! perl -slw use strict; my $data_stg = 'junk text update 8923 mark complete update 8324 mark ' . 'complete more junk update 5438 and 5843 and 1522 mark ' . 'complete more junk update junk and 5843 and 1522 mark ' . 'complete update 8435 and 9323 mark complete true junk' ; $data_stg =~ s[(update (.+?) mark complete)]{ my @numbers = split '\s+and\s+', $2; if( grep{ !/^\d+$/ } @numbers ) { warn "Malformed request: '$1'\n"; $1; } else{ join ' ', map{ "update $_ mark complete"} @numbers; } }ge; print $data_stg; __END__ ## Output wrapped to match input for easier verification. Malformed request: 'update junk and 5843 and 1522 mark complete' junk text update 8923 mark complete update 8324 mark complete more junk update 5438 mark complete update 5843 mark complete + update 1522 mark complete more junk update junk and 5843 and 1522 mark complete update 8435 mark complete update 9323 mark complete true junk
      BTW, I like the single-quotes for delimiting the split regex.

      Most don't. They consider it a bad habit of mine.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'd rather leave everything alone if there's anything "non-digit" besides the and splitters in there.
      Then leave that part the same as in your original looping regex:
      s[update (\d+(?: and \d+)+) mark complete]{...}ge;